KVM Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v2 00/17] follow_pfn and other iomap races
@ 2020-10-09  7:59 Daniel Vetter
  2020-10-09  7:59 ` [PATCH v2 01/17] drm/exynos: Stop using frame_vector helpers Daniel Vetter
                   ` (16 more replies)
  0 siblings, 17 replies; 69+ messages in thread
From: Daniel Vetter @ 2020-10-09  7:59 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: kvm, linux-mm, linux-arm-kernel, linux-samsung-soc, linux-media,
	linux-s390, Daniel Vetter

Hi all,

Round two of my patch series to clamp down a bunch of races and gaps
around follow_pfn and other access to iomem mmaps. Previous version:

v1: https://lore.kernel.org/dri-devel/20201007164426.1812530-1-daniel.vetter@ffwll.ch/

And the discussion that sparked this journey:

https://lore.kernel.org/dri-devel/20201007164426.1812530-1-daniel.vetter@ffwll.ch/

Changes in v2:
- tons of small polish&fixes all over, thanks to all the reviewers who
  spotted issues
- I managed to test at least the generic_access_phys and pci mmap revoke
  stuff with a few gdb sessions using our i915 debug tools (hence now also
  the drm/i915 patch to properly request all the pci bar regions)
- reworked approach for the pci mmap revoke: Infrastructure moved into
  kernel/resource.c, address_space mapping is now set up at open time for
  everyone (which required some sysfs changes). Does indeed look a lot
  cleaner and a lot less invasive than I feared at first.

The big thing I can't test are all the frame_vector changes in habanalbas,
exynos and media. Gerald has given the s390 patch a spin already.

Review, testing, feedback all very much welcome.

Cheers, Daniel

Daniel Vetter (17):
  drm/exynos: Stop using frame_vector helpers
  drm/exynos: Use FOLL_LONGTERM for g2d cmdlists
  misc/habana: Stop using frame_vector helpers
  misc/habana: Use FOLL_LONGTERM for userptr
  mm/frame-vector: Use FOLL_LONGTERM
  media: videobuf2: Move frame_vector into media subsystem
  mm: Close race in generic_access_phys
  s390/pci: Remove races against pte updates
  mm: Add unsafe_follow_pfn
  media/videbuf1|2: Mark follow_pfn usage as unsafe
  vfio/type1: Mark follow_pfn as unsafe
  PCI: Obey iomem restrictions for procfs mmap
  /dev/mem: Only set filp->f_mapping
  resource: Move devmem revoke code to resource framework
  sysfs: Support zapping of binary attr mmaps
  PCI: Revoke mappings like devmem
  drm/i915: Properly request PCI BARs

 arch/s390/pci/pci_mmio.c                      | 98 +++++++++++--------
 drivers/char/mem.c                            | 86 +---------------
 drivers/gpu/drm/exynos/Kconfig                |  1 -
 drivers/gpu/drm/exynos/exynos_drm_g2d.c       | 48 ++++-----
 drivers/gpu/drm/i915/intel_uncore.c           | 25 ++++-
 drivers/media/common/videobuf2/Kconfig        |  1 -
 drivers/media/common/videobuf2/Makefile       |  1 +
 .../media/common/videobuf2}/frame_vector.c    | 54 ++++------
 drivers/media/platform/omap/Kconfig           |  1 -
 drivers/media/v4l2-core/videobuf-dma-contig.c |  2 +-
 drivers/misc/habanalabs/Kconfig               |  1 -
 drivers/misc/habanalabs/common/habanalabs.h   |  3 +-
 drivers/misc/habanalabs/common/memory.c       | 50 ++++------
 drivers/pci/pci-sysfs.c                       |  4 +
 drivers/pci/proc.c                            |  6 ++
 drivers/vfio/vfio_iommu_type1.c               |  4 +-
 fs/sysfs/file.c                               | 11 +++
 include/linux/ioport.h                        |  6 +-
 include/linux/mm.h                            | 47 +--------
 include/linux/sysfs.h                         |  2 +
 include/media/videobuf2-core.h                | 42 ++++++++
 kernel/resource.c                             | 95 +++++++++++++++++-
 mm/Kconfig                                    |  3 -
 mm/Makefile                                   |  1 -
 mm/memory.c                                   | 76 +++++++++++++-
 mm/nommu.c                                    | 17 ++++
 security/Kconfig                              | 13 +++
 27 files changed, 412 insertions(+), 286 deletions(-)
 rename {mm => drivers/media/common/videobuf2}/frame_vector.c (85%)

-- 
2.28.0


^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH v2 01/17] drm/exynos: Stop using frame_vector helpers
  2020-10-09  7:59 [PATCH v2 00/17] follow_pfn and other iomap races Daniel Vetter
@ 2020-10-09  7:59 ` Daniel Vetter
  2020-10-16  7:42   ` John Hubbard
  2020-10-09  7:59 ` [PATCH v2 02/17] drm/exynos: Use FOLL_LONGTERM for g2d cmdlists Daniel Vetter
                   ` (15 subsequent siblings)
  16 siblings, 1 reply; 69+ messages in thread
From: Daniel Vetter @ 2020-10-09  7:59 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: kvm, linux-mm, linux-arm-kernel, linux-samsung-soc, linux-media,
	linux-s390, Daniel Vetter, Daniel Vetter, Jason Gunthorpe,
	Inki Dae, Joonyoung Shim, Seung-Woo Kim, Kyungmin Park,
	Kukjin Kim, Krzysztof Kozlowski, Andrew Morton, John Hubbard,
	Jérôme Glisse, Jan Kara, Dan Williams

All we need are a pages array, pin_user_pages_fast can give us that
directly. Plus this avoids the entire raw pfn side of get_vaddr_frames.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Inki Dae <inki.dae@samsung.com>
Cc: Joonyoung Shim <jy0922.shim@samsung.com>
Cc: Seung-Woo Kim <sw0312.kim@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Kukjin Kim <kgene@kernel.org>
Cc: Krzysztof Kozlowski <krzk@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
--
v2: Use unpin_user_pages_dirty_lock (John)
---
 drivers/gpu/drm/exynos/Kconfig          |  1 -
 drivers/gpu/drm/exynos/exynos_drm_g2d.c | 47 +++++++++++--------------
 2 files changed, 20 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/exynos/Kconfig b/drivers/gpu/drm/exynos/Kconfig
index 6417f374b923..43257ef3c09d 100644
--- a/drivers/gpu/drm/exynos/Kconfig
+++ b/drivers/gpu/drm/exynos/Kconfig
@@ -88,7 +88,6 @@ comment "Sub-drivers"
 config DRM_EXYNOS_G2D
 	bool "G2D"
 	depends on VIDEO_SAMSUNG_S5P_G2D=n || COMPILE_TEST
-	select FRAME_VECTOR
 	help
 	  Choose this option if you want to use Exynos G2D for DRM.
 
diff --git a/drivers/gpu/drm/exynos/exynos_drm_g2d.c b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
index 967a5cdc120e..ecede41af9b9 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_g2d.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
@@ -205,7 +205,8 @@ struct g2d_cmdlist_userptr {
 	dma_addr_t		dma_addr;
 	unsigned long		userptr;
 	unsigned long		size;
-	struct frame_vector	*vec;
+	struct page		**pages;
+	unsigned int		npages;
 	struct sg_table		*sgt;
 	atomic_t		refcount;
 	bool			in_pool;
@@ -378,7 +379,6 @@ static void g2d_userptr_put_dma_addr(struct g2d_data *g2d,
 					bool force)
 {
 	struct g2d_cmdlist_userptr *g2d_userptr = obj;
-	struct page **pages;
 
 	if (!obj)
 		return;
@@ -398,15 +398,9 @@ static void g2d_userptr_put_dma_addr(struct g2d_data *g2d,
 	dma_unmap_sgtable(to_dma_dev(g2d->drm_dev), g2d_userptr->sgt,
 			  DMA_BIDIRECTIONAL, 0);
 
-	pages = frame_vector_pages(g2d_userptr->vec);
-	if (!IS_ERR(pages)) {
-		int i;
-
-		for (i = 0; i < frame_vector_count(g2d_userptr->vec); i++)
-			set_page_dirty_lock(pages[i]);
-	}
-	put_vaddr_frames(g2d_userptr->vec);
-	frame_vector_destroy(g2d_userptr->vec);
+	unpin_user_pages_dirty_lock(g2d_userptr->pages, g2d_userptr->npages,
+				    true);
+	kvfree(g2d_userptr->pages);
 
 	if (!g2d_userptr->out_of_list)
 		list_del_init(&g2d_userptr->list);
@@ -474,35 +468,34 @@ static dma_addr_t *g2d_userptr_get_dma_addr(struct g2d_data *g2d,
 	offset = userptr & ~PAGE_MASK;
 	end = PAGE_ALIGN(userptr + size);
 	npages = (end - start) >> PAGE_SHIFT;
-	g2d_userptr->vec = frame_vector_create(npages);
-	if (!g2d_userptr->vec) {
+	g2d_userptr->pages = kvmalloc_array(npages, sizeof(*g2d_userptr->pages),
+					    GFP_KERNEL);
+	if (!g2d_userptr->pages) {
 		ret = -ENOMEM;
 		goto err_free;
 	}
 
-	ret = get_vaddr_frames(start, npages, FOLL_FORCE | FOLL_WRITE,
-		g2d_userptr->vec);
+	ret = pin_user_pages_fast(start, npages, FOLL_FORCE | FOLL_WRITE,
+				  g2d_userptr->pages);
 	if (ret != npages) {
 		DRM_DEV_ERROR(g2d->dev,
 			      "failed to get user pages from userptr.\n");
 		if (ret < 0)
-			goto err_destroy_framevec;
-		ret = -EFAULT;
-		goto err_put_framevec;
-	}
-	if (frame_vector_to_pages(g2d_userptr->vec) < 0) {
+			goto err_destroy_pages;
+		npages = ret;
 		ret = -EFAULT;
-		goto err_put_framevec;
+		goto err_unpin_pages;
 	}
+	g2d_userptr->npages = npages;
 
 	sgt = kzalloc(sizeof(*sgt), GFP_KERNEL);
 	if (!sgt) {
 		ret = -ENOMEM;
-		goto err_put_framevec;
+		goto err_unpin_pages;
 	}
 
 	ret = sg_alloc_table_from_pages(sgt,
-					frame_vector_pages(g2d_userptr->vec),
+					g2d_userptr->pages,
 					npages, offset, size, GFP_KERNEL);
 	if (ret < 0) {
 		DRM_DEV_ERROR(g2d->dev, "failed to get sgt from pages.\n");
@@ -538,11 +531,11 @@ static dma_addr_t *g2d_userptr_get_dma_addr(struct g2d_data *g2d,
 err_free_sgt:
 	kfree(sgt);
 
-err_put_framevec:
-	put_vaddr_frames(g2d_userptr->vec);
+err_unpin_pages:
+	unpin_user_pages(g2d_userptr->pages, npages);
 
-err_destroy_framevec:
-	frame_vector_destroy(g2d_userptr->vec);
+err_destroy_pages:
+	kvfree(g2d_userptr->pages);
 
 err_free:
 	kfree(g2d_userptr);
-- 
2.28.0


^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH v2 02/17] drm/exynos: Use FOLL_LONGTERM for g2d cmdlists
  2020-10-09  7:59 [PATCH v2 00/17] follow_pfn and other iomap races Daniel Vetter
  2020-10-09  7:59 ` [PATCH v2 01/17] drm/exynos: Stop using frame_vector helpers Daniel Vetter
@ 2020-10-09  7:59 ` Daniel Vetter
  2020-10-09  7:59 ` [PATCH v2 03/17] misc/habana: Stop using frame_vector helpers Daniel Vetter
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 69+ messages in thread
From: Daniel Vetter @ 2020-10-09  7:59 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: kvm, linux-mm, linux-arm-kernel, linux-samsung-soc, linux-media,
	linux-s390, Daniel Vetter, Daniel Vetter, Jason Gunthorpe,
	Inki Dae, Joonyoung Shim, Seung-Woo Kim, Kyungmin Park,
	Kukjin Kim, Krzysztof Kozlowski, Andrew Morton, John Hubbard,
	Jérôme Glisse, Jan Kara, Dan Williams

The exynos g2d interface is very unusual, but it looks like the
userptr objects are persistent. Hence they need FOLL_LONGTERM.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Inki Dae <inki.dae@samsung.com>
Cc: Joonyoung Shim <jy0922.shim@samsung.com>
Cc: Seung-Woo Kim <sw0312.kim@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Kukjin Kim <kgene@kernel.org>
Cc: Krzysztof Kozlowski <krzk@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
---
 drivers/gpu/drm/exynos/exynos_drm_g2d.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/exynos/exynos_drm_g2d.c b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
index ecede41af9b9..1e0c5a7f206e 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_g2d.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
@@ -475,7 +475,8 @@ static dma_addr_t *g2d_userptr_get_dma_addr(struct g2d_data *g2d,
 		goto err_free;
 	}
 
-	ret = pin_user_pages_fast(start, npages, FOLL_FORCE | FOLL_WRITE,
+	ret = pin_user_pages_fast(start, npages,
+				  FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM,
 				  g2d_userptr->pages);
 	if (ret != npages) {
 		DRM_DEV_ERROR(g2d->dev,
-- 
2.28.0


^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH v2 03/17] misc/habana: Stop using frame_vector helpers
  2020-10-09  7:59 [PATCH v2 00/17] follow_pfn and other iomap races Daniel Vetter
  2020-10-09  7:59 ` [PATCH v2 01/17] drm/exynos: Stop using frame_vector helpers Daniel Vetter
  2020-10-09  7:59 ` [PATCH v2 02/17] drm/exynos: Use FOLL_LONGTERM for g2d cmdlists Daniel Vetter
@ 2020-10-09  7:59 ` Daniel Vetter
  2020-10-10 20:26   ` Oded Gabbay
  2020-10-16  7:45   ` John Hubbard
  2020-10-09  7:59 ` [PATCH v2 04/17] misc/habana: Use FOLL_LONGTERM for userptr Daniel Vetter
                   ` (13 subsequent siblings)
  16 siblings, 2 replies; 69+ messages in thread
From: Daniel Vetter @ 2020-10-09  7:59 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: kvm, linux-mm, linux-arm-kernel, linux-samsung-soc, linux-media,
	linux-s390, Daniel Vetter, Daniel Vetter, Jason Gunthorpe,
	Andrew Morton, John Hubbard, Jérôme Glisse, Jan Kara,
	Dan Williams, Oded Gabbay, Omer Shpigelman, Ofir Bitton,
	Tomer Tayar, Moti Haimovski, Greg Kroah-Hartman, Pawel Piskorski

All we need are a pages array, pin_user_pages_fast can give us that
directly. Plus this avoids the entire raw pfn side of get_vaddr_frames.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Oded Gabbay <oded.gabbay@gmail.com>
Cc: Omer Shpigelman <oshpigelman@habana.ai>
Cc: Ofir Bitton <obitton@habana.ai>
Cc: Tomer Tayar <ttayar@habana.ai>
Cc: Moti Haimovski <mhaimovski@habana.ai>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Pawel Piskorski <ppiskorski@habana.ai>
--
v2: Use unpin_user_pages_dirty_lock (John)
---
 drivers/misc/habanalabs/Kconfig             |  1 -
 drivers/misc/habanalabs/common/habanalabs.h |  3 +-
 drivers/misc/habanalabs/common/memory.c     | 49 ++++++++-------------
 3 files changed, 20 insertions(+), 33 deletions(-)

diff --git a/drivers/misc/habanalabs/Kconfig b/drivers/misc/habanalabs/Kconfig
index 8eb5d38c618e..2f04187f7167 100644
--- a/drivers/misc/habanalabs/Kconfig
+++ b/drivers/misc/habanalabs/Kconfig
@@ -6,7 +6,6 @@
 config HABANA_AI
 	tristate "HabanaAI accelerators (habanalabs)"
 	depends on PCI && HAS_IOMEM
-	select FRAME_VECTOR
 	select DMA_SHARED_BUFFER
 	select GENERIC_ALLOCATOR
 	select HWMON
diff --git a/drivers/misc/habanalabs/common/habanalabs.h b/drivers/misc/habanalabs/common/habanalabs.h
index edbd627b29d2..c1b3ad613b15 100644
--- a/drivers/misc/habanalabs/common/habanalabs.h
+++ b/drivers/misc/habanalabs/common/habanalabs.h
@@ -881,7 +881,8 @@ struct hl_ctx_mgr {
 struct hl_userptr {
 	enum vm_type_t		vm_type; /* must be first */
 	struct list_head	job_node;
-	struct frame_vector	*vec;
+	struct page		**pages;
+	unsigned int		npages;
 	struct sg_table		*sgt;
 	enum dma_data_direction dir;
 	struct list_head	debugfs_list;
diff --git a/drivers/misc/habanalabs/common/memory.c b/drivers/misc/habanalabs/common/memory.c
index 5ff4688683fd..327b64479f97 100644
--- a/drivers/misc/habanalabs/common/memory.c
+++ b/drivers/misc/habanalabs/common/memory.c
@@ -1281,45 +1281,41 @@ static int get_user_memory(struct hl_device *hdev, u64 addr, u64 size,
 		return -EFAULT;
 	}
 
-	userptr->vec = frame_vector_create(npages);
-	if (!userptr->vec) {
+	userptr->pages = kvmalloc_array(npages, sizeof(*userptr->pages),
+					GFP_KERNEL);
+	if (!userptr->pages) {
 		dev_err(hdev->dev, "Failed to create frame vector\n");
 		return -ENOMEM;
 	}
 
-	rc = get_vaddr_frames(start, npages, FOLL_FORCE | FOLL_WRITE,
-				userptr->vec);
+	rc = pin_user_pages_fast(start, npages, FOLL_FORCE | FOLL_WRITE,
+				 userptr->pages);
 
 	if (rc != npages) {
 		dev_err(hdev->dev,
 			"Failed to map host memory, user ptr probably wrong\n");
 		if (rc < 0)
-			goto destroy_framevec;
+			goto destroy_pages;
+		npages = rc;
 		rc = -EFAULT;
-		goto put_framevec;
-	}
-
-	if (frame_vector_to_pages(userptr->vec) < 0) {
-		dev_err(hdev->dev,
-			"Failed to translate frame vector to pages\n");
-		rc = -EFAULT;
-		goto put_framevec;
+		goto put_pages;
 	}
+	userptr->npages = npages;
 
 	rc = sg_alloc_table_from_pages(userptr->sgt,
-					frame_vector_pages(userptr->vec),
-					npages, offset, size, GFP_ATOMIC);
+				       userptr->pages,
+				       npages, offset, size, GFP_ATOMIC);
 	if (rc < 0) {
 		dev_err(hdev->dev, "failed to create SG table from pages\n");
-		goto put_framevec;
+		goto put_pages;
 	}
 
 	return 0;
 
-put_framevec:
-	put_vaddr_frames(userptr->vec);
-destroy_framevec:
-	frame_vector_destroy(userptr->vec);
+put_pages:
+	unpin_user_pages(userptr->pages, npages);
+destroy_pages:
+	kvfree(userptr->pages);
 	return rc;
 }
 
@@ -1405,8 +1401,6 @@ int hl_pin_host_memory(struct hl_device *hdev, u64 addr, u64 size,
  */
 void hl_unpin_host_memory(struct hl_device *hdev, struct hl_userptr *userptr)
 {
-	struct page **pages;
-
 	hl_debugfs_remove_userptr(hdev, userptr);
 
 	if (userptr->dma_mapped)
@@ -1414,15 +1408,8 @@ void hl_unpin_host_memory(struct hl_device *hdev, struct hl_userptr *userptr)
 							userptr->sgt->nents,
 							userptr->dir);
 
-	pages = frame_vector_pages(userptr->vec);
-	if (!IS_ERR(pages)) {
-		int i;
-
-		for (i = 0; i < frame_vector_count(userptr->vec); i++)
-			set_page_dirty_lock(pages[i]);
-	}
-	put_vaddr_frames(userptr->vec);
-	frame_vector_destroy(userptr->vec);
+	unpin_user_pages_dirty_lock(userptr->pages, userptr->npages, true);
+	kvfree(userptr->pages);
 
 	list_del(&userptr->job_node);
 
-- 
2.28.0


^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH v2 04/17] misc/habana: Use FOLL_LONGTERM for userptr
  2020-10-09  7:59 [PATCH v2 00/17] follow_pfn and other iomap races Daniel Vetter
                   ` (2 preceding siblings ...)
  2020-10-09  7:59 ` [PATCH v2 03/17] misc/habana: Stop using frame_vector helpers Daniel Vetter
@ 2020-10-09  7:59 ` Daniel Vetter
  2020-10-09  7:59 ` [PATCH v2 05/17] mm/frame-vector: Use FOLL_LONGTERM Daniel Vetter
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 69+ messages in thread
From: Daniel Vetter @ 2020-10-09  7:59 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: kvm, linux-mm, linux-arm-kernel, linux-samsung-soc, linux-media,
	linux-s390, Daniel Vetter, Daniel Vetter, Jason Gunthorpe,
	Andrew Morton, John Hubbard, Jérôme Glisse, Jan Kara,
	Dan Williams, Oded Gabbay, Omer Shpigelman, Ofir Bitton,
	Tomer Tayar, Moti Haimovski, Greg Kroah-Hartman, Pawel Piskorski

These are persistent, not just for the duration of a dma operation.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Oded Gabbay <oded.gabbay@gmail.com>
Cc: Omer Shpigelman <oshpigelman@habana.ai>
Cc: Ofir Bitton <obitton@habana.ai>
Cc: Tomer Tayar <ttayar@habana.ai>
Cc: Moti Haimovski <mhaimovski@habana.ai>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Pawel Piskorski <ppiskorski@habana.ai>
---
 drivers/misc/habanalabs/common/memory.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/misc/habanalabs/common/memory.c b/drivers/misc/habanalabs/common/memory.c
index 327b64479f97..767d3644c033 100644
--- a/drivers/misc/habanalabs/common/memory.c
+++ b/drivers/misc/habanalabs/common/memory.c
@@ -1288,7 +1288,8 @@ static int get_user_memory(struct hl_device *hdev, u64 addr, u64 size,
 		return -ENOMEM;
 	}
 
-	rc = pin_user_pages_fast(start, npages, FOLL_FORCE | FOLL_WRITE,
+	rc = pin_user_pages_fast(start, npages,
+				 FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM,
 				 userptr->pages);
 
 	if (rc != npages) {
-- 
2.28.0


^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH v2 05/17] mm/frame-vector: Use FOLL_LONGTERM
  2020-10-09  7:59 [PATCH v2 00/17] follow_pfn and other iomap races Daniel Vetter
                   ` (3 preceding siblings ...)
  2020-10-09  7:59 ` [PATCH v2 04/17] misc/habana: Use FOLL_LONGTERM for userptr Daniel Vetter
@ 2020-10-09  7:59 ` Daniel Vetter
  2020-10-16  7:54   ` John Hubbard
  2020-10-09  7:59 ` [PATCH v2 06/17] media: videobuf2: Move frame_vector into media subsystem Daniel Vetter
                   ` (11 subsequent siblings)
  16 siblings, 1 reply; 69+ messages in thread
From: Daniel Vetter @ 2020-10-09  7:59 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: kvm, linux-mm, linux-arm-kernel, linux-samsung-soc, linux-media,
	linux-s390, Daniel Vetter, Daniel Vetter, Jason Gunthorpe,
	Pawel Osciak, Marek Szyprowski, Kyungmin Park, Tomasz Figa,
	Mauro Carvalho Chehab, Andrew Morton, John Hubbard,
	Jérôme Glisse, Jan Kara, Dan Williams

This is used by media/videbuf2 for persistent dma mappings, not just
for a single dma operation and then freed again, so needs
FOLL_LONGTERM.

Unfortunately current pup_locked doesn't support FOLL_LONGTERM due to
locking issues. Rework the code to pull the pup path out from the
mmap_sem critical section as suggested by Jason.

By relying entirely on the vma checks in pin_user_pages and follow_pfn
(for vm_flags and vma_is_fsdax) we can also streamline the code a lot.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Pawel Osciak <pawel@osciak.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Tomasz Figa <tfiga@chromium.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
--
v2: Streamline the code and further simplify the loop checks (Jason)
---
 mm/frame_vector.c | 50 ++++++++++++++---------------------------------
 1 file changed, 15 insertions(+), 35 deletions(-)

diff --git a/mm/frame_vector.c b/mm/frame_vector.c
index 10f82d5643b6..d44779e56313 100644
--- a/mm/frame_vector.c
+++ b/mm/frame_vector.c
@@ -38,7 +38,6 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
 	struct vm_area_struct *vma;
 	int ret = 0;
 	int err;
-	int locked;
 
 	if (nr_frames == 0)
 		return 0;
@@ -48,40 +47,25 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
 
 	start = untagged_addr(start);
 
-	mmap_read_lock(mm);
-	locked = 1;
-	vma = find_vma_intersection(mm, start, start + 1);
-	if (!vma) {
-		ret = -EFAULT;
-		goto out;
-	}
-
-	/*
-	 * While get_vaddr_frames() could be used for transient (kernel
-	 * controlled lifetime) pinning of memory pages all current
-	 * users establish long term (userspace controlled lifetime)
-	 * page pinning. Treat get_vaddr_frames() like
-	 * get_user_pages_longterm() and disallow it for filesystem-dax
-	 * mappings.
-	 */
-	if (vma_is_fsdax(vma)) {
-		ret = -EOPNOTSUPP;
-		goto out;
-	}
-
-	if (!(vma->vm_flags & (VM_IO | VM_PFNMAP))) {
+	ret = pin_user_pages_fast(start, nr_frames,
+				  FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM,
+				  (struct page **)(vec->ptrs));
+	if (ret > 0) {
 		vec->got_ref = true;
 		vec->is_pfns = false;
-		ret = pin_user_pages_locked(start, nr_frames,
-			gup_flags, (struct page **)(vec->ptrs), &locked);
-		goto out;
+		goto out_unlocked;
 	}
 
+	mmap_read_lock(mm);
 	vec->got_ref = false;
 	vec->is_pfns = true;
 	do {
 		unsigned long *nums = frame_vector_pfns(vec);
 
+		vma = find_vma_intersection(mm, start, start + 1);
+		if (!vma)
+			break;
+
 		while (ret < nr_frames && start + PAGE_SIZE <= vma->vm_end) {
 			err = follow_pfn(vma, start, &nums[ret]);
 			if (err) {
@@ -92,17 +76,13 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
 			start += PAGE_SIZE;
 			ret++;
 		}
-		/*
-		 * We stop if we have enough pages or if VMA doesn't completely
-		 * cover the tail page.
-		 */
-		if (ret >= nr_frames || start < vma->vm_end)
+		/* Bail out if VMA doesn't completely cover the tail page. */
+		if (start < vma->vm_end)
 			break;
-		vma = find_vma_intersection(mm, start, start + 1);
-	} while (vma && vma->vm_flags & (VM_IO | VM_PFNMAP));
+	} while (ret < nr_frames);
 out:
-	if (locked)
-		mmap_read_unlock(mm);
+	mmap_read_unlock(mm);
+out_unlocked:
 	if (!ret)
 		ret = -EFAULT;
 	if (ret > 0)
-- 
2.28.0


^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH v2 06/17] media: videobuf2: Move frame_vector into media subsystem
  2020-10-09  7:59 [PATCH v2 00/17] follow_pfn and other iomap races Daniel Vetter
                   ` (4 preceding siblings ...)
  2020-10-09  7:59 ` [PATCH v2 05/17] mm/frame-vector: Use FOLL_LONGTERM Daniel Vetter
@ 2020-10-09  7:59 ` Daniel Vetter
  2020-10-09 10:14   ` Mauro Carvalho Chehab
  2020-10-09  7:59 ` [PATCH v2 07/17] mm: Close race in generic_access_phys Daniel Vetter
                   ` (10 subsequent siblings)
  16 siblings, 1 reply; 69+ messages in thread
From: Daniel Vetter @ 2020-10-09  7:59 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: kvm, linux-mm, linux-arm-kernel, linux-samsung-soc, linux-media,
	linux-s390, Daniel Vetter, John Hubbard, Daniel Vetter,
	Jason Gunthorpe, Pawel Osciak, Marek Szyprowski, Kyungmin Park,
	Tomasz Figa, Mauro Carvalho Chehab, Andrew Morton,
	Jérôme Glisse, Jan Kara, Dan Williams

It's the only user. This also garbage collects the CONFIG_FRAME_VECTOR
symbol from all over the tree (well just one place, somehow omap media
driver still had this in its Kconfig, despite not using it).

Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Pawel Osciak <pawel@osciak.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Tomasz Figa <tfiga@chromium.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/media/common/videobuf2/Kconfig        |  1 -
 drivers/media/common/videobuf2/Makefile       |  1 +
 .../media/common/videobuf2}/frame_vector.c    |  2 +
 drivers/media/platform/omap/Kconfig           |  1 -
 include/linux/mm.h                            | 42 -------------------
 include/media/videobuf2-core.h                | 42 +++++++++++++++++++
 mm/Kconfig                                    |  3 --
 mm/Makefile                                   |  1 -
 8 files changed, 45 insertions(+), 48 deletions(-)
 rename {mm => drivers/media/common/videobuf2}/frame_vector.c (99%)

diff --git a/drivers/media/common/videobuf2/Kconfig b/drivers/media/common/videobuf2/Kconfig
index edbc99ebba87..d2223a12c95f 100644
--- a/drivers/media/common/videobuf2/Kconfig
+++ b/drivers/media/common/videobuf2/Kconfig
@@ -9,7 +9,6 @@ config VIDEOBUF2_V4L2
 
 config VIDEOBUF2_MEMOPS
 	tristate
-	select FRAME_VECTOR
 
 config VIDEOBUF2_DMA_CONTIG
 	tristate
diff --git a/drivers/media/common/videobuf2/Makefile b/drivers/media/common/videobuf2/Makefile
index 77bebe8b202f..54306f8d096c 100644
--- a/drivers/media/common/videobuf2/Makefile
+++ b/drivers/media/common/videobuf2/Makefile
@@ -1,5 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0
 videobuf2-common-objs := videobuf2-core.o
+videobuf2-common-objs += frame_vector.o
 
 ifeq ($(CONFIG_TRACEPOINTS),y)
   videobuf2-common-objs += vb2-trace.o
diff --git a/mm/frame_vector.c b/drivers/media/common/videobuf2/frame_vector.c
similarity index 99%
rename from mm/frame_vector.c
rename to drivers/media/common/videobuf2/frame_vector.c
index d44779e56313..2b0b97761d15 100644
--- a/mm/frame_vector.c
+++ b/drivers/media/common/videobuf2/frame_vector.c
@@ -8,6 +8,8 @@
 #include <linux/pagemap.h>
 #include <linux/sched.h>
 
+#include <media/videobuf2-core.h>
+
 /**
  * get_vaddr_frames() - map virtual addresses to pfns
  * @start:	starting user address
diff --git a/drivers/media/platform/omap/Kconfig b/drivers/media/platform/omap/Kconfig
index f73b5893220d..de16de46c0f4 100644
--- a/drivers/media/platform/omap/Kconfig
+++ b/drivers/media/platform/omap/Kconfig
@@ -12,6 +12,5 @@ config VIDEO_OMAP2_VOUT
 	depends on VIDEO_V4L2
 	select VIDEOBUF2_DMA_CONTIG
 	select OMAP2_VRFB if ARCH_OMAP2 || ARCH_OMAP3
-	select FRAME_VECTOR
 	help
 	  V4L2 Display driver support for OMAP2/3 based boards.
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 16b799a0522c..acd60fbf1a5a 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1743,48 +1743,6 @@ int account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc);
 int __account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc,
 			struct task_struct *task, bool bypass_rlim);
 
-/* Container for pinned pfns / pages */
-struct frame_vector {
-	unsigned int nr_allocated;	/* Number of frames we have space for */
-	unsigned int nr_frames;	/* Number of frames stored in ptrs array */
-	bool got_ref;		/* Did we pin pages by getting page ref? */
-	bool is_pfns;		/* Does array contain pages or pfns? */
-	void *ptrs[];		/* Array of pinned pfns / pages. Use
-				 * pfns_vector_pages() or pfns_vector_pfns()
-				 * for access */
-};
-
-struct frame_vector *frame_vector_create(unsigned int nr_frames);
-void frame_vector_destroy(struct frame_vector *vec);
-int get_vaddr_frames(unsigned long start, unsigned int nr_pfns,
-		     unsigned int gup_flags, struct frame_vector *vec);
-void put_vaddr_frames(struct frame_vector *vec);
-int frame_vector_to_pages(struct frame_vector *vec);
-void frame_vector_to_pfns(struct frame_vector *vec);
-
-static inline unsigned int frame_vector_count(struct frame_vector *vec)
-{
-	return vec->nr_frames;
-}
-
-static inline struct page **frame_vector_pages(struct frame_vector *vec)
-{
-	if (vec->is_pfns) {
-		int err = frame_vector_to_pages(vec);
-
-		if (err)
-			return ERR_PTR(err);
-	}
-	return (struct page **)(vec->ptrs);
-}
-
-static inline unsigned long *frame_vector_pfns(struct frame_vector *vec)
-{
-	if (!vec->is_pfns)
-		frame_vector_to_pfns(vec);
-	return (unsigned long *)(vec->ptrs);
-}
-
 struct kvec;
 int get_kernel_pages(const struct kvec *iov, int nr_pages, int write,
 			struct page **pages);
diff --git a/include/media/videobuf2-core.h b/include/media/videobuf2-core.h
index bbb3f26fbde9..a2e75ca0334f 100644
--- a/include/media/videobuf2-core.h
+++ b/include/media/videobuf2-core.h
@@ -1254,4 +1254,46 @@ bool vb2_request_object_is_buffer(struct media_request_object *obj);
  */
 unsigned int vb2_request_buffer_cnt(struct media_request *req);
 
+/* Container for pinned pfns / pages in frame_vector.c */
+struct frame_vector {
+	unsigned int nr_allocated;	/* Number of frames we have space for */
+	unsigned int nr_frames;	/* Number of frames stored in ptrs array */
+	bool got_ref;		/* Did we pin pages by getting page ref? */
+	bool is_pfns;		/* Does array contain pages or pfns? */
+	void *ptrs[];		/* Array of pinned pfns / pages. Use
+				 * pfns_vector_pages() or pfns_vector_pfns()
+				 * for access */
+};
+
+struct frame_vector *frame_vector_create(unsigned int nr_frames);
+void frame_vector_destroy(struct frame_vector *vec);
+int get_vaddr_frames(unsigned long start, unsigned int nr_pfns,
+		     unsigned int gup_flags, struct frame_vector *vec);
+void put_vaddr_frames(struct frame_vector *vec);
+int frame_vector_to_pages(struct frame_vector *vec);
+void frame_vector_to_pfns(struct frame_vector *vec);
+
+static inline unsigned int frame_vector_count(struct frame_vector *vec)
+{
+	return vec->nr_frames;
+}
+
+static inline struct page **frame_vector_pages(struct frame_vector *vec)
+{
+	if (vec->is_pfns) {
+		int err = frame_vector_to_pages(vec);
+
+		if (err)
+			return ERR_PTR(err);
+	}
+	return (struct page **)(vec->ptrs);
+}
+
+static inline unsigned long *frame_vector_pfns(struct frame_vector *vec)
+{
+	if (!vec->is_pfns)
+		frame_vector_to_pfns(vec);
+	return (unsigned long *)(vec->ptrs);
+}
+
 #endif /* _MEDIA_VIDEOBUF2_CORE_H */
diff --git a/mm/Kconfig b/mm/Kconfig
index 6c974888f86f..da6c943fe9f1 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -815,9 +815,6 @@ config DEVICE_PRIVATE
 	  memory; i.e., memory that is only accessible from the device (or
 	  group of devices). You likely also want to select HMM_MIRROR.
 
-config FRAME_VECTOR
-	bool
-
 config ARCH_USES_HIGH_VMA_FLAGS
 	bool
 config ARCH_HAS_PKEYS
diff --git a/mm/Makefile b/mm/Makefile
index d5649f1c12c0..a025fd6c6afd 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -111,7 +111,6 @@ obj-$(CONFIG_PAGE_EXTENSION) += page_ext.o
 obj-$(CONFIG_CMA_DEBUGFS) += cma_debug.o
 obj-$(CONFIG_USERFAULTFD) += userfaultfd.o
 obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o
-obj-$(CONFIG_FRAME_VECTOR) += frame_vector.o
 obj-$(CONFIG_DEBUG_PAGE_REF) += debug_page_ref.o
 obj-$(CONFIG_HARDENED_USERCOPY) += usercopy.o
 obj-$(CONFIG_PERCPU_STATS) += percpu-stats.o
-- 
2.28.0


^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH v2 07/17] mm: Close race in generic_access_phys
  2020-10-09  7:59 [PATCH v2 00/17] follow_pfn and other iomap races Daniel Vetter
                   ` (5 preceding siblings ...)
  2020-10-09  7:59 ` [PATCH v2 06/17] media: videobuf2: Move frame_vector into media subsystem Daniel Vetter
@ 2020-10-09  7:59 ` Daniel Vetter
  2020-10-09  7:59 ` [PATCH v2 08/17] s390/pci: Remove races against pte updates Daniel Vetter
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 69+ messages in thread
From: Daniel Vetter @ 2020-10-09  7:59 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: kvm, linux-mm, linux-arm-kernel, linux-samsung-soc, linux-media,
	linux-s390, Daniel Vetter, Jason Gunthorpe, Dan Williams,
	Kees Cook, Rik van Riel, Benjamin Herrensmidt, Dave Airlie,
	Andrew Morton, John Hubbard, Jérôme Glisse, Jan Kara,
	Daniel Vetter

Way back it was a reasonable assumptions that iomem mappings never
change the pfn range they point at. But this has changed:

- gpu drivers dynamically manage their memory nowadays, invalidating
  ptes with unmap_mapping_range when buffers get moved

- contiguous dma allocations have moved from dedicated carvetouts to
  cma regions. This means if we miss the unmap the pfn might contain
  pagecache or anon memory (well anything allocated with GFP_MOVEABLE)

- even /dev/mem now invalidates mappings when the kernel requests that
  iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
  ("/dev/mem: Revoke mappings when a driver claims the region")

Accessing pfns obtained from ptes without holding all the locks is
therefore no longer a good idea. Fix this.

Since ioremap might need to manipulate pagetables too we need to drop
the pt lock and have a retry loop if we raced.

While at it, also add kerneldoc and improve the comment for the
vma_ops->access function. It's for accessing, not for moving the
memory from iomem to system memory, as the old comment seemed to
suggest.

References: 28b2ee20c7cb ("access_process_vm device memory infrastructure")
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Benjamin Herrensmidt <benh@kernel.crashing.org>
Cc: Dave Airlie <airlied@linux.ie>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
--
v2: Fix inversion in the retry check (John).
---
 include/linux/mm.h |  3 ++-
 mm/memory.c        | 44 ++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 44 insertions(+), 3 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index acd60fbf1a5a..2a16631c1fda 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -566,7 +566,8 @@ struct vm_operations_struct {
 	vm_fault_t (*pfn_mkwrite)(struct vm_fault *vmf);
 
 	/* called by access_process_vm when get_user_pages() fails, typically
-	 * for use by special VMAs that can switch between memory and hardware
+	 * for use by special VMAs. See also generic_access_phys() for a generic
+	 * implementation useful for any iomem mapping.
 	 */
 	int (*access)(struct vm_area_struct *vma, unsigned long addr,
 		      void *buf, int len, int write);
diff --git a/mm/memory.c b/mm/memory.c
index fcfc4ca36eba..f7cbc4dde0ef 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4873,28 +4873,68 @@ int follow_phys(struct vm_area_struct *vma,
 	return ret;
 }
 
+/**
+ * generic_access_phys - generic implementation for iomem mmap access
+ * @vma: the vma to access
+ * @addr: userspace addres, not relative offset within @vma
+ * @buf: buffer to read/write
+ * @len: length of transfer
+ * @write: set to FOLL_WRITE when writing, otherwise reading
+ *
+ * This is a generic implementation for &vm_operations_struct.access for an
+ * iomem mapping. This callback is used by access_process_vm() when the @vma is
+ * not page based.
+ */
 int generic_access_phys(struct vm_area_struct *vma, unsigned long addr,
 			void *buf, int len, int write)
 {
 	resource_size_t phys_addr;
 	unsigned long prot = 0;
 	void __iomem *maddr;
+	pte_t *ptep, pte;
+	spinlock_t *ptl;
 	int offset = addr & (PAGE_SIZE-1);
+	int ret = -EINVAL;
+
+	if (!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
+		return -EINVAL;
+
+retry:
+	if (follow_pte(vma->vm_mm, addr, &ptep, &ptl))
+		return -EINVAL;
+	pte = *ptep;
+	pte_unmap_unlock(ptep, ptl);
 
-	if (follow_phys(vma, addr, write, &prot, &phys_addr))
+	prot = pgprot_val(pte_pgprot(pte));
+	phys_addr = (resource_size_t)pte_pfn(pte) << PAGE_SHIFT;
+
+	if ((write & FOLL_WRITE) && !pte_write(pte))
 		return -EINVAL;
 
 	maddr = ioremap_prot(phys_addr, PAGE_ALIGN(len + offset), prot);
 	if (!maddr)
 		return -ENOMEM;
 
+	if (follow_pte(vma->vm_mm, addr, &ptep, &ptl))
+		goto out_unmap;
+
+	if (!pte_same(pte, *ptep)) {
+		pte_unmap_unlock(ptep, ptl);
+		iounmap(maddr);
+
+		goto retry;
+	}
+
 	if (write)
 		memcpy_toio(maddr + offset, buf, len);
 	else
 		memcpy_fromio(buf, maddr + offset, len);
+	ret = len;
+	pte_unmap_unlock(ptep, ptl);
+out_unmap:
 	iounmap(maddr);
 
-	return len;
+	return ret;
 }
 EXPORT_SYMBOL_GPL(generic_access_phys);
 #endif
-- 
2.28.0


^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH v2 08/17] s390/pci: Remove races against pte updates
  2020-10-09  7:59 [PATCH v2 00/17] follow_pfn and other iomap races Daniel Vetter
                   ` (6 preceding siblings ...)
  2020-10-09  7:59 ` [PATCH v2 07/17] mm: Close race in generic_access_phys Daniel Vetter
@ 2020-10-09  7:59 ` Daniel Vetter
  2020-10-12 14:03   ` Niklas Schnelle
  2020-10-09  7:59 ` [PATCH v2 09/17] mm: Add unsafe_follow_pfn Daniel Vetter
                   ` (8 subsequent siblings)
  16 siblings, 1 reply; 69+ messages in thread
From: Daniel Vetter @ 2020-10-09  7:59 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: kvm, linux-mm, linux-arm-kernel, linux-samsung-soc, linux-media,
	linux-s390, Daniel Vetter, Gerald Schaefer, Daniel Vetter,
	Jason Gunthorpe, Dan Williams, Kees Cook, Andrew Morton,
	John Hubbard, Jérôme Glisse, Jan Kara, Niklas Schnelle

Way back it was a reasonable assumptions that iomem mappings never
change the pfn range they point at. But this has changed:

- gpu drivers dynamically manage their memory nowadays, invalidating
ptes with unmap_mapping_range when buffers get moved

- contiguous dma allocations have moved from dedicated carvetouts to
cma regions. This means if we miss the unmap the pfn might contain
pagecache or anon memory (well anything allocated with GFP_MOVEABLE)

- even /dev/mem now invalidates mappings when the kernel requests that
iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
("/dev/mem: Revoke mappings when a driver claims the region")

Accessing pfns obtained from ptes without holding all the locks is
therefore no longer a good idea. Fix this.

Since zpci_memcpy_from|toio seems to not do anything nefarious with
locks we just need to open code get_pfn and follow_pfn and make sure
we drop the locks only after we've done. The write function also needs
the copy_from_user move, since we can't take userspace faults while
holding the mmap sem.

Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Niklas Schnelle <schnelle@linux.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: linux-s390@vger.kernel.org
--
v2: Move VM_IO | VM_PFNMAP checks around so they keep returning EINVAL
like before (Gerard)
---
 arch/s390/pci/pci_mmio.c | 98 +++++++++++++++++++++++-----------------
 1 file changed, 57 insertions(+), 41 deletions(-)

diff --git a/arch/s390/pci/pci_mmio.c b/arch/s390/pci/pci_mmio.c
index 401cf670a243..1a6adbc68ee8 100644
--- a/arch/s390/pci/pci_mmio.c
+++ b/arch/s390/pci/pci_mmio.c
@@ -119,33 +119,15 @@ static inline int __memcpy_toio_inuser(void __iomem *dst,
 	return rc;
 }
 
-static long get_pfn(unsigned long user_addr, unsigned long access,
-		    unsigned long *pfn)
-{
-	struct vm_area_struct *vma;
-	long ret;
-
-	mmap_read_lock(current->mm);
-	ret = -EINVAL;
-	vma = find_vma(current->mm, user_addr);
-	if (!vma)
-		goto out;
-	ret = -EACCES;
-	if (!(vma->vm_flags & access))
-		goto out;
-	ret = follow_pfn(vma, user_addr, pfn);
-out:
-	mmap_read_unlock(current->mm);
-	return ret;
-}
-
 SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, mmio_addr,
 		const void __user *, user_buffer, size_t, length)
 {
 	u8 local_buf[64];
 	void __iomem *io_addr;
 	void *buf;
-	unsigned long pfn;
+	struct vm_area_struct *vma;
+	pte_t *ptep;
+	spinlock_t *ptl;
 	long ret;
 
 	if (!zpci_is_enabled())
@@ -158,7 +140,7 @@ SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, mmio_addr,
 	 * We only support write access to MIO capable devices if we are on
 	 * a MIO enabled system. Otherwise we would have to check for every
 	 * address if it is a special ZPCI_ADDR and would have to do
-	 * a get_pfn() which we don't need for MIO capable devices.  Currently
+	 * a pfn lookup which we don't need for MIO capable devices.  Currently
 	 * ISM devices are the only devices without MIO support and there is no
 	 * known need for accessing these from userspace.
 	 */
@@ -176,21 +158,37 @@ SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, mmio_addr,
 	} else
 		buf = local_buf;
 
-	ret = get_pfn(mmio_addr, VM_WRITE, &pfn);
+	ret = -EFAULT;
+	if (copy_from_user(buf, user_buffer, length))
+		goto out_free;
+
+	mmap_read_lock(current->mm);
+	ret = -EINVAL;
+	vma = find_vma(current->mm, mmio_addr);
+	if (!vma)
+		goto out_unlock_mmap;
+	if (!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
+		goto out_unlock_mmap;
+	ret = -EACCES;
+	if (!(vma->vm_flags & VM_WRITE))
+		goto out_unlock_mmap;
+
+	ret = follow_pte_pmd(vma->vm_mm, mmio_addr, NULL, &ptep, NULL, &ptl);
 	if (ret)
-		goto out;
-	io_addr = (void __iomem *)((pfn << PAGE_SHIFT) |
+		goto out_unlock_mmap;
+
+	io_addr = (void __iomem *)((pte_pfn(*ptep) << PAGE_SHIFT) |
 			(mmio_addr & ~PAGE_MASK));
 
-	ret = -EFAULT;
 	if ((unsigned long) io_addr < ZPCI_IOMAP_ADDR_BASE)
-		goto out;
-
-	if (copy_from_user(buf, user_buffer, length))
-		goto out;
+		goto out_unlock_pt;
 
 	ret = zpci_memcpy_toio(io_addr, buf, length);
-out:
+out_unlock_pt:
+	pte_unmap_unlock(ptep, ptl);
+out_unlock_mmap:
+	mmap_read_unlock(current->mm);
+out_free:
 	if (buf != local_buf)
 		kfree(buf);
 	return ret;
@@ -274,7 +272,9 @@ SYSCALL_DEFINE3(s390_pci_mmio_read, unsigned long, mmio_addr,
 	u8 local_buf[64];
 	void __iomem *io_addr;
 	void *buf;
-	unsigned long pfn;
+	struct vm_area_struct *vma;
+	pte_t *ptep;
+	spinlock_t *ptl;
 	long ret;
 
 	if (!zpci_is_enabled())
@@ -287,7 +287,7 @@ SYSCALL_DEFINE3(s390_pci_mmio_read, unsigned long, mmio_addr,
 	 * We only support read access to MIO capable devices if we are on
 	 * a MIO enabled system. Otherwise we would have to check for every
 	 * address if it is a special ZPCI_ADDR and would have to do
-	 * a get_pfn() which we don't need for MIO capable devices.  Currently
+	 * a pfn lookup which we don't need for MIO capable devices.  Currently
 	 * ISM devices are the only devices without MIO support and there is no
 	 * known need for accessing these from userspace.
 	 */
@@ -306,22 +306,38 @@ SYSCALL_DEFINE3(s390_pci_mmio_read, unsigned long, mmio_addr,
 		buf = local_buf;
 	}
 
-	ret = get_pfn(mmio_addr, VM_READ, &pfn);
+	mmap_read_lock(current->mm);
+	ret = -EINVAL;
+	vma = find_vma(current->mm, mmio_addr);
+	if (!vma)
+		goto out_unlock_mmap;
+	if (!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
+		goto out_unlock_mmap;
+	ret = -EACCES;
+	if (!(vma->vm_flags & VM_WRITE))
+		goto out_unlock_mmap;
+
+	ret = follow_pte_pmd(vma->vm_mm, mmio_addr, NULL, &ptep, NULL, &ptl);
 	if (ret)
-		goto out;
-	io_addr = (void __iomem *)((pfn << PAGE_SHIFT) | (mmio_addr & ~PAGE_MASK));
+		goto out_unlock_mmap;
+
+	io_addr = (void __iomem *)((pte_pfn(*ptep) << PAGE_SHIFT) |
+			(mmio_addr & ~PAGE_MASK));
 
 	if ((unsigned long) io_addr < ZPCI_IOMAP_ADDR_BASE) {
 		ret = -EFAULT;
-		goto out;
+		goto out_unlock_pt;
 	}
 	ret = zpci_memcpy_fromio(buf, io_addr, length);
-	if (ret)
-		goto out;
-	if (copy_to_user(user_buffer, buf, length))
+
+out_unlock_pt:
+	pte_unmap_unlock(ptep, ptl);
+out_unlock_mmap:
+	mmap_read_unlock(current->mm);
+
+	if (!ret && copy_to_user(user_buffer, buf, length))
 		ret = -EFAULT;
 
-out:
 	if (buf != local_buf)
 		kfree(buf);
 	return ret;
-- 
2.28.0


^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH v2 09/17] mm: Add unsafe_follow_pfn
  2020-10-09  7:59 [PATCH v2 00/17] follow_pfn and other iomap races Daniel Vetter
                   ` (7 preceding siblings ...)
  2020-10-09  7:59 ` [PATCH v2 08/17] s390/pci: Remove races against pte updates Daniel Vetter
@ 2020-10-09  7:59 ` Daniel Vetter
  2020-10-09 10:34   ` Mauro Carvalho Chehab
  2020-10-09  7:59 ` [PATCH v2 10/17] media/videbuf1|2: Mark follow_pfn usage as unsafe Daniel Vetter
                   ` (7 subsequent siblings)
  16 siblings, 1 reply; 69+ messages in thread
From: Daniel Vetter @ 2020-10-09  7:59 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: kvm, linux-mm, linux-arm-kernel, linux-samsung-soc, linux-media,
	linux-s390, Daniel Vetter, Daniel Vetter, Jason Gunthorpe,
	Kees Cook, Dan Williams, Andrew Morton, John Hubbard,
	Jérôme Glisse, Jan Kara

Way back it was a reasonable assumptions that iomem mappings never
change the pfn range they point at. But this has changed:

- gpu drivers dynamically manage their memory nowadays, invalidating
ptes with unmap_mapping_range when buffers get moved

- contiguous dma allocations have moved from dedicated carvetouts to
cma regions. This means if we miss the unmap the pfn might contain
pagecache or anon memory (well anything allocated with GFP_MOVEABLE)

- even /dev/mem now invalidates mappings when the kernel requests that
iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
("/dev/mem: Revoke mappings when a driver claims the region")

Accessing pfns obtained from ptes without holding all the locks is
therefore no longer a good idea.

Unfortunately there's some users where this is not fixable (like v4l
userptr of iomem mappings) or involves a pile of work (vfio type1
iommu). For now annotate these as unsafe and splat appropriately.

This patch adds an unsafe_follow_pfn, which later patches will then
roll out to all appropriate places.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kees Cook <keescook@chromium.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: kvm@vger.kernel.org
---
 include/linux/mm.h |  2 ++
 mm/memory.c        | 32 +++++++++++++++++++++++++++++++-
 mm/nommu.c         | 17 +++++++++++++++++
 security/Kconfig   | 13 +++++++++++++
 4 files changed, 63 insertions(+), 1 deletion(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 2a16631c1fda..ec8c90928fc9 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1653,6 +1653,8 @@ int follow_pte_pmd(struct mm_struct *mm, unsigned long address,
 		   pte_t **ptepp, pmd_t **pmdpp, spinlock_t **ptlp);
 int follow_pfn(struct vm_area_struct *vma, unsigned long address,
 	unsigned long *pfn);
+int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address,
+		      unsigned long *pfn);
 int follow_phys(struct vm_area_struct *vma, unsigned long address,
 		unsigned int flags, unsigned long *prot, resource_size_t *phys);
 int generic_access_phys(struct vm_area_struct *vma, unsigned long addr,
diff --git a/mm/memory.c b/mm/memory.c
index f7cbc4dde0ef..7c7b234ffb24 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4821,7 +4821,12 @@ EXPORT_SYMBOL(follow_pte_pmd);
  * @address: user virtual address
  * @pfn: location to store found PFN
  *
- * Only IO mappings and raw PFN mappings are allowed.
+ * Only IO mappings and raw PFN mappings are allowed. Note that callers must
+ * ensure coherency with pte updates by using a &mmu_notifier to follow updates.
+ * If this is not feasible, or the access to the @pfn is only very short term,
+ * use follow_pte_pmd() instead and hold the pagetable lock for the duration of
+ * the access instead. Any caller not following these requirements must use
+ * unsafe_follow_pfn() instead.
  *
  * Return: zero and the pfn at @pfn on success, -ve otherwise.
  */
@@ -4844,6 +4849,31 @@ int follow_pfn(struct vm_area_struct *vma, unsigned long address,
 }
 EXPORT_SYMBOL(follow_pfn);
 
+/**
+ * unsafe_follow_pfn - look up PFN at a user virtual address
+ * @vma: memory mapping
+ * @address: user virtual address
+ * @pfn: location to store found PFN
+ *
+ * Only IO mappings and raw PFN mappings are allowed.
+ *
+ * Returns zero and the pfn at @pfn on success, -ve otherwise.
+ */
+int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address,
+	unsigned long *pfn)
+{
+#ifdef CONFIG_STRICT_FOLLOW_PFN
+	pr_info("unsafe follow_pfn usage rejected, see CONFIG_STRICT_FOLLOW_PFN\n");
+	return -EINVAL;
+#else
+	WARN_ONCE(1, "unsafe follow_pfn usage\n");
+	add_taint(TAINT_USER, LOCKDEP_STILL_OK);
+
+	return follow_pfn(vma, address, pfn);
+#endif
+}
+EXPORT_SYMBOL(unsafe_follow_pfn);
+
 #ifdef CONFIG_HAVE_IOREMAP_PROT
 int follow_phys(struct vm_area_struct *vma,
 		unsigned long address, unsigned int flags,
diff --git a/mm/nommu.c b/mm/nommu.c
index 75a327149af1..3db2910f0d64 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -132,6 +132,23 @@ int follow_pfn(struct vm_area_struct *vma, unsigned long address,
 }
 EXPORT_SYMBOL(follow_pfn);
 
+/**
+ * unsafe_follow_pfn - look up PFN at a user virtual address
+ * @vma: memory mapping
+ * @address: user virtual address
+ * @pfn: location to store found PFN
+ *
+ * Only IO mappings and raw PFN mappings are allowed.
+ *
+ * Returns zero and the pfn at @pfn on success, -ve otherwise.
+ */
+int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address,
+	unsigned long *pfn)
+{
+	return follow_pfn(vma, address, pfn);
+}
+EXPORT_SYMBOL(unsafe_follow_pfn);
+
 LIST_HEAD(vmap_area_list);
 
 void vfree(const void *addr)
diff --git a/security/Kconfig b/security/Kconfig
index 7561f6f99f1d..48945402e103 100644
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -230,6 +230,19 @@ config STATIC_USERMODEHELPER_PATH
 	  If you wish for all usermode helper programs to be disabled,
 	  specify an empty string here (i.e. "").
 
+config STRICT_FOLLOW_PFN
+	bool "Disable unsafe use of follow_pfn"
+	depends on MMU
+	help
+	  Some functionality in the kernel follows userspace mappings to iomem
+	  ranges in an unsafe matter. Examples include v4l userptr for zero-copy
+	  buffers sharing.
+
+	  If this option is switched on, such access is rejected. Only enable
+	  this option when you must run userspace which requires this.
+
+	  If in doubt, say Y.
+
 source "security/selinux/Kconfig"
 source "security/smack/Kconfig"
 source "security/tomoyo/Kconfig"
-- 
2.28.0


^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH v2 10/17] media/videbuf1|2: Mark follow_pfn usage as unsafe
  2020-10-09  7:59 [PATCH v2 00/17] follow_pfn and other iomap races Daniel Vetter
                   ` (8 preceding siblings ...)
  2020-10-09  7:59 ` [PATCH v2 09/17] mm: Add unsafe_follow_pfn Daniel Vetter
@ 2020-10-09  7:59 ` Daniel Vetter
  2020-10-10  9:24   ` Mauro Carvalho Chehab
  2020-10-09  7:59 ` [PATCH v2 11/17] vfio/type1: Mark follow_pfn " Daniel Vetter
                   ` (6 subsequent siblings)
  16 siblings, 1 reply; 69+ messages in thread
From: Daniel Vetter @ 2020-10-09  7:59 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: kvm, linux-mm, linux-arm-kernel, linux-samsung-soc, linux-media,
	linux-s390, Daniel Vetter, Daniel Vetter, Jason Gunthorpe,
	Kees Cook, Dan Williams, Andrew Morton, John Hubbard,
	Jérôme Glisse, Jan Kara, Pawel Osciak,
	Marek Szyprowski, Kyungmin Park, Tomasz Figa, Laurent Dufour,
	Vlastimil Babka, Daniel Jordan, Michel Lespinasse

The media model assumes that buffers are all preallocated, so that
when a media pipeline is running we never miss a deadline because the
buffers aren't allocated or available.

This means we cannot fix the v4l follow_pfn usage through
mmu_notifier, without breaking how this all works. The only real fix
is to deprecate userptr support for VM_IO | VM_PFNMAP mappings and
tell everyone to cut over to dma-buf memory sharing for zerocopy.

userptr for normal memory will keep working as-is.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kees Cook <keescook@chromium.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Pawel Osciak <pawel@osciak.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Tomasz Figa <tfiga@chromium.org>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Michel Lespinasse <walken@google.com>
---
 drivers/media/common/videobuf2/frame_vector.c | 2 +-
 drivers/media/v4l2-core/videobuf-dma-contig.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/media/common/videobuf2/frame_vector.c b/drivers/media/common/videobuf2/frame_vector.c
index 2b0b97761d15..a1b85fe9e7c1 100644
--- a/drivers/media/common/videobuf2/frame_vector.c
+++ b/drivers/media/common/videobuf2/frame_vector.c
@@ -69,7 +69,7 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
 			break;
 
 		while (ret < nr_frames && start + PAGE_SIZE <= vma->vm_end) {
-			err = follow_pfn(vma, start, &nums[ret]);
+			err = unsafe_follow_pfn(vma, start, &nums[ret]);
 			if (err) {
 				if (ret == 0)
 					ret = err;
diff --git a/drivers/media/v4l2-core/videobuf-dma-contig.c b/drivers/media/v4l2-core/videobuf-dma-contig.c
index 52312ce2ba05..821c4a76ab96 100644
--- a/drivers/media/v4l2-core/videobuf-dma-contig.c
+++ b/drivers/media/v4l2-core/videobuf-dma-contig.c
@@ -183,7 +183,7 @@ static int videobuf_dma_contig_user_get(struct videobuf_dma_contig_memory *mem,
 	user_address = untagged_baddr;
 
 	while (pages_done < (mem->size >> PAGE_SHIFT)) {
-		ret = follow_pfn(vma, user_address, &this_pfn);
+		ret = unsafe_follow_pfn(vma, user_address, &this_pfn);
 		if (ret)
 			break;
 
-- 
2.28.0


^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH v2 11/17] vfio/type1: Mark follow_pfn as unsafe
  2020-10-09  7:59 [PATCH v2 00/17] follow_pfn and other iomap races Daniel Vetter
                   ` (9 preceding siblings ...)
  2020-10-09  7:59 ` [PATCH v2 10/17] media/videbuf1|2: Mark follow_pfn usage as unsafe Daniel Vetter
@ 2020-10-09  7:59 ` Daniel Vetter
  2020-10-09  7:59 ` [PATCH v2 12/17] PCI: Obey iomem restrictions for procfs mmap Daniel Vetter
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 69+ messages in thread
From: Daniel Vetter @ 2020-10-09  7:59 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: kvm, linux-mm, linux-arm-kernel, linux-samsung-soc, linux-media,
	linux-s390, Daniel Vetter, Daniel Vetter, Jason Gunthorpe,
	Kees Cook, Dan Williams, Andrew Morton, John Hubbard,
	Jérôme Glisse, Jan Kara, Alex Williamson,
	Cornelia Huck

The code seems to stuff these pfns into iommu pts (or something like
that, I didn't follow), but there's no mmu_notifier to ensure that
access is synchronized with pte updates.

Hence mark these as unsafe. This means that with
CONFIG_STRICT_FOLLOW_PFN, these will be rejected.

Real fix is to wire up an mmu_notifier ... somehow. Probably means any
invalidate is a fatal fault for this vfio device, but then this
shouldn't ever happen if userspace is reasonable.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kees Cook <keescook@chromium.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Cornelia Huck <cohuck@redhat.com>
Cc: kvm@vger.kernel.org
---
 drivers/vfio/vfio_iommu_type1.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 5fbf0c1f7433..a4d53f3d0a35 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -421,7 +421,7 @@ static int follow_fault_pfn(struct vm_area_struct *vma, struct mm_struct *mm,
 {
 	int ret;
 
-	ret = follow_pfn(vma, vaddr, pfn);
+	ret = unsafe_follow_pfn(vma, vaddr, pfn);
 	if (ret) {
 		bool unlocked = false;
 
@@ -435,7 +435,7 @@ static int follow_fault_pfn(struct vm_area_struct *vma, struct mm_struct *mm,
 		if (ret)
 			return ret;
 
-		ret = follow_pfn(vma, vaddr, pfn);
+		ret = unsafe_follow_pfn(vma, vaddr, pfn);
 	}
 
 	return ret;
-- 
2.28.0


^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH v2 12/17] PCI: Obey iomem restrictions for procfs mmap
  2020-10-09  7:59 [PATCH v2 00/17] follow_pfn and other iomap races Daniel Vetter
                   ` (10 preceding siblings ...)
  2020-10-09  7:59 ` [PATCH v2 11/17] vfio/type1: Mark follow_pfn " Daniel Vetter
@ 2020-10-09  7:59 ` Daniel Vetter
  2020-10-09  7:59 ` [PATCH v2 13/17] /dev/mem: Only set filp->f_mapping Daniel Vetter
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 69+ messages in thread
From: Daniel Vetter @ 2020-10-09  7:59 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: kvm, linux-mm, linux-arm-kernel, linux-samsung-soc, linux-media,
	linux-s390, Daniel Vetter, Daniel Vetter, Jason Gunthorpe,
	Kees Cook, Dan Williams, Andrew Morton, John Hubbard,
	Jérôme Glisse, Jan Kara, Bjorn Helgaas, linux-pci

There's three ways to access PCI BARs from userspace: /dev/mem, sysfs
files, and the old proc interface. Two check against
iomem_is_exclusive, proc never did. And with CONFIG_IO_STRICT_DEVMEM,
this starts to matter, since we don't want random userspace having
access to PCI BARs while a driver is loaded and using it.

Fix this by adding the same iomem_is_exclusive() check we already have
on the sysfs side in pci_mmap_resource().

References: 90a545e98126 ("restrict /dev/mem to idle io memory ranges")
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kees Cook <keescook@chromium.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-pci@vger.kernel.org
--
v2: Improve commit message (Bjorn)
---
 drivers/pci/proc.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/pci/proc.c b/drivers/pci/proc.c
index d35186b01d98..3a2f90beb4cb 100644
--- a/drivers/pci/proc.c
+++ b/drivers/pci/proc.c
@@ -274,6 +274,11 @@ static int proc_bus_pci_mmap(struct file *file, struct vm_area_struct *vma)
 		else
 			return -EINVAL;
 	}
+
+	if (dev->resource[i].flags & IORESOURCE_MEM &&
+	    iomem_is_exclusive(dev->resource[i].start))
+		return -EINVAL;
+
 	ret = pci_mmap_page_range(dev, i, vma,
 				  fpriv->mmap_state, write_combine);
 	if (ret < 0)
-- 
2.28.0


^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH v2 13/17] /dev/mem: Only set filp->f_mapping
  2020-10-09  7:59 [PATCH v2 00/17] follow_pfn and other iomap races Daniel Vetter
                   ` (11 preceding siblings ...)
  2020-10-09  7:59 ` [PATCH v2 12/17] PCI: Obey iomem restrictions for procfs mmap Daniel Vetter
@ 2020-10-09  7:59 ` Daniel Vetter
  2020-10-09  7:59 ` [PATCH v2 14/17] resource: Move devmem revoke code to resource framework Daniel Vetter
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 69+ messages in thread
From: Daniel Vetter @ 2020-10-09  7:59 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: kvm, linux-mm, linux-arm-kernel, linux-samsung-soc, linux-media,
	linux-s390, Daniel Vetter, Daniel Vetter, Jason Gunthorpe,
	Kees Cook, Dan Williams, Andrew Morton, John Hubbard,
	Jérôme Glisse, Jan Kara

When we care about pagecache maintenance, we need to make sure that
both f_mapping and i_mapping point at the right mapping.

But for iomem mappings we only care about the virtual/pte side of
things, so f_mapping is enough. Also setting inode->i_mapping was
confusing me as a driver maintainer, since in e.g. drivers/gpu we
don't do that. Per Dan this seems to be copypasta from places which do
care about pagecache consistency, but not needed. Hence remove it for
slightly less confusion.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kees Cook <keescook@chromium.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
---
 drivers/char/mem.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/char/mem.c b/drivers/char/mem.c
index abd4ffdc8cde..5502f56f3655 100644
--- a/drivers/char/mem.c
+++ b/drivers/char/mem.c
@@ -864,7 +864,6 @@ static int open_port(struct inode *inode, struct file *filp)
 	 * revocations when drivers want to take over a /dev/mem mapped
 	 * range.
 	 */
-	inode->i_mapping = devmem_inode->i_mapping;
 	filp->f_mapping = inode->i_mapping;
 
 	return 0;
-- 
2.28.0


^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH v2 14/17] resource: Move devmem revoke code to resource framework
  2020-10-09  7:59 [PATCH v2 00/17] follow_pfn and other iomap races Daniel Vetter
                   ` (12 preceding siblings ...)
  2020-10-09  7:59 ` [PATCH v2 13/17] /dev/mem: Only set filp->f_mapping Daniel Vetter
@ 2020-10-09  7:59 ` Daniel Vetter
  2020-10-09 10:59   ` Greg Kroah-Hartman
  2020-10-09 12:31   ` Jason Gunthorpe
  2020-10-09  7:59 ` [PATCH v2 15/17] sysfs: Support zapping of binary attr mmaps Daniel Vetter
                   ` (2 subsequent siblings)
  16 siblings, 2 replies; 69+ messages in thread
From: Daniel Vetter @ 2020-10-09  7:59 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: kvm, linux-mm, linux-arm-kernel, linux-samsung-soc, linux-media,
	linux-s390, Daniel Vetter, Daniel Vetter, Jason Gunthorpe,
	Kees Cook, Dan Williams, Andrew Morton, John Hubbard,
	Jérôme Glisse, Jan Kara, Arnd Bergmann,
	Greg Kroah-Hartman, David Hildenbrand, Rafael J. Wysocki

We want all iomem mmaps to consistently revoke ptes when the kernel
takes over and CONFIG_IO_STRICT_DEVMEM is enabled. This includes the
pci bar mmaps available through procfs and sysfs, which currently do
not revoke mappings.

To prepare for this, move the code from the /dev/kmem driver to
kernel/resource.c.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kees Cook <keescook@chromium.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Hildenbrand <david@redhat.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
---
 drivers/char/mem.c     | 85 +------------------------------------
 include/linux/ioport.h |  6 +--
 kernel/resource.c      | 95 +++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 96 insertions(+), 90 deletions(-)

diff --git a/drivers/char/mem.c b/drivers/char/mem.c
index 5502f56f3655..53338aad8d28 100644
--- a/drivers/char/mem.c
+++ b/drivers/char/mem.c
@@ -31,9 +31,6 @@
 #include <linux/uio.h>
 #include <linux/uaccess.h>
 #include <linux/security.h>
-#include <linux/pseudo_fs.h>
-#include <uapi/linux/magic.h>
-#include <linux/mount.h>
 
 #ifdef CONFIG_IA64
 # include <linux/efi.h>
@@ -809,42 +806,6 @@ static loff_t memory_lseek(struct file *file, loff_t offset, int orig)
 	return ret;
 }
 
-static struct inode *devmem_inode;
-
-#ifdef CONFIG_IO_STRICT_DEVMEM
-void revoke_devmem(struct resource *res)
-{
-	/* pairs with smp_store_release() in devmem_init_inode() */
-	struct inode *inode = smp_load_acquire(&devmem_inode);
-
-	/*
-	 * Check that the initialization has completed. Losing the race
-	 * is ok because it means drivers are claiming resources before
-	 * the fs_initcall level of init and prevent /dev/mem from
-	 * establishing mappings.
-	 */
-	if (!inode)
-		return;
-
-	/*
-	 * The expectation is that the driver has successfully marked
-	 * the resource busy by this point, so devmem_is_allowed()
-	 * should start returning false, however for performance this
-	 * does not iterate the entire resource range.
-	 */
-	if (devmem_is_allowed(PHYS_PFN(res->start)) &&
-	    devmem_is_allowed(PHYS_PFN(res->end))) {
-		/*
-		 * *cringe* iomem=relaxed says "go ahead, what's the
-		 * worst that can happen?"
-		 */
-		return;
-	}
-
-	unmap_mapping_range(inode->i_mapping, res->start, resource_size(res), 1);
-}
-#endif
-
 static int open_port(struct inode *inode, struct file *filp)
 {
 	int rc;
@@ -864,7 +825,7 @@ static int open_port(struct inode *inode, struct file *filp)
 	 * revocations when drivers want to take over a /dev/mem mapped
 	 * range.
 	 */
-	filp->f_mapping = inode->i_mapping;
+	filp->f_mapping = iomem_get_mapping();
 
 	return 0;
 }
@@ -995,48 +956,6 @@ static char *mem_devnode(struct device *dev, umode_t *mode)
 
 static struct class *mem_class;
 
-static int devmem_fs_init_fs_context(struct fs_context *fc)
-{
-	return init_pseudo(fc, DEVMEM_MAGIC) ? 0 : -ENOMEM;
-}
-
-static struct file_system_type devmem_fs_type = {
-	.name		= "devmem",
-	.owner		= THIS_MODULE,
-	.init_fs_context = devmem_fs_init_fs_context,
-	.kill_sb	= kill_anon_super,
-};
-
-static int devmem_init_inode(void)
-{
-	static struct vfsmount *devmem_vfs_mount;
-	static int devmem_fs_cnt;
-	struct inode *inode;
-	int rc;
-
-	rc = simple_pin_fs(&devmem_fs_type, &devmem_vfs_mount, &devmem_fs_cnt);
-	if (rc < 0) {
-		pr_err("Cannot mount /dev/mem pseudo filesystem: %d\n", rc);
-		return rc;
-	}
-
-	inode = alloc_anon_inode(devmem_vfs_mount->mnt_sb);
-	if (IS_ERR(inode)) {
-		rc = PTR_ERR(inode);
-		pr_err("Cannot allocate inode for /dev/mem: %d\n", rc);
-		simple_release_fs(&devmem_vfs_mount, &devmem_fs_cnt);
-		return rc;
-	}
-
-	/*
-	 * Publish /dev/mem initialized.
-	 * Pairs with smp_load_acquire() in revoke_devmem().
-	 */
-	smp_store_release(&devmem_inode, inode);
-
-	return 0;
-}
-
 static int __init chr_dev_init(void)
 {
 	int minor;
@@ -1058,8 +977,6 @@ static int __init chr_dev_init(void)
 		 */
 		if ((minor == DEVPORT_MINOR) && !arch_has_dev_port())
 			continue;
-		if ((minor == DEVMEM_MINOR) && devmem_init_inode() != 0)
-			continue;
 
 		device_create(mem_class, NULL, MKDEV(MEM_MAJOR, minor),
 			      NULL, devlist[minor].name);
diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index 6c2b06fe8beb..8ffb61b36606 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -302,11 +302,7 @@ struct resource *devm_request_free_mem_region(struct device *dev,
 struct resource *request_free_mem_region(struct resource *base,
 		unsigned long size, const char *name);
 
-#ifdef CONFIG_IO_STRICT_DEVMEM
-void revoke_devmem(struct resource *res);
-#else
-static inline void revoke_devmem(struct resource *res) { };
-#endif
+extern struct address_space *iomem_get_mapping(void);
 
 #endif /* __ASSEMBLY__ */
 #endif	/* _LINUX_IOPORT_H */
diff --git a/kernel/resource.c b/kernel/resource.c
index 841737bbda9e..22153fdec4f5 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -18,12 +18,15 @@
 #include <linux/spinlock.h>
 #include <linux/fs.h>
 #include <linux/proc_fs.h>
+#include <linux/pseudo_fs.h>
 #include <linux/sched.h>
 #include <linux/seq_file.h>
 #include <linux/device.h>
 #include <linux/pfn.h>
 #include <linux/mm.h>
+#include <linux/mount.h>
 #include <linux/resource_ext.h>
+#include <uapi/linux/magic.h>
 #include <asm/io.h>
 
 
@@ -1112,6 +1115,52 @@ resource_size_t resource_alignment(struct resource *res)
 
 static DECLARE_WAIT_QUEUE_HEAD(muxed_resource_wait);
 
+static struct inode *iomem_inode;
+
+#ifdef CONFIG_IO_STRICT_DEVMEM
+static void revoke_iomem(struct resource *res)
+{
+	/* pairs with smp_store_release() in iomem_init_inode() */
+	struct inode *inode = smp_load_acquire(&iomem_inode);
+
+	/*
+	 * Check that the initialization has completed. Losing the race
+	 * is ok because it means drivers are claiming resources before
+	 * the fs_initcall level of init and prevent /dev/mem from
+	 * establishing mappings.
+	 */
+	if (!inode)
+		return;
+
+	/*
+	 * The expectation is that the driver has successfully marked
+	 * the resource busy by this point, so devmem_is_allowed()
+	 * should start returning false, however for performance this
+	 * does not iterate the entire resource range.
+	 */
+	if (devmem_is_allowed(PHYS_PFN(res->start)) &&
+	    devmem_is_allowed(PHYS_PFN(res->end))) {
+		/*
+		 * *cringe* iomem=relaxed says "go ahead, what's the
+		 * worst that can happen?"
+		 */
+		return;
+	}
+
+	unmap_mapping_range(inode->i_mapping, res->start, resource_size(res), 1);
+}
+struct address_space *iomem_get_mapping(void)
+{
+	return iomem_inode->i_mapping;
+}
+#else
+static void revoke_iomem(struct resource *res) {}
+struct address_space *iomem_get_mapping(void)
+{
+	return NULL;
+}
+#endif
+
 /**
  * __request_region - create a new busy resource region
  * @parent: parent resource descriptor
@@ -1179,7 +1228,7 @@ struct resource * __request_region(struct resource *parent,
 	write_unlock(&resource_lock);
 
 	if (res && orig_parent == &iomem_resource)
-		revoke_devmem(res);
+		revoke_iomem(res);
 
 	return res;
 }
@@ -1713,4 +1762,48 @@ static int __init strict_iomem(char *str)
 	return 1;
 }
 
+static int iomem_fs_init_fs_context(struct fs_context *fc)
+{
+	return init_pseudo(fc, DEVMEM_MAGIC) ? 0 : -ENOMEM;
+}
+
+static struct file_system_type iomem_fs_type = {
+	.name		= "iomem",
+	.owner		= THIS_MODULE,
+	.init_fs_context = iomem_fs_init_fs_context,
+	.kill_sb	= kill_anon_super,
+};
+
+static int __init iomem_init_inode(void)
+{
+	static struct vfsmount *iomem_vfs_mount;
+	static int iomem_fs_cnt;
+	struct inode *inode;
+	int rc;
+
+	rc = simple_pin_fs(&iomem_fs_type, &iomem_vfs_mount, &iomem_fs_cnt);
+	if (rc < 0) {
+		pr_err("Cannot mount iomem pseudo filesystem: %d\n", rc);
+		return rc;
+	}
+
+	inode = alloc_anon_inode(iomem_vfs_mount->mnt_sb);
+	if (IS_ERR(inode)) {
+		rc = PTR_ERR(inode);
+		pr_err("Cannot allocate inode for iomem: %d\n", rc);
+		simple_release_fs(&iomem_vfs_mount, &iomem_fs_cnt);
+		return rc;
+	}
+
+	/*
+	 * Publish /dev/mem initialized.
+	 * Pairs with smp_load_acquire() in revoke_iomem().
+	 */
+	smp_store_release(&iomem_inode, inode);
+
+	return 0;
+}
+
+fs_initcall(iomem_init_inode);
+
 __setup("iomem=", strict_iomem);
-- 
2.28.0


^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH v2 15/17] sysfs: Support zapping of binary attr mmaps
  2020-10-09  7:59 [PATCH v2 00/17] follow_pfn and other iomap races Daniel Vetter
                   ` (13 preceding siblings ...)
  2020-10-09  7:59 ` [PATCH v2 14/17] resource: Move devmem revoke code to resource framework Daniel Vetter
@ 2020-10-09  7:59 ` Daniel Vetter
  2020-10-09 10:58   ` Greg Kroah-Hartman
  2020-10-09  7:59 ` [PATCH v2 16/17] PCI: Revoke mappings like devmem Daniel Vetter
  2020-10-09  7:59 ` [PATCH v2 17/17] drm/i915: Properly request PCI BARs Daniel Vetter
  16 siblings, 1 reply; 69+ messages in thread
From: Daniel Vetter @ 2020-10-09  7:59 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: kvm, linux-mm, linux-arm-kernel, linux-samsung-soc, linux-media,
	linux-s390, Daniel Vetter, Daniel Vetter, Jason Gunthorpe,
	Kees Cook, Dan Williams, Andrew Morton, John Hubbard,
	Jérôme Glisse, Jan Kara, Bjorn Helgaas, linux-pci,
	Greg Kroah-Hartman, Rafael J. Wysocki, Christian Brauner,
	David S. Miller, Michael Ellerman, Sourabh Jain,
	Mauro Carvalho Chehab, Nayna Jain

We want to be able to revoke pci mmaps so that the same access rules
applies as for /dev/kmem. Revoke support for devmem was added in
3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims the
region").

The simplest way to achieve this is by having the same filp->f_mapping
for all mappings, so that unmap_mapping_range can find them all, no
matter through which file they've been created. Since this must be set
at open time we need sysfs support for this.

Add an optional mapping parameter bin_attr, which is only consulted
when there's also an mmap callback, since without mmap support
allowing to adjust the ->f_mapping makes no sense.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kees Cook <keescook@chromium.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-pci@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Sourabh Jain <sourabhjain@linux.ibm.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Cc: Nayna Jain <nayna@linux.ibm.com>
---
 fs/sysfs/file.c       | 11 +++++++++++
 include/linux/sysfs.h |  2 ++
 2 files changed, 13 insertions(+)

diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c
index eb6897ab78e7..9d8ccdb000e3 100644
--- a/fs/sysfs/file.c
+++ b/fs/sysfs/file.c
@@ -169,6 +169,16 @@ static int sysfs_kf_bin_mmap(struct kernfs_open_file *of,
 	return battr->mmap(of->file, kobj, battr, vma);
 }
 
+static int sysfs_kf_bin_open(struct kernfs_open_file *of)
+{
+	struct bin_attribute *battr = of->kn->priv;
+
+	if (battr->mapping)
+		of->file->f_mapping = battr->mapping;
+
+	return 0;
+}
+
 void sysfs_notify(struct kobject *kobj, const char *dir, const char *attr)
 {
 	struct kernfs_node *kn = kobj->sd, *tmp;
@@ -240,6 +250,7 @@ static const struct kernfs_ops sysfs_bin_kfops_mmap = {
 	.read		= sysfs_kf_bin_read,
 	.write		= sysfs_kf_bin_write,
 	.mmap		= sysfs_kf_bin_mmap,
+	.open		= sysfs_kf_bin_open,
 };
 
 int sysfs_add_file_mode_ns(struct kernfs_node *parent,
diff --git a/include/linux/sysfs.h b/include/linux/sysfs.h
index 34e84122f635..a17a474d1601 100644
--- a/include/linux/sysfs.h
+++ b/include/linux/sysfs.h
@@ -164,11 +164,13 @@ __ATTRIBUTE_GROUPS(_name)
 
 struct file;
 struct vm_area_struct;
+struct address_space;
 
 struct bin_attribute {
 	struct attribute	attr;
 	size_t			size;
 	void			*private;
+	struct address_space	*mapping;
 	ssize_t (*read)(struct file *, struct kobject *, struct bin_attribute *,
 			char *, loff_t, size_t);
 	ssize_t (*write)(struct file *, struct kobject *, struct bin_attribute *,
-- 
2.28.0


^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH v2 16/17] PCI: Revoke mappings like devmem
  2020-10-09  7:59 [PATCH v2 00/17] follow_pfn and other iomap races Daniel Vetter
                   ` (14 preceding siblings ...)
  2020-10-09  7:59 ` [PATCH v2 15/17] sysfs: Support zapping of binary attr mmaps Daniel Vetter
@ 2020-10-09  7:59 ` Daniel Vetter
  2020-10-09  7:59 ` [PATCH v2 17/17] drm/i915: Properly request PCI BARs Daniel Vetter
  16 siblings, 0 replies; 69+ messages in thread
From: Daniel Vetter @ 2020-10-09  7:59 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: kvm, linux-mm, linux-arm-kernel, linux-samsung-soc, linux-media,
	linux-s390, Daniel Vetter, Daniel Vetter, Jason Gunthorpe,
	Kees Cook, Dan Williams, Andrew Morton, John Hubbard,
	Jérôme Glisse, Jan Kara, Greg Kroah-Hartman,
	Bjorn Helgaas, linux-pci

Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims
the region") /dev/kmem zaps ptes when the kernel requests exclusive
acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is
the default for all driver uses.

Except there's two more ways to access PCI BARs: sysfs and proc mmap
support. Let's plug that hole.

For revoke_devmem() to work we need to link our vma into the same
address_space, with consistent vma->vm_pgoff. ->pgoff is already
adjusted, because that's how (io_)remap_pfn_range works, but for the
mapping we need to adjust vma->vm_file->f_mapping. The cleanest way is
to adjust this at at ->open time:

- for sysfs this is easy, now that binary attributes support this. We
  just set bin_attr->mapping when mmap is supported
- for procfs it's a bit more tricky, since procfs pci access has only
  one file per device, and access to a specific resources first needs
  to be set up with some ioctl calls. But mmap is only supported for
  the same resources as sysfs exposes with mmap support, and otherwise
  rejected, so we can set the mapping unconditionally at open time
  without harm.

A special consideration is for arch_can_pci_mmap_io() - we need to
make sure that the ->f_mapping doesn't alias between ioport and iomem
space. There's only 2 ways in-tree to support mmap of ioports: generic
pci mmap (ARCH_GENERIC_PCI_MMAP_RESOURCE), and sparc as the single
architecture hand-rolling. Both approach support ioport mmap through a
special pfn range and not through magic pte attributes. Aliasing is
therefore not a problem.

The only difference in access checks left is that sysfs PCI mmap does
not check for CAP_RAWIO. I'm not really sure whether that should be
added or not.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kees Cook <keescook@chromium.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-pci@vger.kernel.org
--
v2:
- Totally new approach: Adjust filp->f_mapping at open time. Note that
  this now works on all architectures, not just those support
  ARCH_GENERIC_PCI_MMAP_RESOURCE
---
 drivers/pci/pci-sysfs.c | 4 ++++
 drivers/pci/proc.c      | 1 +
 2 files changed, 5 insertions(+)

diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index 6d78df981d41..cee38fcb4a86 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -928,6 +928,7 @@ void pci_create_legacy_files(struct pci_bus *b)
 	b->legacy_io->read = pci_read_legacy_io;
 	b->legacy_io->write = pci_write_legacy_io;
 	b->legacy_io->mmap = pci_mmap_legacy_io;
+	b->legacy_io->mapping = iomem_get_mapping();
 	pci_adjust_legacy_attr(b, pci_mmap_io);
 	error = device_create_bin_file(&b->dev, b->legacy_io);
 	if (error)
@@ -940,6 +941,7 @@ void pci_create_legacy_files(struct pci_bus *b)
 	b->legacy_mem->size = 1024*1024;
 	b->legacy_mem->attr.mode = 0600;
 	b->legacy_mem->mmap = pci_mmap_legacy_mem;
+	b->legacy_io->mapping = iomem_get_mapping();
 	pci_adjust_legacy_attr(b, pci_mmap_mem);
 	error = device_create_bin_file(&b->dev, b->legacy_mem);
 	if (error)
@@ -1155,6 +1157,8 @@ static int pci_create_attr(struct pci_dev *pdev, int num, int write_combine)
 			res_attr->mmap = pci_mmap_resource_uc;
 		}
 	}
+	if (res_attr->mmap)
+		res_attr->mapping = iomem_get_mapping();
 	res_attr->attr.name = res_attr_name;
 	res_attr->attr.mode = 0600;
 	res_attr->size = pci_resource_len(pdev, num);
diff --git a/drivers/pci/proc.c b/drivers/pci/proc.c
index 3a2f90beb4cb..9bab07302bbf 100644
--- a/drivers/pci/proc.c
+++ b/drivers/pci/proc.c
@@ -298,6 +298,7 @@ static int proc_bus_pci_open(struct inode *inode, struct file *file)
 	fpriv->write_combine = 0;
 
 	file->private_data = fpriv;
+	file->f_mapping = iomem_get_mapping();
 
 	return 0;
 }
-- 
2.28.0


^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH v2 17/17] drm/i915: Properly request PCI BARs
  2020-10-09  7:59 [PATCH v2 00/17] follow_pfn and other iomap races Daniel Vetter
                   ` (15 preceding siblings ...)
  2020-10-09  7:59 ` [PATCH v2 16/17] PCI: Revoke mappings like devmem Daniel Vetter
@ 2020-10-09  7:59 ` Daniel Vetter
  2020-10-09  9:47   ` Ville Syrjälä
  16 siblings, 1 reply; 69+ messages in thread
From: Daniel Vetter @ 2020-10-09  7:59 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: kvm, linux-mm, linux-arm-kernel, linux-samsung-soc, linux-media,
	linux-s390, Daniel Vetter, Daniel Vetter, Jason Gunthorpe,
	Kees Cook, Dan Williams, Andrew Morton, John Hubbard,
	Jérôme Glisse, Jan Kara, Bjorn Helgaas, linux-pci

When trying to test my CONFIG_IO_STRICT_DEVMEM changes I realized they
do nothing for i915. Because i915 doesn't request any regions, like
pretty much all drm pci drivers. I guess this is some very old
remnants from the userspace modesetting days, when we wanted to
co-exist with the fbdev driver. Which usually requested these
resources.

But makes me wonder why the pci subsystem doesn't just request
resource automatically when we map a bar and a pci driver is bound?

Knowledge about which pci bars we need kludged together from
intel_uncore.c and intel_gtt.c from i915 and intel-gtt.c over in the
fake agp driver.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kees Cook <keescook@chromium.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-pci@vger.kernel.org
---
 drivers/gpu/drm/i915/intel_uncore.c | 25 +++++++++++++++++++++++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
index 54e201fdeba4..ce39049d8919 100644
--- a/drivers/gpu/drm/i915/intel_uncore.c
+++ b/drivers/gpu/drm/i915/intel_uncore.c
@@ -1692,10 +1692,13 @@ static int uncore_mmio_setup(struct intel_uncore *uncore)
 	struct pci_dev *pdev = i915->drm.pdev;
 	int mmio_bar;
 	int mmio_size;
+	int bar_selection;
+	int ret;
 
 	mmio_bar = IS_GEN(i915, 2) ? 1 : 0;
+	bar_selection = BIT (2) | BIT(mmio_bar);
 	/*
-	 * Before gen4, the registers and the GTT are behind different BARs.
+	 * On gen3 the registers and the GTT are behind different BARs.
 	 * However, from gen4 onwards, the registers and the GTT are shared
 	 * in the same BAR, so we want to restrict this ioremap from
 	 * clobbering the GTT which we want ioremap_wc instead. Fortunately,
@@ -1703,6 +1706,8 @@ static int uncore_mmio_setup(struct intel_uncore *uncore)
 	 * generations up to Ironlake.
 	 * For dgfx chips register range is expanded to 4MB.
 	 */
+	if (INTEL_GEN(i915) == 3)
+		bar_selection |= BIT(3);
 	if (INTEL_GEN(i915) < 5)
 		mmio_size = 512 * 1024;
 	else if (IS_DGFX(i915))
@@ -1710,8 +1715,15 @@ static int uncore_mmio_setup(struct intel_uncore *uncore)
 	else
 		mmio_size = 2 * 1024 * 1024;
 
+	ret = pci_request_selected_regions(pdev, bar_selection, "i915");
+	if (ret < 0) {
+		drm_err(&i915->drm, "failed to request pci bars\n");
+		return ret;
+	}
+
 	uncore->regs = pci_iomap(pdev, mmio_bar, mmio_size);
 	if (uncore->regs == NULL) {
+		pci_release_selected_regions(pdev, bar_selection);
 		drm_err(&i915->drm, "failed to map registers\n");
 		return -EIO;
 	}
@@ -1721,9 +1733,18 @@ static int uncore_mmio_setup(struct intel_uncore *uncore)
 
 static void uncore_mmio_cleanup(struct intel_uncore *uncore)
 {
-	struct pci_dev *pdev = uncore->i915->drm.pdev;
+	struct drm_i915_private *i915 = uncore->i915;
+	struct pci_dev *pdev = i915->drm.pdev;
+	int mmio_bar;
+	int bar_selection;
+
+	mmio_bar = IS_GEN(i915, 2) ? 1 : 0;
+	bar_selection = BIT (2) | BIT(mmio_bar);
+	if (INTEL_GEN(i915) == 3)
+		bar_selection |= BIT(3);
 
 	pci_iounmap(pdev, uncore->regs);
+	pci_release_selected_regions(pdev, bar_selection);
 }
 
 void intel_uncore_init_early(struct intel_uncore *uncore,
-- 
2.28.0


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 17/17] drm/i915: Properly request PCI BARs
  2020-10-09  7:59 ` [PATCH v2 17/17] drm/i915: Properly request PCI BARs Daniel Vetter
@ 2020-10-09  9:47   ` Ville Syrjälä
  2020-10-09 10:01     ` Daniel Vetter
  0 siblings, 1 reply; 69+ messages in thread
From: Ville Syrjälä @ 2020-10-09  9:47 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: DRI Development, LKML, linux-s390, linux-samsung-soc, Jan Kara,
	Kees Cook, kvm, Jason Gunthorpe, linux-pci, linux-mm,
	Jérôme Glisse, John Hubbard, Bjorn Helgaas,
	Daniel Vetter, Dan Williams, Andrew Morton, linux-arm-kernel,
	linux-media

On Fri, Oct 09, 2020 at 09:59:34AM +0200, Daniel Vetter wrote:
> When trying to test my CONFIG_IO_STRICT_DEVMEM changes I realized they
> do nothing for i915. Because i915 doesn't request any regions, like
> pretty much all drm pci drivers. I guess this is some very old
> remnants from the userspace modesetting days, when we wanted to
> co-exist with the fbdev driver. Which usually requested these
> resources.
> 
> But makes me wonder why the pci subsystem doesn't just request
> resource automatically when we map a bar and a pci driver is bound?
> 
> Knowledge about which pci bars we need kludged together from
> intel_uncore.c and intel_gtt.c from i915 and intel-gtt.c over in the
> fake agp driver.
> 
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: linux-mm@kvack.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-samsung-soc@vger.kernel.org
> Cc: linux-media@vger.kernel.org
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Cc: linux-pci@vger.kernel.org
> ---
>  drivers/gpu/drm/i915/intel_uncore.c | 25 +++++++++++++++++++++++--
>  1 file changed, 23 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
> index 54e201fdeba4..ce39049d8919 100644
> --- a/drivers/gpu/drm/i915/intel_uncore.c
> +++ b/drivers/gpu/drm/i915/intel_uncore.c
> @@ -1692,10 +1692,13 @@ static int uncore_mmio_setup(struct intel_uncore *uncore)
>  	struct pci_dev *pdev = i915->drm.pdev;
>  	int mmio_bar;
>  	int mmio_size;
> +	int bar_selection;

Signed bitmasks always make me uneasy. But looks like
that's what it is in the pci api. So meh.

> +	int ret;
>  
>  	mmio_bar = IS_GEN(i915, 2) ? 1 : 0;
> +	bar_selection = BIT (2) | BIT(mmio_bar);
                           ^
spurious space			   

That's also not correct for gen2 I think.

gen2:
0 = GMADR
1 = MMADR
2 = IOBAR

gen3:
0 = MMADR
1 = IOBAR
2 = GMADR
3 = GTTADR

gen4+:
0+1 = GTTMMADR
2+3 = GMADR
4 = IOBAR

Maybe we should just have an explicit list of bars like that in a
comment?

I'd also suggest sucking this bitmask calculation into a small helper
so you can reuse it for the release.

>  	/*
> -	 * Before gen4, the registers and the GTT are behind different BARs.
> +	 * On gen3 the registers and the GTT are behind different BARs.
>  	 * However, from gen4 onwards, the registers and the GTT are shared
>  	 * in the same BAR, so we want to restrict this ioremap from
>  	 * clobbering the GTT which we want ioremap_wc instead. Fortunately,
> @@ -1703,6 +1706,8 @@ static int uncore_mmio_setup(struct intel_uncore *uncore)
>  	 * generations up to Ironlake.
>  	 * For dgfx chips register range is expanded to 4MB.
>  	 */
> +	if (INTEL_GEN(i915) == 3)
> +		bar_selection |= BIT(3);
>  	if (INTEL_GEN(i915) < 5)
>  		mmio_size = 512 * 1024;
>  	else if (IS_DGFX(i915))
> @@ -1710,8 +1715,15 @@ static int uncore_mmio_setup(struct intel_uncore *uncore)
>  	else
>  		mmio_size = 2 * 1024 * 1024;
>  
> +	ret = pci_request_selected_regions(pdev, bar_selection, "i915");
> +	if (ret < 0) {
> +		drm_err(&i915->drm, "failed to request pci bars\n");
> +		return ret;
> +	}
> +
>  	uncore->regs = pci_iomap(pdev, mmio_bar, mmio_size);
>  	if (uncore->regs == NULL) {
> +		pci_release_selected_regions(pdev, bar_selection);
>  		drm_err(&i915->drm, "failed to map registers\n");
>  		return -EIO;
>  	}
> @@ -1721,9 +1733,18 @@ static int uncore_mmio_setup(struct intel_uncore *uncore)
>  
>  static void uncore_mmio_cleanup(struct intel_uncore *uncore)
>  {
> -	struct pci_dev *pdev = uncore->i915->drm.pdev;
> +	struct drm_i915_private *i915 = uncore->i915;
> +	struct pci_dev *pdev = i915->drm.pdev;
> +	int mmio_bar;
> +	int bar_selection;
> +
> +	mmio_bar = IS_GEN(i915, 2) ? 1 : 0;
> +	bar_selection = BIT (2) | BIT(mmio_bar);
> +	if (INTEL_GEN(i915) == 3)
> +		bar_selection |= BIT(3);
>  
>  	pci_iounmap(pdev, uncore->regs);
> +	pci_release_selected_regions(pdev, bar_selection);
>  }
>  
>  void intel_uncore_init_early(struct intel_uncore *uncore,
> -- 
> 2.28.0
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

-- 
Ville Syrjälä
Intel

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 17/17] drm/i915: Properly request PCI BARs
  2020-10-09  9:47   ` Ville Syrjälä
@ 2020-10-09 10:01     ` Daniel Vetter
  2020-10-09 10:41       ` Ville Syrjälä
  0 siblings, 1 reply; 69+ messages in thread
From: Daniel Vetter @ 2020-10-09 10:01 UTC (permalink / raw)
  To: Ville Syrjälä
  Cc: DRI Development, LKML, linux-s390, linux-samsung-soc, Jan Kara,
	Kees Cook, KVM list, Jason Gunthorpe, Linux PCI, Linux MM,
	Jérôme Glisse, John Hubbard, Bjorn Helgaas,
	Daniel Vetter, Dan Williams, Andrew Morton, Linux ARM,
	open list:DMA BUFFER SHARING FRAMEWORK

On Fri, Oct 9, 2020 at 11:47 AM Ville Syrjälä
<ville.syrjala@linux.intel.com> wrote:
>
> On Fri, Oct 09, 2020 at 09:59:34AM +0200, Daniel Vetter wrote:
> > When trying to test my CONFIG_IO_STRICT_DEVMEM changes I realized they
> > do nothing for i915. Because i915 doesn't request any regions, like
> > pretty much all drm pci drivers. I guess this is some very old
> > remnants from the userspace modesetting days, when we wanted to
> > co-exist with the fbdev driver. Which usually requested these
> > resources.
> >
> > But makes me wonder why the pci subsystem doesn't just request
> > resource automatically when we map a bar and a pci driver is bound?
> >
> > Knowledge about which pci bars we need kludged together from
> > intel_uncore.c and intel_gtt.c from i915 and intel-gtt.c over in the
> > fake agp driver.
> >
> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > Cc: Jason Gunthorpe <jgg@ziepe.ca>
> > Cc: Kees Cook <keescook@chromium.org>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: John Hubbard <jhubbard@nvidia.com>
> > Cc: Jérôme Glisse <jglisse@redhat.com>
> > Cc: Jan Kara <jack@suse.cz>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: linux-mm@kvack.org
> > Cc: linux-arm-kernel@lists.infradead.org
> > Cc: linux-samsung-soc@vger.kernel.org
> > Cc: linux-media@vger.kernel.org
> > Cc: Bjorn Helgaas <bhelgaas@google.com>
> > Cc: linux-pci@vger.kernel.org
> > ---
> >  drivers/gpu/drm/i915/intel_uncore.c | 25 +++++++++++++++++++++++--
> >  1 file changed, 23 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
> > index 54e201fdeba4..ce39049d8919 100644
> > --- a/drivers/gpu/drm/i915/intel_uncore.c
> > +++ b/drivers/gpu/drm/i915/intel_uncore.c
> > @@ -1692,10 +1692,13 @@ static int uncore_mmio_setup(struct intel_uncore *uncore)
> >       struct pci_dev *pdev = i915->drm.pdev;
> >       int mmio_bar;
> >       int mmio_size;
> > +     int bar_selection;
>
> Signed bitmasks always make me uneasy. But looks like
> that's what it is in the pci api. So meh.

Yeah it's surprising.

> > +     int ret;
> >
> >       mmio_bar = IS_GEN(i915, 2) ? 1 : 0;
> > +     bar_selection = BIT (2) | BIT(mmio_bar);
>                            ^
> spurious space
>
> That's also not correct for gen2 I think.
>
> gen2:
> 0 = GMADR
> 1 = MMADR
> 2 = IOBAR
>
> gen3:
> 0 = MMADR
> 1 = IOBAR
> 2 = GMADR
> 3 = GTTADR
>
> gen4+:
> 0+1 = GTTMMADR
> 2+3 = GMADR
> 4 = IOBAR
>
> Maybe we should just have an explicit list of bars like that in a
> comment?
>
> I'd also suggest sucking this bitmask calculation into a small helper
> so you can reuse it for the release.

tbh I just hacked this up for testing. Given how almost no other drm
driver does this, I'm wondering whether we should or not.

Also the only reason why I didn't just use the pci_request_regions
helper is to avoid the vga ioport range, since that's managed by
vgaarbiter.

So I think if we go for this for real we should:
- register the vga ioport range in the vgaarbiter
- have a pci_request_iomem_regions helper that grabs all mem bars
- roll that out to all drm pci drivers

Or something like that. The other complication is when we resize the
iobar. So not really sure what to do here.
-Daniel

>
> >       /*
> > -      * Before gen4, the registers and the GTT are behind different BARs.
> > +      * On gen3 the registers and the GTT are behind different BARs.
> >        * However, from gen4 onwards, the registers and the GTT are shared
> >        * in the same BAR, so we want to restrict this ioremap from
> >        * clobbering the GTT which we want ioremap_wc instead. Fortunately,
> > @@ -1703,6 +1706,8 @@ static int uncore_mmio_setup(struct intel_uncore *uncore)
> >        * generations up to Ironlake.
> >        * For dgfx chips register range is expanded to 4MB.
> >        */
> > +     if (INTEL_GEN(i915) == 3)
> > +             bar_selection |= BIT(3);
> >       if (INTEL_GEN(i915) < 5)
> >               mmio_size = 512 * 1024;
> >       else if (IS_DGFX(i915))
> > @@ -1710,8 +1715,15 @@ static int uncore_mmio_setup(struct intel_uncore *uncore)
> >       else
> >               mmio_size = 2 * 1024 * 1024;
> >
> > +     ret = pci_request_selected_regions(pdev, bar_selection, "i915");
> > +     if (ret < 0) {
> > +             drm_err(&i915->drm, "failed to request pci bars\n");
> > +             return ret;
> > +     }
> > +
> >       uncore->regs = pci_iomap(pdev, mmio_bar, mmio_size);
> >       if (uncore->regs == NULL) {
> > +             pci_release_selected_regions(pdev, bar_selection);
> >               drm_err(&i915->drm, "failed to map registers\n");
> >               return -EIO;
> >       }
> > @@ -1721,9 +1733,18 @@ static int uncore_mmio_setup(struct intel_uncore *uncore)
> >
> >  static void uncore_mmio_cleanup(struct intel_uncore *uncore)
> >  {
> > -     struct pci_dev *pdev = uncore->i915->drm.pdev;
> > +     struct drm_i915_private *i915 = uncore->i915;
> > +     struct pci_dev *pdev = i915->drm.pdev;
> > +     int mmio_bar;
> > +     int bar_selection;
> > +
> > +     mmio_bar = IS_GEN(i915, 2) ? 1 : 0;
> > +     bar_selection = BIT (2) | BIT(mmio_bar);
> > +     if (INTEL_GEN(i915) == 3)
> > +             bar_selection |= BIT(3);
> >
> >       pci_iounmap(pdev, uncore->regs);
> > +     pci_release_selected_regions(pdev, bar_selection);
> >  }
> >
> >  void intel_uncore_init_early(struct intel_uncore *uncore,
> > --
> > 2.28.0
> >
> > _______________________________________________
> > dri-devel mailing list
> > dri-devel@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
> --
> Ville Syrjälä
> Intel



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 06/17] media: videobuf2: Move frame_vector into media subsystem
  2020-10-09  7:59 ` [PATCH v2 06/17] media: videobuf2: Move frame_vector into media subsystem Daniel Vetter
@ 2020-10-09 10:14   ` Mauro Carvalho Chehab
  2020-10-09 16:57     ` Daniel Vetter
  0 siblings, 1 reply; 69+ messages in thread
From: Mauro Carvalho Chehab @ 2020-10-09 10:14 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: DRI Development, LKML, kvm, linux-mm, linux-arm-kernel,
	linux-samsung-soc, linux-media, linux-s390, John Hubbard,
	Daniel Vetter, Jason Gunthorpe, Pawel Osciak, Marek Szyprowski,
	Kyungmin Park, Tomasz Figa, Andrew Morton,
	Jérôme Glisse, Jan Kara, Dan Williams

Em Fri,  9 Oct 2020 09:59:23 +0200
Daniel Vetter <daniel.vetter@ffwll.ch> escreveu:

> It's the only user. This also garbage collects the CONFIG_FRAME_VECTOR
> symbol from all over the tree (well just one place, somehow omap media
> driver still had this in its Kconfig, despite not using it).
> 
> Reviewed-by: John Hubbard <jhubbard@nvidia.com>
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Pawel Osciak <pawel@osciak.com>
> Cc: Marek Szyprowski <m.szyprowski@samsung.com>
> Cc: Kyungmin Park <kyungmin.park@samsung.com>
> Cc: Tomasz Figa <tfiga@chromium.org>
> Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: linux-mm@kvack.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-samsung-soc@vger.kernel.org
> Cc: linux-media@vger.kernel.org
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> ---
>  drivers/media/common/videobuf2/Kconfig        |  1 -
>  drivers/media/common/videobuf2/Makefile       |  1 +
>  .../media/common/videobuf2}/frame_vector.c    |  2 +
>  drivers/media/platform/omap/Kconfig           |  1 -
>  include/linux/mm.h                            | 42 -------------------
>  include/media/videobuf2-core.h                | 42 +++++++++++++++++++
>  mm/Kconfig                                    |  3 --
>  mm/Makefile                                   |  1 -
>  8 files changed, 45 insertions(+), 48 deletions(-)
>  rename {mm => drivers/media/common/videobuf2}/frame_vector.c (99%)
> 
> diff --git a/drivers/media/common/videobuf2/Kconfig b/drivers/media/common/videobuf2/Kconfig
> index edbc99ebba87..d2223a12c95f 100644
> --- a/drivers/media/common/videobuf2/Kconfig
> +++ b/drivers/media/common/videobuf2/Kconfig
> @@ -9,7 +9,6 @@ config VIDEOBUF2_V4L2
>  
>  config VIDEOBUF2_MEMOPS
>  	tristate
> -	select FRAME_VECTOR
>  
>  config VIDEOBUF2_DMA_CONTIG
>  	tristate
> diff --git a/drivers/media/common/videobuf2/Makefile b/drivers/media/common/videobuf2/Makefile
> index 77bebe8b202f..54306f8d096c 100644
> --- a/drivers/media/common/videobuf2/Makefile
> +++ b/drivers/media/common/videobuf2/Makefile
> @@ -1,5 +1,6 @@
>  # SPDX-License-Identifier: GPL-2.0
>  videobuf2-common-objs := videobuf2-core.o
> +videobuf2-common-objs += frame_vector.o
>  
>  ifeq ($(CONFIG_TRACEPOINTS),y)
>    videobuf2-common-objs += vb2-trace.o
> diff --git a/mm/frame_vector.c b/drivers/media/common/videobuf2/frame_vector.c
> similarity index 99%
> rename from mm/frame_vector.c
> rename to drivers/media/common/videobuf2/frame_vector.c
> index d44779e56313..2b0b97761d15 100644
> --- a/mm/frame_vector.c
> +++ b/drivers/media/common/videobuf2/frame_vector.c
> @@ -8,6 +8,8 @@
>  #include <linux/pagemap.h>
>  #include <linux/sched.h>
>  
> +#include <media/videobuf2-core.h>
> +

See my comment below...

>  /**
>   * get_vaddr_frames() - map virtual addresses to pfns
>   * @start:	starting user address
> diff --git a/drivers/media/platform/omap/Kconfig b/drivers/media/platform/omap/Kconfig
> index f73b5893220d..de16de46c0f4 100644
> --- a/drivers/media/platform/omap/Kconfig
> +++ b/drivers/media/platform/omap/Kconfig
> @@ -12,6 +12,5 @@ config VIDEO_OMAP2_VOUT
>  	depends on VIDEO_V4L2
>  	select VIDEOBUF2_DMA_CONTIG
>  	select OMAP2_VRFB if ARCH_OMAP2 || ARCH_OMAP3
> -	select FRAME_VECTOR
>  	help
>  	  V4L2 Display driver support for OMAP2/3 based boards.
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 16b799a0522c..acd60fbf1a5a 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1743,48 +1743,6 @@ int account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc);
>  int __account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc,
>  			struct task_struct *task, bool bypass_rlim);
>  
> -/* Container for pinned pfns / pages */
> -struct frame_vector {
> -	unsigned int nr_allocated;	/* Number of frames we have space for */
> -	unsigned int nr_frames;	/* Number of frames stored in ptrs array */
> -	bool got_ref;		/* Did we pin pages by getting page ref? */
> -	bool is_pfns;		/* Does array contain pages or pfns? */
> -	void *ptrs[];		/* Array of pinned pfns / pages. Use
> -				 * pfns_vector_pages() or pfns_vector_pfns()
> -				 * for access */
> -};
> -
> -struct frame_vector *frame_vector_create(unsigned int nr_frames);
> -void frame_vector_destroy(struct frame_vector *vec);
> -int get_vaddr_frames(unsigned long start, unsigned int nr_pfns,
> -		     unsigned int gup_flags, struct frame_vector *vec);
> -void put_vaddr_frames(struct frame_vector *vec);
> -int frame_vector_to_pages(struct frame_vector *vec);
> -void frame_vector_to_pfns(struct frame_vector *vec);
> -
> -static inline unsigned int frame_vector_count(struct frame_vector *vec)
> -{
> -	return vec->nr_frames;
> -}
> -
> -static inline struct page **frame_vector_pages(struct frame_vector *vec)
> -{
> -	if (vec->is_pfns) {
> -		int err = frame_vector_to_pages(vec);
> -
> -		if (err)
> -			return ERR_PTR(err);
> -	}
> -	return (struct page **)(vec->ptrs);
> -}
> -
> -static inline unsigned long *frame_vector_pfns(struct frame_vector *vec)
> -{
> -	if (!vec->is_pfns)
> -		frame_vector_to_pfns(vec);
> -	return (unsigned long *)(vec->ptrs);
> -}
> -
>  struct kvec;
>  int get_kernel_pages(const struct kvec *iov, int nr_pages, int write,
>  			struct page **pages);
> diff --git a/include/media/videobuf2-core.h b/include/media/videobuf2-core.h
> index bbb3f26fbde9..a2e75ca0334f 100644
> --- a/include/media/videobuf2-core.h
> +++ b/include/media/videobuf2-core.h
> @@ -1254,4 +1254,46 @@ bool vb2_request_object_is_buffer(struct media_request_object *obj);
>   */
>  unsigned int vb2_request_buffer_cnt(struct media_request *req);
>  
> +/* Container for pinned pfns / pages in frame_vector.c */
> +struct frame_vector {
> +	unsigned int nr_allocated;	/* Number of frames we have space for */
> +	unsigned int nr_frames;	/* Number of frames stored in ptrs array */
> +	bool got_ref;		/* Did we pin pages by getting page ref? */
> +	bool is_pfns;		/* Does array contain pages or pfns? */
> +	void *ptrs[];		/* Array of pinned pfns / pages. Use
> +				 * pfns_vector_pages() or pfns_vector_pfns()
> +				 * for access */
> +};
> +
> +struct frame_vector *frame_vector_create(unsigned int nr_frames);
> +void frame_vector_destroy(struct frame_vector *vec);
> +int get_vaddr_frames(unsigned long start, unsigned int nr_pfns,
> +		     unsigned int gup_flags, struct frame_vector *vec);
> +void put_vaddr_frames(struct frame_vector *vec);
> +int frame_vector_to_pages(struct frame_vector *vec);
> +void frame_vector_to_pfns(struct frame_vector *vec);
> +
> +static inline unsigned int frame_vector_count(struct frame_vector *vec)
> +{
> +	return vec->nr_frames;
> +}
> +
> +static inline struct page **frame_vector_pages(struct frame_vector *vec)
> +{
> +	if (vec->is_pfns) {
> +		int err = frame_vector_to_pages(vec);
> +
> +		if (err)
> +			return ERR_PTR(err);
> +	}
> +	return (struct page **)(vec->ptrs);
> +}
> +
> +static inline unsigned long *frame_vector_pfns(struct frame_vector *vec)
> +{
> +	if (!vec->is_pfns)
> +		frame_vector_to_pfns(vec);
> +	return (unsigned long *)(vec->ptrs);
> +}
> +
>  #endif /* _MEDIA_VIDEOBUF2_CORE_H */

Please place those into a include/media/frame_vector.h file, instead of
merging it directly at vb2 core header.

Then include the new header at videobuf2-core.h and at frame_vector.c.

With such changes:

Acked-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

> diff --git a/mm/Kconfig b/mm/Kconfig
> index 6c974888f86f..da6c943fe9f1 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -815,9 +815,6 @@ config DEVICE_PRIVATE
>  	  memory; i.e., memory that is only accessible from the device (or
>  	  group of devices). You likely also want to select HMM_MIRROR.
>  
> -config FRAME_VECTOR
> -	bool
> -
>  config ARCH_USES_HIGH_VMA_FLAGS
>  	bool
>  config ARCH_HAS_PKEYS
> diff --git a/mm/Makefile b/mm/Makefile
> index d5649f1c12c0..a025fd6c6afd 100644
> --- a/mm/Makefile
> +++ b/mm/Makefile
> @@ -111,7 +111,6 @@ obj-$(CONFIG_PAGE_EXTENSION) += page_ext.o
>  obj-$(CONFIG_CMA_DEBUGFS) += cma_debug.o
>  obj-$(CONFIG_USERFAULTFD) += userfaultfd.o
>  obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o
> -obj-$(CONFIG_FRAME_VECTOR) += frame_vector.o
>  obj-$(CONFIG_DEBUG_PAGE_REF) += debug_page_ref.o
>  obj-$(CONFIG_HARDENED_USERCOPY) += usercopy.o
>  obj-$(CONFIG_PERCPU_STATS) += percpu-stats.o



Thanks,
Mauro

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 09/17] mm: Add unsafe_follow_pfn
  2020-10-09  7:59 ` [PATCH v2 09/17] mm: Add unsafe_follow_pfn Daniel Vetter
@ 2020-10-09 10:34   ` Mauro Carvalho Chehab
  2020-10-09 12:21     ` Jason Gunthorpe
  0 siblings, 1 reply; 69+ messages in thread
From: Mauro Carvalho Chehab @ 2020-10-09 10:34 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: DRI Development, LKML, kvm, linux-mm, linux-arm-kernel,
	linux-samsung-soc, linux-media, linux-s390, Daniel Vetter,
	Jason Gunthorpe, Kees Cook, Dan Williams, Andrew Morton,
	John Hubbard, Jérôme Glisse, Jan Kara, Linus Torvalds

Hi,

Em Fri,  9 Oct 2020 09:59:26 +0200
Daniel Vetter <daniel.vetter@ffwll.ch> escreveu:

> Way back it was a reasonable assumptions that iomem mappings never
> change the pfn range they point at. But this has changed:
> 
> - gpu drivers dynamically manage their memory nowadays, invalidating
> ptes with unmap_mapping_range when buffers get moved
> 
> - contiguous dma allocations have moved from dedicated carvetouts to
> cma regions. This means if we miss the unmap the pfn might contain
> pagecache or anon memory (well anything allocated with GFP_MOVEABLE)
> 
> - even /dev/mem now invalidates mappings when the kernel requests that
> iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
> ("/dev/mem: Revoke mappings when a driver claims the region")
> 
> Accessing pfns obtained from ptes without holding all the locks is
> therefore no longer a good idea.
> 
> Unfortunately there's some users where this is not fixable (like v4l
> userptr of iomem mappings) or involves a pile of work (vfio type1
> iommu). For now annotate these as unsafe and splat appropriately.
> 
> This patch adds an unsafe_follow_pfn, which later patches will then
> roll out to all appropriate places.

NACK, as this breaks an existing userspace API on media.

While I agree that using the userptr on media is something that
new drivers may not support, as DMABUF is a better way of
handling it, changing this for existing ones is a big no, 
as it may break usersapace.

The right approach here would be to be able to fine-tune
support for it on a per-driver basis, e. g. disabling such
feature only for drivers that would use a movable page.

The media subsystem has already a way to disable USERPTR
support from VB2. the right approach would be to ensure
that newer drivers will only set this if they won't use
movable pages.

Regards,
Mauro

> 
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: linux-mm@kvack.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-samsung-soc@vger.kernel.org
> Cc: linux-media@vger.kernel.org
> Cc: kvm@vger.kernel.org
> ---
>  include/linux/mm.h |  2 ++
>  mm/memory.c        | 32 +++++++++++++++++++++++++++++++-
>  mm/nommu.c         | 17 +++++++++++++++++
>  security/Kconfig   | 13 +++++++++++++
>  4 files changed, 63 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 2a16631c1fda..ec8c90928fc9 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1653,6 +1653,8 @@ int follow_pte_pmd(struct mm_struct *mm, unsigned long address,
>  		   pte_t **ptepp, pmd_t **pmdpp, spinlock_t **ptlp);
>  int follow_pfn(struct vm_area_struct *vma, unsigned long address,
>  	unsigned long *pfn);
> +int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address,
> +		      unsigned long *pfn);
>  int follow_phys(struct vm_area_struct *vma, unsigned long address,
>  		unsigned int flags, unsigned long *prot, resource_size_t *phys);
>  int generic_access_phys(struct vm_area_struct *vma, unsigned long addr,
> diff --git a/mm/memory.c b/mm/memory.c
> index f7cbc4dde0ef..7c7b234ffb24 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4821,7 +4821,12 @@ EXPORT_SYMBOL(follow_pte_pmd);
>   * @address: user virtual address
>   * @pfn: location to store found PFN
>   *
> - * Only IO mappings and raw PFN mappings are allowed.
> + * Only IO mappings and raw PFN mappings are allowed. Note that callers must
> + * ensure coherency with pte updates by using a &mmu_notifier to follow updates.
> + * If this is not feasible, or the access to the @pfn is only very short term,
> + * use follow_pte_pmd() instead and hold the pagetable lock for the duration of
> + * the access instead. Any caller not following these requirements must use
> + * unsafe_follow_pfn() instead.
>   *
>   * Return: zero and the pfn at @pfn on success, -ve otherwise.
>   */
> @@ -4844,6 +4849,31 @@ int follow_pfn(struct vm_area_struct *vma, unsigned long address,
>  }
>  EXPORT_SYMBOL(follow_pfn);
>  
> +/**
> + * unsafe_follow_pfn - look up PFN at a user virtual address
> + * @vma: memory mapping
> + * @address: user virtual address
> + * @pfn: location to store found PFN
> + *
> + * Only IO mappings and raw PFN mappings are allowed.
> + *
> + * Returns zero and the pfn at @pfn on success, -ve otherwise.
> + */
> +int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address,
> +	unsigned long *pfn)
> +{
> +#ifdef CONFIG_STRICT_FOLLOW_PFN
> +	pr_info("unsafe follow_pfn usage rejected, see CONFIG_STRICT_FOLLOW_PFN\n");
> +	return -EINVAL;
> +#else
> +	WARN_ONCE(1, "unsafe follow_pfn usage\n");
> +	add_taint(TAINT_USER, LOCKDEP_STILL_OK);
> +
> +	return follow_pfn(vma, address, pfn);
> +#endif
> +}
> +EXPORT_SYMBOL(unsafe_follow_pfn);
> +
>  #ifdef CONFIG_HAVE_IOREMAP_PROT
>  int follow_phys(struct vm_area_struct *vma,
>  		unsigned long address, unsigned int flags,
> diff --git a/mm/nommu.c b/mm/nommu.c
> index 75a327149af1..3db2910f0d64 100644
> --- a/mm/nommu.c
> +++ b/mm/nommu.c
> @@ -132,6 +132,23 @@ int follow_pfn(struct vm_area_struct *vma, unsigned long address,
>  }
>  EXPORT_SYMBOL(follow_pfn);
>  
> +/**
> + * unsafe_follow_pfn - look up PFN at a user virtual address
> + * @vma: memory mapping
> + * @address: user virtual address
> + * @pfn: location to store found PFN
> + *
> + * Only IO mappings and raw PFN mappings are allowed.
> + *
> + * Returns zero and the pfn at @pfn on success, -ve otherwise.
> + */
> +int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address,
> +	unsigned long *pfn)
> +{
> +	return follow_pfn(vma, address, pfn);
> +}
> +EXPORT_SYMBOL(unsafe_follow_pfn);
> +
>  LIST_HEAD(vmap_area_list);
>  
>  void vfree(const void *addr)
> diff --git a/security/Kconfig b/security/Kconfig
> index 7561f6f99f1d..48945402e103 100644
> --- a/security/Kconfig
> +++ b/security/Kconfig
> @@ -230,6 +230,19 @@ config STATIC_USERMODEHELPER_PATH
>  	  If you wish for all usermode helper programs to be disabled,
>  	  specify an empty string here (i.e. "").
>  
> +config STRICT_FOLLOW_PFN
> +	bool "Disable unsafe use of follow_pfn"
> +	depends on MMU
> +	help
> +	  Some functionality in the kernel follows userspace mappings to iomem
> +	  ranges in an unsafe matter. Examples include v4l userptr for zero-copy
> +	  buffers sharing.
> +
> +	  If this option is switched on, such access is rejected. Only enable
> +	  this option when you must run userspace which requires this.
> +
> +	  If in doubt, say Y.
> +
>  source "security/selinux/Kconfig"
>  source "security/smack/Kconfig"
>  source "security/tomoyo/Kconfig"



Thanks,
Mauro

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 17/17] drm/i915: Properly request PCI BARs
  2020-10-09 10:01     ` Daniel Vetter
@ 2020-10-09 10:41       ` Ville Syrjälä
  2020-10-09 14:18         ` Daniel Vetter
  0 siblings, 1 reply; 69+ messages in thread
From: Ville Syrjälä @ 2020-10-09 10:41 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: DRI Development, LKML, linux-s390, linux-samsung-soc, Jan Kara,
	Kees Cook, KVM list, Jason Gunthorpe, Linux PCI, Linux MM,
	Jérôme Glisse, John Hubbard, Bjorn Helgaas,
	Daniel Vetter, Dan Williams, Andrew Morton, Linux ARM,
	open list:DMA BUFFER SHARING FRAMEWORK

On Fri, Oct 09, 2020 at 12:01:39PM +0200, Daniel Vetter wrote:
> On Fri, Oct 9, 2020 at 11:47 AM Ville Syrjälä
> <ville.syrjala@linux.intel.com> wrote:
> >
> > On Fri, Oct 09, 2020 at 09:59:34AM +0200, Daniel Vetter wrote:
> > > When trying to test my CONFIG_IO_STRICT_DEVMEM changes I realized they
> > > do nothing for i915. Because i915 doesn't request any regions, like
> > > pretty much all drm pci drivers. I guess this is some very old
> > > remnants from the userspace modesetting days, when we wanted to
> > > co-exist with the fbdev driver. Which usually requested these
> > > resources.
> > >
> > > But makes me wonder why the pci subsystem doesn't just request
> > > resource automatically when we map a bar and a pci driver is bound?
> > >
> > > Knowledge about which pci bars we need kludged together from
> > > intel_uncore.c and intel_gtt.c from i915 and intel-gtt.c over in the
> > > fake agp driver.
> > >
> > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > > Cc: Jason Gunthorpe <jgg@ziepe.ca>
> > > Cc: Kees Cook <keescook@chromium.org>
> > > Cc: Dan Williams <dan.j.williams@intel.com>
> > > Cc: Andrew Morton <akpm@linux-foundation.org>
> > > Cc: John Hubbard <jhubbard@nvidia.com>
> > > Cc: Jérôme Glisse <jglisse@redhat.com>
> > > Cc: Jan Kara <jack@suse.cz>
> > > Cc: Dan Williams <dan.j.williams@intel.com>
> > > Cc: linux-mm@kvack.org
> > > Cc: linux-arm-kernel@lists.infradead.org
> > > Cc: linux-samsung-soc@vger.kernel.org
> > > Cc: linux-media@vger.kernel.org
> > > Cc: Bjorn Helgaas <bhelgaas@google.com>
> > > Cc: linux-pci@vger.kernel.org
> > > ---
> > >  drivers/gpu/drm/i915/intel_uncore.c | 25 +++++++++++++++++++++++--
> > >  1 file changed, 23 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
> > > index 54e201fdeba4..ce39049d8919 100644
> > > --- a/drivers/gpu/drm/i915/intel_uncore.c
> > > +++ b/drivers/gpu/drm/i915/intel_uncore.c
> > > @@ -1692,10 +1692,13 @@ static int uncore_mmio_setup(struct intel_uncore *uncore)
> > >       struct pci_dev *pdev = i915->drm.pdev;
> > >       int mmio_bar;
> > >       int mmio_size;
> > > +     int bar_selection;
> >
> > Signed bitmasks always make me uneasy. But looks like
> > that's what it is in the pci api. So meh.
> 
> Yeah it's surprising.
> 
> > > +     int ret;
> > >
> > >       mmio_bar = IS_GEN(i915, 2) ? 1 : 0;
> > > +     bar_selection = BIT (2) | BIT(mmio_bar);
> >                            ^
> > spurious space
> >
> > That's also not correct for gen2 I think.
> >
> > gen2:
> > 0 = GMADR
> > 1 = MMADR
> > 2 = IOBAR
> >
> > gen3:
> > 0 = MMADR
> > 1 = IOBAR
> > 2 = GMADR
> > 3 = GTTADR
> >
> > gen4+:
> > 0+1 = GTTMMADR
> > 2+3 = GMADR
> > 4 = IOBAR
> >
> > Maybe we should just have an explicit list of bars like that in a
> > comment?
> >
> > I'd also suggest sucking this bitmask calculation into a small helper
> > so you can reuse it for the release.
> 
> tbh I just hacked this up for testing. Given how almost no other drm
> driver does this, I'm wondering whether we should or not.
> 
> Also the only reason why I didn't just use the pci_request_regions
> helper is to avoid the vga ioport range, since that's managed by
> vgaarbiter.

VGA io range isn't part of any bar. Or do you mean just the io decode
enable bit in the pci command register? That should be just a matter
or pci_enable_device() vs. pci_enable_device_mem() I think. So nothing
to do with which bars we've requested IIRC.

> 
> So I think if we go for this for real we should:
> - register the vga ioport range in the vgaarbiter
> - have a pci_request_iomem_regions helper that grabs all mem bars
> - roll that out to all drm pci drivers
> 
> Or something like that. The other complication is when we resize the
> iobar. So not really sure what to do here.

We resize it?

-- 
Ville Syrjälä
Intel

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 15/17] sysfs: Support zapping of binary attr mmaps
  2020-10-09  7:59 ` [PATCH v2 15/17] sysfs: Support zapping of binary attr mmaps Daniel Vetter
@ 2020-10-09 10:58   ` Greg Kroah-Hartman
  0 siblings, 0 replies; 69+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-09 10:58 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: DRI Development, LKML, kvm, linux-mm, linux-arm-kernel,
	linux-samsung-soc, linux-media, linux-s390, Daniel Vetter,
	Jason Gunthorpe, Kees Cook, Dan Williams, Andrew Morton,
	John Hubbard, Jérôme Glisse, Jan Kara, Bjorn Helgaas,
	linux-pci, Rafael J. Wysocki, Christian Brauner, David S. Miller,
	Michael Ellerman, Sourabh Jain, Mauro Carvalho Chehab,
	Nayna Jain

On Fri, Oct 09, 2020 at 09:59:32AM +0200, Daniel Vetter wrote:
> We want to be able to revoke pci mmaps so that the same access rules
> applies as for /dev/kmem. Revoke support for devmem was added in
> 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims the
> region").
> 
> The simplest way to achieve this is by having the same filp->f_mapping
> for all mappings, so that unmap_mapping_range can find them all, no
> matter through which file they've been created. Since this must be set
> at open time we need sysfs support for this.
> 
> Add an optional mapping parameter bin_attr, which is only consulted
> when there's also an mmap callback, since without mmap support
> allowing to adjust the ->f_mapping makes no sense.
> 
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: linux-mm@kvack.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-samsung-soc@vger.kernel.org
> Cc: linux-media@vger.kernel.org
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Cc: linux-pci@vger.kernel.org
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> Cc: Christian Brauner <christian.brauner@ubuntu.com>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Sourabh Jain <sourabhjain@linux.ibm.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> Cc: Nayna Jain <nayna@linux.ibm.com>
> ---
>  fs/sysfs/file.c       | 11 +++++++++++
>  include/linux/sysfs.h |  2 ++
>  2 files changed, 13 insertions(+)

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 14/17] resource: Move devmem revoke code to resource framework
  2020-10-09  7:59 ` [PATCH v2 14/17] resource: Move devmem revoke code to resource framework Daniel Vetter
@ 2020-10-09 10:59   ` Greg Kroah-Hartman
  2020-10-09 12:31   ` Jason Gunthorpe
  1 sibling, 0 replies; 69+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-09 10:59 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: DRI Development, LKML, kvm, linux-mm, linux-arm-kernel,
	linux-samsung-soc, linux-media, linux-s390, Daniel Vetter,
	Jason Gunthorpe, Kees Cook, Dan Williams, Andrew Morton,
	John Hubbard, Jérôme Glisse, Jan Kara, Arnd Bergmann,
	David Hildenbrand, Rafael J. Wysocki

On Fri, Oct 09, 2020 at 09:59:31AM +0200, Daniel Vetter wrote:
> We want all iomem mmaps to consistently revoke ptes when the kernel
> takes over and CONFIG_IO_STRICT_DEVMEM is enabled. This includes the
> pci bar mmaps available through procfs and sysfs, which currently do
> not revoke mappings.
> 
> To prepare for this, move the code from the /dev/kmem driver to
> kernel/resource.c.
> 
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: linux-mm@kvack.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-samsung-soc@vger.kernel.org
> Cc: linux-media@vger.kernel.org
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 09/17] mm: Add unsafe_follow_pfn
  2020-10-09 10:34   ` Mauro Carvalho Chehab
@ 2020-10-09 12:21     ` Jason Gunthorpe
  2020-10-09 12:37       ` Mauro Carvalho Chehab
  0 siblings, 1 reply; 69+ messages in thread
From: Jason Gunthorpe @ 2020-10-09 12:21 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Daniel Vetter, DRI Development, LKML, kvm, linux-mm,
	linux-arm-kernel, linux-samsung-soc, linux-media, linux-s390,
	Daniel Vetter, Kees Cook, Dan Williams, Andrew Morton,
	John Hubbard, Jérôme Glisse, Jan Kara, Linus Torvalds

On Fri, Oct 09, 2020 at 12:34:21PM +0200, Mauro Carvalho Chehab wrote:
> Hi,
> 
> Em Fri,  9 Oct 2020 09:59:26 +0200
> Daniel Vetter <daniel.vetter@ffwll.ch> escreveu:
> 
> > Way back it was a reasonable assumptions that iomem mappings never
> > change the pfn range they point at. But this has changed:
> > 
> > - gpu drivers dynamically manage their memory nowadays, invalidating
> > ptes with unmap_mapping_range when buffers get moved
> > 
> > - contiguous dma allocations have moved from dedicated carvetouts to
> > cma regions. This means if we miss the unmap the pfn might contain
> > pagecache or anon memory (well anything allocated with GFP_MOVEABLE)
> > 
> > - even /dev/mem now invalidates mappings when the kernel requests that
> > iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
> > ("/dev/mem: Revoke mappings when a driver claims the region")
> > 
> > Accessing pfns obtained from ptes without holding all the locks is
> > therefore no longer a good idea.
> > 
> > Unfortunately there's some users where this is not fixable (like v4l
> > userptr of iomem mappings) or involves a pile of work (vfio type1
> > iommu). For now annotate these as unsafe and splat appropriately.
> > 
> > This patch adds an unsafe_follow_pfn, which later patches will then
> > roll out to all appropriate places.
> 
> NACK, as this breaks an existing userspace API on media.

It doesn't break it. You get a big warning the thing is broken and it
keeps working.

We can't leave such a huge security hole open - it impacts other
subsystems, distros need to be able to run in a secure mode.

> While I agree that using the userptr on media is something that
> new drivers may not support, as DMABUF is a better way of
> handling it, changing this for existing ones is a big no, 
> as it may break usersapace.

media community needs to work to fix this, not pretend it is OK to
keep going as-is.

Dealing with security issues is the one case where an uABI break might
be acceptable.

If you want to NAK it then you need to come up with the work to do
something here correctly that will support the old drivers without the
kernel taint.

Unfortunately making things uncomfortable for the subsystem is the big
hammer the core kernel needs to use to actually get this security work
done by those responsible.

Jason

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 14/17] resource: Move devmem revoke code to resource framework
  2020-10-09  7:59 ` [PATCH v2 14/17] resource: Move devmem revoke code to resource framework Daniel Vetter
  2020-10-09 10:59   ` Greg Kroah-Hartman
@ 2020-10-09 12:31   ` Jason Gunthorpe
  2020-10-09 14:24     ` Daniel Vetter
  1 sibling, 1 reply; 69+ messages in thread
From: Jason Gunthorpe @ 2020-10-09 12:31 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: DRI Development, LKML, kvm, linux-mm, linux-arm-kernel,
	linux-samsung-soc, linux-media, linux-s390, Daniel Vetter,
	Kees Cook, Dan Williams, Andrew Morton, John Hubbard,
	Jérôme Glisse, Jan Kara, Arnd Bergmann,
	Greg Kroah-Hartman, David Hildenbrand, Rafael J. Wysocki

On Fri, Oct 09, 2020 at 09:59:31AM +0200, Daniel Vetter wrote:

> +struct address_space *iomem_get_mapping(void)
> +{
> +	return iomem_inode->i_mapping;

This should pair an acquire with the release below

> +	/*
> +	 * Publish /dev/mem initialized.
> +	 * Pairs with smp_load_acquire() in revoke_iomem().
> +	 */
> +	smp_store_release(&iomem_inode, inode);

However, this seems abnormal, initcalls rarely do this kind of stuff
with global data..

The kernel crashes if this fs_initcall is raced with
iomem_get_mapping() due to the unconditional dereference, so I think
it can be safely switched to a simple assignment.

Jason

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 09/17] mm: Add unsafe_follow_pfn
  2020-10-09 12:21     ` Jason Gunthorpe
@ 2020-10-09 12:37       ` Mauro Carvalho Chehab
  2020-10-09 12:39         ` Mauro Carvalho Chehab
                           ` (2 more replies)
  0 siblings, 3 replies; 69+ messages in thread
From: Mauro Carvalho Chehab @ 2020-10-09 12:37 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Daniel Vetter, DRI Development, LKML, kvm, linux-mm,
	linux-arm-kernel, linux-samsung-soc, linux-media, linux-s390,
	Daniel Vetter, Kees Cook, Dan Williams, Andrew Morton,
	John Hubbard, Jérôme Glisse, Jan Kara, Linus Torvalds

Em Fri, 9 Oct 2020 09:21:11 -0300
Jason Gunthorpe <jgg@ziepe.ca> escreveu:

> On Fri, Oct 09, 2020 at 12:34:21PM +0200, Mauro Carvalho Chehab wrote:
> > Hi,
> > 
> > Em Fri,  9 Oct 2020 09:59:26 +0200
> > Daniel Vetter <daniel.vetter@ffwll.ch> escreveu:
> >   
> > > Way back it was a reasonable assumptions that iomem mappings never
> > > change the pfn range they point at. But this has changed:
> > > 
> > > - gpu drivers dynamically manage their memory nowadays, invalidating
> > > ptes with unmap_mapping_range when buffers get moved
> > > 
> > > - contiguous dma allocations have moved from dedicated carvetouts to
> > > cma regions. This means if we miss the unmap the pfn might contain
> > > pagecache or anon memory (well anything allocated with GFP_MOVEABLE)
> > > 
> > > - even /dev/mem now invalidates mappings when the kernel requests that
> > > iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
> > > ("/dev/mem: Revoke mappings when a driver claims the region")
> > > 
> > > Accessing pfns obtained from ptes without holding all the locks is
> > > therefore no longer a good idea.
> > > 
> > > Unfortunately there's some users where this is not fixable (like v4l
> > > userptr of iomem mappings) or involves a pile of work (vfio type1
> > > iommu). For now annotate these as unsafe and splat appropriately.
> > > 
> > > This patch adds an unsafe_follow_pfn, which later patches will then
> > > roll out to all appropriate places.  
> > 
> > NACK, as this breaks an existing userspace API on media.  
> 
> It doesn't break it. You get a big warning the thing is broken and it
> keeps working.
> 
> We can't leave such a huge security hole open - it impacts other
> subsystems, distros need to be able to run in a secure mode.

Well, if distros disable it, then apps will break.

> > While I agree that using the userptr on media is something that
> > new drivers may not support, as DMABUF is a better way of
> > handling it, changing this for existing ones is a big no, 
> > as it may break usersapace.  
> 
> media community needs to work to fix this, not pretend it is OK to
> keep going as-is.

> Dealing with security issues is the one case where an uABI break might
> be acceptable.
> 
> If you want to NAK it then you need to come up with the work to do
> something here correctly that will support the old drivers without the
> kernel taint.
> 
> Unfortunately making things uncomfortable for the subsystem is the big
> hammer the core kernel needs to use to actually get this security work
> done by those responsible.


I'm not pretending that this is ok. Just pointing that the approach
taken is NOT OK.

I'm not a mm/ expert, but, from what I understood from Daniel's patch
description is that this is unsafe *only if*  __GFP_MOVABLE is used.

Well, no drivers inside the media subsystem uses such flag, although
they may rely on some infrastructure that could be using it behind
the bars.

If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE 
flag that it would be denying the core mm code to set __GFP_MOVABLE.

Please let address the issue on this way, instead of broken an
userspace API that it is there since 1991.

Thanks,
Mauro

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 09/17] mm: Add unsafe_follow_pfn
  2020-10-09 12:37       ` Mauro Carvalho Chehab
@ 2020-10-09 12:39         ` Mauro Carvalho Chehab
  2020-10-09 12:48         ` Jason Gunthorpe
  2020-10-10 17:30         ` Tomasz Figa
  2 siblings, 0 replies; 69+ messages in thread
From: Mauro Carvalho Chehab @ 2020-10-09 12:39 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Daniel Vetter, DRI Development, LKML, kvm, linux-mm,
	linux-arm-kernel, linux-samsung-soc, linux-media, linux-s390,
	Daniel Vetter, Kees Cook, Dan Williams, Andrew Morton,
	John Hubbard, Jérôme Glisse, Jan Kara, Linus Torvalds

Em Fri, 9 Oct 2020 14:37:23 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> escreveu:

> Em Fri, 9 Oct 2020 09:21:11 -0300
> Jason Gunthorpe <jgg@ziepe.ca> escreveu:
> 
> > On Fri, Oct 09, 2020 at 12:34:21PM +0200, Mauro Carvalho Chehab wrote:  
> > > Hi,
> > > 
> > > Em Fri,  9 Oct 2020 09:59:26 +0200
> > > Daniel Vetter <daniel.vetter@ffwll.ch> escreveu:
> > >     
> > > > Way back it was a reasonable assumptions that iomem mappings never
> > > > change the pfn range they point at. But this has changed:
> > > > 
> > > > - gpu drivers dynamically manage their memory nowadays, invalidating
> > > > ptes with unmap_mapping_range when buffers get moved
> > > > 
> > > > - contiguous dma allocations have moved from dedicated carvetouts to
> > > > cma regions. This means if we miss the unmap the pfn might contain
> > > > pagecache or anon memory (well anything allocated with GFP_MOVEABLE)
> > > > 
> > > > - even /dev/mem now invalidates mappings when the kernel requests that
> > > > iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
> > > > ("/dev/mem: Revoke mappings when a driver claims the region")
> > > > 
> > > > Accessing pfns obtained from ptes without holding all the locks is
> > > > therefore no longer a good idea.
> > > > 
> > > > Unfortunately there's some users where this is not fixable (like v4l
> > > > userptr of iomem mappings) or involves a pile of work (vfio type1
> > > > iommu). For now annotate these as unsafe and splat appropriately.
> > > > 
> > > > This patch adds an unsafe_follow_pfn, which later patches will then
> > > > roll out to all appropriate places.    
> > > 
> > > NACK, as this breaks an existing userspace API on media.    
> > 
> > It doesn't break it. You get a big warning the thing is broken and it
> > keeps working.
> > 
> > We can't leave such a huge security hole open - it impacts other
> > subsystems, distros need to be able to run in a secure mode.  
> 
> Well, if distros disable it, then apps will break.
> 
> > > While I agree that using the userptr on media is something that
> > > new drivers may not support, as DMABUF is a better way of
> > > handling it, changing this for existing ones is a big no, 
> > > as it may break usersapace.    
> > 
> > media community needs to work to fix this, not pretend it is OK to
> > keep going as-is.  
> 
> > Dealing with security issues is the one case where an uABI break might
> > be acceptable.
> > 
> > If you want to NAK it then you need to come up with the work to do
> > something here correctly that will support the old drivers without the
> > kernel taint.
> > 
> > Unfortunately making things uncomfortable for the subsystem is the big
> > hammer the core kernel needs to use to actually get this security work
> > done by those responsible.  
> 
> 
> I'm not pretending that this is ok. Just pointing that the approach
> taken is NOT OK.
> 
> I'm not a mm/ expert, but, from what I understood from Daniel's patch
> description is that this is unsafe *only if*  __GFP_MOVABLE is used.
> 
> Well, no drivers inside the media subsystem uses such flag, although
> they may rely on some infrastructure that could be using it behind
> the bars.
> 
> If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE 
> flag that it would be denying the core mm code to set __GFP_MOVABLE.
> 
> Please let address the issue on this way, instead of broken an
> userspace API that it is there since 1991.

In time: I meant to say 1998.

Thanks,
Mauro

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 09/17] mm: Add unsafe_follow_pfn
  2020-10-09 12:37       ` Mauro Carvalho Chehab
  2020-10-09 12:39         ` Mauro Carvalho Chehab
@ 2020-10-09 12:48         ` Jason Gunthorpe
  2020-10-09 17:52           ` Daniel Vetter
  2020-10-12 10:46           ` Marek Szyprowski
  2020-10-10 17:30         ` Tomasz Figa
  2 siblings, 2 replies; 69+ messages in thread
From: Jason Gunthorpe @ 2020-10-09 12:48 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Daniel Vetter, DRI Development, LKML, kvm, linux-mm,
	linux-arm-kernel, linux-samsung-soc, linux-media, linux-s390,
	Daniel Vetter, Kees Cook, Dan Williams, Andrew Morton,
	John Hubbard, Jérôme Glisse, Jan Kara, Linus Torvalds

On Fri, Oct 09, 2020 at 02:37:23PM +0200, Mauro Carvalho Chehab wrote:

> I'm not a mm/ expert, but, from what I understood from Daniel's patch
> description is that this is unsafe *only if*  __GFP_MOVABLE is used.

No, it is unconditionally unsafe. The CMA movable mappings are
specific VMAs that will have bad issues here, but there are other
types too.

The only way to do something at a VMA level is to have a list of OK
VMAs, eg because they were creatd via a special mmap helper from the
media subsystem.

> Well, no drivers inside the media subsystem uses such flag, although
> they may rely on some infrastructure that could be using it behind
> the bars.

It doesn't matter, nothing prevents the user from calling media APIs
on mmaps it gets from other subsystems.

> If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE
> flag that it would be denying the core mm code to set __GFP_MOVABLE.

We can't tell from the VMA these kinds of details..

It has to go the other direction, evey mmap that might be used as a
userptr here has to be found and the VMA specially created to allow
its use. At least that is a kernel only change, but will need people
with the HW to do this work.

> Please let address the issue on this way, instead of broken an
> userspace API that it is there since 1991.

It has happened before :( It took 4 years for RDMA to undo the uAPI
breakage caused by a security fix for something that was a 15 years
old bug. 

Jason

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 17/17] drm/i915: Properly request PCI BARs
  2020-10-09 10:41       ` Ville Syrjälä
@ 2020-10-09 14:18         ` Daniel Vetter
  0 siblings, 0 replies; 69+ messages in thread
From: Daniel Vetter @ 2020-10-09 14:18 UTC (permalink / raw)
  To: Ville Syrjälä
  Cc: DRI Development, LKML, linux-s390, linux-samsung-soc, Jan Kara,
	Kees Cook, KVM list, Jason Gunthorpe, Linux PCI, Linux MM,
	Jérôme Glisse, John Hubbard, Bjorn Helgaas,
	Daniel Vetter, Dan Williams, Andrew Morton, Linux ARM,
	open list:DMA BUFFER SHARING FRAMEWORK

On Fri, Oct 9, 2020 at 12:42 PM Ville Syrjälä
<ville.syrjala@linux.intel.com> wrote:
>
> On Fri, Oct 09, 2020 at 12:01:39PM +0200, Daniel Vetter wrote:
> > On Fri, Oct 9, 2020 at 11:47 AM Ville Syrjälä
> > <ville.syrjala@linux.intel.com> wrote:
> > >
> > > On Fri, Oct 09, 2020 at 09:59:34AM +0200, Daniel Vetter wrote:
> > > > When trying to test my CONFIG_IO_STRICT_DEVMEM changes I realized they
> > > > do nothing for i915. Because i915 doesn't request any regions, like
> > > > pretty much all drm pci drivers. I guess this is some very old
> > > > remnants from the userspace modesetting days, when we wanted to
> > > > co-exist with the fbdev driver. Which usually requested these
> > > > resources.
> > > >
> > > > But makes me wonder why the pci subsystem doesn't just request
> > > > resource automatically when we map a bar and a pci driver is bound?
> > > >
> > > > Knowledge about which pci bars we need kludged together from
> > > > intel_uncore.c and intel_gtt.c from i915 and intel-gtt.c over in the
> > > > fake agp driver.
> > > >
> > > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > > > Cc: Jason Gunthorpe <jgg@ziepe.ca>
> > > > Cc: Kees Cook <keescook@chromium.org>
> > > > Cc: Dan Williams <dan.j.williams@intel.com>
> > > > Cc: Andrew Morton <akpm@linux-foundation.org>
> > > > Cc: John Hubbard <jhubbard@nvidia.com>
> > > > Cc: Jérôme Glisse <jglisse@redhat.com>
> > > > Cc: Jan Kara <jack@suse.cz>
> > > > Cc: Dan Williams <dan.j.williams@intel.com>
> > > > Cc: linux-mm@kvack.org
> > > > Cc: linux-arm-kernel@lists.infradead.org
> > > > Cc: linux-samsung-soc@vger.kernel.org
> > > > Cc: linux-media@vger.kernel.org
> > > > Cc: Bjorn Helgaas <bhelgaas@google.com>
> > > > Cc: linux-pci@vger.kernel.org
> > > > ---
> > > >  drivers/gpu/drm/i915/intel_uncore.c | 25 +++++++++++++++++++++++--
> > > >  1 file changed, 23 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
> > > > index 54e201fdeba4..ce39049d8919 100644
> > > > --- a/drivers/gpu/drm/i915/intel_uncore.c
> > > > +++ b/drivers/gpu/drm/i915/intel_uncore.c
> > > > @@ -1692,10 +1692,13 @@ static int uncore_mmio_setup(struct intel_uncore *uncore)
> > > >       struct pci_dev *pdev = i915->drm.pdev;
> > > >       int mmio_bar;
> > > >       int mmio_size;
> > > > +     int bar_selection;
> > >
> > > Signed bitmasks always make me uneasy. But looks like
> > > that's what it is in the pci api. So meh.
> >
> > Yeah it's surprising.
> >
> > > > +     int ret;
> > > >
> > > >       mmio_bar = IS_GEN(i915, 2) ? 1 : 0;
> > > > +     bar_selection = BIT (2) | BIT(mmio_bar);
> > >                            ^
> > > spurious space
> > >
> > > That's also not correct for gen2 I think.
> > >
> > > gen2:
> > > 0 = GMADR
> > > 1 = MMADR
> > > 2 = IOBAR
> > >
> > > gen3:
> > > 0 = MMADR
> > > 1 = IOBAR
> > > 2 = GMADR
> > > 3 = GTTADR
> > >
> > > gen4+:
> > > 0+1 = GTTMMADR
> > > 2+3 = GMADR
> > > 4 = IOBAR
> > >
> > > Maybe we should just have an explicit list of bars like that in a
> > > comment?
> > >
> > > I'd also suggest sucking this bitmask calculation into a small helper
> > > so you can reuse it for the release.
> >
> > tbh I just hacked this up for testing. Given how almost no other drm
> > driver does this, I'm wondering whether we should or not.
> >
> > Also the only reason why I didn't just use the pci_request_regions
> > helper is to avoid the vga ioport range, since that's managed by
> > vgaarbiter.
>
> VGA io range isn't part of any bar. Or do you mean just the io decode
> enable bit in the pci command register? That should be just a matter
> or pci_enable_device() vs. pci_enable_device_mem() I think. So nothing
> to do with which bars we've requested IIRC.
>
> >
> > So I think if we go for this for real we should:
> > - register the vga ioport range in the vgaarbiter
> > - have a pci_request_iomem_regions helper that grabs all mem bars
> > - roll that out to all drm pci drivers
> >
> > Or something like that. The other complication is when we resize the
> > iobar. So not really sure what to do here.
>
> We resize it?

By default I thought firmware puts everything (well, squeezes) into
the lower 32bit. Maybe they stopped doing that. So when we want the
full bar (for discrete at least) we need to resize it and put it
somewhere in the 64bit range above end of system memory.

So I guess correct phrasing is "we will resize it" :-)
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 14/17] resource: Move devmem revoke code to resource framework
  2020-10-09 12:31   ` Jason Gunthorpe
@ 2020-10-09 14:24     ` Daniel Vetter
  2020-10-09 14:32       ` Jason Gunthorpe
  0 siblings, 1 reply; 69+ messages in thread
From: Daniel Vetter @ 2020-10-09 14:24 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: DRI Development, LKML, KVM list, Linux MM, Linux ARM,
	linux-samsung-soc, open list:DMA BUFFER SHARING FRAMEWORK,
	linux-s390, Daniel Vetter, Kees Cook, Dan Williams,
	Andrew Morton, John Hubbard, Jérôme Glisse, Jan Kara,
	Arnd Bergmann, Greg Kroah-Hartman, David Hildenbrand,
	Rafael J. Wysocki

On Fri, Oct 9, 2020 at 2:31 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Fri, Oct 09, 2020 at 09:59:31AM +0200, Daniel Vetter wrote:
>
> > +struct address_space *iomem_get_mapping(void)
> > +{
> > +     return iomem_inode->i_mapping;
>
> This should pair an acquire with the release below
>
> > +     /*
> > +      * Publish /dev/mem initialized.
> > +      * Pairs with smp_load_acquire() in revoke_iomem().
> > +      */
> > +     smp_store_release(&iomem_inode, inode);
>
> However, this seems abnormal, initcalls rarely do this kind of stuff
> with global data..
>
> The kernel crashes if this fs_initcall is raced with
> iomem_get_mapping() due to the unconditional dereference, so I think
> it can be safely switched to a simple assignment.

Ah yes I checked this all, but forgot to correctly annotate the
iomem_get_mapping access. For reference, see b34e7e298d7a ("/dev/mem:
Add missing memory barriers for devmem_inode").

The reasons for the annotations is that iomem requests can happen
fairly early, way before fs_initcalls happen. That means revoke_iomem
needs to check for that and bail out if we race - nothing bad can
happen since userspace isn't running at this point anyway. And
apparently it needs to be a full acquire fence since we don't just
write a value, but need a barrier for the struct stuff.

Now iomem_get_mapping otoh can only be called after userspace is up &
running, so way after all the fs_initcalls are guaranteed to have
fininshed. Hence we don't really need anything there. But I expect the
kernel race checker thing to complain, plus that then gives me a good
spot to explain why we can't race and don't have to check for a NULL
iomem_inode.

I'll add that in v3.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 14/17] resource: Move devmem revoke code to resource framework
  2020-10-09 14:24     ` Daniel Vetter
@ 2020-10-09 14:32       ` Jason Gunthorpe
  2020-10-09 18:28         ` Dan Williams
  0 siblings, 1 reply; 69+ messages in thread
From: Jason Gunthorpe @ 2020-10-09 14:32 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: DRI Development, LKML, KVM list, Linux MM, Linux ARM,
	linux-samsung-soc, open list:DMA BUFFER SHARING FRAMEWORK,
	linux-s390, Daniel Vetter, Kees Cook, Dan Williams,
	Andrew Morton, John Hubbard, Jérôme Glisse, Jan Kara,
	Arnd Bergmann, Greg Kroah-Hartman, David Hildenbrand,
	Rafael J. Wysocki

On Fri, Oct 09, 2020 at 04:24:45PM +0200, Daniel Vetter wrote:
> On Fri, Oct 9, 2020 at 2:31 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Fri, Oct 09, 2020 at 09:59:31AM +0200, Daniel Vetter wrote:
> >
> > > +struct address_space *iomem_get_mapping(void)
> > > +{
> > > +     return iomem_inode->i_mapping;
> >
> > This should pair an acquire with the release below
> >
> > > +     /*
> > > +      * Publish /dev/mem initialized.
> > > +      * Pairs with smp_load_acquire() in revoke_iomem().
> > > +      */
> > > +     smp_store_release(&iomem_inode, inode);
> >
> > However, this seems abnormal, initcalls rarely do this kind of stuff
> > with global data..
> >
> > The kernel crashes if this fs_initcall is raced with
> > iomem_get_mapping() due to the unconditional dereference, so I think
> > it can be safely switched to a simple assignment.
> 
> Ah yes I checked this all, but forgot to correctly annotate the
> iomem_get_mapping access. For reference, see b34e7e298d7a ("/dev/mem:
> Add missing memory barriers for devmem_inode").

Oh yikes, so revoke_iomem can run concurrently during early boot,
tricky.

> The reasons for the annotations is that iomem requests can happen
> fairly early, way before fs_initcalls happen. That means revoke_iomem
> needs to check for that and bail out if we race - nothing bad can
> happen since userspace isn't running at this point anyway. And
> apparently it needs to be a full acquire fence since we don't just
> write a value, but need a barrier for the struct stuff.

Yes, if that is what is happening it release/acquire is needed.

Jason

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 06/17] media: videobuf2: Move frame_vector into media subsystem
  2020-10-09 10:14   ` Mauro Carvalho Chehab
@ 2020-10-09 16:57     ` Daniel Vetter
  0 siblings, 0 replies; 69+ messages in thread
From: Daniel Vetter @ 2020-10-09 16:57 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Daniel Vetter, DRI Development, LKML, kvm, linux-mm,
	linux-arm-kernel, linux-samsung-soc, linux-media, linux-s390,
	John Hubbard, Daniel Vetter, Jason Gunthorpe, Pawel Osciak,
	Marek Szyprowski, Kyungmin Park, Tomasz Figa, Andrew Morton,
	Jérôme Glisse, Jan Kara, Dan Williams

On Fri, Oct 09, 2020 at 12:14:17PM +0200, Mauro Carvalho Chehab wrote:
> Em Fri,  9 Oct 2020 09:59:23 +0200
> Daniel Vetter <daniel.vetter@ffwll.ch> escreveu:
> 
> > It's the only user. This also garbage collects the CONFIG_FRAME_VECTOR
> > symbol from all over the tree (well just one place, somehow omap media
> > driver still had this in its Kconfig, despite not using it).
> > 
> > Reviewed-by: John Hubbard <jhubbard@nvidia.com>
> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > Cc: Jason Gunthorpe <jgg@ziepe.ca>
> > Cc: Pawel Osciak <pawel@osciak.com>
> > Cc: Marek Szyprowski <m.szyprowski@samsung.com>
> > Cc: Kyungmin Park <kyungmin.park@samsung.com>
> > Cc: Tomasz Figa <tfiga@chromium.org>
> > Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: John Hubbard <jhubbard@nvidia.com>
> > Cc: Jérôme Glisse <jglisse@redhat.com>
> > Cc: Jan Kara <jack@suse.cz>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: linux-mm@kvack.org
> > Cc: linux-arm-kernel@lists.infradead.org
> > Cc: linux-samsung-soc@vger.kernel.org
> > Cc: linux-media@vger.kernel.org
> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > ---
> >  drivers/media/common/videobuf2/Kconfig        |  1 -
> >  drivers/media/common/videobuf2/Makefile       |  1 +
> >  .../media/common/videobuf2}/frame_vector.c    |  2 +
> >  drivers/media/platform/omap/Kconfig           |  1 -
> >  include/linux/mm.h                            | 42 -------------------
> >  include/media/videobuf2-core.h                | 42 +++++++++++++++++++
> >  mm/Kconfig                                    |  3 --
> >  mm/Makefile                                   |  1 -
> >  8 files changed, 45 insertions(+), 48 deletions(-)
> >  rename {mm => drivers/media/common/videobuf2}/frame_vector.c (99%)
> > 
> > diff --git a/drivers/media/common/videobuf2/Kconfig b/drivers/media/common/videobuf2/Kconfig
> > index edbc99ebba87..d2223a12c95f 100644
> > --- a/drivers/media/common/videobuf2/Kconfig
> > +++ b/drivers/media/common/videobuf2/Kconfig
> > @@ -9,7 +9,6 @@ config VIDEOBUF2_V4L2
> >  
> >  config VIDEOBUF2_MEMOPS
> >  	tristate
> > -	select FRAME_VECTOR
> >  
> >  config VIDEOBUF2_DMA_CONTIG
> >  	tristate
> > diff --git a/drivers/media/common/videobuf2/Makefile b/drivers/media/common/videobuf2/Makefile
> > index 77bebe8b202f..54306f8d096c 100644
> > --- a/drivers/media/common/videobuf2/Makefile
> > +++ b/drivers/media/common/videobuf2/Makefile
> > @@ -1,5 +1,6 @@
> >  # SPDX-License-Identifier: GPL-2.0
> >  videobuf2-common-objs := videobuf2-core.o
> > +videobuf2-common-objs += frame_vector.o
> >  
> >  ifeq ($(CONFIG_TRACEPOINTS),y)
> >    videobuf2-common-objs += vb2-trace.o
> > diff --git a/mm/frame_vector.c b/drivers/media/common/videobuf2/frame_vector.c
> > similarity index 99%
> > rename from mm/frame_vector.c
> > rename to drivers/media/common/videobuf2/frame_vector.c
> > index d44779e56313..2b0b97761d15 100644
> > --- a/mm/frame_vector.c
> > +++ b/drivers/media/common/videobuf2/frame_vector.c
> > @@ -8,6 +8,8 @@
> >  #include <linux/pagemap.h>
> >  #include <linux/sched.h>
> >  
> > +#include <media/videobuf2-core.h>
> > +
> 
> See my comment below...
> 
> >  /**
> >   * get_vaddr_frames() - map virtual addresses to pfns
> >   * @start:	starting user address
> > diff --git a/drivers/media/platform/omap/Kconfig b/drivers/media/platform/omap/Kconfig
> > index f73b5893220d..de16de46c0f4 100644
> > --- a/drivers/media/platform/omap/Kconfig
> > +++ b/drivers/media/platform/omap/Kconfig
> > @@ -12,6 +12,5 @@ config VIDEO_OMAP2_VOUT
> >  	depends on VIDEO_V4L2
> >  	select VIDEOBUF2_DMA_CONTIG
> >  	select OMAP2_VRFB if ARCH_OMAP2 || ARCH_OMAP3
> > -	select FRAME_VECTOR
> >  	help
> >  	  V4L2 Display driver support for OMAP2/3 based boards.
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 16b799a0522c..acd60fbf1a5a 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -1743,48 +1743,6 @@ int account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc);
> >  int __account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc,
> >  			struct task_struct *task, bool bypass_rlim);
> >  
> > -/* Container for pinned pfns / pages */
> > -struct frame_vector {
> > -	unsigned int nr_allocated;	/* Number of frames we have space for */
> > -	unsigned int nr_frames;	/* Number of frames stored in ptrs array */
> > -	bool got_ref;		/* Did we pin pages by getting page ref? */
> > -	bool is_pfns;		/* Does array contain pages or pfns? */
> > -	void *ptrs[];		/* Array of pinned pfns / pages. Use
> > -				 * pfns_vector_pages() or pfns_vector_pfns()
> > -				 * for access */
> > -};
> > -
> > -struct frame_vector *frame_vector_create(unsigned int nr_frames);
> > -void frame_vector_destroy(struct frame_vector *vec);
> > -int get_vaddr_frames(unsigned long start, unsigned int nr_pfns,
> > -		     unsigned int gup_flags, struct frame_vector *vec);
> > -void put_vaddr_frames(struct frame_vector *vec);
> > -int frame_vector_to_pages(struct frame_vector *vec);
> > -void frame_vector_to_pfns(struct frame_vector *vec);
> > -
> > -static inline unsigned int frame_vector_count(struct frame_vector *vec)
> > -{
> > -	return vec->nr_frames;
> > -}
> > -
> > -static inline struct page **frame_vector_pages(struct frame_vector *vec)
> > -{
> > -	if (vec->is_pfns) {
> > -		int err = frame_vector_to_pages(vec);
> > -
> > -		if (err)
> > -			return ERR_PTR(err);
> > -	}
> > -	return (struct page **)(vec->ptrs);
> > -}
> > -
> > -static inline unsigned long *frame_vector_pfns(struct frame_vector *vec)
> > -{
> > -	if (!vec->is_pfns)
> > -		frame_vector_to_pfns(vec);
> > -	return (unsigned long *)(vec->ptrs);
> > -}
> > -
> >  struct kvec;
> >  int get_kernel_pages(const struct kvec *iov, int nr_pages, int write,
> >  			struct page **pages);
> > diff --git a/include/media/videobuf2-core.h b/include/media/videobuf2-core.h
> > index bbb3f26fbde9..a2e75ca0334f 100644
> > --- a/include/media/videobuf2-core.h
> > +++ b/include/media/videobuf2-core.h
> > @@ -1254,4 +1254,46 @@ bool vb2_request_object_is_buffer(struct media_request_object *obj);
> >   */
> >  unsigned int vb2_request_buffer_cnt(struct media_request *req);
> >  
> > +/* Container for pinned pfns / pages in frame_vector.c */
> > +struct frame_vector {
> > +	unsigned int nr_allocated;	/* Number of frames we have space for */
> > +	unsigned int nr_frames;	/* Number of frames stored in ptrs array */
> > +	bool got_ref;		/* Did we pin pages by getting page ref? */
> > +	bool is_pfns;		/* Does array contain pages or pfns? */
> > +	void *ptrs[];		/* Array of pinned pfns / pages. Use
> > +				 * pfns_vector_pages() or pfns_vector_pfns()
> > +				 * for access */
> > +};
> > +
> > +struct frame_vector *frame_vector_create(unsigned int nr_frames);
> > +void frame_vector_destroy(struct frame_vector *vec);
> > +int get_vaddr_frames(unsigned long start, unsigned int nr_pfns,
> > +		     unsigned int gup_flags, struct frame_vector *vec);
> > +void put_vaddr_frames(struct frame_vector *vec);
> > +int frame_vector_to_pages(struct frame_vector *vec);
> > +void frame_vector_to_pfns(struct frame_vector *vec);
> > +
> > +static inline unsigned int frame_vector_count(struct frame_vector *vec)
> > +{
> > +	return vec->nr_frames;
> > +}
> > +
> > +static inline struct page **frame_vector_pages(struct frame_vector *vec)
> > +{
> > +	if (vec->is_pfns) {
> > +		int err = frame_vector_to_pages(vec);
> > +
> > +		if (err)
> > +			return ERR_PTR(err);
> > +	}
> > +	return (struct page **)(vec->ptrs);
> > +}
> > +
> > +static inline unsigned long *frame_vector_pfns(struct frame_vector *vec)
> > +{
> > +	if (!vec->is_pfns)
> > +		frame_vector_to_pfns(vec);
> > +	return (unsigned long *)(vec->ptrs);
> > +}
> > +
> >  #endif /* _MEDIA_VIDEOBUF2_CORE_H */
> 
> Please place those into a include/media/frame_vector.h file, instead of
> merging it directly at vb2 core header.
> 
> Then include the new header at videobuf2-core.h and at frame_vector.c.

Makes sense, I'll do that for v3.

> With such changes:
> 
> Acked-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

Thanks for taking a look.
-Daniel

> 
> > diff --git a/mm/Kconfig b/mm/Kconfig
> > index 6c974888f86f..da6c943fe9f1 100644
> > --- a/mm/Kconfig
> > +++ b/mm/Kconfig
> > @@ -815,9 +815,6 @@ config DEVICE_PRIVATE
> >  	  memory; i.e., memory that is only accessible from the device (or
> >  	  group of devices). You likely also want to select HMM_MIRROR.
> >  
> > -config FRAME_VECTOR
> > -	bool
> > -
> >  config ARCH_USES_HIGH_VMA_FLAGS
> >  	bool
> >  config ARCH_HAS_PKEYS
> > diff --git a/mm/Makefile b/mm/Makefile
> > index d5649f1c12c0..a025fd6c6afd 100644
> > --- a/mm/Makefile
> > +++ b/mm/Makefile
> > @@ -111,7 +111,6 @@ obj-$(CONFIG_PAGE_EXTENSION) += page_ext.o
> >  obj-$(CONFIG_CMA_DEBUGFS) += cma_debug.o
> >  obj-$(CONFIG_USERFAULTFD) += userfaultfd.o
> >  obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o
> > -obj-$(CONFIG_FRAME_VECTOR) += frame_vector.o
> >  obj-$(CONFIG_DEBUG_PAGE_REF) += debug_page_ref.o
> >  obj-$(CONFIG_HARDENED_USERCOPY) += usercopy.o
> >  obj-$(CONFIG_PERCPU_STATS) += percpu-stats.o
> 
> 
> 
> Thanks,
> Mauro

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 09/17] mm: Add unsafe_follow_pfn
  2020-10-09 12:48         ` Jason Gunthorpe
@ 2020-10-09 17:52           ` Daniel Vetter
  2020-10-09 18:01             ` Jason Gunthorpe
                               ` (3 more replies)
  2020-10-12 10:46           ` Marek Szyprowski
  1 sibling, 4 replies; 69+ messages in thread
From: Daniel Vetter @ 2020-10-09 17:52 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Mauro Carvalho Chehab, DRI Development, LKML, KVM list, Linux MM,
	Linux ARM, linux-samsung-soc,
	open list:DMA BUFFER SHARING FRAMEWORK, linux-s390,
	Daniel Vetter, Kees Cook, Dan Williams, Andrew Morton,
	John Hubbard, Jérôme Glisse, Jan Kara, Linus Torvalds

On Fri, Oct 9, 2020 at 2:48 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Fri, Oct 09, 2020 at 02:37:23PM +0200, Mauro Carvalho Chehab wrote:
>
> > I'm not a mm/ expert, but, from what I understood from Daniel's patch
> > description is that this is unsafe *only if*  __GFP_MOVABLE is used.
>
> No, it is unconditionally unsafe. The CMA movable mappings are
> specific VMAs that will have bad issues here, but there are other
> types too.
>
> The only way to do something at a VMA level is to have a list of OK
> VMAs, eg because they were creatd via a special mmap helper from the
> media subsystem.
>
> > Well, no drivers inside the media subsystem uses such flag, although
> > they may rely on some infrastructure that could be using it behind
> > the bars.
>
> It doesn't matter, nothing prevents the user from calling media APIs
> on mmaps it gets from other subsystems.

I think a good first step would be to disable userptr of non struct
page backed storage going forward for any new hw support. Even on
existing drivers. dma-buf sharing has been around for long enough now
that this shouldn't be a problem. Unfortunately right now this doesn't
seem to exist, so the entire problem keeps getting perpetuated.

> > If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE
> > flag that it would be denying the core mm code to set __GFP_MOVABLE.
>
> We can't tell from the VMA these kinds of details..
>
> It has to go the other direction, evey mmap that might be used as a
> userptr here has to be found and the VMA specially created to allow
> its use. At least that is a kernel only change, but will need people
> with the HW to do this work.

I think the only reasonable way to keep this working is:
- add a struct dma_buf *vma_tryget_dma_buf(struct vm_area_struct *vma);
- add dma-buf export support to fbdev and v4l
- roll this out everywhere we still need it.

Realistically this just isn't going to happen. And anything else just
reimplements half of dma-buf, which is kinda pointless (you need
minimally refcounting and some way to get at a promise of a permanent
sg list for dma. Plus probably the vmap for kernel cpu access.

> > Please let address the issue on this way, instead of broken an
> > userspace API that it is there since 1991.
>
> It has happened before :( It took 4 years for RDMA to undo the uAPI
> breakage caused by a security fix for something that was a 15 years
> old bug.

Yeah we have a bunch of these on the drm side too. Some of them are
really just "you have to upgrade userspace", and there's no real fix
for the security nightmare without that.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 09/17] mm: Add unsafe_follow_pfn
  2020-10-09 17:52           ` Daniel Vetter
@ 2020-10-09 18:01             ` Jason Gunthorpe
  2020-10-09 19:31               ` Daniel Vetter
  2020-10-10  9:21             ` Mauro Carvalho Chehab
                               ` (2 subsequent siblings)
  3 siblings, 1 reply; 69+ messages in thread
From: Jason Gunthorpe @ 2020-10-09 18:01 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Mauro Carvalho Chehab, DRI Development, LKML, KVM list, Linux MM,
	Linux ARM, linux-samsung-soc,
	open list:DMA BUFFER SHARING FRAMEWORK, linux-s390,
	Daniel Vetter, Kees Cook, Dan Williams, Andrew Morton,
	John Hubbard, Jérôme Glisse, Jan Kara, Linus Torvalds

On Fri, Oct 09, 2020 at 07:52:05PM +0200, Daniel Vetter wrote:

> > > If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE
> > > flag that it would be denying the core mm code to set __GFP_MOVABLE.
> >
> > We can't tell from the VMA these kinds of details..
> >
> > It has to go the other direction, evey mmap that might be used as a
> > userptr here has to be found and the VMA specially created to allow
> > its use. At least that is a kernel only change, but will need people
> > with the HW to do this work.
> 
> I think the only reasonable way to keep this working is:
> - add a struct dma_buf *vma_tryget_dma_buf(struct vm_area_struct *vma);
> - add dma-buf export support to fbdev and v4l
> - roll this out everywhere we still need it.

It seems to me there is a technical way forward to restore user
compat, so it is really no different than RDMA/DRM pain we both
suffered before.

Thus no justification to NAK it. If media wants things to keep working
they have to do the technical path like you outline above.

> Realistically this just isn't going to happen. 

If your series goes ahead it will get solved. Someone will take on the
huge project to either add DMA buf to the drivers people still care
about, or do the work above to transparently handle in kernel.

If we allow things to keep working without consequence then nobody
will do it.

The only reason we did the 4 years of work in RDMA was because Linus
went in and broke the uABI for a security fix. It was hundreds of
patches to fix it, so I don't have much sympathy for "it is too hard"
here.

Jason

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 14/17] resource: Move devmem revoke code to resource framework
  2020-10-09 14:32       ` Jason Gunthorpe
@ 2020-10-09 18:28         ` Dan Williams
  2020-10-15  0:09           ` Jason Gunthorpe
  0 siblings, 1 reply; 69+ messages in thread
From: Dan Williams @ 2020-10-09 18:28 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Daniel Vetter, DRI Development, LKML, KVM list, Linux MM,
	Linux ARM, linux-samsung-soc,
	open list:DMA BUFFER SHARING FRAMEWORK, linux-s390,
	Daniel Vetter, Kees Cook, Andrew Morton, John Hubbard,
	Jérôme Glisse, Jan Kara, Arnd Bergmann,
	Greg Kroah-Hartman, David Hildenbrand, Rafael J. Wysocki

On Fri, Oct 9, 2020 at 7:32 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Fri, Oct 09, 2020 at 04:24:45PM +0200, Daniel Vetter wrote:
> > On Fri, Oct 9, 2020 at 2:31 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > >
> > > On Fri, Oct 09, 2020 at 09:59:31AM +0200, Daniel Vetter wrote:
> > >
> > > > +struct address_space *iomem_get_mapping(void)
> > > > +{
> > > > +     return iomem_inode->i_mapping;
> > >
> > > This should pair an acquire with the release below
> > >
> > > > +     /*
> > > > +      * Publish /dev/mem initialized.
> > > > +      * Pairs with smp_load_acquire() in revoke_iomem().
> > > > +      */
> > > > +     smp_store_release(&iomem_inode, inode);
> > >
> > > However, this seems abnormal, initcalls rarely do this kind of stuff
> > > with global data..
> > >
> > > The kernel crashes if this fs_initcall is raced with
> > > iomem_get_mapping() due to the unconditional dereference, so I think
> > > it can be safely switched to a simple assignment.
> >
> > Ah yes I checked this all, but forgot to correctly annotate the
> > iomem_get_mapping access. For reference, see b34e7e298d7a ("/dev/mem:
> > Add missing memory barriers for devmem_inode").
>
> Oh yikes, so revoke_iomem can run concurrently during early boot,
> tricky.

It runs early because request_mem_region() can run before fs_initcall.
Rather than add an unnecessary lock just arrange for the revoke to be
skipped before the inode is initialized. The expectation is that any
early resource reservations will block future userspace mapping
attempts.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 09/17] mm: Add unsafe_follow_pfn
  2020-10-09 18:01             ` Jason Gunthorpe
@ 2020-10-09 19:31               ` Daniel Vetter
  0 siblings, 0 replies; 69+ messages in thread
From: Daniel Vetter @ 2020-10-09 19:31 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Mauro Carvalho Chehab, DRI Development, LKML, KVM list, Linux MM,
	Linux ARM, linux-samsung-soc,
	open list:DMA BUFFER SHARING FRAMEWORK, linux-s390,
	Daniel Vetter, Kees Cook, Dan Williams, Andrew Morton,
	John Hubbard, Jérôme Glisse, Jan Kara, Linus Torvalds

On Fri, Oct 9, 2020 at 8:01 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Fri, Oct 09, 2020 at 07:52:05PM +0200, Daniel Vetter wrote:
>
> > > > If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE
> > > > flag that it would be denying the core mm code to set __GFP_MOVABLE.
> > >
> > > We can't tell from the VMA these kinds of details..
> > >
> > > It has to go the other direction, evey mmap that might be used as a
> > > userptr here has to be found and the VMA specially created to allow
> > > its use. At least that is a kernel only change, but will need people
> > > with the HW to do this work.
> >
> > I think the only reasonable way to keep this working is:
> > - add a struct dma_buf *vma_tryget_dma_buf(struct vm_area_struct *vma);
> > - add dma-buf export support to fbdev and v4l
> > - roll this out everywhere we still need it.
>
> It seems to me there is a technical way forward to restore user
> compat, so it is really no different than RDMA/DRM pain we both
> suffered before.
>
> Thus no justification to NAK it. If media wants things to keep working
> they have to do the technical path like you outline above.
>
> > Realistically this just isn't going to happen.
>
> If your series goes ahead it will get solved. Someone will take on the
> huge project to either add DMA buf to the drivers people still care
> about, or do the work above to transparently handle in kernel.
>
> If we allow things to keep working without consequence then nobody
> will do it.
>
> The only reason we did the 4 years of work in RDMA was because Linus
> went in and broke the uABI for a security fix. It was hundreds of
> patches to fix it, so I don't have much sympathy for "it is too hard"
> here.

Oh fully agreeing with you here, I just wanted to lay out that a)
there is a solid plan to fix it and b) it's way too much work for me
to just type it as a part of a "learn me some core mm semantics"
project :-)

I was hoping that we could get away with a special marker for
problematic vma, and filter those out. But after all the digging I've
noticed that on anything remotely modern, there's just nothing left.
Device memory management has become massively more dynamic in the past
10 years.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 09/17] mm: Add unsafe_follow_pfn
  2020-10-09 17:52           ` Daniel Vetter
  2020-10-09 18:01             ` Jason Gunthorpe
@ 2020-10-10  9:21             ` Mauro Carvalho Chehab
  2020-10-10 10:53               ` Daniel Vetter
  2020-10-10 17:22             ` Tomasz Figa
  2020-10-10 21:11             ` Laurent Pinchart
  3 siblings, 1 reply; 69+ messages in thread
From: Mauro Carvalho Chehab @ 2020-10-10  9:21 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Jason Gunthorpe, DRI Development, LKML, KVM list, Linux MM,
	Linux ARM, linux-samsung-soc,
	open list:DMA BUFFER SHARING FRAMEWORK, linux-s390,
	Daniel Vetter, Kees Cook, Dan Williams, Andrew Morton,
	John Hubbard, Jérôme Glisse, Jan Kara, Linus Torvalds

Em Fri, 9 Oct 2020 19:52:05 +0200
Daniel Vetter <daniel.vetter@ffwll.ch> escreveu:

> On Fri, Oct 9, 2020 at 2:48 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Fri, Oct 09, 2020 at 02:37:23PM +0200, Mauro Carvalho Chehab wrote:
> >  
> > > I'm not a mm/ expert, but, from what I understood from Daniel's patch
> > > description is that this is unsafe *only if*  __GFP_MOVABLE is used.  
> >
> > No, it is unconditionally unsafe. The CMA movable mappings are
> > specific VMAs that will have bad issues here, but there are other
> > types too.

I didn't check the mm dirty details, but I strongly suspect that the mm
code has a way to prevent moving a mmapped page while it is still in usage.

If not, then the entire movable pages concept sounds broken to me, and
has to be fixed at mm subsystem.

> >
> > The only way to do something at a VMA level is to have a list of OK
> > VMAs, eg because they were creatd via a special mmap helper from the
> > media subsystem.

I'm not sure if I'm following you on that. The media API can work with
different ways of sending buffers to userspace:

	- read();

	- using the overlay mode. This interface is deprecated in
	  practice, being replaced by DMABUF. Only a few older hardware
	  supports it, and it depends on an special X11 helper driver
	  for it to work.

	- via DMABUF:
		https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/dmabuf.html

	- via mmap, using a mmap helper:
		https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/mmap.html

	- via mmap, using userspace-provided pointers:
		https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/userp.html

The existing open-source programs usually chose one or more of the above
modes. if the Kernel driver returns -EINVAL when an mmapped streaming I/O
mode is not supported, userspace has to select a different method.

Most userspace open source programs have fallback support: if one
mmap I/O method fails, it selects another one, although this is not
a mandatory requirement. I found (and fixed) a few ones that were
relying exclusively on userptr support, but I didn't make a 
comprehensive check.

Also there are a number of relevant closed-source apps that we have no 
idea about what methods they use, like Skype, and other similar
videoconferencing programs. Breaking support for those, specially at
a time where people are relying on it in order to participate on
conferences and doing internal meetings is a **very bad** idea.

So, whatever solution is taken, it should not be dumping warning
messages at the system and tainting the Kernel, but, instead, checking
if the userspace request is valid or not. If it is invalid, return the
proper error code via the right V4L2 ioctl, in order to let userspace
choose a different method. I the request is valid, refcount the pages 
for them to not be moved while they're still in usage.

-

Let me provide some background about how things work at the media
subsytem. If you want to know more, the userspace-provided memory 
mapped pointers work is described here:

	https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/userp.html#userp

Basically, userspace calls either one of those ioctls:

	VIDIOC_CREATE_BUFS:
		https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/vidioc-create-bufs.html

Which is translated into a videobuf2 call to: vb2_ioctl_create_bufs()

	VIDIOC_REQBUFS
		https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/vidioc-reqbufs.html#vidioc-reqbufs

Which is translated into a videobuf2 call to: vb2_ioctl_reqbufs()

Both internally calls vb2_verify_memory_type(), which is responsible
for checking if the provided pointers are OK for the usage and/or do
all necessary page allocations, and taking care of any special 
requirements. This could easily have some additional checks to
verify if the requested VMA address has pages that are movable or
not, ensuring that ensure that the VMA is OK, and locking them in
order to prevent the mm code to move such pages while they are in
usage by the media subsystem.

Now, as I said before, I don't know the dirty details about how
to lock those pages at the mm subsystem in order to avoid it
to move the used pages. Yet, when vb2_create_framevec()
is called, the check if VMA is OK should already be happened
at vb2_verify_memory_type().

-

It should be noticed that the dirty hack added by patch 09/17
and 10/17 affects *all* types of memory allocations at V4L2, 
as this kAPI is called by the 3 different memory models
supported at the media subsystem:

	drivers/media/common/videobuf2/videobuf2-vmalloc.c
	drivers/media/common/videobuf2/videobuf2-dma-contig.c
	drivers/media/common/videobuf2/videobuf2-dma-sg.c

In other words, with this code:

	int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address,
		unsigned long *pfn)
	{
	#ifdef CONFIG_STRICT_FOLLOW_PFN
		pr_info("unsafe follow_pfn usage rejected, see CONFIG_STRICT_FOLLOW_PFN\n");
		return -EINVAL;
	#else
		WARN_ONCE(1, "unsafe follow_pfn usage\n");
		add_taint(TAINT_USER, LOCKDEP_STILL_OK);

		return follow_pfn(vma, address, pfn);
	#endif

you're unconditionally breaking the media userspace API support not
only for embedded systems that could be using userptr instead of
DMA_BUF, but also for *all* video devices, including USB cameras.

This is **NOT** an acceptable solution. 

So, I stand my NACK to this approach.

> > > Well, no drivers inside the media subsystem uses such flag, although
> > > they may rely on some infrastructure that could be using it behind
> > > the bars.  
> >
> > It doesn't matter, nothing prevents the user from calling media APIs
> > on mmaps it gets from other subsystems.  
> 
> I think a good first step would be to disable userptr of non struct
> page backed storage going forward for any new hw support. Even on
> existing drivers. dma-buf sharing has been around for long enough now
> that this shouldn't be a problem. Unfortunately right now this doesn't
> seem to exist, so the entire problem keeps getting perpetuated.

Well, the media uAPI does support DMABUF (both import and export):

	https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/dmabuf.html
	https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/vidioc-expbuf.html#vidioc-expbuf

And I fully agree that newer programs should use DMABUF when sharing
buffers with DRM subsystem, but that's not my main concern.

What I do want is to not break userspace support nor to taint the Kernel
due to a valid uAPI call.

A valid uAPI call should check if the parameters passed though it are
valid. If they are, it should handle. Otherwise, it should return
-EINVAL, without tainting the Kernel or printing warning messages.

The approach took by patches 09/17 and 10/17 doesn't do that.
Instead, they just unconditionally breaks the media uAPI.

What should be done, instead, is to drop patch 10/17, and work on
a way for the code inside vb2_create_framevec() to ensure that, if
USERPTR is used, the memory pages will be properly locked while the
driver is using, returning -EINVAL only if there's no way to proceed,
without tainting the Kernel.

> 
> > > If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE
> > > flag that it would be denying the core mm code to set __GFP_MOVABLE.  
> >
> > We can't tell from the VMA these kinds of details..
> >
> > It has to go the other direction, evey mmap that might be used as a
> > userptr here has to be found and the VMA specially created to allow
> > its use. At least that is a kernel only change, but will need people
> > with the HW to do this work.  
> 
> I think the only reasonable way to keep this working is:
> - add a struct dma_buf *vma_tryget_dma_buf(struct vm_area_struct *vma);

Not sure how an userspace buffer could be mapped to be using it,
specially since the buffer may not even be imported/exported
from the DRM subsystem, but it could simply be allocated
via glibc calloc()/malloc().

> - add dma-buf export support to fbdev and v4l

V4L has support for it already.

> - roll this out everywhere we still need it.

That's where things are hard. This is not like DRM, where the APIs
are called via some open source libraries that are also managed
by DRM upstream developers.

In the case of the media subsystem, while we added a libv4l sometime
ago, not all userspace apps use it, as a large part of them used
to exist before the addition of the libraries. Also, we're currently 
trying to deprecate libv4l, at least for embedded systems, in favor
of libcamera.

On media, there are lots of closed source apps that uses the media API
directly. Even talking about open source ones, there are lots of
different apps, including not only media-specific apps, but also
generic ones, like web browsers, which don't use the libraries we
wrote.

An userspace API breakage would take *huge* efforts and will take
lots of time to have it merged everywhere. It will cause lots of
troubles everywhere.

> Realistically this just isn't going to happen. 

Why not? Any app developer could already use DMA-BUF if required,
as the upstream support was added several years ago.

> And anything else just
> reimplements half of dma-buf, 

It is just the opposite: those uAPIs came years before dma-buf.
In other words, it was dma-buf that re-implemented it ;-)

Now, I agree with you that dma-buf is a way cooler than the past
alternatives.

-

It sounds to me that you're considering on only one use case of 
USERPTR: to pass a buffer created from DRM. As far as I'm aware,
only embedded userspace applications actually do that.

Yet, there are a number of other applications that do something like
the userptr_capture() function on this code:

	https://git.linuxtv.org/v4l-utils.git/tree/contrib/test/v4l2grab.c

E. g. using glibc alloc functions like calloc() to allocate memory, 
passing the user-allocated data to the Kernel via something like this:

	struct v4l2_requestbuffers req;
	struct v4l2_buffer buf;
	int n_buffers = 2;

	req.count  = 2;
	req.type   = V4L2_BUF_TYPE_VIDEO_CAPTURE;
	req.memory = V4L2_MEMORY_USERPTR;
	if (ioctl(fd, VIDIOC_REQBUFS, &req))
		return errno;

	for (i = 0; i < req.count; ++i) {
		buf.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
		buf.memory = V4L2_MEMORY_USERPTR;
		buf.index = i;
		buf.m.userptr = (unsigned long)buffers[i].start;
		buf.length = buffers[i].length;
		if (ioctl(fd, VIDIOC_QBUF, &buf))
			return errno;
	}
	if (ioctl(fd, VIDIOC_STREAMON, &req.type))
		return errno;

	/* some capture loop */

	ioctl(fd, VIDIOC_STREAMOFF, &req.type);

I can't possibly see *any* security issues with the above code.

As I said before, VIDIOC_REQBUFS should be checking if the userspace
buffers are OK and ensure that their refcounts will be incremented,
in order to avoid mm to move the pages used there, freeing the 
refconts when VIDIOC_STREAMOFF - or close(fd) - is called.

> which is kinda pointless (you need
> minimally refcounting and some way to get at a promise of a permanent
> sg list for dma. Plus probably the vmap for kernel cpu access.

Yeah, refcounting needs to happen. 

Thanks,
Mauro

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 10/17] media/videbuf1|2: Mark follow_pfn usage as unsafe
  2020-10-09  7:59 ` [PATCH v2 10/17] media/videbuf1|2: Mark follow_pfn usage as unsafe Daniel Vetter
@ 2020-10-10  9:24   ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 69+ messages in thread
From: Mauro Carvalho Chehab @ 2020-10-10  9:24 UTC (permalink / raw)
  To: Daniel Vetter, Linus Torvalds
  Cc: DRI Development, LKML, kvm, linux-mm, linux-arm-kernel,
	linux-samsung-soc, linux-media, linux-s390, Daniel Vetter,
	Jason Gunthorpe, Kees Cook, Dan Williams, Andrew Morton,
	John Hubbard, Jérôme Glisse, Jan Kara, Pawel Osciak,
	Marek Szyprowski, Kyungmin Park, Tomasz Figa, Laurent Dufour,
	Vlastimil Babka, Daniel Jordan, Michel Lespinasse

Em Fri,  9 Oct 2020 09:59:27 +0200
Daniel Vetter <daniel.vetter@ffwll.ch> escreveu:

> The media model assumes that buffers are all preallocated, so that
> when a media pipeline is running we never miss a deadline because the
> buffers aren't allocated or available.
> 
> This means we cannot fix the v4l follow_pfn usage through
> mmu_notifier, without breaking how this all works. The only real fix
> is to deprecate userptr support for VM_IO | VM_PFNMAP mappings and
> tell everyone to cut over to dma-buf memory sharing for zerocopy.
> 
> userptr for normal memory will keep working as-is.

I won't repeat here the discussions for patch 09/17, but
just to be clear about this one:

NACK.

We need a better alternative to avoid breaking existing
media applications.

> 
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: linux-mm@kvack.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-samsung-soc@vger.kernel.org
> Cc: linux-media@vger.kernel.org
> Cc: Pawel Osciak <pawel@osciak.com>
> Cc: Marek Szyprowski <m.szyprowski@samsung.com>
> Cc: Kyungmin Park <kyungmin.park@samsung.com>
> Cc: Tomasz Figa <tfiga@chromium.org>
> Cc: Laurent Dufour <ldufour@linux.ibm.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
> Cc: Michel Lespinasse <walken@google.com>
> ---
>  drivers/media/common/videobuf2/frame_vector.c | 2 +-
>  drivers/media/v4l2-core/videobuf-dma-contig.c | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/media/common/videobuf2/frame_vector.c b/drivers/media/common/videobuf2/frame_vector.c
> index 2b0b97761d15..a1b85fe9e7c1 100644
> --- a/drivers/media/common/videobuf2/frame_vector.c
> +++ b/drivers/media/common/videobuf2/frame_vector.c
> @@ -69,7 +69,7 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
>  			break;
>  
>  		while (ret < nr_frames && start + PAGE_SIZE <= vma->vm_end) {
> -			err = follow_pfn(vma, start, &nums[ret]);
> +			err = unsafe_follow_pfn(vma, start, &nums[ret]);
>  			if (err) {
>  				if (ret == 0)
>  					ret = err;
> diff --git a/drivers/media/v4l2-core/videobuf-dma-contig.c b/drivers/media/v4l2-core/videobuf-dma-contig.c
> index 52312ce2ba05..821c4a76ab96 100644
> --- a/drivers/media/v4l2-core/videobuf-dma-contig.c
> +++ b/drivers/media/v4l2-core/videobuf-dma-contig.c
> @@ -183,7 +183,7 @@ static int videobuf_dma_contig_user_get(struct videobuf_dma_contig_memory *mem,
>  	user_address = untagged_baddr;
>  
>  	while (pages_done < (mem->size >> PAGE_SHIFT)) {
> -		ret = follow_pfn(vma, user_address, &this_pfn);
> +		ret = unsafe_follow_pfn(vma, user_address, &this_pfn);
>  		if (ret)
>  			break;
>  



Thanks,
Mauro

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 09/17] mm: Add unsafe_follow_pfn
  2020-10-10  9:21             ` Mauro Carvalho Chehab
@ 2020-10-10 10:53               ` Daniel Vetter
  2020-10-10 11:39                 ` Mauro Carvalho Chehab
  0 siblings, 1 reply; 69+ messages in thread
From: Daniel Vetter @ 2020-10-10 10:53 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Jason Gunthorpe, DRI Development, LKML, KVM list, Linux MM,
	Linux ARM, linux-samsung-soc,
	open list:DMA BUFFER SHARING FRAMEWORK, linux-s390,
	Daniel Vetter, Kees Cook, Dan Williams, Andrew Morton,
	John Hubbard, Jérôme Glisse, Jan Kara, Linus Torvalds

Hi Mauro,

You might want to read the patches more carefully, because what you're
demanding is what my patches do. Short summary:

- if STRICT_FOLLOW_PFN is set:
a) normal memory is handled as-is (i.e. your example works) through
the addition of FOLL_LONGTERM. This is the "pin the pages correctly"
approach you're demanding
b) for non-page memory (zerocopy sharing before dma-buf was upstreamed
is the only use-case for this) it is correctly rejected with -EINVAL

- if you do have blobby userspace which requires the zero-copy using
userptr to work, and doesn't have any of the fallbacks implemented
that you describe, this would be a regression. That's why
STRICT_FOLLOW_PFN can be unset. And yes there's a real security issue
in this usage, Marek also confirmed that the removal of the vma copy
code a few years ago essentially broke even the weak assumptions that
made the code work 10+ years ago when it was merged.

so tdlr; Everything you described will keep working even with the new
flag set, and everything you demand must be implemented _is_
implemented in this patch series.

Also please keep in mind that we are _not_ talking about the general
userptr support that was merge ~20 years ago. This patch series here
is _only_ about the zerocpy userptr support merged with 50ac952d2263
("[media] videobuf2-dma-sg: Support io userptr operations on io
memory") in 2013.

Why this hack was merged in 2013 when we merged dma-buf almost 2 years
before that I have no idea about. Imo that patch simply should never
have landed, and instead dma-buf support prioritized.

Cheers, Daniel


On Sat, Oct 10, 2020 at 11:21 AM Mauro Carvalho Chehab
<mchehab+huawei@kernel.org> wrote:
>
> Em Fri, 9 Oct 2020 19:52:05 +0200
> Daniel Vetter <daniel.vetter@ffwll.ch> escreveu:
>
> > On Fri, Oct 9, 2020 at 2:48 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > >
> > > On Fri, Oct 09, 2020 at 02:37:23PM +0200, Mauro Carvalho Chehab wrote:
> > >
> > > > I'm not a mm/ expert, but, from what I understood from Daniel's patch
> > > > description is that this is unsafe *only if*  __GFP_MOVABLE is used.
> > >
> > > No, it is unconditionally unsafe. The CMA movable mappings are
> > > specific VMAs that will have bad issues here, but there are other
> > > types too.
>
> I didn't check the mm dirty details, but I strongly suspect that the mm
> code has a way to prevent moving a mmapped page while it is still in usage.
>
> If not, then the entire movable pages concept sounds broken to me, and
> has to be fixed at mm subsystem.
>
> > >
> > > The only way to do something at a VMA level is to have a list of OK
> > > VMAs, eg because they were creatd via a special mmap helper from the
> > > media subsystem.
>
> I'm not sure if I'm following you on that. The media API can work with
> different ways of sending buffers to userspace:
>
>         - read();
>
>         - using the overlay mode. This interface is deprecated in
>           practice, being replaced by DMABUF. Only a few older hardware
>           supports it, and it depends on an special X11 helper driver
>           for it to work.
>
>         - via DMABUF:
>                 https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/dmabuf.html
>
>         - via mmap, using a mmap helper:
>                 https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/mmap.html
>
>         - via mmap, using userspace-provided pointers:
>                 https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/userp.html
>
> The existing open-source programs usually chose one or more of the above
> modes. if the Kernel driver returns -EINVAL when an mmapped streaming I/O
> mode is not supported, userspace has to select a different method.
>
> Most userspace open source programs have fallback support: if one
> mmap I/O method fails, it selects another one, although this is not
> a mandatory requirement. I found (and fixed) a few ones that were
> relying exclusively on userptr support, but I didn't make a
> comprehensive check.
>
> Also there are a number of relevant closed-source apps that we have no
> idea about what methods they use, like Skype, and other similar
> videoconferencing programs. Breaking support for those, specially at
> a time where people are relying on it in order to participate on
> conferences and doing internal meetings is a **very bad** idea.
>
> So, whatever solution is taken, it should not be dumping warning
> messages at the system and tainting the Kernel, but, instead, checking
> if the userspace request is valid or not. If it is invalid, return the
> proper error code via the right V4L2 ioctl, in order to let userspace
> choose a different method. I the request is valid, refcount the pages
> for them to not be moved while they're still in usage.
>
> -
>
> Let me provide some background about how things work at the media
> subsytem. If you want to know more, the userspace-provided memory
> mapped pointers work is described here:
>
>         https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/userp.html#userp
>
> Basically, userspace calls either one of those ioctls:
>
>         VIDIOC_CREATE_BUFS:
>                 https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/vidioc-create-bufs.html
>
> Which is translated into a videobuf2 call to: vb2_ioctl_create_bufs()
>
>         VIDIOC_REQBUFS
>                 https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/vidioc-reqbufs.html#vidioc-reqbufs
>
> Which is translated into a videobuf2 call to: vb2_ioctl_reqbufs()
>
> Both internally calls vb2_verify_memory_type(), which is responsible
> for checking if the provided pointers are OK for the usage and/or do
> all necessary page allocations, and taking care of any special
> requirements. This could easily have some additional checks to
> verify if the requested VMA address has pages that are movable or
> not, ensuring that ensure that the VMA is OK, and locking them in
> order to prevent the mm code to move such pages while they are in
> usage by the media subsystem.
>
> Now, as I said before, I don't know the dirty details about how
> to lock those pages at the mm subsystem in order to avoid it
> to move the used pages. Yet, when vb2_create_framevec()
> is called, the check if VMA is OK should already be happened
> at vb2_verify_memory_type().
>
> -
>
> It should be noticed that the dirty hack added by patch 09/17
> and 10/17 affects *all* types of memory allocations at V4L2,
> as this kAPI is called by the 3 different memory models
> supported at the media subsystem:
>
>         drivers/media/common/videobuf2/videobuf2-vmalloc.c
>         drivers/media/common/videobuf2/videobuf2-dma-contig.c
>         drivers/media/common/videobuf2/videobuf2-dma-sg.c
>
> In other words, with this code:
>
>         int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address,
>                 unsigned long *pfn)
>         {
>         #ifdef CONFIG_STRICT_FOLLOW_PFN
>                 pr_info("unsafe follow_pfn usage rejected, see CONFIG_STRICT_FOLLOW_PFN\n");
>                 return -EINVAL;
>         #else
>                 WARN_ONCE(1, "unsafe follow_pfn usage\n");
>                 add_taint(TAINT_USER, LOCKDEP_STILL_OK);
>
>                 return follow_pfn(vma, address, pfn);
>         #endif
>
> you're unconditionally breaking the media userspace API support not
> only for embedded systems that could be using userptr instead of
> DMA_BUF, but also for *all* video devices, including USB cameras.
>
> This is **NOT** an acceptable solution.
>
> So, I stand my NACK to this approach.
>
> > > > Well, no drivers inside the media subsystem uses such flag, although
> > > > they may rely on some infrastructure that could be using it behind
> > > > the bars.
> > >
> > > It doesn't matter, nothing prevents the user from calling media APIs
> > > on mmaps it gets from other subsystems.
> >
> > I think a good first step would be to disable userptr of non struct
> > page backed storage going forward for any new hw support. Even on
> > existing drivers. dma-buf sharing has been around for long enough now
> > that this shouldn't be a problem. Unfortunately right now this doesn't
> > seem to exist, so the entire problem keeps getting perpetuated.
>
> Well, the media uAPI does support DMABUF (both import and export):
>
>         https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/dmabuf.html
>         https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/vidioc-expbuf.html#vidioc-expbuf
>
> And I fully agree that newer programs should use DMABUF when sharing
> buffers with DRM subsystem, but that's not my main concern.
>
> What I do want is to not break userspace support nor to taint the Kernel
> due to a valid uAPI call.
>
> A valid uAPI call should check if the parameters passed though it are
> valid. If they are, it should handle. Otherwise, it should return
> -EINVAL, without tainting the Kernel or printing warning messages.
>
> The approach took by patches 09/17 and 10/17 doesn't do that.
> Instead, they just unconditionally breaks the media uAPI.
>
> What should be done, instead, is to drop patch 10/17, and work on
> a way for the code inside vb2_create_framevec() to ensure that, if
> USERPTR is used, the memory pages will be properly locked while the
> driver is using, returning -EINVAL only if there's no way to proceed,
> without tainting the Kernel.
>
> >
> > > > If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE
> > > > flag that it would be denying the core mm code to set __GFP_MOVABLE.
> > >
> > > We can't tell from the VMA these kinds of details..
> > >
> > > It has to go the other direction, evey mmap that might be used as a
> > > userptr here has to be found and the VMA specially created to allow
> > > its use. At least that is a kernel only change, but will need people
> > > with the HW to do this work.
> >
> > I think the only reasonable way to keep this working is:
> > - add a struct dma_buf *vma_tryget_dma_buf(struct vm_area_struct *vma);
>
> Not sure how an userspace buffer could be mapped to be using it,
> specially since the buffer may not even be imported/exported
> from the DRM subsystem, but it could simply be allocated
> via glibc calloc()/malloc().
>
> > - add dma-buf export support to fbdev and v4l
>
> V4L has support for it already.
>
> > - roll this out everywhere we still need it.
>
> That's where things are hard. This is not like DRM, where the APIs
> are called via some open source libraries that are also managed
> by DRM upstream developers.
>
> In the case of the media subsystem, while we added a libv4l sometime
> ago, not all userspace apps use it, as a large part of them used
> to exist before the addition of the libraries. Also, we're currently
> trying to deprecate libv4l, at least for embedded systems, in favor
> of libcamera.
>
> On media, there are lots of closed source apps that uses the media API
> directly. Even talking about open source ones, there are lots of
> different apps, including not only media-specific apps, but also
> generic ones, like web browsers, which don't use the libraries we
> wrote.
>
> An userspace API breakage would take *huge* efforts and will take
> lots of time to have it merged everywhere. It will cause lots of
> troubles everywhere.
>
> > Realistically this just isn't going to happen.
>
> Why not? Any app developer could already use DMA-BUF if required,
> as the upstream support was added several years ago.
>
> > And anything else just
> > reimplements half of dma-buf,
>
> It is just the opposite: those uAPIs came years before dma-buf.
> In other words, it was dma-buf that re-implemented it ;-)
>
> Now, I agree with you that dma-buf is a way cooler than the past
> alternatives.
>
> -
>
> It sounds to me that you're considering on only one use case of
> USERPTR: to pass a buffer created from DRM. As far as I'm aware,
> only embedded userspace applications actually do that.
>
> Yet, there are a number of other applications that do something like
> the userptr_capture() function on this code:
>
>         https://git.linuxtv.org/v4l-utils.git/tree/contrib/test/v4l2grab.c
>
> E. g. using glibc alloc functions like calloc() to allocate memory,
> passing the user-allocated data to the Kernel via something like this:
>
>         struct v4l2_requestbuffers req;
>         struct v4l2_buffer buf;
>         int n_buffers = 2;
>
>         req.count  = 2;
>         req.type   = V4L2_BUF_TYPE_VIDEO_CAPTURE;
>         req.memory = V4L2_MEMORY_USERPTR;
>         if (ioctl(fd, VIDIOC_REQBUFS, &req))
>                 return errno;
>
>         for (i = 0; i < req.count; ++i) {
>                 buf.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
>                 buf.memory = V4L2_MEMORY_USERPTR;
>                 buf.index = i;
>                 buf.m.userptr = (unsigned long)buffers[i].start;
>                 buf.length = buffers[i].length;
>                 if (ioctl(fd, VIDIOC_QBUF, &buf))
>                         return errno;
>         }
>         if (ioctl(fd, VIDIOC_STREAMON, &req.type))
>                 return errno;
>
>         /* some capture loop */
>
>         ioctl(fd, VIDIOC_STREAMOFF, &req.type);
>
> I can't possibly see *any* security issues with the above code.
>
> As I said before, VIDIOC_REQBUFS should be checking if the userspace
> buffers are OK and ensure that their refcounts will be incremented,
> in order to avoid mm to move the pages used there, freeing the
> refconts when VIDIOC_STREAMOFF - or close(fd) - is called.
>
> > which is kinda pointless (you need
> > minimally refcounting and some way to get at a promise of a permanent
> > sg list for dma. Plus probably the vmap for kernel cpu access.
>
> Yeah, refcounting needs to happen.
>
> Thanks,
> Mauro



--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 09/17] mm: Add unsafe_follow_pfn
  2020-10-10 10:53               ` Daniel Vetter
@ 2020-10-10 11:39                 ` Mauro Carvalho Chehab
  2020-10-10 11:56                   ` Daniel Vetter
  0 siblings, 1 reply; 69+ messages in thread
From: Mauro Carvalho Chehab @ 2020-10-10 11:39 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Jason Gunthorpe, DRI Development, LKML, KVM list, Linux MM,
	Linux ARM, linux-samsung-soc,
	open list:DMA BUFFER SHARING FRAMEWORK, linux-s390,
	Daniel Vetter, Kees Cook, Dan Williams, Andrew Morton,
	John Hubbard, Jérôme Glisse, Jan Kara, Linus Torvalds

Em Sat, 10 Oct 2020 12:53:49 +0200
Daniel Vetter <daniel.vetter@ffwll.ch> escreveu:

> Hi Mauro,
> 
> You might want to read the patches more carefully, because what you're
> demanding is what my patches do. Short summary:
> 
> - if STRICT_FOLLOW_PFN is set:
> a) normal memory is handled as-is (i.e. your example works) through
> the addition of FOLL_LONGTERM. This is the "pin the pages correctly"
> approach you're demanding
> b) for non-page memory (zerocopy sharing before dma-buf was upstreamed
> is the only use-case for this) it is correctly rejected with -EINVAL
> 
> - if you do have blobby userspace which requires the zero-copy using
> userptr to work, and doesn't have any of the fallbacks implemented
> that you describe, this would be a regression. That's why
> STRICT_FOLLOW_PFN can be unset. And yes there's a real security issue
> in this usage, Marek also confirmed that the removal of the vma copy
> code a few years ago essentially broke even the weak assumptions that
> made the code work 10+ years ago when it was merged.
> 
> so tdlr; Everything you described will keep working even with the new
> flag set, and everything you demand must be implemented _is_
> implemented in this patch series.
> 
> Also please keep in mind that we are _not_ talking about the general
> userptr support that was merge ~20 years ago. This patch series here
> is _only_ about the zerocpy userptr support merged with 50ac952d2263
> ("[media] videobuf2-dma-sg: Support io userptr operations on io
> memory") in 2013.

Ok, now it is making more sense. Please update the comments for
patch 10/17 to describe the above.

We need some time to test this though, in order to check if no
regressions were added (except the ones due to changeset 50ac952d2263).

> 
> Why this hack was merged in 2013 when we merged dma-buf almost 2 years
> before that I have no idea about. Imo that patch simply should never
> have landed, and instead dma-buf support prioritized.

If I recall correctly, we didn't have any DMABUF support
at the media subsystem, back on 2013.

It took some time for the DMA-BUF to arrive at media, as this
was not a top priority. Also, there aren't many developers that
understand the memory model well enough to implement DMA-BUF support
and touch the VB2 code, which is quite complex, as it supports
lots of different ways for I/O, plus works with vmalloc, DMA
contig and DMA scatter/gather. 

Changes there should carefully be tested against different
drivers, in order to avoid regressions on it.

> Cheers, Daniel

Thanks,
Mauro

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 09/17] mm: Add unsafe_follow_pfn
  2020-10-10 11:39                 ` Mauro Carvalho Chehab
@ 2020-10-10 11:56                   ` Daniel Vetter
  0 siblings, 0 replies; 69+ messages in thread
From: Daniel Vetter @ 2020-10-10 11:56 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Jason Gunthorpe, DRI Development, LKML, KVM list, Linux MM,
	Linux ARM, linux-samsung-soc,
	open list:DMA BUFFER SHARING FRAMEWORK, linux-s390,
	Daniel Vetter, Kees Cook, Dan Williams, Andrew Morton,
	John Hubbard, Jérôme Glisse, Jan Kara, Linus Torvalds

On Sat, Oct 10, 2020 at 1:39 PM Mauro Carvalho Chehab
<mchehab+huawei@kernel.org> wrote:
>
> Em Sat, 10 Oct 2020 12:53:49 +0200
> Daniel Vetter <daniel.vetter@ffwll.ch> escreveu:
>
> > Hi Mauro,
> >
> > You might want to read the patches more carefully, because what you're
> > demanding is what my patches do. Short summary:
> >
> > - if STRICT_FOLLOW_PFN is set:
> > a) normal memory is handled as-is (i.e. your example works) through
> > the addition of FOLL_LONGTERM. This is the "pin the pages correctly"
> > approach you're demanding
> > b) for non-page memory (zerocopy sharing before dma-buf was upstreamed
> > is the only use-case for this) it is correctly rejected with -EINVAL
> >
> > - if you do have blobby userspace which requires the zero-copy using
> > userptr to work, and doesn't have any of the fallbacks implemented
> > that you describe, this would be a regression. That's why
> > STRICT_FOLLOW_PFN can be unset. And yes there's a real security issue
> > in this usage, Marek also confirmed that the removal of the vma copy
> > code a few years ago essentially broke even the weak assumptions that
> > made the code work 10+ years ago when it was merged.
> >
> > so tdlr; Everything you described will keep working even with the new
> > flag set, and everything you demand must be implemented _is_
> > implemented in this patch series.
> >
> > Also please keep in mind that we are _not_ talking about the general
> > userptr support that was merge ~20 years ago. This patch series here
> > is _only_ about the zerocpy userptr support merged with 50ac952d2263
> > ("[media] videobuf2-dma-sg: Support io userptr operations on io
> > memory") in 2013.
>
> Ok, now it is making more sense. Please update the comments for
> patch 10/17 to describe the above.

Will do.

> We need some time to test this though, in order to check if no
> regressions were added (except the ones due to changeset 50ac952d2263).

Yeah testing of the previous patches to switch to FOLL_LONGTERM would
be really good. I also need that for habanalabs and ideally exynos
too. All the userptr for normal memory should keep working, and with
FOLL_LONGTERM it should actually work better, since with that it
should now correctly interact with pagecache and fs code, not just
with anon memory from malloc.

Thanks, Daniel

> > Why this hack was merged in 2013 when we merged dma-buf almost 2 years
> > before that I have no idea about. Imo that patch simply should never
> > have landed, and instead dma-buf support prioritized.
>
> If I recall correctly, we didn't have any DMABUF support
> at the media subsystem, back on 2013.
>
> It took some time for the DMA-BUF to arrive at media, as this
> was not a top priority. Also, there aren't many developers that
> understand the memory model well enough to implement DMA-BUF support
> and touch the VB2 code, which is quite complex, as it supports
> lots of different ways for I/O, plus works with vmalloc, DMA
> contig and DMA scatter/gather.
>
> Changes there should carefully be tested against different
> drivers, in order to avoid regressions on it.
>
> > Cheers, Daniel
>
> Thanks,
> Mauro



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 09/17] mm: Add unsafe_follow_pfn
  2020-10-09 17:52           ` Daniel Vetter
  2020-10-09 18:01             ` Jason Gunthorpe
  2020-10-10  9:21             ` Mauro Carvalho Chehab
@ 2020-10-10 17:22             ` Tomasz Figa
  2020-10-10 21:35               ` Laurent Pinchart
  2020-10-10 21:11             ` Laurent Pinchart
  3 siblings, 1 reply; 69+ messages in thread
From: Tomasz Figa @ 2020-10-10 17:22 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Jason Gunthorpe, Mauro Carvalho Chehab, DRI Development, LKML,
	KVM list, Linux MM, Linux ARM, linux-samsung-soc,
	open list:DMA BUFFER SHARING FRAMEWORK, linux-s390,
	Daniel Vetter, Kees Cook, Dan Williams, Andrew Morton,
	John Hubbard, Jérôme Glisse, Jan Kara, Linus Torvalds

Hi Daniel,

On Fri, Oct 9, 2020 at 7:52 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
>
> On Fri, Oct 9, 2020 at 2:48 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Fri, Oct 09, 2020 at 02:37:23PM +0200, Mauro Carvalho Chehab wrote:
> >
> > > I'm not a mm/ expert, but, from what I understood from Daniel's patch
> > > description is that this is unsafe *only if*  __GFP_MOVABLE is used.
> >
> > No, it is unconditionally unsafe. The CMA movable mappings are
> > specific VMAs that will have bad issues here, but there are other
> > types too.
> >
> > The only way to do something at a VMA level is to have a list of OK
> > VMAs, eg because they were creatd via a special mmap helper from the
> > media subsystem.
> >
> > > Well, no drivers inside the media subsystem uses such flag, although
> > > they may rely on some infrastructure that could be using it behind
> > > the bars.
> >
> > It doesn't matter, nothing prevents the user from calling media APIs
> > on mmaps it gets from other subsystems.
>
> I think a good first step would be to disable userptr of non struct
> page backed storage going forward for any new hw support. Even on
> existing drivers. dma-buf sharing has been around for long enough now
> that this shouldn't be a problem. Unfortunately right now this doesn't
> seem to exist, so the entire problem keeps getting perpetuated.
>
> > > If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE
> > > flag that it would be denying the core mm code to set __GFP_MOVABLE.
> >
> > We can't tell from the VMA these kinds of details..
> >
> > It has to go the other direction, evey mmap that might be used as a
> > userptr here has to be found and the VMA specially created to allow
> > its use. At least that is a kernel only change, but will need people
> > with the HW to do this work.
>
> I think the only reasonable way to keep this working is:
> - add a struct dma_buf *vma_tryget_dma_buf(struct vm_area_struct *vma);
> - add dma-buf export support to fbdev and v4l

I assume you mean V4L2 and not the obsolete V4L that is emulated in
the userspace by libv4l. If so, every video device that uses videobuf2
gets DMA-buf export for free and there is nothing needed to enable it.

We probably still have a few legacy drivers using videobuf (non-2),
but IMHO those should be safe to put behind some disabled-by-default
Kconfig symbol or even completely drop, as the legacy framework has
been deprecated for many years already.

> - roll this out everywhere we still need it.
>
> Realistically this just isn't going to happen. And anything else just
> reimplements half of dma-buf, which is kinda pointless (you need
> minimally refcounting and some way to get at a promise of a permanent
> sg list for dma. Plus probably the vmap for kernel cpu access.
>
> > > Please let address the issue on this way, instead of broken an
> > > userspace API that it is there since 1991.
> >
> > It has happened before :( It took 4 years for RDMA to undo the uAPI
> > breakage caused by a security fix for something that was a 15 years
> > old bug.
>
> Yeah we have a bunch of these on the drm side too. Some of them are
> really just "you have to upgrade userspace", and there's no real fix
> for the security nightmare without that.

I think we need to phase out such userspace indeed. The Kconfig symbol
allows enabling the unsafe functionality for anyone who still needs
it, so I think it's not entirely a breakage.

Best regards,
Tomasz

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 09/17] mm: Add unsafe_follow_pfn
  2020-10-09 12:37       ` Mauro Carvalho Chehab
  2020-10-09 12:39         ` Mauro Carvalho Chehab
  2020-10-09 12:48         ` Jason Gunthorpe
@ 2020-10-10 17:30         ` Tomasz Figa
  2 siblings, 0 replies; 69+ messages in thread
From: Tomasz Figa @ 2020-10-10 17:30 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Jason Gunthorpe, Daniel Vetter, DRI Development, LKML, kvm,
	Linux MM,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>,,
	linux-samsung-soc, Linux Media Mailing List, linux-s390,
	Daniel Vetter, Kees Cook, Dan Williams, Andrew Morton,
	John Hubbard, Jérôme Glisse, Jan Kara, Linus Torvalds

Hi Mauro,

On Fri, Oct 9, 2020 at 2:37 PM Mauro Carvalho Chehab
<mchehab+huawei@kernel.org> wrote:
>
> Em Fri, 9 Oct 2020 09:21:11 -0300
> Jason Gunthorpe <jgg@ziepe.ca> escreveu:
>
> > On Fri, Oct 09, 2020 at 12:34:21PM +0200, Mauro Carvalho Chehab wrote:
> > > Hi,
> > >
> > > Em Fri,  9 Oct 2020 09:59:26 +0200
> > > Daniel Vetter <daniel.vetter@ffwll.ch> escreveu:
> > >
> > > > Way back it was a reasonable assumptions that iomem mappings never
> > > > change the pfn range they point at. But this has changed:
> > > >
> > > > - gpu drivers dynamically manage their memory nowadays, invalidating
> > > > ptes with unmap_mapping_range when buffers get moved
> > > >
> > > > - contiguous dma allocations have moved from dedicated carvetouts to
> > > > cma regions. This means if we miss the unmap the pfn might contain
> > > > pagecache or anon memory (well anything allocated with GFP_MOVEABLE)
> > > >
> > > > - even /dev/mem now invalidates mappings when the kernel requests that
> > > > iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
> > > > ("/dev/mem: Revoke mappings when a driver claims the region")
> > > >
> > > > Accessing pfns obtained from ptes without holding all the locks is
> > > > therefore no longer a good idea.
> > > >
> > > > Unfortunately there's some users where this is not fixable (like v4l
> > > > userptr of iomem mappings) or involves a pile of work (vfio type1
> > > > iommu). For now annotate these as unsafe and splat appropriately.
> > > >
> > > > This patch adds an unsafe_follow_pfn, which later patches will then
> > > > roll out to all appropriate places.
> > >
> > > NACK, as this breaks an existing userspace API on media.
> >
> > It doesn't break it. You get a big warning the thing is broken and it
> > keeps working.
> >
> > We can't leave such a huge security hole open - it impacts other
> > subsystems, distros need to be able to run in a secure mode.
>
> Well, if distros disable it, then apps will break.
>

Do we have any information on userspace that actually needs this functionality?

Note that we're _not_ talking here about the complete USERPTR
functionality, but rather just the very corner case of carveout memory
not backed by struct pages.

Given that the current in-tree ways of reserving carveout memory, such
as shared-dma-pool, actually give memory backed by struct pages, do we
even have a source of such legacy memory in the kernel today?

I think that given that this is a very niche functionality, we could
have it disabled by default for security reasons and if someone
_really_ (i.e. there is no replacement) needs it, they probably need
to use a custom kernel build anyway for their exotic hardware setup
(with PFN-backed carveout memory), so they can enable it.

> > > While I agree that using the userptr on media is something that
> > > new drivers may not support, as DMABUF is a better way of
> > > handling it, changing this for existing ones is a big no,
> > > as it may break usersapace.
> >
> > media community needs to work to fix this, not pretend it is OK to
> > keep going as-is.
>
> > Dealing with security issues is the one case where an uABI break might
> > be acceptable.
> >
> > If you want to NAK it then you need to come up with the work to do
> > something here correctly that will support the old drivers without the
> > kernel taint.
> >
> > Unfortunately making things uncomfortable for the subsystem is the big
> > hammer the core kernel needs to use to actually get this security work
> > done by those responsible.
>
>
> I'm not pretending that this is ok. Just pointing that the approach
> taken is NOT OK.
>
> I'm not a mm/ expert, but, from what I understood from Daniel's patch
> description is that this is unsafe *only if*  __GFP_MOVABLE is used.
>
> Well, no drivers inside the media subsystem uses such flag, although
> they may rely on some infrastructure that could be using it behind
> the bars.
>
> If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE
> flag that it would be denying the core mm code to set __GFP_MOVABLE.
>
> Please let address the issue on this way, instead of broken an
> userspace API that it is there since 1991.

Note that USERPTR as a whole generally has been considered deprecated
in V4L2 for many years and people have been actively discouraged to
use it. And, still, we're just talking here about the very rare corner
case, not the whole USERPTR API.

Best regards,
Tomasz

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 03/17] misc/habana: Stop using frame_vector helpers
  2020-10-09  7:59 ` [PATCH v2 03/17] misc/habana: Stop using frame_vector helpers Daniel Vetter
@ 2020-10-10 20:26   ` Oded Gabbay
  2020-10-10 21:32     ` Daniel Vetter
  2020-10-16  7:45   ` John Hubbard
  1 sibling, 1 reply; 69+ messages in thread
From: Oded Gabbay @ 2020-10-10 20:26 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: DRI Development, LKML, kvm, linux-mm,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>,,
	linux-samsung-soc, Linux Media Mailing List, linux-s390,
	Daniel Vetter, Jason Gunthorpe, Andrew Morton, John Hubbard,
	Jérôme Glisse, Jan Kara, Dan Williams, Omer Shpigelman,
	Ofir Bitton, Tomer Tayar, Moti Haimovski, Greg Kroah-Hartman,
	Pawel Piskorski

On Fri, Oct 9, 2020 at 10:59 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
>
> All we need are a pages array, pin_user_pages_fast can give us that
> directly. Plus this avoids the entire raw pfn side of get_vaddr_frames.
>
Thanks for the patch Daniel.

> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: linux-mm@kvack.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-samsung-soc@vger.kernel.org
> Cc: linux-media@vger.kernel.org
> Cc: Oded Gabbay <oded.gabbay@gmail.com>
> Cc: Omer Shpigelman <oshpigelman@habana.ai>
> Cc: Ofir Bitton <obitton@habana.ai>
> Cc: Tomer Tayar <ttayar@habana.ai>
> Cc: Moti Haimovski <mhaimovski@habana.ai>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Pawel Piskorski <ppiskorski@habana.ai>
> --
> v2: Use unpin_user_pages_dirty_lock (John)
> ---
>  drivers/misc/habanalabs/Kconfig             |  1 -
>  drivers/misc/habanalabs/common/habanalabs.h |  3 +-
>  drivers/misc/habanalabs/common/memory.c     | 49 ++++++++-------------
>  3 files changed, 20 insertions(+), 33 deletions(-)
>
> diff --git a/drivers/misc/habanalabs/Kconfig b/drivers/misc/habanalabs/Kconfig
> index 8eb5d38c618e..2f04187f7167 100644
> --- a/drivers/misc/habanalabs/Kconfig
> +++ b/drivers/misc/habanalabs/Kconfig
> @@ -6,7 +6,6 @@
>  config HABANA_AI
>         tristate "HabanaAI accelerators (habanalabs)"
>         depends on PCI && HAS_IOMEM
> -       select FRAME_VECTOR
>         select DMA_SHARED_BUFFER
>         select GENERIC_ALLOCATOR
>         select HWMON
> diff --git a/drivers/misc/habanalabs/common/habanalabs.h b/drivers/misc/habanalabs/common/habanalabs.h
> index edbd627b29d2..c1b3ad613b15 100644
> --- a/drivers/misc/habanalabs/common/habanalabs.h
> +++ b/drivers/misc/habanalabs/common/habanalabs.h
> @@ -881,7 +881,8 @@ struct hl_ctx_mgr {
>  struct hl_userptr {
>         enum vm_type_t          vm_type; /* must be first */
>         struct list_head        job_node;
> -       struct frame_vector     *vec;
> +       struct page             **pages;
> +       unsigned int            npages;
Can you please update the kerneldoc comment section of this structure
according to your changes ?

>         struct sg_table         *sgt;
>         enum dma_data_direction dir;
>         struct list_head        debugfs_list;
> diff --git a/drivers/misc/habanalabs/common/memory.c b/drivers/misc/habanalabs/common/memory.c
> index 5ff4688683fd..327b64479f97 100644
> --- a/drivers/misc/habanalabs/common/memory.c
> +++ b/drivers/misc/habanalabs/common/memory.c
> @@ -1281,45 +1281,41 @@ static int get_user_memory(struct hl_device *hdev, u64 addr, u64 size,
>                 return -EFAULT;
>         }
>
> -       userptr->vec = frame_vector_create(npages);
> -       if (!userptr->vec) {
> +       userptr->pages = kvmalloc_array(npages, sizeof(*userptr->pages),
> +                                       GFP_KERNEL);
> +       if (!userptr->pages) {
>                 dev_err(hdev->dev, "Failed to create frame vector\n");
>                 return -ENOMEM;
>         }
>
> -       rc = get_vaddr_frames(start, npages, FOLL_FORCE | FOLL_WRITE,
> -                               userptr->vec);
> +       rc = pin_user_pages_fast(start, npages, FOLL_FORCE | FOLL_WRITE,
> +                                userptr->pages);
>
>         if (rc != npages) {
>                 dev_err(hdev->dev,
>                         "Failed to map host memory, user ptr probably wrong\n");
>                 if (rc < 0)
> -                       goto destroy_framevec;
> +                       goto destroy_pages;
> +               npages = rc;
>                 rc = -EFAULT;
> -               goto put_framevec;
> -       }
> -
> -       if (frame_vector_to_pages(userptr->vec) < 0) {
> -               dev_err(hdev->dev,
> -                       "Failed to translate frame vector to pages\n");
> -               rc = -EFAULT;
> -               goto put_framevec;
> +               goto put_pages;
>         }
> +       userptr->npages = npages;
>
>         rc = sg_alloc_table_from_pages(userptr->sgt,
> -                                       frame_vector_pages(userptr->vec),
> -                                       npages, offset, size, GFP_ATOMIC);
> +                                      userptr->pages,
> +                                      npages, offset, size, GFP_ATOMIC);
I think that because the call to kvmalloc_array() is done with
GFP_KERNEL, there is no point in using GFP_ATOMIC here.
And actually, this path only needs to avoid yielding when using a
special debug mode.
So I suggest putting here GFP_KERNEL.

In the meanwhile, I'll run this patch (coupled with the next patch) in
our C/I to make sure there are no regressions.
Thanks,
Oded

>         if (rc < 0) {
>                 dev_err(hdev->dev, "failed to create SG table from pages\n");
> -               goto put_framevec;
> +               goto put_pages;
>         }
>
>         return 0;
>
> -put_framevec:
> -       put_vaddr_frames(userptr->vec);
> -destroy_framevec:
> -       frame_vector_destroy(userptr->vec);
> +put_pages:
> +       unpin_user_pages(userptr->pages, npages);
> +destroy_pages:
> +       kvfree(userptr->pages);
>         return rc;
>  }
>
> @@ -1405,8 +1401,6 @@ int hl_pin_host_memory(struct hl_device *hdev, u64 addr, u64 size,
>   */
>  void hl_unpin_host_memory(struct hl_device *hdev, struct hl_userptr *userptr)
>  {
> -       struct page **pages;
> -
>         hl_debugfs_remove_userptr(hdev, userptr);
>
>         if (userptr->dma_mapped)
> @@ -1414,15 +1408,8 @@ void hl_unpin_host_memory(struct hl_device *hdev, struct hl_userptr *userptr)
>                                                         userptr->sgt->nents,
>                                                         userptr->dir);
>
> -       pages = frame_vector_pages(userptr->vec);
> -       if (!IS_ERR(pages)) {
> -               int i;
> -
> -               for (i = 0; i < frame_vector_count(userptr->vec); i++)
> -                       set_page_dirty_lock(pages[i]);
> -       }
> -       put_vaddr_frames(userptr->vec);
> -       frame_vector_destroy(userptr->vec);
> +       unpin_user_pages_dirty_lock(userptr->pages, userptr->npages, true);
> +       kvfree(userptr->pages);
>
>         list_del(&userptr->job_node);
>
> --
> 2.28.0
>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 09/17] mm: Add unsafe_follow_pfn
  2020-10-09 17:52           ` Daniel Vetter
                               ` (2 preceding siblings ...)
  2020-10-10 17:22             ` Tomasz Figa
@ 2020-10-10 21:11             ` Laurent Pinchart
  3 siblings, 0 replies; 69+ messages in thread
From: Laurent Pinchart @ 2020-10-10 21:11 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Jason Gunthorpe, Mauro Carvalho Chehab, DRI Development, LKML,
	KVM list, Linux MM, Linux ARM, linux-samsung-soc,
	open list:DMA BUFFER SHARING FRAMEWORK, linux-s390,
	Daniel Vetter, Kees Cook, Dan Williams, Andrew Morton,
	John Hubbard, Jérôme Glisse, Jan Kara, Linus Torvalds

Hi Daniel,

On Fri, Oct 09, 2020 at 07:52:05PM +0200, Daniel Vetter wrote:
> On Fri, Oct 9, 2020 at 2:48 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > On Fri, Oct 09, 2020 at 02:37:23PM +0200, Mauro Carvalho Chehab wrote:
> >
> > > I'm not a mm/ expert, but, from what I understood from Daniel's patch
> > > description is that this is unsafe *only if*  __GFP_MOVABLE is used.
> >
> > No, it is unconditionally unsafe. The CMA movable mappings are
> > specific VMAs that will have bad issues here, but there are other
> > types too.
> >
> > The only way to do something at a VMA level is to have a list of OK
> > VMAs, eg because they were creatd via a special mmap helper from the
> > media subsystem.
> >
> > > Well, no drivers inside the media subsystem uses such flag, although
> > > they may rely on some infrastructure that could be using it behind
> > > the bars.
> >
> > It doesn't matter, nothing prevents the user from calling media APIs
> > on mmaps it gets from other subsystems.
> 
> I think a good first step would be to disable userptr of non struct
> page backed storage going forward for any new hw support. Even on
> existing drivers. dma-buf sharing has been around for long enough now
> that this shouldn't be a problem. Unfortunately right now this doesn't
> seem to exist, so the entire problem keeps getting perpetuated.

On the V4L2 side, I think we should disable USERPTR for any new driver,
period. That's what I've been recommended when reviewing patches for
several years already. It's a deprecated API, it should be phased out,
which starts by not allowing any new use case.

> > > If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE
> > > flag that it would be denying the core mm code to set __GFP_MOVABLE.
> >
> > We can't tell from the VMA these kinds of details..
> >
> > It has to go the other direction, evey mmap that might be used as a
> > userptr here has to be found and the VMA specially created to allow
> > its use. At least that is a kernel only change, but will need people
> > with the HW to do this work.
> 
> I think the only reasonable way to keep this working is:
> - add a struct dma_buf *vma_tryget_dma_buf(struct vm_area_struct *vma);
> - add dma-buf export support to fbdev and v4l
> - roll this out everywhere we still need it.
> 
> Realistically this just isn't going to happen. And anything else just
> reimplements half of dma-buf, which is kinda pointless (you need
> minimally refcounting and some way to get at a promise of a permanent
> sg list for dma. Plus probably the vmap for kernel cpu access.
> 
> > > Please let address the issue on this way, instead of broken an
> > > userspace API that it is there since 1991.
> >
> > It has happened before :( It took 4 years for RDMA to undo the uAPI
> > breakage caused by a security fix for something that was a 15 years
> > old bug.
> 
> Yeah we have a bunch of these on the drm side too. Some of them are
> really just "you have to upgrade userspace", and there's no real fix
> for the security nightmare without that.

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 03/17] misc/habana: Stop using frame_vector helpers
  2020-10-10 20:26   ` Oded Gabbay
@ 2020-10-10 21:32     ` Daniel Vetter
  2020-10-10 21:41       ` Daniel Vetter
  0 siblings, 1 reply; 69+ messages in thread
From: Daniel Vetter @ 2020-10-10 21:32 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: DRI Development, LKML, KVM list, linux-mm,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>,,
	linux-samsung-soc, Linux Media Mailing List, linux-s390,
	Daniel Vetter, Jason Gunthorpe, Andrew Morton, John Hubbard,
	Jérôme Glisse, Jan Kara, Dan Williams, Omer Shpigelman,
	Ofir Bitton, Tomer Tayar, Moti Haimovski, Greg Kroah-Hartman,
	Pawel Piskorski

On Sat, Oct 10, 2020 at 10:27 PM Oded Gabbay <oded.gabbay@gmail.com> wrote:
>
> On Fri, Oct 9, 2020 at 10:59 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> >
> > All we need are a pages array, pin_user_pages_fast can give us that
> > directly. Plus this avoids the entire raw pfn side of get_vaddr_frames.
> >
> Thanks for the patch Daniel.
>
> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > Cc: Jason Gunthorpe <jgg@ziepe.ca>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: John Hubbard <jhubbard@nvidia.com>
> > Cc: Jérôme Glisse <jglisse@redhat.com>
> > Cc: Jan Kara <jack@suse.cz>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: linux-mm@kvack.org
> > Cc: linux-arm-kernel@lists.infradead.org
> > Cc: linux-samsung-soc@vger.kernel.org
> > Cc: linux-media@vger.kernel.org
> > Cc: Oded Gabbay <oded.gabbay@gmail.com>
> > Cc: Omer Shpigelman <oshpigelman@habana.ai>
> > Cc: Ofir Bitton <obitton@habana.ai>
> > Cc: Tomer Tayar <ttayar@habana.ai>
> > Cc: Moti Haimovski <mhaimovski@habana.ai>
> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > Cc: Pawel Piskorski <ppiskorski@habana.ai>
> > --
> > v2: Use unpin_user_pages_dirty_lock (John)
> > ---
> >  drivers/misc/habanalabs/Kconfig             |  1 -
> >  drivers/misc/habanalabs/common/habanalabs.h |  3 +-
> >  drivers/misc/habanalabs/common/memory.c     | 49 ++++++++-------------
> >  3 files changed, 20 insertions(+), 33 deletions(-)
> >
> > diff --git a/drivers/misc/habanalabs/Kconfig b/drivers/misc/habanalabs/Kconfig
> > index 8eb5d38c618e..2f04187f7167 100644
> > --- a/drivers/misc/habanalabs/Kconfig
> > +++ b/drivers/misc/habanalabs/Kconfig
> > @@ -6,7 +6,6 @@
> >  config HABANA_AI
> >         tristate "HabanaAI accelerators (habanalabs)"
> >         depends on PCI && HAS_IOMEM
> > -       select FRAME_VECTOR
> >         select DMA_SHARED_BUFFER
> >         select GENERIC_ALLOCATOR
> >         select HWMON
> > diff --git a/drivers/misc/habanalabs/common/habanalabs.h b/drivers/misc/habanalabs/common/habanalabs.h
> > index edbd627b29d2..c1b3ad613b15 100644
> > --- a/drivers/misc/habanalabs/common/habanalabs.h
> > +++ b/drivers/misc/habanalabs/common/habanalabs.h
> > @@ -881,7 +881,8 @@ struct hl_ctx_mgr {
> >  struct hl_userptr {
> >         enum vm_type_t          vm_type; /* must be first */
> >         struct list_head        job_node;
> > -       struct frame_vector     *vec;
> > +       struct page             **pages;
> > +       unsigned int            npages;
> Can you please update the kerneldoc comment section of this structure
> according to your changes ?

Apologies I missed the nice kerneldoc. I'll fix that in the next round.


> >         struct sg_table         *sgt;
> >         enum dma_data_direction dir;
> >         struct list_head        debugfs_list;
> > diff --git a/drivers/misc/habanalabs/common/memory.c b/drivers/misc/habanalabs/common/memory.c
> > index 5ff4688683fd..327b64479f97 100644
> > --- a/drivers/misc/habanalabs/common/memory.c
> > +++ b/drivers/misc/habanalabs/common/memory.c
> > @@ -1281,45 +1281,41 @@ static int get_user_memory(struct hl_device *hdev, u64 addr, u64 size,
> >                 return -EFAULT;
> >         }
> >
> > -       userptr->vec = frame_vector_create(npages);
> > -       if (!userptr->vec) {
> > +       userptr->pages = kvmalloc_array(npages, sizeof(*userptr->pages),
> > +                                       GFP_KERNEL);
> > +       if (!userptr->pages) {
> >                 dev_err(hdev->dev, "Failed to create frame vector\n");
> >                 return -ENOMEM;
> >         }
> >
> > -       rc = get_vaddr_frames(start, npages, FOLL_FORCE | FOLL_WRITE,
> > -                               userptr->vec);
> > +       rc = pin_user_pages_fast(start, npages, FOLL_FORCE | FOLL_WRITE,
> > +                                userptr->pages);
> >
> >         if (rc != npages) {
> >                 dev_err(hdev->dev,
> >                         "Failed to map host memory, user ptr probably wrong\n");
> >                 if (rc < 0)
> > -                       goto destroy_framevec;
> > +                       goto destroy_pages;
> > +               npages = rc;
> >                 rc = -EFAULT;
> > -               goto put_framevec;
> > -       }
> > -
> > -       if (frame_vector_to_pages(userptr->vec) < 0) {
> > -               dev_err(hdev->dev,
> > -                       "Failed to translate frame vector to pages\n");
> > -               rc = -EFAULT;
> > -               goto put_framevec;
> > +               goto put_pages;
> >         }
> > +       userptr->npages = npages;
> >
> >         rc = sg_alloc_table_from_pages(userptr->sgt,
> > -                                       frame_vector_pages(userptr->vec),
> > -                                       npages, offset, size, GFP_ATOMIC);
> > +                                      userptr->pages,
> > +                                      npages, offset, size, GFP_ATOMIC);
> I think that because the call to kvmalloc_array() is done with
> GFP_KERNEL, there is no point in using GFP_ATOMIC here.
> And actually, this path only needs to avoid yielding when using a
> special debug mode.
> So I suggest putting here GFP_KERNEL.

Huh, I didn't even notice the GFP_ATOMIC here. This looks indeed
strange and GFP_KERNEL should be perfectly fine in a function that
also calls pin_user_pages (since that one can allocate and do worse
stuff like userspace pagefaults).

But since that GFP_ATOMIC is there already I'll do that in a separate patch.

> In the meanwhile, I'll run this patch (coupled with the next patch) in
> our C/I to make sure there are no regressions.

Excellent. I'll wait with v3 until that's done, just in case you hit a
snag I need to fix.

Cheers, Daniel

> Thanks,
> Oded
>
> >         if (rc < 0) {
> >                 dev_err(hdev->dev, "failed to create SG table from pages\n");
> > -               goto put_framevec;
> > +               goto put_pages;
> >         }
> >
> >         return 0;
> >
> > -put_framevec:
> > -       put_vaddr_frames(userptr->vec);
> > -destroy_framevec:
> > -       frame_vector_destroy(userptr->vec);
> > +put_pages:
> > +       unpin_user_pages(userptr->pages, npages);
> > +destroy_pages:
> > +       kvfree(userptr->pages);
> >         return rc;
> >  }
> >
> > @@ -1405,8 +1401,6 @@ int hl_pin_host_memory(struct hl_device *hdev, u64 addr, u64 size,
> >   */
> >  void hl_unpin_host_memory(struct hl_device *hdev, struct hl_userptr *userptr)
> >  {
> > -       struct page **pages;
> > -
> >         hl_debugfs_remove_userptr(hdev, userptr);
> >
> >         if (userptr->dma_mapped)
> > @@ -1414,15 +1408,8 @@ void hl_unpin_host_memory(struct hl_device *hdev, struct hl_userptr *userptr)
> >                                                         userptr->sgt->nents,
> >                                                         userptr->dir);
> >
> > -       pages = frame_vector_pages(userptr->vec);
> > -       if (!IS_ERR(pages)) {
> > -               int i;
> > -
> > -               for (i = 0; i < frame_vector_count(userptr->vec); i++)
> > -                       set_page_dirty_lock(pages[i]);
> > -       }
> > -       put_vaddr_frames(userptr->vec);
> > -       frame_vector_destroy(userptr->vec);
> > +       unpin_user_pages_dirty_lock(userptr->pages, userptr->npages, true);
> > +       kvfree(userptr->pages);
> >
> >         list_del(&userptr->job_node);
> >
> > --
> > 2.28.0
> >



--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 09/17] mm: Add unsafe_follow_pfn
  2020-10-10 17:22             ` Tomasz Figa
@ 2020-10-10 21:35               ` Laurent Pinchart
  2020-10-10 21:50                 ` Daniel Vetter
  0 siblings, 1 reply; 69+ messages in thread
From: Laurent Pinchart @ 2020-10-10 21:35 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: Daniel Vetter, linux-s390, linux-samsung-soc, Jan Kara,
	Kees Cook, KVM list, Linux MM, Mauro Carvalho Chehab,
	John Hubbard, LKML, DRI Development, Jason Gunthorpe,
	Jérôme Glisse, Daniel Vetter, Dan Williams,
	Linus Torvalds, Andrew Morton, Linux ARM,
	open list:DMA BUFFER SHARING FRAMEWORK

Hi Tomasz,

On Sat, Oct 10, 2020 at 07:22:48PM +0200, Tomasz Figa wrote:
> On Fri, Oct 9, 2020 at 7:52 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> > On Fri, Oct 9, 2020 at 2:48 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > > On Fri, Oct 09, 2020 at 02:37:23PM +0200, Mauro Carvalho Chehab wrote:
> > >
> > > > I'm not a mm/ expert, but, from what I understood from Daniel's patch
> > > > description is that this is unsafe *only if*  __GFP_MOVABLE is used.
> > >
> > > No, it is unconditionally unsafe. The CMA movable mappings are
> > > specific VMAs that will have bad issues here, but there are other
> > > types too.
> > >
> > > The only way to do something at a VMA level is to have a list of OK
> > > VMAs, eg because they were creatd via a special mmap helper from the
> > > media subsystem.
> > >
> > > > Well, no drivers inside the media subsystem uses such flag, although
> > > > they may rely on some infrastructure that could be using it behind
> > > > the bars.
> > >
> > > It doesn't matter, nothing prevents the user from calling media APIs
> > > on mmaps it gets from other subsystems.
> >
> > I think a good first step would be to disable userptr of non struct
> > page backed storage going forward for any new hw support. Even on
> > existing drivers. dma-buf sharing has been around for long enough now
> > that this shouldn't be a problem. Unfortunately right now this doesn't
> > seem to exist, so the entire problem keeps getting perpetuated.
> >
> > > > If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE
> > > > flag that it would be denying the core mm code to set __GFP_MOVABLE.
> > >
> > > We can't tell from the VMA these kinds of details..
> > >
> > > It has to go the other direction, evey mmap that might be used as a
> > > userptr here has to be found and the VMA specially created to allow
> > > its use. At least that is a kernel only change, but will need people
> > > with the HW to do this work.
> >
> > I think the only reasonable way to keep this working is:
> > - add a struct dma_buf *vma_tryget_dma_buf(struct vm_area_struct *vma);
> > - add dma-buf export support to fbdev and v4l
> 
> I assume you mean V4L2 and not the obsolete V4L that is emulated in
> the userspace by libv4l. If so, every video device that uses videobuf2
> gets DMA-buf export for free and there is nothing needed to enable it.
> 
> We probably still have a few legacy drivers using videobuf (non-2),
> but IMHO those should be safe to put behind some disabled-by-default
> Kconfig symbol or even completely drop, as the legacy framework has
> been deprecated for many years already.

There's 8 drivers left, and they support a very large number of devices.
I expect unhappy users distros stop shipping them. On the other hand,
videobuf has been deprecated for a loooooooong time, so there has been
plenty of time to convert the remaining drivers to videobuf2. If nobody
can do it, then we'll have to drop support for these devices given the
security issues.

We have moved media drivers to staging in the past when there wasn't
enough maintenance effort, we could do so here too.

> > - roll this out everywhere we still need it.
> >
> > Realistically this just isn't going to happen. And anything else just
> > reimplements half of dma-buf, which is kinda pointless (you need
> > minimally refcounting and some way to get at a promise of a permanent
> > sg list for dma. Plus probably the vmap for kernel cpu access.
> >
> > > > Please let address the issue on this way, instead of broken an
> > > > userspace API that it is there since 1991.
> > >
> > > It has happened before :( It took 4 years for RDMA to undo the uAPI
> > > breakage caused by a security fix for something that was a 15 years
> > > old bug.
> >
> > Yeah we have a bunch of these on the drm side too. Some of them are
> > really just "you have to upgrade userspace", and there's no real fix
> > for the security nightmare without that.
> 
> I think we need to phase out such userspace indeed. The Kconfig symbol
> allows enabling the unsafe functionality for anyone who still needs
> it, so I think it's not entirely a breakage.

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 03/17] misc/habana: Stop using frame_vector helpers
  2020-10-10 21:32     ` Daniel Vetter
@ 2020-10-10 21:41       ` Daniel Vetter
  2020-10-10 21:47         ` Oded Gabbay
  0 siblings, 1 reply; 69+ messages in thread
From: Daniel Vetter @ 2020-10-10 21:41 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: DRI Development, LKML, KVM list, linux-mm,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>,,
	linux-samsung-soc, Linux Media Mailing List, linux-s390,
	Daniel Vetter, Jason Gunthorpe, Andrew Morton, John Hubbard,
	Jérôme Glisse, Jan Kara, Dan Williams, Omer Shpigelman,
	Ofir Bitton, Tomer Tayar, Moti Haimovski, Greg Kroah-Hartman,
	Pawel Piskorski

On Sat, Oct 10, 2020 at 11:32 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
>
> On Sat, Oct 10, 2020 at 10:27 PM Oded Gabbay <oded.gabbay@gmail.com> wrote:
> >
> > On Fri, Oct 9, 2020 at 10:59 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> > >
> > > All we need are a pages array, pin_user_pages_fast can give us that
> > > directly. Plus this avoids the entire raw pfn side of get_vaddr_frames.
> > >
> > Thanks for the patch Daniel.
> >
> > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > > Cc: Jason Gunthorpe <jgg@ziepe.ca>
> > > Cc: Andrew Morton <akpm@linux-foundation.org>
> > > Cc: John Hubbard <jhubbard@nvidia.com>
> > > Cc: Jérôme Glisse <jglisse@redhat.com>
> > > Cc: Jan Kara <jack@suse.cz>
> > > Cc: Dan Williams <dan.j.williams@intel.com>
> > > Cc: linux-mm@kvack.org
> > > Cc: linux-arm-kernel@lists.infradead.org
> > > Cc: linux-samsung-soc@vger.kernel.org
> > > Cc: linux-media@vger.kernel.org
> > > Cc: Oded Gabbay <oded.gabbay@gmail.com>
> > > Cc: Omer Shpigelman <oshpigelman@habana.ai>
> > > Cc: Ofir Bitton <obitton@habana.ai>
> > > Cc: Tomer Tayar <ttayar@habana.ai>
> > > Cc: Moti Haimovski <mhaimovski@habana.ai>
> > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > > Cc: Pawel Piskorski <ppiskorski@habana.ai>
> > > --
> > > v2: Use unpin_user_pages_dirty_lock (John)
> > > ---
> > >  drivers/misc/habanalabs/Kconfig             |  1 -
> > >  drivers/misc/habanalabs/common/habanalabs.h |  3 +-
> > >  drivers/misc/habanalabs/common/memory.c     | 49 ++++++++-------------
> > >  3 files changed, 20 insertions(+), 33 deletions(-)
> > >
> > > diff --git a/drivers/misc/habanalabs/Kconfig b/drivers/misc/habanalabs/Kconfig
> > > index 8eb5d38c618e..2f04187f7167 100644
> > > --- a/drivers/misc/habanalabs/Kconfig
> > > +++ b/drivers/misc/habanalabs/Kconfig
> > > @@ -6,7 +6,6 @@
> > >  config HABANA_AI
> > >         tristate "HabanaAI accelerators (habanalabs)"
> > >         depends on PCI && HAS_IOMEM
> > > -       select FRAME_VECTOR
> > >         select DMA_SHARED_BUFFER
> > >         select GENERIC_ALLOCATOR
> > >         select HWMON
> > > diff --git a/drivers/misc/habanalabs/common/habanalabs.h b/drivers/misc/habanalabs/common/habanalabs.h
> > > index edbd627b29d2..c1b3ad613b15 100644
> > > --- a/drivers/misc/habanalabs/common/habanalabs.h
> > > +++ b/drivers/misc/habanalabs/common/habanalabs.h
> > > @@ -881,7 +881,8 @@ struct hl_ctx_mgr {
> > >  struct hl_userptr {
> > >         enum vm_type_t          vm_type; /* must be first */
> > >         struct list_head        job_node;
> > > -       struct frame_vector     *vec;
> > > +       struct page             **pages;
> > > +       unsigned int            npages;
> > Can you please update the kerneldoc comment section of this structure
> > according to your changes ?
>
> Apologies I missed the nice kerneldoc. I'll fix that in the next round.
>
>
> > >         struct sg_table         *sgt;
> > >         enum dma_data_direction dir;
> > >         struct list_head        debugfs_list;
> > > diff --git a/drivers/misc/habanalabs/common/memory.c b/drivers/misc/habanalabs/common/memory.c
> > > index 5ff4688683fd..327b64479f97 100644
> > > --- a/drivers/misc/habanalabs/common/memory.c
> > > +++ b/drivers/misc/habanalabs/common/memory.c
> > > @@ -1281,45 +1281,41 @@ static int get_user_memory(struct hl_device *hdev, u64 addr, u64 size,
> > >                 return -EFAULT;
> > >         }
> > >
> > > -       userptr->vec = frame_vector_create(npages);
> > > -       if (!userptr->vec) {
> > > +       userptr->pages = kvmalloc_array(npages, sizeof(*userptr->pages),
> > > +                                       GFP_KERNEL);
> > > +       if (!userptr->pages) {
> > >                 dev_err(hdev->dev, "Failed to create frame vector\n");
> > >                 return -ENOMEM;
> > >         }
> > >
> > > -       rc = get_vaddr_frames(start, npages, FOLL_FORCE | FOLL_WRITE,
> > > -                               userptr->vec);
> > > +       rc = pin_user_pages_fast(start, npages, FOLL_FORCE | FOLL_WRITE,
> > > +                                userptr->pages);
> > >
> > >         if (rc != npages) {
> > >                 dev_err(hdev->dev,
> > >                         "Failed to map host memory, user ptr probably wrong\n");
> > >                 if (rc < 0)
> > > -                       goto destroy_framevec;
> > > +                       goto destroy_pages;
> > > +               npages = rc;
> > >                 rc = -EFAULT;
> > > -               goto put_framevec;
> > > -       }
> > > -
> > > -       if (frame_vector_to_pages(userptr->vec) < 0) {
> > > -               dev_err(hdev->dev,
> > > -                       "Failed to translate frame vector to pages\n");
> > > -               rc = -EFAULT;
> > > -               goto put_framevec;
> > > +               goto put_pages;
> > >         }
> > > +       userptr->npages = npages;
> > >
> > >         rc = sg_alloc_table_from_pages(userptr->sgt,
> > > -                                       frame_vector_pages(userptr->vec),
> > > -                                       npages, offset, size, GFP_ATOMIC);
> > > +                                      userptr->pages,
> > > +                                      npages, offset, size, GFP_ATOMIC);
> > I think that because the call to kvmalloc_array() is done with
> > GFP_KERNEL, there is no point in using GFP_ATOMIC here.
> > And actually, this path only needs to avoid yielding when using a
> > special debug mode.
> > So I suggest putting here GFP_KERNEL.
>
> Huh, I didn't even notice the GFP_ATOMIC here. This looks indeed
> strange and GFP_KERNEL should be perfectly fine in a function that
> also calls pin_user_pages (since that one can allocate and do worse
> stuff like userspace pagefaults).
>
> But since that GFP_ATOMIC is there already I'll do that in a separate patch.

Ok I read up on your usage of GFP_ATOMIC in habanalabs, and I'm not
going to touch this. But I'm pretty sure it's broken.

You seem to have some requirement of not allocating memory with
blocking (see hl_cb_alloc()), and that seems to be way you allocate
tons of structures with GFP_ATOMIC. There's 2 pretty tough problems
with that:
- GFP_ATOMIC can fail, even when the system hasn't run out of memory
yet. You _must_ have a fallback back to handle allocation failures for
these. Quick survey shows you a ton of GFP_ATOMIC callsites, and very
little fallback code - I've found none, but I didn't check the failure
handlers all going up the possible callchains.
- pin_user_pages can allocate memory, so you're breaking your own "no
sleeping in these paths" rules.

This isn't going to get fixed with a quick oneliner patch, depending
what's needed you're looking at a driver rearchitecture here :-/ Hence
I'm not going to touch this in the next patch, but leave it all as-is.

Cheers, Daniel

>
> > In the meanwhile, I'll run this patch (coupled with the next patch) in
> > our C/I to make sure there are no regressions.
>
> Excellent. I'll wait with v3 until that's done, just in case you hit a
> snag I need to fix.
>
> Cheers, Daniel
>
> > Thanks,
> > Oded
> >
> > >         if (rc < 0) {
> > >                 dev_err(hdev->dev, "failed to create SG table from pages\n");
> > > -               goto put_framevec;
> > > +               goto put_pages;
> > >         }
> > >
> > >         return 0;
> > >
> > > -put_framevec:
> > > -       put_vaddr_frames(userptr->vec);
> > > -destroy_framevec:
> > > -       frame_vector_destroy(userptr->vec);
> > > +put_pages:
> > > +       unpin_user_pages(userptr->pages, npages);
> > > +destroy_pages:
> > > +       kvfree(userptr->pages);
> > >         return rc;
> > >  }
> > >
> > > @@ -1405,8 +1401,6 @@ int hl_pin_host_memory(struct hl_device *hdev, u64 addr, u64 size,
> > >   */
> > >  void hl_unpin_host_memory(struct hl_device *hdev, struct hl_userptr *userptr)
> > >  {
> > > -       struct page **pages;
> > > -
> > >         hl_debugfs_remove_userptr(hdev, userptr);
> > >
> > >         if (userptr->dma_mapped)
> > > @@ -1414,15 +1408,8 @@ void hl_unpin_host_memory(struct hl_device *hdev, struct hl_userptr *userptr)
> > >                                                         userptr->sgt->nents,
> > >                                                         userptr->dir);
> > >
> > > -       pages = frame_vector_pages(userptr->vec);
> > > -       if (!IS_ERR(pages)) {
> > > -               int i;
> > > -
> > > -               for (i = 0; i < frame_vector_count(userptr->vec); i++)
> > > -                       set_page_dirty_lock(pages[i]);
> > > -       }
> > > -       put_vaddr_frames(userptr->vec);
> > > -       frame_vector_destroy(userptr->vec);
> > > +       unpin_user_pages_dirty_lock(userptr->pages, userptr->npages, true);
> > > +       kvfree(userptr->pages);
> > >
> > >         list_del(&userptr->job_node);
> > >
> > > --
> > > 2.28.0
> > >
>
>
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 03/17] misc/habana: Stop using frame_vector helpers
  2020-10-10 21:41       ` Daniel Vetter
@ 2020-10-10 21:47         ` Oded Gabbay
  0 siblings, 0 replies; 69+ messages in thread
From: Oded Gabbay @ 2020-10-10 21:47 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: DRI Development, LKML, KVM list, linux-mm,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>,,
	linux-samsung-soc, Linux Media Mailing List, linux-s390,
	Daniel Vetter, Jason Gunthorpe, Andrew Morton, John Hubbard,
	Jérôme Glisse, Jan Kara, Dan Williams, Omer Shpigelman,
	Ofir Bitton, Tomer Tayar, Moti Haimovski, Greg Kroah-Hartman,
	Pawel Piskorski

On Sun, Oct 11, 2020 at 12:41 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
>
> On Sat, Oct 10, 2020 at 11:32 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> >
> > On Sat, Oct 10, 2020 at 10:27 PM Oded Gabbay <oded.gabbay@gmail.com> wrote:
> > >
> > > On Fri, Oct 9, 2020 at 10:59 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> > > >
> > > > All we need are a pages array, pin_user_pages_fast can give us that
> > > > directly. Plus this avoids the entire raw pfn side of get_vaddr_frames.
> > > >
> > > Thanks for the patch Daniel.
> > >
> > > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > > > Cc: Jason Gunthorpe <jgg@ziepe.ca>
> > > > Cc: Andrew Morton <akpm@linux-foundation.org>
> > > > Cc: John Hubbard <jhubbard@nvidia.com>
> > > > Cc: Jérôme Glisse <jglisse@redhat.com>
> > > > Cc: Jan Kara <jack@suse.cz>
> > > > Cc: Dan Williams <dan.j.williams@intel.com>
> > > > Cc: linux-mm@kvack.org
> > > > Cc: linux-arm-kernel@lists.infradead.org
> > > > Cc: linux-samsung-soc@vger.kernel.org
> > > > Cc: linux-media@vger.kernel.org
> > > > Cc: Oded Gabbay <oded.gabbay@gmail.com>
> > > > Cc: Omer Shpigelman <oshpigelman@habana.ai>
> > > > Cc: Ofir Bitton <obitton@habana.ai>
> > > > Cc: Tomer Tayar <ttayar@habana.ai>
> > > > Cc: Moti Haimovski <mhaimovski@habana.ai>
> > > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > > > Cc: Pawel Piskorski <ppiskorski@habana.ai>
> > > > --
> > > > v2: Use unpin_user_pages_dirty_lock (John)
> > > > ---
> > > >  drivers/misc/habanalabs/Kconfig             |  1 -
> > > >  drivers/misc/habanalabs/common/habanalabs.h |  3 +-
> > > >  drivers/misc/habanalabs/common/memory.c     | 49 ++++++++-------------
> > > >  3 files changed, 20 insertions(+), 33 deletions(-)
> > > >
> > > > diff --git a/drivers/misc/habanalabs/Kconfig b/drivers/misc/habanalabs/Kconfig
> > > > index 8eb5d38c618e..2f04187f7167 100644
> > > > --- a/drivers/misc/habanalabs/Kconfig
> > > > +++ b/drivers/misc/habanalabs/Kconfig
> > > > @@ -6,7 +6,6 @@
> > > >  config HABANA_AI
> > > >         tristate "HabanaAI accelerators (habanalabs)"
> > > >         depends on PCI && HAS_IOMEM
> > > > -       select FRAME_VECTOR
> > > >         select DMA_SHARED_BUFFER
> > > >         select GENERIC_ALLOCATOR
> > > >         select HWMON
> > > > diff --git a/drivers/misc/habanalabs/common/habanalabs.h b/drivers/misc/habanalabs/common/habanalabs.h
> > > > index edbd627b29d2..c1b3ad613b15 100644
> > > > --- a/drivers/misc/habanalabs/common/habanalabs.h
> > > > +++ b/drivers/misc/habanalabs/common/habanalabs.h
> > > > @@ -881,7 +881,8 @@ struct hl_ctx_mgr {
> > > >  struct hl_userptr {
> > > >         enum vm_type_t          vm_type; /* must be first */
> > > >         struct list_head        job_node;
> > > > -       struct frame_vector     *vec;
> > > > +       struct page             **pages;
> > > > +       unsigned int            npages;
> > > Can you please update the kerneldoc comment section of this structure
> > > according to your changes ?
> >
> > Apologies I missed the nice kerneldoc. I'll fix that in the next round.
> >
> >
> > > >         struct sg_table         *sgt;
> > > >         enum dma_data_direction dir;
> > > >         struct list_head        debugfs_list;
> > > > diff --git a/drivers/misc/habanalabs/common/memory.c b/drivers/misc/habanalabs/common/memory.c
> > > > index 5ff4688683fd..327b64479f97 100644
> > > > --- a/drivers/misc/habanalabs/common/memory.c
> > > > +++ b/drivers/misc/habanalabs/common/memory.c
> > > > @@ -1281,45 +1281,41 @@ static int get_user_memory(struct hl_device *hdev, u64 addr, u64 size,
> > > >                 return -EFAULT;
> > > >         }
> > > >
> > > > -       userptr->vec = frame_vector_create(npages);
> > > > -       if (!userptr->vec) {
> > > > +       userptr->pages = kvmalloc_array(npages, sizeof(*userptr->pages),
> > > > +                                       GFP_KERNEL);
> > > > +       if (!userptr->pages) {
> > > >                 dev_err(hdev->dev, "Failed to create frame vector\n");
> > > >                 return -ENOMEM;
> > > >         }
> > > >
> > > > -       rc = get_vaddr_frames(start, npages, FOLL_FORCE | FOLL_WRITE,
> > > > -                               userptr->vec);
> > > > +       rc = pin_user_pages_fast(start, npages, FOLL_FORCE | FOLL_WRITE,
> > > > +                                userptr->pages);
> > > >
> > > >         if (rc != npages) {
> > > >                 dev_err(hdev->dev,
> > > >                         "Failed to map host memory, user ptr probably wrong\n");
> > > >                 if (rc < 0)
> > > > -                       goto destroy_framevec;
> > > > +                       goto destroy_pages;
> > > > +               npages = rc;
> > > >                 rc = -EFAULT;
> > > > -               goto put_framevec;
> > > > -       }
> > > > -
> > > > -       if (frame_vector_to_pages(userptr->vec) < 0) {
> > > > -               dev_err(hdev->dev,
> > > > -                       "Failed to translate frame vector to pages\n");
> > > > -               rc = -EFAULT;
> > > > -               goto put_framevec;
> > > > +               goto put_pages;
> > > >         }
> > > > +       userptr->npages = npages;
> > > >
> > > >         rc = sg_alloc_table_from_pages(userptr->sgt,
> > > > -                                       frame_vector_pages(userptr->vec),
> > > > -                                       npages, offset, size, GFP_ATOMIC);
> > > > +                                      userptr->pages,
> > > > +                                      npages, offset, size, GFP_ATOMIC);
> > > I think that because the call to kvmalloc_array() is done with
> > > GFP_KERNEL, there is no point in using GFP_ATOMIC here.
> > > And actually, this path only needs to avoid yielding when using a
> > > special debug mode.
> > > So I suggest putting here GFP_KERNEL.
> >
> > Huh, I didn't even notice the GFP_ATOMIC here. This looks indeed
> > strange and GFP_KERNEL should be perfectly fine in a function that
> > also calls pin_user_pages (since that one can allocate and do worse
> > stuff like userspace pagefaults).
> >
> > But since that GFP_ATOMIC is there already I'll do that in a separate patch.
>
> Ok I read up on your usage of GFP_ATOMIC in habanalabs, and I'm not
> going to touch this. But I'm pretty sure it's broken.
>
> You seem to have some requirement of not allocating memory with
> blocking (see hl_cb_alloc()), and that seems to be way you allocate
> tons of structures with GFP_ATOMIC. There's 2 pretty tough problems
> with that:
> - GFP_ATOMIC can fail, even when the system hasn't run out of memory
> yet. You _must_ have a fallback back to handle allocation failures for
> these. Quick survey shows you a ton of GFP_ATOMIC callsites, and very
> little fallback code - I've found none, but I didn't check the failure
> handlers all going up the possible callchains.
> - pin_user_pages can allocate memory, so you're breaking your own "no
> sleeping in these paths" rules.
>
> This isn't going to get fixed with a quick oneliner patch, depending
> what's needed you're looking at a driver rearchitecture here :-/ Hence
> I'm not going to touch this in the next patch, but leave it all as-is.
>
Most of those requirements come from code that is only relevant in
initial bringup and in our first ASIC (GOYA) for the first few months
we had it.
I'm going to remove all that code from the upstream driver as it's not
needed there.

Then, I'll go and look at all other uses of GFP_ATOMIC to see how they
can be improved/removed, maybe with pre-allocated stuff.
Thanks for pointing this out.
Oded

> Cheers, Daniel
>
> >
> > > In the meanwhile, I'll run this patch (coupled with the next patch) in
> > > our C/I to make sure there are no regressions.
> >
> > Excellent. I'll wait with v3 until that's done, just in case you hit a
> > snag I need to fix.
> >
> > Cheers, Daniel
> >
> > > Thanks,
> > > Oded
> > >
> > > >         if (rc < 0) {
> > > >                 dev_err(hdev->dev, "failed to create SG table from pages\n");
> > > > -               goto put_framevec;
> > > > +               goto put_pages;
> > > >         }
> > > >
> > > >         return 0;
> > > >
> > > > -put_framevec:
> > > > -       put_vaddr_frames(userptr->vec);
> > > > -destroy_framevec:
> > > > -       frame_vector_destroy(userptr->vec);
> > > > +put_pages:
> > > > +       unpin_user_pages(userptr->pages, npages);
> > > > +destroy_pages:
> > > > +       kvfree(userptr->pages);
> > > >         return rc;
> > > >  }
> > > >
> > > > @@ -1405,8 +1401,6 @@ int hl_pin_host_memory(struct hl_device *hdev, u64 addr, u64 size,
> > > >   */
> > > >  void hl_unpin_host_memory(struct hl_device *hdev, struct hl_userptr *userptr)
> > > >  {
> > > > -       struct page **pages;
> > > > -
> > > >         hl_debugfs_remove_userptr(hdev, userptr);
> > > >
> > > >         if (userptr->dma_mapped)
> > > > @@ -1414,15 +1408,8 @@ void hl_unpin_host_memory(struct hl_device *hdev, struct hl_userptr *userptr)
> > > >                                                         userptr->sgt->nents,
> > > >                                                         userptr->dir);
> > > >
> > > > -       pages = frame_vector_pages(userptr->vec);
> > > > -       if (!IS_ERR(pages)) {
> > > > -               int i;
> > > > -
> > > > -               for (i = 0; i < frame_vector_count(userptr->vec); i++)
> > > > -                       set_page_dirty_lock(pages[i]);
> > > > -       }
> > > > -       put_vaddr_frames(userptr->vec);
> > > > -       frame_vector_destroy(userptr->vec);
> > > > +       unpin_user_pages_dirty_lock(userptr->pages, userptr->npages, true);
> > > > +       kvfree(userptr->pages);
> > > >
> > > >         list_del(&userptr->job_node);
> > > >
> > > > --
> > > > 2.28.0
> > > >
> >
> >
> >
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch
>
>
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 09/17] mm: Add unsafe_follow_pfn
  2020-10-10 21:35               ` Laurent Pinchart
@ 2020-10-10 21:50                 ` Daniel Vetter
  2020-10-11  6:27                   ` Mauro Carvalho Chehab
  0 siblings, 1 reply; 69+ messages in thread
From: Daniel Vetter @ 2020-10-10 21:50 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: Tomasz Figa, linux-s390, linux-samsung-soc, Jan Kara, Kees Cook,
	KVM list, Linux MM, Mauro Carvalho Chehab, John Hubbard, LKML,
	DRI Development, Jason Gunthorpe, Jérôme Glisse,
	Daniel Vetter, Dan Williams, Linus Torvalds, Andrew Morton,
	Linux ARM, open list:DMA BUFFER SHARING FRAMEWORK

On Sat, Oct 10, 2020 at 11:36 PM Laurent Pinchart
<laurent.pinchart@ideasonboard.com> wrote:
>
> Hi Tomasz,
>
> On Sat, Oct 10, 2020 at 07:22:48PM +0200, Tomasz Figa wrote:
> > On Fri, Oct 9, 2020 at 7:52 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> > > On Fri, Oct 9, 2020 at 2:48 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > > > On Fri, Oct 09, 2020 at 02:37:23PM +0200, Mauro Carvalho Chehab wrote:
> > > >
> > > > > I'm not a mm/ expert, but, from what I understood from Daniel's patch
> > > > > description is that this is unsafe *only if*  __GFP_MOVABLE is used.
> > > >
> > > > No, it is unconditionally unsafe. The CMA movable mappings are
> > > > specific VMAs that will have bad issues here, but there are other
> > > > types too.
> > > >
> > > > The only way to do something at a VMA level is to have a list of OK
> > > > VMAs, eg because they were creatd via a special mmap helper from the
> > > > media subsystem.
> > > >
> > > > > Well, no drivers inside the media subsystem uses such flag, although
> > > > > they may rely on some infrastructure that could be using it behind
> > > > > the bars.
> > > >
> > > > It doesn't matter, nothing prevents the user from calling media APIs
> > > > on mmaps it gets from other subsystems.
> > >
> > > I think a good first step would be to disable userptr of non struct
> > > page backed storage going forward for any new hw support. Even on
> > > existing drivers. dma-buf sharing has been around for long enough now
> > > that this shouldn't be a problem. Unfortunately right now this doesn't
> > > seem to exist, so the entire problem keeps getting perpetuated.
> > >
> > > > > If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE
> > > > > flag that it would be denying the core mm code to set __GFP_MOVABLE.
> > > >
> > > > We can't tell from the VMA these kinds of details..
> > > >
> > > > It has to go the other direction, evey mmap that might be used as a
> > > > userptr here has to be found and the VMA specially created to allow
> > > > its use. At least that is a kernel only change, but will need people
> > > > with the HW to do this work.
> > >
> > > I think the only reasonable way to keep this working is:
> > > - add a struct dma_buf *vma_tryget_dma_buf(struct vm_area_struct *vma);
> > > - add dma-buf export support to fbdev and v4l
> >
> > I assume you mean V4L2 and not the obsolete V4L that is emulated in
> > the userspace by libv4l. If so, every video device that uses videobuf2
> > gets DMA-buf export for free and there is nothing needed to enable it.

Yeah. And I missed that v4l2 added dma-buf export too.

> > We probably still have a few legacy drivers using videobuf (non-2),
> > but IMHO those should be safe to put behind some disabled-by-default
> > Kconfig symbol or even completely drop, as the legacy framework has
> > been deprecated for many years already.
>
> There's 8 drivers left, and they support a very large number of devices.
> I expect unhappy users distros stop shipping them. On the other hand,
> videobuf has been deprecated for a loooooooong time, so there has been
> plenty of time to convert the remaining drivers to videobuf2. If nobody
> can do it, then we'll have to drop support for these devices given the
> security issues.

Again, the issue here is _only_ with follow_pfn. For videobuf1 this
means videbuf-dma-contig.c userptr support is broken. Unlike videobuf2
it means it's broken for all usage (not just zero-copy userptr),
because videbuf-dma-contig.c lacks the pin_user_pages path. But that
would be easy to add if this poses a  problem I think - we just need
to carry over the pin_user_pages_fast logic from videbuf2, no driver
changes required. But of course I don't think we should do that before
someone reports the regression, since videobuf1 userptr is doubly
deprecated :-)

Everything else keeps working with videobuf1 with my patch series. So
depending upon which videobuf1 implementations these 8 drivers use,
you might not even have any real breakage there.

> We have moved media drivers to staging in the past when there wasn't
> enough maintenance effort, we could do so here too.

I'm not breaking the world with this, it's really very minimal
use-case. At least as far as I'm understanding the entire media
subsystem here.
-Daniel

> > > - roll this out everywhere we still need it.
> > >
> > > Realistically this just isn't going to happen. And anything else just
> > > reimplements half of dma-buf, which is kinda pointless (you need
> > > minimally refcounting and some way to get at a promise of a permanent
> > > sg list for dma. Plus probably the vmap for kernel cpu access.
> > >
> > > > > Please let address the issue on this way, instead of broken an
> > > > > userspace API that it is there since 1991.
> > > >
> > > > It has happened before :( It took 4 years for RDMA to undo the uAPI
> > > > breakage caused by a security fix for something that was a 15 years
> > > > old bug.
> > >
> > > Yeah we have a bunch of these on the drm side too. Some of them are
> > > really just "you have to upgrade userspace", and there's no real fix
> > > for the security nightmare without that.
> >
> > I think we need to phase out such userspace indeed. The Kconfig symbol
> > allows enabling the unsafe functionality for anyone who still needs
> > it, so I think it's not entirely a breakage.
>
> --
> Regards,
>
> Laurent Pinchart



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 09/17] mm: Add unsafe_follow_pfn
  2020-10-10 21:50                 ` Daniel Vetter
@ 2020-10-11  6:27                   ` Mauro Carvalho Chehab
  2020-10-11  6:36                     ` Mauro Carvalho Chehab
  0 siblings, 1 reply; 69+ messages in thread
From: Mauro Carvalho Chehab @ 2020-10-11  6:27 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Laurent Pinchart, Tomasz Figa, linux-s390, linux-samsung-soc,
	Jan Kara, Kees Cook, KVM list, Linux MM, John Hubbard, LKML,
	DRI Development, Jason Gunthorpe, Jérôme Glisse,
	Daniel Vetter, Dan Williams, Linus Torvalds, Andrew Morton,
	Linux ARM, open list:DMA BUFFER SHARING FRAMEWORK, Hans Verkuil,
	Lad, Prabhakar

Em Sat, 10 Oct 2020 23:50:27 +0200
Daniel Vetter <daniel.vetter@ffwll.ch> escreveu:

> On Sat, Oct 10, 2020 at 11:36 PM Laurent Pinchart
> <laurent.pinchart@ideasonboard.com> wrote:
> >

> > > We probably still have a few legacy drivers using videobuf (non-2),
> > > but IMHO those should be safe to put behind some disabled-by-default
> > > Kconfig symbol or even completely drop, as the legacy framework has
> > > been deprecated for many years already.  
> >
> > There's 8 drivers left, and they support a very large number of devices.
> > I expect unhappy users distros stop shipping them. On the other hand,
> > videobuf has been deprecated for a loooooooong time, so there has been
> > plenty of time to convert the remaining drivers to videobuf2. If nobody
> > can do it, then we'll have to drop support for these devices given the
> > security issues.  
> 
> Again, the issue here is _only_ with follow_pfn. For videobuf1 this
> means videbuf-dma-contig.c userptr support is broken. Unlike videobuf2
> it means it's broken for all usage (not just zero-copy userptr),
> because videbuf-dma-contig.c lacks the pin_user_pages path.

Well, follow_pfn() is used only by videbuf-dma-contig.c. If this is 
the only part of VB1 that will have userptr broken, then there's
just one driver that might be affected: davinci.

Yet, taking a deeper look:

	$ git grep include drivers/media/platform/davinci/|grep -i videobuf
	drivers/media/platform/davinci/vpif_capture.h:#include <media/videobuf2-dma-contig.h>
	drivers/media/platform/davinci/vpif_display.h:#include <media/videobuf2-dma-contig.h>

It sounds to me that it was already converted to VB2, but some VB1
symbols were not converted at its Kconfig.

It sounds to me that there are other drivers with some VB1 left overs
at Kconfig, as those are the only ones using VB1 those days:

	$ for i in $(git grep media/videobuf drivers |grep -v videobuf2 |grep -v v4l2-core|cut -d: -f1); do dirname $i; done|sort|uniq
	drivers/media/pci/bt8xx
	drivers/media/pci/cx18
	drivers/media/platform
	drivers/media/usb/tm6000
	drivers/media/usb/zr364xx
	drivers/staging/media/atomisp/pci

> But that
> would be easy to add if this poses a  problem I think - we just need
> to carry over the pin_user_pages_fast logic from videbuf2, no driver
> changes required. But of course I don't think we should do that before
> someone reports the regression, since videobuf1 userptr is doubly
> deprecated :-)

I think otherwise. Keeping a broken component at the Kernel is 
a bad idea. 

Yet, from my quick search above, it sounds to me that it is time for 
us to retire the VB1 DMA contig support as a hole, as there's no client 
for it anymore.

I'll work on some patches cleaning up the VB1 left overs at
Kconfig and removing videbuf-dma-contig.c for good, if there's
no hidden dependency on it.


Thanks,
Mauro

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 09/17] mm: Add unsafe_follow_pfn
  2020-10-11  6:27                   ` Mauro Carvalho Chehab
@ 2020-10-11  6:36                     ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 69+ messages in thread
From: Mauro Carvalho Chehab @ 2020-10-11  6:36 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Laurent Pinchart, Tomasz Figa, linux-s390, linux-samsung-soc,
	Jan Kara, Kees Cook, KVM list, Linux MM, John Hubbard, LKML,
	DRI Development, Jason Gunthorpe, Jérôme Glisse,
	Daniel Vetter, Dan Williams, Linus Torvalds, Andrew Morton,
	Linux ARM, open list:DMA BUFFER SHARING FRAMEWORK, Hans Verkuil,
	Lad, Prabhakar

Em Sun, 11 Oct 2020 08:27:41 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> escreveu:

> Em Sat, 10 Oct 2020 23:50:27 +0200
> Daniel Vetter <daniel.vetter@ffwll.ch> escreveu:
> 
> > On Sat, Oct 10, 2020 at 11:36 PM Laurent Pinchart
> > <laurent.pinchart@ideasonboard.com> wrote:
> > >
> 
> > > > We probably still have a few legacy drivers using videobuf (non-2),
> > > > but IMHO those should be safe to put behind some disabled-by-default
> > > > Kconfig symbol or even completely drop, as the legacy framework has
> > > > been deprecated for many years already.  
> > >
> > > There's 8 drivers left, and they support a very large number of devices.
> > > I expect unhappy users distros stop shipping them. On the other hand,
> > > videobuf has been deprecated for a loooooooong time, so there has been
> > > plenty of time to convert the remaining drivers to videobuf2. If nobody
> > > can do it, then we'll have to drop support for these devices given the
> > > security issues.  
> > 
> > Again, the issue here is _only_ with follow_pfn. For videobuf1 this
> > means videbuf-dma-contig.c userptr support is broken. Unlike videobuf2
> > it means it's broken for all usage (not just zero-copy userptr),
> > because videbuf-dma-contig.c lacks the pin_user_pages path.
> 
> Well, follow_pfn() is used only by videbuf-dma-contig.c. If this is 
> the only part of VB1 that will have userptr broken, then there's
> just one driver that might be affected: davinci.
> 
> Yet, taking a deeper look:
> 
> 	$ git grep include drivers/media/platform/davinci/|grep -i videobuf
> 	drivers/media/platform/davinci/vpif_capture.h:#include <media/videobuf2-dma-contig.h>
> 	drivers/media/platform/davinci/vpif_display.h:#include <media/videobuf2-dma-contig.h>
> 
> It sounds to me that it was already converted to VB2, but some VB1
> symbols were not converted at its Kconfig.
> 
> It sounds to me that there are other drivers with some VB1 left overs
> at Kconfig, as those are the only ones using VB1 those days:
> 
> 	$ for i in $(git grep media/videobuf drivers |grep -v videobuf2 |grep -v v4l2-core|cut -d: -f1); do dirname $i; done|sort|uniq
> 	drivers/media/pci/bt8xx
> 	drivers/media/pci/cx18
> 	drivers/media/platform
> 	drivers/media/usb/tm6000
> 	drivers/media/usb/zr364xx
> 	drivers/staging/media/atomisp/pci

This is incomplete. There are two drivers that include videobuf
indirectly:

	include/media/davinci/vpfe_capture.h
	include/media/drv-intf/saa7146_vv.h

I double-checked that DaVinci still uses VB1. There are 
actually two clients for videbuf-dma-contig.c: davinci and fsl-viu.

Those two will be affected, if we don't add pin_user_pages_fast()
support to VB1 or convert them to VB2.

> 
> > But that
> > would be easy to add if this poses a  problem I think - we just need
> > to carry over the pin_user_pages_fast logic from videbuf2, no driver
> > changes required. But of course I don't think we should do that before
> > someone reports the regression, since videobuf1 userptr is doubly
> > deprecated :-)
> 
> I think otherwise. Keeping a broken component at the Kernel is 
> a bad idea. 
> 
> Yet, from my quick search above, it sounds to me that it is time for 
> us to retire the VB1 DMA contig support as a hole, as there's no client 
> for it anymore.
> 
> I'll work on some patches cleaning up the VB1 left overs at
> Kconfig and removing videbuf-dma-contig.c for good, if there's
> no hidden dependency on it.
> 
> 
> Thanks,
> Mauro



Thanks,
Mauro

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 09/17] mm: Add unsafe_follow_pfn
  2020-10-09 12:48         ` Jason Gunthorpe
  2020-10-09 17:52           ` Daniel Vetter
@ 2020-10-12 10:46           ` Marek Szyprowski
  2020-10-12 13:49             ` Daniel Vetter
  1 sibling, 1 reply; 69+ messages in thread
From: Marek Szyprowski @ 2020-10-12 10:46 UTC (permalink / raw)
  To: Jason Gunthorpe, Mauro Carvalho Chehab
  Cc: Daniel Vetter, DRI Development, LKML, kvm, linux-mm,
	linux-arm-kernel, linux-samsung-soc, linux-media, linux-s390,
	Daniel Vetter, Kees Cook, Dan Williams, Andrew Morton,
	John Hubbard, Jérôme Glisse, Jan Kara, Linus Torvalds

Hi Jason,

On 09.10.2020 14:48, Jason Gunthorpe wrote:
> On Fri, Oct 09, 2020 at 02:37:23PM +0200, Mauro Carvalho Chehab wrote:
>
>> I'm not a mm/ expert, but, from what I understood from Daniel's patch
>> description is that this is unsafe *only if*  __GFP_MOVABLE is used.
> No, it is unconditionally unsafe. The CMA movable mappings are
> specific VMAs that will have bad issues here, but there are other
> types too.

I'm trying to follow this thread, but I really wonder what do you mean 
by CMA movable mappings? If a buffer has been allocated from CMA and 
used for DMA, it won't be moved in the memory. It will stay at the same 
physical memory address all the time until freed by the owner. It just a 
matter of proper usage count tracking to delay freeing if it is still 
used somewhere.

Best regards

-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 09/17] mm: Add unsafe_follow_pfn
  2020-10-12 10:46           ` Marek Szyprowski
@ 2020-10-12 13:49             ` Daniel Vetter
  0 siblings, 0 replies; 69+ messages in thread
From: Daniel Vetter @ 2020-10-12 13:49 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Jason Gunthorpe, Mauro Carvalho Chehab, linux-s390,
	linux-samsung-soc, Jan Kara, Kees Cook, KVM list, LKML,
	DRI Development, Linux MM, Jérôme Glisse, John Hubbard,
	Daniel Vetter, Dan Williams, Linus Torvalds, Andrew Morton,
	Linux ARM, open list:DMA BUFFER SHARING FRAMEWORK

On Mon, Oct 12, 2020 at 12:47 PM Marek Szyprowski
<m.szyprowski@samsung.com> wrote:
>
> Hi Jason,
>
> On 09.10.2020 14:48, Jason Gunthorpe wrote:
> > On Fri, Oct 09, 2020 at 02:37:23PM +0200, Mauro Carvalho Chehab wrote:
> >
> >> I'm not a mm/ expert, but, from what I understood from Daniel's patch
> >> description is that this is unsafe *only if*  __GFP_MOVABLE is used.
> > No, it is unconditionally unsafe. The CMA movable mappings are
> > specific VMAs that will have bad issues here, but there are other
> > types too.
>
> I'm trying to follow this thread, but I really wonder what do you mean
> by CMA movable mappings? If a buffer has been allocated from CMA and
> used for DMA, it won't be moved in the memory. It will stay at the same
> physical memory address all the time until freed by the owner. It just a
> matter of proper usage count tracking to delay freeing if it is still
> used somewhere.

 Yup. The problem is that this usage count tracking doesn't exist. And
drivers could at least in theory treat CMA like vram and swap buffers
in&out of it, so just refcounting the userspace vma isn't enough. In
practice, right now, it might be enough for CMA drivers though (but
there's more that's possible here).
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 08/17] s390/pci: Remove races against pte updates
  2020-10-09  7:59 ` [PATCH v2 08/17] s390/pci: Remove races against pte updates Daniel Vetter
@ 2020-10-12 14:03   ` Niklas Schnelle
  2020-10-12 14:19     ` Daniel Vetter
  0 siblings, 1 reply; 69+ messages in thread
From: Niklas Schnelle @ 2020-10-12 14:03 UTC (permalink / raw)
  To: Daniel Vetter, DRI Development, LKML
  Cc: kvm, linux-mm, linux-arm-kernel, linux-samsung-soc, linux-media,
	linux-s390, Gerald Schaefer, Daniel Vetter, Jason Gunthorpe,
	Dan Williams, Kees Cook, Andrew Morton, John Hubbard,
	Jérôme Glisse, Jan Kara

Hi Daniel,

freshly back from my vacation I've just taken a look at your patch.
First thanks for this fix and the detailed commit description.
Definitely makes sense to fix this and you can add my

Acked-by: Niklas Schnelle <schnelle@linux.ibm.com>

Content wise it all looks sane and clear and since Gerald did the testing,
I would have applied it to our tree already, but I got some trivial
checkpatch violations that probably apply to the whole series.
I've commented them inline below.
If you confirm there I can do the fixups when applying or you can resend.

On 10/9/20 9:59 AM, Daniel Vetter wrote:
> Way back it was a reasonable assumptions that iomem mappings never
> change the pfn range they point at. But this has changed:
> 
> - gpu drivers dynamically manage their memory nowadays, invalidating
> ptes with unmap_mapping_range when buffers get moved
> 
> - contiguous dma allocations have moved from dedicated carvetouts to
> cma regions. This means if we miss the unmap the pfn might contain
> pagecache or anon memory (well anything allocated with GFP_MOVEABLE)
> 
> - even /dev/mem now invalidates mappings when the kernel requests that
> iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87

The above commit mention should use the format
'commit 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims the region")'
otherwise this results in a checkpatch ERROR.

> ("/dev/mem: Revoke mappings when a driver claims the region")
> 
> Accessing pfns obtained from ptes without holding all the locks is
> therefore no longer a good idea. Fix this.
> 
> Since zpci_memcpy_from|toio seems to not do anything nefarious with
> locks we just need to open code get_pfn and follow_pfn and make sure
> we drop the locks only after we've done. The write function also needs

just a typo but just saw it "we're" instead of "we've"

> the copy_from_user move, since we can't take userspace faults while
> holding the mmap sem.
> 
> Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
> 
No empty line after the Revied-by tag.

> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>

Your Signed-off-by mail address does not match the one you're sending from,
this yields a checkpatch warning when using git am with your mail.
This is probably just a silly misconfiguration but since Signed-offs
are signatures should I change this to 
"Daniel Vetter <daniel.vetter@ffwll.ch>" which is the one you're
sending from and also in the MAINTAINERS file?


> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dan Williams <dan.j.williams@intel.com>

The above Cc: line for Dan Williams is a duplicate

> Cc: linux-mm@kvack.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-samsung-soc@vger.kernel.org
> Cc: linux-media@vger.kernel.org
> Cc: Niklas Schnelle <schnelle@linux.ibm.com>
> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
> Cc: linux-s390@vger.kernel.org
> --
> v2: Move VM_IO | VM_PFNMAP checks around so they keep returning EINVAL
> like before (Gerard)

I think the above should go before the CC/Signed-off/Reviewev block.

> ---
>  arch/s390/pci/pci_mmio.c | 98 +++++++++++++++++++++++-----------------
>  1 file changed, 57 insertions(+), 41 deletions(-)
> 
> diff --git a/arch/s390/pci/pci_mmio.c b/arch/s390/pci/pci_mmio.c
> index 401cf670a243..1a6adbc68ee8 100644
> --- a/arch/s390/pci/pci_mmio.c
> +++ b/arch/s390/pci/pci_mmio.c
> @@ -119,33 +119,15 @@ static inline int __memcpy_toio_inuser(void __iomem *dst,
>  	return rc;
>  }
>  
> -static long get_pfn(unsigned long user_addr, unsigned long access,
> -		    unsigned long *pfn)
> -{
> -	struct vm_area_struct *vma;
> -	long ret;
> -
> -	mmap_read_lock(current->mm);
> -	ret = -EINVAL;
> -	vma = find_vma(current->mm, user_addr);
> -	if (!vma)
> -		goto out;
> -	ret = -EACCES;
> -	if (!(vma->vm_flags & access))
> -		goto out;
> -	ret = follow_pfn(vma, user_addr, pfn);
> -out:
> -	mmap_read_unlock(current->mm);
> -	return ret;
> -}
> -
>  SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, mmio_addr,
>  		const void __user *, user_buffer, size_t, length)
>  {
>  	u8 local_buf[64];
>  	void __iomem *io_addr;
>  	void *buf;
> -	unsigned long pfn;
> +	struct vm_area_struct *vma;
> +	pte_t *ptep;
> +	spinlock_t *ptl;

With checkpatch.pl --strict the above yields a complained
"CHECK: spinlock_t definition without comment" but I think
that's really okay since your commit description is very clear.
Same oin line 277.

... snip ...

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 08/17] s390/pci: Remove races against pte updates
  2020-10-12 14:03   ` Niklas Schnelle
@ 2020-10-12 14:19     ` Daniel Vetter
  2020-10-12 14:39       ` Niklas Schnelle
  2020-10-21  7:55       ` Niklas Schnelle
  0 siblings, 2 replies; 69+ messages in thread
From: Daniel Vetter @ 2020-10-12 14:19 UTC (permalink / raw)
  To: Niklas Schnelle
  Cc: Daniel Vetter, DRI Development, LKML, kvm, linux-mm,
	linux-arm-kernel, linux-samsung-soc, linux-media, linux-s390,
	Gerald Schaefer, Daniel Vetter, Jason Gunthorpe, Dan Williams,
	Kees Cook, Andrew Morton, John Hubbard, Jérôme Glisse,
	Jan Kara

On Mon, Oct 12, 2020 at 04:03:28PM +0200, Niklas Schnelle wrote:
> Hi Daniel,
> 
> freshly back from my vacation I've just taken a look at your patch.
> First thanks for this fix and the detailed commit description.
> Definitely makes sense to fix this and you can add my
> 
> Acked-by: Niklas Schnelle <schnelle@linux.ibm.com>
> 
> Content wise it all looks sane and clear and since Gerald did the testing,
> I would have applied it to our tree already, but I got some trivial
> checkpatch violations that probably apply to the whole series.
> I've commented them inline below.
> If you confirm there I can do the fixups when applying or you can resend.
> 
> On 10/9/20 9:59 AM, Daniel Vetter wrote:
> > Way back it was a reasonable assumptions that iomem mappings never
> > change the pfn range they point at. But this has changed:
> > 
> > - gpu drivers dynamically manage their memory nowadays, invalidating
> > ptes with unmap_mapping_range when buffers get moved
> > 
> > - contiguous dma allocations have moved from dedicated carvetouts to
> > cma regions. This means if we miss the unmap the pfn might contain
> > pagecache or anon memory (well anything allocated with GFP_MOVEABLE)
> > 
> > - even /dev/mem now invalidates mappings when the kernel requests that
> > iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
> 
> The above commit mention should use the format
> 'commit 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims the region")'
> otherwise this results in a checkpatch ERROR.
> 
> > ("/dev/mem: Revoke mappings when a driver claims the region")
> > 
> > Accessing pfns obtained from ptes without holding all the locks is
> > therefore no longer a good idea. Fix this.
> > 
> > Since zpci_memcpy_from|toio seems to not do anything nefarious with
> > locks we just need to open code get_pfn and follow_pfn and make sure
> > we drop the locks only after we've done. The write function also needs
> 
> just a typo but just saw it "we're" instead of "we've"
> 
> > the copy_from_user move, since we can't take userspace faults while
> > holding the mmap sem.
> > 
> > Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
> > 
> No empty line after the Revied-by tag.
> 
> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> 
> Your Signed-off-by mail address does not match the one you're sending from,
> this yields a checkpatch warning when using git am with your mail.
> This is probably just a silly misconfiguration but since Signed-offs
> are signatures should I change this to 
> "Daniel Vetter <daniel.vetter@ffwll.ch>" which is the one you're
> sending from and also in the MAINTAINERS file?
> 
> 
> > Cc: Jason Gunthorpe <jgg@ziepe.ca>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: Kees Cook <keescook@chromium.org>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: John Hubbard <jhubbard@nvidia.com>
> > Cc: Jérôme Glisse <jglisse@redhat.com>
> > Cc: Jan Kara <jack@suse.cz>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> 
> The above Cc: line for Dan Williams is a duplicate
> 
> > Cc: linux-mm@kvack.org
> > Cc: linux-arm-kernel@lists.infradead.org
> > Cc: linux-samsung-soc@vger.kernel.org
> > Cc: linux-media@vger.kernel.org
> > Cc: Niklas Schnelle <schnelle@linux.ibm.com>
> > Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
> > Cc: linux-s390@vger.kernel.org
> > --
> > v2: Move VM_IO | VM_PFNMAP checks around so they keep returning EINVAL
> > like before (Gerard)
> 
> I think the above should go before the CC/Signed-off/Reviewev block.

This is a per-subsystem bikeshed :-) drivers/gpu definitely wants it
above, but most core subsystems want it below. I'll move it.

> > ---
> >  arch/s390/pci/pci_mmio.c | 98 +++++++++++++++++++++++-----------------
> >  1 file changed, 57 insertions(+), 41 deletions(-)
> > 
> > diff --git a/arch/s390/pci/pci_mmio.c b/arch/s390/pci/pci_mmio.c
> > index 401cf670a243..1a6adbc68ee8 100644
> > --- a/arch/s390/pci/pci_mmio.c
> > +++ b/arch/s390/pci/pci_mmio.c
> > @@ -119,33 +119,15 @@ static inline int __memcpy_toio_inuser(void __iomem *dst,
> >  	return rc;
> >  }
> >  
> > -static long get_pfn(unsigned long user_addr, unsigned long access,
> > -		    unsigned long *pfn)
> > -{
> > -	struct vm_area_struct *vma;
> > -	long ret;
> > -
> > -	mmap_read_lock(current->mm);
> > -	ret = -EINVAL;
> > -	vma = find_vma(current->mm, user_addr);
> > -	if (!vma)
> > -		goto out;
> > -	ret = -EACCES;
> > -	if (!(vma->vm_flags & access))
> > -		goto out;
> > -	ret = follow_pfn(vma, user_addr, pfn);
> > -out:
> > -	mmap_read_unlock(current->mm);
> > -	return ret;
> > -}
> > -
> >  SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, mmio_addr,
> >  		const void __user *, user_buffer, size_t, length)
> >  {
> >  	u8 local_buf[64];
> >  	void __iomem *io_addr;
> >  	void *buf;
> > -	unsigned long pfn;
> > +	struct vm_area_struct *vma;
> > +	pte_t *ptep;
> > +	spinlock_t *ptl;
> 
> With checkpatch.pl --strict the above yields a complained
> "CHECK: spinlock_t definition without comment" but I think
> that's really okay since your commit description is very clear.
> Same oin line 277.

I think this is a falls positive, checkpatch doesn't realize that
SYSCALL_DEFINE3 is a function, not a structure. And in a structure I'd
have added the kerneldoc or comment.

I'll fix up all the nits you've found for the next round. Thanks for
taking a look.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 08/17] s390/pci: Remove races against pte updates
  2020-10-12 14:19     ` Daniel Vetter
@ 2020-10-12 14:39       ` Niklas Schnelle
  2020-10-21  7:55       ` Niklas Schnelle
  1 sibling, 0 replies; 69+ messages in thread
From: Niklas Schnelle @ 2020-10-12 14:39 UTC (permalink / raw)
  To: DRI Development, LKML, kvm, linux-mm, linux-arm-kernel,
	linux-samsung-soc, linux-media, linux-s390, Gerald Schaefer,
	Daniel Vetter, Jason Gunthorpe, Dan Williams, Kees Cook,
	Andrew Morton, John Hubbard, Jérôme Glisse, Jan Kara

... snip ...
>>> Cc: linux-media@vger.kernel.org
>>> Cc: Niklas Schnelle <schnelle@linux.ibm.com>
>>> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
>>> Cc: linux-s390@vger.kernel.org
>>> --
>>> v2: Move VM_IO | VM_PFNMAP checks around so they keep returning EINVAL
>>> like before (Gerard)
>>
>> I think the above should go before the CC/Signed-off/Reviewev block.
> 
> This is a per-subsystem bikeshed :-) drivers/gpu definitely wants it
> above, but most core subsystems want it below. I'll move it.

Today I learned, thanks! That said I think most of the time I've
actually not seen version change information in the commit message itself
only in the cover letters. I really don't care just looked odd to me.

> 
>>> ---
>>>  arch/s390/pci/pci_mmio.c | 98 +++++++++++++++++++++++-----------------
>>>  1 file changed, 57 insertions(+), 41 deletions(-)
>>>
>>> diff --git a/arch/s390/pci/pci_mmio.c b/arch/s390/pci/pci_mmio.c
>>> index 401cf670a243..1a6adbc68ee8 100644
>>> --- a/arch/s390/pci/pci_mmio.c
>>> +++ b/arch/s390/pci/pci_mmio.c
>>> @@ -119,33 +119,15 @@ static inline int __memcpy_toio_inuser(void __iomem *dst,
>>>  	return rc;
>>>  }
>>>  
>>> -static long get_pfn(unsigned long user_addr, unsigned long access,
>>> -		    unsigned long *pfn)
>>> -{
>>> -	struct vm_area_struct *vma;
>>> -	long ret;
>>> -
>>> -	mmap_read_lock(current->mm);
>>> -	ret = -EINVAL;
>>> -	vma = find_vma(current->mm, user_addr);
>>> -	if (!vma)
>>> -		goto out;
>>> -	ret = -EACCES;
>>> -	if (!(vma->vm_flags & access))
>>> -		goto out;
>>> -	ret = follow_pfn(vma, user_addr, pfn);
>>> -out:
>>> -	mmap_read_unlock(current->mm);
>>> -	return ret;
>>> -}
>>> -
>>>  SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, mmio_addr,
>>>  		const void __user *, user_buffer, size_t, length)
>>>  {
>>>  	u8 local_buf[64];
>>>  	void __iomem *io_addr;
>>>  	void *buf;
>>> -	unsigned long pfn;
>>> +	struct vm_area_struct *vma;
>>> +	pte_t *ptep;
>>> +	spinlock_t *ptl;
>>
>> With checkpatch.pl --strict the above yields a complained
>> "CHECK: spinlock_t definition without comment" but I think
>> that's really okay since your commit description is very clear.
>> Same oin line 277.
> 
> I think this is a falls positive, checkpatch doesn't realize that
> SYSCALL_DEFINE3 is a function, not a structure. And in a structure I'd
> have added the kerneldoc or comment.

Interesting, your theory sounds convincing, I too thought this
was a bit too pedantic.

> 
> I'll fix up all the nits you've found for the next round. Thanks for
> taking a look.

You're welcome hope I didn't sound pedantic. I think you've a lot
more experience actually and this can indeed turn into bikeshedding
but since I was answering anyway and most of this was checkpatch…

> -Daniel
> 

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 14/17] resource: Move devmem revoke code to resource framework
  2020-10-09 18:28         ` Dan Williams
@ 2020-10-15  0:09           ` Jason Gunthorpe
  2020-10-15  7:52             ` Daniel Vetter
  2020-10-15 15:29             ` Daniel Vetter
  0 siblings, 2 replies; 69+ messages in thread
From: Jason Gunthorpe @ 2020-10-15  0:09 UTC (permalink / raw)
  To: Dan Williams
  Cc: Daniel Vetter, DRI Development, LKML, KVM list, Linux MM,
	Linux ARM, linux-samsung-soc,
	open list:DMA BUFFER SHARING FRAMEWORK, linux-s390,
	Daniel Vetter, Kees Cook, Andrew Morton, John Hubbard,
	Jérôme Glisse, Jan Kara, Arnd Bergmann,
	Greg Kroah-Hartman, David Hildenbrand, Rafael J. Wysocki

On Fri, Oct 09, 2020 at 11:28:54AM -0700, Dan Williams wrote:
> On Fri, Oct 9, 2020 at 7:32 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Fri, Oct 09, 2020 at 04:24:45PM +0200, Daniel Vetter wrote:
> > > On Fri, Oct 9, 2020 at 2:31 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > > >
> > > > On Fri, Oct 09, 2020 at 09:59:31AM +0200, Daniel Vetter wrote:
> > > >
> > > > > +struct address_space *iomem_get_mapping(void)
> > > > > +{
> > > > > +     return iomem_inode->i_mapping;
> > > >
> > > > This should pair an acquire with the release below
> > > >
> > > > > +     /*
> > > > > +      * Publish /dev/mem initialized.
> > > > > +      * Pairs with smp_load_acquire() in revoke_iomem().
> > > > > +      */
> > > > > +     smp_store_release(&iomem_inode, inode);
> > > >
> > > > However, this seems abnormal, initcalls rarely do this kind of stuff
> > > > with global data..
> > > >
> > > > The kernel crashes if this fs_initcall is raced with
> > > > iomem_get_mapping() due to the unconditional dereference, so I think
> > > > it can be safely switched to a simple assignment.
> > >
> > > Ah yes I checked this all, but forgot to correctly annotate the
> > > iomem_get_mapping access. For reference, see b34e7e298d7a ("/dev/mem:
> > > Add missing memory barriers for devmem_inode").
> >
> > Oh yikes, so revoke_iomem can run concurrently during early boot,
> > tricky.
> 
> It runs early because request_mem_region() can run before fs_initcall.
> Rather than add an unnecessary lock just arrange for the revoke to be
> skipped before the inode is initialized. The expectation is that any
> early resource reservations will block future userspace mapping
> attempts.

Actually, on this point a simple WRITE_ONCE/READ_ONCE pairing is OK,
Paul once explained that the pointer chase on the READ_ONCE side is
required to be like an acquire - this is why rcu_dereference is just
READ_ONCE

Jason

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 14/17] resource: Move devmem revoke code to resource framework
  2020-10-15  0:09           ` Jason Gunthorpe
@ 2020-10-15  7:52             ` Daniel Vetter
  2020-10-15  7:55               ` Daniel Vetter
  2020-10-15 15:29             ` Daniel Vetter
  1 sibling, 1 reply; 69+ messages in thread
From: Daniel Vetter @ 2020-10-15  7:52 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Dan Williams, DRI Development, LKML, KVM list, Linux MM,
	Linux ARM, linux-samsung-soc,
	open list:DMA BUFFER SHARING FRAMEWORK, linux-s390,
	Daniel Vetter, Kees Cook, Andrew Morton, John Hubbard,
	Jérôme Glisse, Jan Kara, Arnd Bergmann,
	Greg Kroah-Hartman, David Hildenbrand, Rafael J. Wysocki

On Thu, Oct 15, 2020 at 2:09 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Fri, Oct 09, 2020 at 11:28:54AM -0700, Dan Williams wrote:
> > On Fri, Oct 9, 2020 at 7:32 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > >
> > > On Fri, Oct 09, 2020 at 04:24:45PM +0200, Daniel Vetter wrote:
> > > > On Fri, Oct 9, 2020 at 2:31 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > > > >
> > > > > On Fri, Oct 09, 2020 at 09:59:31AM +0200, Daniel Vetter wrote:
> > > > >
> > > > > > +struct address_space *iomem_get_mapping(void)
> > > > > > +{
> > > > > > +     return iomem_inode->i_mapping;
> > > > >
> > > > > This should pair an acquire with the release below
> > > > >
> > > > > > +     /*
> > > > > > +      * Publish /dev/mem initialized.
> > > > > > +      * Pairs with smp_load_acquire() in revoke_iomem().
> > > > > > +      */
> > > > > > +     smp_store_release(&iomem_inode, inode);
> > > > >
> > > > > However, this seems abnormal, initcalls rarely do this kind of stuff
> > > > > with global data..
> > > > >
> > > > > The kernel crashes if this fs_initcall is raced with
> > > > > iomem_get_mapping() due to the unconditional dereference, so I think
> > > > > it can be safely switched to a simple assignment.
> > > >
> > > > Ah yes I checked this all, but forgot to correctly annotate the
> > > > iomem_get_mapping access. For reference, see b34e7e298d7a ("/dev/mem:
> > > > Add missing memory barriers for devmem_inode").
> > >
> > > Oh yikes, so revoke_iomem can run concurrently during early boot,
> > > tricky.
> >
> > It runs early because request_mem_region() can run before fs_initcall.
> > Rather than add an unnecessary lock just arrange for the revoke to be
> > skipped before the inode is initialized. The expectation is that any
> > early resource reservations will block future userspace mapping
> > attempts.
>
> Actually, on this point a simple WRITE_ONCE/READ_ONCE pairing is OK,
> Paul once explained that the pointer chase on the READ_ONCE side is
> required to be like an acquire - this is why rcu_dereference is just
> READ_ONCE

Indeed this changed with the sm_read_barrier_depends() removal a year
ago. Before that READ_ONCE and rcu_dereference where not actually the
same. I guess I'll throw a patch on top to switch that over too.
-Daniel




--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 14/17] resource: Move devmem revoke code to resource framework
  2020-10-15  7:52             ` Daniel Vetter
@ 2020-10-15  7:55               ` Daniel Vetter
  0 siblings, 0 replies; 69+ messages in thread
From: Daniel Vetter @ 2020-10-15  7:55 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Dan Williams, DRI Development, LKML, KVM list, Linux MM,
	Linux ARM, linux-samsung-soc,
	open list:DMA BUFFER SHARING FRAMEWORK, linux-s390,
	Daniel Vetter, Kees Cook, Andrew Morton, John Hubbard,
	Jérôme Glisse, Jan Kara, Arnd Bergmann,
	Greg Kroah-Hartman, David Hildenbrand, Rafael J. Wysocki

On Thu, Oct 15, 2020 at 9:52 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
>
> On Thu, Oct 15, 2020 at 2:09 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Fri, Oct 09, 2020 at 11:28:54AM -0700, Dan Williams wrote:
> > > On Fri, Oct 9, 2020 at 7:32 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > > >
> > > > On Fri, Oct 09, 2020 at 04:24:45PM +0200, Daniel Vetter wrote:
> > > > > On Fri, Oct 9, 2020 at 2:31 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > > > > >
> > > > > > On Fri, Oct 09, 2020 at 09:59:31AM +0200, Daniel Vetter wrote:
> > > > > >
> > > > > > > +struct address_space *iomem_get_mapping(void)
> > > > > > > +{
> > > > > > > +     return iomem_inode->i_mapping;
> > > > > >
> > > > > > This should pair an acquire with the release below
> > > > > >
> > > > > > > +     /*
> > > > > > > +      * Publish /dev/mem initialized.
> > > > > > > +      * Pairs with smp_load_acquire() in revoke_iomem().
> > > > > > > +      */
> > > > > > > +     smp_store_release(&iomem_inode, inode);
> > > > > >
> > > > > > However, this seems abnormal, initcalls rarely do this kind of stuff
> > > > > > with global data..
> > > > > >
> > > > > > The kernel crashes if this fs_initcall is raced with
> > > > > > iomem_get_mapping() due to the unconditional dereference, so I think
> > > > > > it can be safely switched to a simple assignment.
> > > > >
> > > > > Ah yes I checked this all, but forgot to correctly annotate the
> > > > > iomem_get_mapping access. For reference, see b34e7e298d7a ("/dev/mem:
> > > > > Add missing memory barriers for devmem_inode").
> > > >
> > > > Oh yikes, so revoke_iomem can run concurrently during early boot,
> > > > tricky.
> > >
> > > It runs early because request_mem_region() can run before fs_initcall.
> > > Rather than add an unnecessary lock just arrange for the revoke to be
> > > skipped before the inode is initialized. The expectation is that any
> > > early resource reservations will block future userspace mapping
> > > attempts.
> >
> > Actually, on this point a simple WRITE_ONCE/READ_ONCE pairing is OK,
> > Paul once explained that the pointer chase on the READ_ONCE side is
> > required to be like an acquire - this is why rcu_dereference is just
> > READ_ONCE
>
> Indeed this changed with the sm_read_barrier_depends() removal a year
> ago. Before that READ_ONCE and rcu_dereference where not actually the
> same. I guess I'll throw a patch on top to switch that over too.

Actually 2019 landed just the clean-up, the read change landed in 2017 already:

commit 76ebbe78f7390aee075a7f3768af197ded1bdfbb
Author: Will Deacon <will@kernel.org>
Date:   Tue Oct 24 11:22:47 2017 +0100

   locking/barriers: Add implicit smp_read_barrier_depends() to READ_ONCE()

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 14/17] resource: Move devmem revoke code to resource framework
  2020-10-15  0:09           ` Jason Gunthorpe
  2020-10-15  7:52             ` Daniel Vetter
@ 2020-10-15 15:29             ` Daniel Vetter
  1 sibling, 0 replies; 69+ messages in thread
From: Daniel Vetter @ 2020-10-15 15:29 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Dan Williams, DRI Development, LKML, KVM list, Linux MM,
	Linux ARM, linux-samsung-soc,
	open list:DMA BUFFER SHARING FRAMEWORK, linux-s390,
	Daniel Vetter, Kees Cook, Andrew Morton, John Hubbard,
	Jérôme Glisse, Jan Kara, Arnd Bergmann,
	Greg Kroah-Hartman, David Hildenbrand, Rafael J. Wysocki

On Thu, Oct 15, 2020 at 2:09 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Fri, Oct 09, 2020 at 11:28:54AM -0700, Dan Williams wrote:
> > On Fri, Oct 9, 2020 at 7:32 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > >
> > > On Fri, Oct 09, 2020 at 04:24:45PM +0200, Daniel Vetter wrote:
> > > > On Fri, Oct 9, 2020 at 2:31 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > > > >
> > > > > On Fri, Oct 09, 2020 at 09:59:31AM +0200, Daniel Vetter wrote:
> > > > >
> > > > > > +struct address_space *iomem_get_mapping(void)
> > > > > > +{
> > > > > > +     return iomem_inode->i_mapping;
> > > > >
> > > > > This should pair an acquire with the release below
> > > > >
> > > > > > +     /*
> > > > > > +      * Publish /dev/mem initialized.
> > > > > > +      * Pairs with smp_load_acquire() in revoke_iomem().
> > > > > > +      */
> > > > > > +     smp_store_release(&iomem_inode, inode);
> > > > >
> > > > > However, this seems abnormal, initcalls rarely do this kind of stuff
> > > > > with global data..
> > > > >
> > > > > The kernel crashes if this fs_initcall is raced with
> > > > > iomem_get_mapping() due to the unconditional dereference, so I think
> > > > > it can be safely switched to a simple assignment.
> > > >
> > > > Ah yes I checked this all, but forgot to correctly annotate the
> > > > iomem_get_mapping access. For reference, see b34e7e298d7a ("/dev/mem:
> > > > Add missing memory barriers for devmem_inode").
> > >
> > > Oh yikes, so revoke_iomem can run concurrently during early boot,
> > > tricky.
> >
> > It runs early because request_mem_region() can run before fs_initcall.
> > Rather than add an unnecessary lock just arrange for the revoke to be
> > skipped before the inode is initialized. The expectation is that any
> > early resource reservations will block future userspace mapping
> > attempts.
>
> Actually, on this point a simple WRITE_ONCE/READ_ONCE pairing is OK,
> Paul once explained that the pointer chase on the READ_ONCE side is
> required to be like an acquire - this is why rcu_dereference is just
> READ_ONCE

Hm so WRITE_ONCE doesn't have any barriers, and we'd need that for
updating the pointer. That would leave things rather inconsistent, so
I think I'll just leave it as-is for symmetry reasons. None of this
code matters for performance anyway, so micro-optimizing barriers
seems a bit silly.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 01/17] drm/exynos: Stop using frame_vector helpers
  2020-10-09  7:59 ` [PATCH v2 01/17] drm/exynos: Stop using frame_vector helpers Daniel Vetter
@ 2020-10-16  7:42   ` John Hubbard
  0 siblings, 0 replies; 69+ messages in thread
From: John Hubbard @ 2020-10-16  7:42 UTC (permalink / raw)
  To: Daniel Vetter, DRI Development, LKML
  Cc: kvm, linux-mm, linux-arm-kernel, linux-samsung-soc, linux-media,
	linux-s390, Daniel Vetter, Jason Gunthorpe, Inki Dae,
	Joonyoung Shim, Seung-Woo Kim, Kyungmin Park, Kukjin Kim,
	Krzysztof Kozlowski, Andrew Morton, Jérôme Glisse,
	Jan Kara, Dan Williams

On 10/9/20 12:59 AM, Daniel Vetter wrote:
> All we need are a pages array, pin_user_pages_fast can give us that
> directly. Plus this avoids the entire raw pfn side of get_vaddr_frames.
> 
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Inki Dae <inki.dae@samsung.com>
> Cc: Joonyoung Shim <jy0922.shim@samsung.com>
> Cc: Seung-Woo Kim <sw0312.kim@samsung.com>
> Cc: Kyungmin Park <kyungmin.park@samsung.com>
> Cc: Kukjin Kim <kgene@kernel.org>
> Cc: Krzysztof Kozlowski <krzk@kernel.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: linux-mm@kvack.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-samsung-soc@vger.kernel.org
> Cc: linux-media@vger.kernel.org
> --
> v2: Use unpin_user_pages_dirty_lock (John)
> ---
>   drivers/gpu/drm/exynos/Kconfig          |  1 -
>   drivers/gpu/drm/exynos/exynos_drm_g2d.c | 47 +++++++++++--------------
>   2 files changed, 20 insertions(+), 28 deletions(-)
> 

Looks good.

Reviewed-by: John Hubbard <jhubbard@nvidia.com>

thanks,
-- 
John Hubbard
NVIDIA

> diff --git a/drivers/gpu/drm/exynos/Kconfig b/drivers/gpu/drm/exynos/Kconfig
> index 6417f374b923..43257ef3c09d 100644
> --- a/drivers/gpu/drm/exynos/Kconfig
> +++ b/drivers/gpu/drm/exynos/Kconfig
> @@ -88,7 +88,6 @@ comment "Sub-drivers"
>   config DRM_EXYNOS_G2D
>   	bool "G2D"
>   	depends on VIDEO_SAMSUNG_S5P_G2D=n || COMPILE_TEST
> -	select FRAME_VECTOR
>   	help
>   	  Choose this option if you want to use Exynos G2D for DRM.
>   
> diff --git a/drivers/gpu/drm/exynos/exynos_drm_g2d.c b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
> index 967a5cdc120e..ecede41af9b9 100644
> --- a/drivers/gpu/drm/exynos/exynos_drm_g2d.c
> +++ b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
> @@ -205,7 +205,8 @@ struct g2d_cmdlist_userptr {
>   	dma_addr_t		dma_addr;
>   	unsigned long		userptr;
>   	unsigned long		size;
> -	struct frame_vector	*vec;
> +	struct page		**pages;
> +	unsigned int		npages;
>   	struct sg_table		*sgt;
>   	atomic_t		refcount;
>   	bool			in_pool;
> @@ -378,7 +379,6 @@ static void g2d_userptr_put_dma_addr(struct g2d_data *g2d,
>   					bool force)
>   {
>   	struct g2d_cmdlist_userptr *g2d_userptr = obj;
> -	struct page **pages;
>   
>   	if (!obj)
>   		return;
> @@ -398,15 +398,9 @@ static void g2d_userptr_put_dma_addr(struct g2d_data *g2d,
>   	dma_unmap_sgtable(to_dma_dev(g2d->drm_dev), g2d_userptr->sgt,
>   			  DMA_BIDIRECTIONAL, 0);
>   
> -	pages = frame_vector_pages(g2d_userptr->vec);
> -	if (!IS_ERR(pages)) {
> -		int i;
> -
> -		for (i = 0; i < frame_vector_count(g2d_userptr->vec); i++)
> -			set_page_dirty_lock(pages[i]);
> -	}
> -	put_vaddr_frames(g2d_userptr->vec);
> -	frame_vector_destroy(g2d_userptr->vec);
> +	unpin_user_pages_dirty_lock(g2d_userptr->pages, g2d_userptr->npages,
> +				    true);
> +	kvfree(g2d_userptr->pages);
>   
>   	if (!g2d_userptr->out_of_list)
>   		list_del_init(&g2d_userptr->list);
> @@ -474,35 +468,34 @@ static dma_addr_t *g2d_userptr_get_dma_addr(struct g2d_data *g2d,
>   	offset = userptr & ~PAGE_MASK;
>   	end = PAGE_ALIGN(userptr + size);
>   	npages = (end - start) >> PAGE_SHIFT;
> -	g2d_userptr->vec = frame_vector_create(npages);
> -	if (!g2d_userptr->vec) {
> +	g2d_userptr->pages = kvmalloc_array(npages, sizeof(*g2d_userptr->pages),
> +					    GFP_KERNEL);
> +	if (!g2d_userptr->pages) {
>   		ret = -ENOMEM;
>   		goto err_free;
>   	}
>   
> -	ret = get_vaddr_frames(start, npages, FOLL_FORCE | FOLL_WRITE,
> -		g2d_userptr->vec);
> +	ret = pin_user_pages_fast(start, npages, FOLL_FORCE | FOLL_WRITE,
> +				  g2d_userptr->pages);
>   	if (ret != npages) {
>   		DRM_DEV_ERROR(g2d->dev,
>   			      "failed to get user pages from userptr.\n");
>   		if (ret < 0)
> -			goto err_destroy_framevec;
> -		ret = -EFAULT;
> -		goto err_put_framevec;
> -	}
> -	if (frame_vector_to_pages(g2d_userptr->vec) < 0) {
> +			goto err_destroy_pages;
> +		npages = ret;
>   		ret = -EFAULT;
> -		goto err_put_framevec;
> +		goto err_unpin_pages;
>   	}
> +	g2d_userptr->npages = npages;
>   
>   	sgt = kzalloc(sizeof(*sgt), GFP_KERNEL);
>   	if (!sgt) {
>   		ret = -ENOMEM;
> -		goto err_put_framevec;
> +		goto err_unpin_pages;
>   	}
>   
>   	ret = sg_alloc_table_from_pages(sgt,
> -					frame_vector_pages(g2d_userptr->vec),
> +					g2d_userptr->pages,
>   					npages, offset, size, GFP_KERNEL);
>   	if (ret < 0) {
>   		DRM_DEV_ERROR(g2d->dev, "failed to get sgt from pages.\n");
> @@ -538,11 +531,11 @@ static dma_addr_t *g2d_userptr_get_dma_addr(struct g2d_data *g2d,
>   err_free_sgt:
>   	kfree(sgt);
>   
> -err_put_framevec:
> -	put_vaddr_frames(g2d_userptr->vec);
> +err_unpin_pages:
> +	unpin_user_pages(g2d_userptr->pages, npages);
>   
> -err_destroy_framevec:
> -	frame_vector_destroy(g2d_userptr->vec);
> +err_destroy_pages:
> +	kvfree(g2d_userptr->pages);
>   
>   err_free:
>   	kfree(g2d_userptr);
> 


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 03/17] misc/habana: Stop using frame_vector helpers
  2020-10-09  7:59 ` [PATCH v2 03/17] misc/habana: Stop using frame_vector helpers Daniel Vetter
  2020-10-10 20:26   ` Oded Gabbay
@ 2020-10-16  7:45   ` John Hubbard
  1 sibling, 0 replies; 69+ messages in thread
From: John Hubbard @ 2020-10-16  7:45 UTC (permalink / raw)
  To: Daniel Vetter, DRI Development, LKML
  Cc: kvm, linux-mm, linux-arm-kernel, linux-samsung-soc, linux-media,
	linux-s390, Daniel Vetter, Jason Gunthorpe, Andrew Morton,
	Jérôme Glisse, Jan Kara, Dan Williams, Oded Gabbay,
	Omer Shpigelman, Ofir Bitton, Tomer Tayar, Moti Haimovski,
	Greg Kroah-Hartman, Pawel Piskorski

On 10/9/20 12:59 AM, Daniel Vetter wrote:
> All we need are a pages array, pin_user_pages_fast can give us that
> directly. Plus this avoids the entire raw pfn side of get_vaddr_frames.
> 
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: linux-mm@kvack.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-samsung-soc@vger.kernel.org
> Cc: linux-media@vger.kernel.org
> Cc: Oded Gabbay <oded.gabbay@gmail.com>
> Cc: Omer Shpigelman <oshpigelman@habana.ai>
> Cc: Ofir Bitton <obitton@habana.ai>
> Cc: Tomer Tayar <ttayar@habana.ai>
> Cc: Moti Haimovski <mhaimovski@habana.ai>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Pawel Piskorski <ppiskorski@habana.ai>
> --
> v2: Use unpin_user_pages_dirty_lock (John)
> ---
>   drivers/misc/habanalabs/Kconfig             |  1 -
>   drivers/misc/habanalabs/common/habanalabs.h |  3 +-
>   drivers/misc/habanalabs/common/memory.c     | 49 ++++++++-------------
>   3 files changed, 20 insertions(+), 33 deletions(-)

Reviewed-by: John Hubbard <jhubbard@nvidia.com>

thanks,
-- 
John Hubbard
NVIDIA

> 
> diff --git a/drivers/misc/habanalabs/Kconfig b/drivers/misc/habanalabs/Kconfig
> index 8eb5d38c618e..2f04187f7167 100644
> --- a/drivers/misc/habanalabs/Kconfig
> +++ b/drivers/misc/habanalabs/Kconfig
> @@ -6,7 +6,6 @@
>   config HABANA_AI
>   	tristate "HabanaAI accelerators (habanalabs)"
>   	depends on PCI && HAS_IOMEM
> -	select FRAME_VECTOR
>   	select DMA_SHARED_BUFFER
>   	select GENERIC_ALLOCATOR
>   	select HWMON
> diff --git a/drivers/misc/habanalabs/common/habanalabs.h b/drivers/misc/habanalabs/common/habanalabs.h
> index edbd627b29d2..c1b3ad613b15 100644
> --- a/drivers/misc/habanalabs/common/habanalabs.h
> +++ b/drivers/misc/habanalabs/common/habanalabs.h
> @@ -881,7 +881,8 @@ struct hl_ctx_mgr {
>   struct hl_userptr {
>   	enum vm_type_t		vm_type; /* must be first */
>   	struct list_head	job_node;
> -	struct frame_vector	*vec;
> +	struct page		**pages;
> +	unsigned int		npages;
>   	struct sg_table		*sgt;
>   	enum dma_data_direction dir;
>   	struct list_head	debugfs_list;
> diff --git a/drivers/misc/habanalabs/common/memory.c b/drivers/misc/habanalabs/common/memory.c
> index 5ff4688683fd..327b64479f97 100644
> --- a/drivers/misc/habanalabs/common/memory.c
> +++ b/drivers/misc/habanalabs/common/memory.c
> @@ -1281,45 +1281,41 @@ static int get_user_memory(struct hl_device *hdev, u64 addr, u64 size,
>   		return -EFAULT;
>   	}
>   
> -	userptr->vec = frame_vector_create(npages);
> -	if (!userptr->vec) {
> +	userptr->pages = kvmalloc_array(npages, sizeof(*userptr->pages),
> +					GFP_KERNEL);
> +	if (!userptr->pages) {
>   		dev_err(hdev->dev, "Failed to create frame vector\n");
>   		return -ENOMEM;
>   	}
>   
> -	rc = get_vaddr_frames(start, npages, FOLL_FORCE | FOLL_WRITE,
> -				userptr->vec);
> +	rc = pin_user_pages_fast(start, npages, FOLL_FORCE | FOLL_WRITE,
> +				 userptr->pages);
>   
>   	if (rc != npages) {
>   		dev_err(hdev->dev,
>   			"Failed to map host memory, user ptr probably wrong\n");
>   		if (rc < 0)
> -			goto destroy_framevec;
> +			goto destroy_pages;
> +		npages = rc;
>   		rc = -EFAULT;
> -		goto put_framevec;
> -	}
> -
> -	if (frame_vector_to_pages(userptr->vec) < 0) {
> -		dev_err(hdev->dev,
> -			"Failed to translate frame vector to pages\n");
> -		rc = -EFAULT;
> -		goto put_framevec;
> +		goto put_pages;
>   	}
> +	userptr->npages = npages;
>   
>   	rc = sg_alloc_table_from_pages(userptr->sgt,
> -					frame_vector_pages(userptr->vec),
> -					npages, offset, size, GFP_ATOMIC);
> +				       userptr->pages,
> +				       npages, offset, size, GFP_ATOMIC);
>   	if (rc < 0) {
>   		dev_err(hdev->dev, "failed to create SG table from pages\n");
> -		goto put_framevec;
> +		goto put_pages;
>   	}
>   
>   	return 0;
>   
> -put_framevec:
> -	put_vaddr_frames(userptr->vec);
> -destroy_framevec:
> -	frame_vector_destroy(userptr->vec);
> +put_pages:
> +	unpin_user_pages(userptr->pages, npages);
> +destroy_pages:
> +	kvfree(userptr->pages);
>   	return rc;
>   }
>   
> @@ -1405,8 +1401,6 @@ int hl_pin_host_memory(struct hl_device *hdev, u64 addr, u64 size,
>    */
>   void hl_unpin_host_memory(struct hl_device *hdev, struct hl_userptr *userptr)
>   {
> -	struct page **pages;
> -
>   	hl_debugfs_remove_userptr(hdev, userptr);
>   
>   	if (userptr->dma_mapped)
> @@ -1414,15 +1408,8 @@ void hl_unpin_host_memory(struct hl_device *hdev, struct hl_userptr *userptr)
>   							userptr->sgt->nents,
>   							userptr->dir);
>   
> -	pages = frame_vector_pages(userptr->vec);
> -	if (!IS_ERR(pages)) {
> -		int i;
> -
> -		for (i = 0; i < frame_vector_count(userptr->vec); i++)
> -			set_page_dirty_lock(pages[i]);
> -	}
> -	put_vaddr_frames(userptr->vec);
> -	frame_vector_destroy(userptr->vec);
> +	unpin_user_pages_dirty_lock(userptr->pages, userptr->npages, true);
> +	kvfree(userptr->pages);
>   
>   	list_del(&userptr->job_node);
>   
> 


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 05/17] mm/frame-vector: Use FOLL_LONGTERM
  2020-10-09  7:59 ` [PATCH v2 05/17] mm/frame-vector: Use FOLL_LONGTERM Daniel Vetter
@ 2020-10-16  7:54   ` John Hubbard
  2020-10-16  8:03     ` Daniel Vetter
  0 siblings, 1 reply; 69+ messages in thread
From: John Hubbard @ 2020-10-16  7:54 UTC (permalink / raw)
  To: Daniel Vetter, DRI Development, LKML
  Cc: kvm, linux-mm, linux-arm-kernel, linux-samsung-soc, linux-media,
	linux-s390, Daniel Vetter, Jason Gunthorpe, Pawel Osciak,
	Marek Szyprowski, Kyungmin Park, Tomasz Figa,
	Mauro Carvalho Chehab, Andrew Morton, Jérôme Glisse,
	Jan Kara, Dan Williams

On 10/9/20 12:59 AM, Daniel Vetter wrote:
...
> @@ -48,40 +47,25 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
>   
>   	start = untagged_addr(start);
>   
> -	mmap_read_lock(mm);
> -	locked = 1;
> -	vma = find_vma_intersection(mm, start, start + 1);
> -	if (!vma) {
> -		ret = -EFAULT;
> -		goto out;
> -	}
> -
> -	/*
> -	 * While get_vaddr_frames() could be used for transient (kernel
> -	 * controlled lifetime) pinning of memory pages all current
> -	 * users establish long term (userspace controlled lifetime)
> -	 * page pinning. Treat get_vaddr_frames() like
> -	 * get_user_pages_longterm() and disallow it for filesystem-dax
> -	 * mappings.
> -	 */
> -	if (vma_is_fsdax(vma)) {
> -		ret = -EOPNOTSUPP;
> -		goto out;
> -	}
> -
> -	if (!(vma->vm_flags & (VM_IO | VM_PFNMAP))) {
> +	ret = pin_user_pages_fast(start, nr_frames,
> +				  FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM,
> +				  (struct page **)(vec->ptrs));
> +	if (ret > 0) {

None of the callers that we have today will accept anything less than
ret == nr_frames. And the whole partially pinned region idea turns out
to be just not useful for almost everyone, from what I recall of the gup/pup
call sites. So I wonder if we should just have get_vaddr_frames do the
cleanup here and return -EFAULT, if ret != nr_frames ?

thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 05/17] mm/frame-vector: Use FOLL_LONGTERM
  2020-10-16  7:54   ` John Hubbard
@ 2020-10-16  8:03     ` Daniel Vetter
  0 siblings, 0 replies; 69+ messages in thread
From: Daniel Vetter @ 2020-10-16  8:03 UTC (permalink / raw)
  To: John Hubbard
  Cc: DRI Development, LKML, KVM list, Linux MM, Linux ARM,
	linux-samsung-soc, open list:DMA BUFFER SHARING FRAMEWORK,
	linux-s390, Daniel Vetter, Jason Gunthorpe, Pawel Osciak,
	Marek Szyprowski, Kyungmin Park, Tomasz Figa,
	Mauro Carvalho Chehab, Andrew Morton, Jérôme Glisse,
	Jan Kara, Dan Williams

On Fri, Oct 16, 2020 at 9:54 AM John Hubbard <jhubbard@nvidia.com> wrote:
>
> On 10/9/20 12:59 AM, Daniel Vetter wrote:
> ...
> > @@ -48,40 +47,25 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
> >
> >       start = untagged_addr(start);
> >
> > -     mmap_read_lock(mm);
> > -     locked = 1;
> > -     vma = find_vma_intersection(mm, start, start + 1);
> > -     if (!vma) {
> > -             ret = -EFAULT;
> > -             goto out;
> > -     }
> > -
> > -     /*
> > -      * While get_vaddr_frames() could be used for transient (kernel
> > -      * controlled lifetime) pinning of memory pages all current
> > -      * users establish long term (userspace controlled lifetime)
> > -      * page pinning. Treat get_vaddr_frames() like
> > -      * get_user_pages_longterm() and disallow it for filesystem-dax
> > -      * mappings.
> > -      */
> > -     if (vma_is_fsdax(vma)) {
> > -             ret = -EOPNOTSUPP;
> > -             goto out;
> > -     }
> > -
> > -     if (!(vma->vm_flags & (VM_IO | VM_PFNMAP))) {
> > +     ret = pin_user_pages_fast(start, nr_frames,
> > +                               FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM,
> > +                               (struct page **)(vec->ptrs));
> > +     if (ret > 0) {
>
> None of the callers that we have today will accept anything less than
> ret == nr_frames. And the whole partially pinned region idea turns out
> to be just not useful for almost everyone, from what I recall of the gup/pup
> call sites. So I wonder if we should just have get_vaddr_frames do the
> cleanup here and return -EFAULT, if ret != nr_frames ?

Yeah I noticed that the calling convention here is a bit funny. But I
with these frame-vector helpers now being part of drivers/media it's
up to media folks if they want to clean that up, or leave it as is.

If this would be in drm I'd say we'll have the loud warning and
tainting due to CONFIG_STRICT_FOLLOW_PFN=n for 2-3 years. Then
assuming no big complaints showed up, rip it all out and just directly
call pup in each place that wants it (like I've done for habanalabs
and exynos).
-Daniel


--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 08/17] s390/pci: Remove races against pte updates
  2020-10-12 14:19     ` Daniel Vetter
  2020-10-12 14:39       ` Niklas Schnelle
@ 2020-10-21  7:55       ` Niklas Schnelle
  2020-10-22  7:39         ` Daniel Vetter
  1 sibling, 1 reply; 69+ messages in thread
From: Niklas Schnelle @ 2020-10-21  7:55 UTC (permalink / raw)
  To: DRI Development, LKML, kvm, linux-mm, linux-arm-kernel,
	linux-samsung-soc, linux-media, linux-s390, Gerald Schaefer,
	Daniel Vetter, Jason Gunthorpe, Dan Williams, Kees Cook,
	Andrew Morton, John Hubbard, Jérôme Glisse, Jan Kara

Hi Daniel,

friendly ping. I haven't seen a new version of this patch series,
as I said I think your change for s390/pci is generally useful so
I'm curious, are you planning on sending a new version soon?
If you want you can also just sent this patch with the last few
nitpicks (primarily the mail address) fixed and I'll happily apply.

Best regards,
Niklas Schnelle

On 10/12/20 4:19 PM, Daniel Vetter wrote:
> On Mon, Oct 12, 2020 at 04:03:28PM +0200, Niklas Schnelle wrote:
... snip ....
>>> Cc: Jason Gunthorpe <jgg@ziepe.ca>
>>> Cc: Dan Williams <dan.j.williams@intel.com>
>>> Cc: Kees Cook <keescook@chromium.org>
>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>> Cc: John Hubbard <jhubbard@nvidia.com>
>>> Cc: Jérôme Glisse <jglisse@redhat.com>
>>> Cc: Jan Kara <jack@suse.cz>
>>> Cc: Dan Williams <dan.j.williams@intel.com>
>>
>> The above Cc: line for Dan Williams is a duplicate
>>
>>> Cc: linux-mm@kvack.org
>>> Cc: linux-arm-kernel@lists.infradead.org
>>> Cc: linux-samsung-soc@vger.kernel.org
>>> Cc: linux-media@vger.kernel.org
>>> Cc: Niklas Schnelle <schnelle@linux.ibm.com>
>>> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
>>> Cc: linux-s390@vger.kernel.org
>>> --
>>> v2: Move VM_IO | VM_PFNMAP checks around so they keep returning EINVAL
>>> like before (Gerard)
>>
>> I think the above should go before the CC/Signed-off/Reviewev block.
> 
> This is a per-subsystem bikeshed :-) drivers/gpu definitely wants it
> above, but most core subsystems want it below. I'll move it.
> 
>>> ---
>>>  arch/s390/pci/pci_mmio.c | 98 +++++++++++++++++++++++-----------------
>>>  1 file changed, 57 insertions(+), 41 deletions(-)
>>>
>>> diff --git a/arch/s390/pci/pci_mmio.c b/arch/s390/pci/pci_mmio.c
>>> index 401cf670a243..1a6adbc68ee8 100644
>>> --- a/arch/s390/pci/pci_mmio.c
>>> +++ b/arch/s390/pci/pci_mmio.c
>>> @@ -119,33 +119,15 @@ static inline int __memcpy_toio_inuser(void __iomem *dst,
>>>  	return rc;
>>>  }
>>>  
>>> -static long get_pfn(unsigned long user_addr, unsigned long access,
>>> -		    unsigned long *pfn)
>>> -{
>>> -	struct vm_area_struct *vma;
>>> -	long ret;
>>> -
>>> -	mmap_read_lock(current->mm);
>>> -	ret = -EINVAL;
>>> -	vma = find_vma(current->mm, user_addr);
>>> -	if (!vma)
>>> -		goto out;
>>> -	ret = -EACCES;
>>> -	if (!(vma->vm_flags & access))
>>> -		goto out;
>>> -	ret = follow_pfn(vma, user_addr, pfn);
>>> -out:
>>> -	mmap_read_unlock(current->mm);
>>> -	return ret;
>>> -}
>>> -
>>>  SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, mmio_addr,
>>>  		const void __user *, user_buffer, size_t, length)
>>>  {
>>>  	u8 local_buf[64];
>>>  	void __iomem *io_addr;
>>>  	void *buf;
>>> -	unsigned long pfn;
>>> +	struct vm_area_struct *vma;
>>> +	pte_t *ptep;
>>> +	spinlock_t *ptl;
>>
>> With checkpatch.pl --strict the above yields a complained
>> "CHECK: spinlock_t definition without comment" but I think
>> that's really okay since your commit description is very clear.
>> Same oin line 277.
> 
> I think this is a falls positive, checkpatch doesn't realize that
> SYSCALL_DEFINE3 is a function, not a structure. And in a structure I'd
> have added the kerneldoc or comment.
> 
> I'll fix up all the nits you've found for the next round. Thanks for
> taking a look.
> -Daniel
> 

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 08/17] s390/pci: Remove races against pte updates
  2020-10-21  7:55       ` Niklas Schnelle
@ 2020-10-22  7:39         ` Daniel Vetter
  0 siblings, 0 replies; 69+ messages in thread
From: Daniel Vetter @ 2020-10-22  7:39 UTC (permalink / raw)
  To: Niklas Schnelle
  Cc: DRI Development, LKML, kvm, linux-mm, linux-arm-kernel,
	linux-samsung-soc, linux-media, linux-s390, Gerald Schaefer,
	Daniel Vetter, Jason Gunthorpe, Dan Williams, Kees Cook,
	Andrew Morton, John Hubbard, Jérôme Glisse, Jan Kara

On Wed, Oct 21, 2020 at 09:55:57AM +0200, Niklas Schnelle wrote:
> Hi Daniel,
> 
> friendly ping. I haven't seen a new version of this patch series,
> as I said I think your change for s390/pci is generally useful so
> I'm curious, are you planning on sending a new version soon?
> If you want you can also just sent this patch with the last few
> nitpicks (primarily the mail address) fixed and I'll happily apply.

(I think this was stuck somewhere in moderation, only showed up just now)

I was waiting for the testing result for the habana driver from Oded, but
I guess Oded was waiting for v3. Hence the delay.

Cheers, Daniel

> 
> Best regards,
> Niklas Schnelle
> 
> On 10/12/20 4:19 PM, Daniel Vetter wrote:
> > On Mon, Oct 12, 2020 at 04:03:28PM +0200, Niklas Schnelle wrote:
> ... snip ....
> >>> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> >>> Cc: Dan Williams <dan.j.williams@intel.com>
> >>> Cc: Kees Cook <keescook@chromium.org>
> >>> Cc: Andrew Morton <akpm@linux-foundation.org>
> >>> Cc: John Hubbard <jhubbard@nvidia.com>
> >>> Cc: Jérôme Glisse <jglisse@redhat.com>
> >>> Cc: Jan Kara <jack@suse.cz>
> >>> Cc: Dan Williams <dan.j.williams@intel.com>
> >>
> >> The above Cc: line for Dan Williams is a duplicate
> >>
> >>> Cc: linux-mm@kvack.org
> >>> Cc: linux-arm-kernel@lists.infradead.org
> >>> Cc: linux-samsung-soc@vger.kernel.org
> >>> Cc: linux-media@vger.kernel.org
> >>> Cc: Niklas Schnelle <schnelle@linux.ibm.com>
> >>> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
> >>> Cc: linux-s390@vger.kernel.org
> >>> --
> >>> v2: Move VM_IO | VM_PFNMAP checks around so they keep returning EINVAL
> >>> like before (Gerard)
> >>
> >> I think the above should go before the CC/Signed-off/Reviewev block.
> > 
> > This is a per-subsystem bikeshed :-) drivers/gpu definitely wants it
> > above, but most core subsystems want it below. I'll move it.
> > 
> >>> ---
> >>>  arch/s390/pci/pci_mmio.c | 98 +++++++++++++++++++++++-----------------
> >>>  1 file changed, 57 insertions(+), 41 deletions(-)
> >>>
> >>> diff --git a/arch/s390/pci/pci_mmio.c b/arch/s390/pci/pci_mmio.c
> >>> index 401cf670a243..1a6adbc68ee8 100644
> >>> --- a/arch/s390/pci/pci_mmio.c
> >>> +++ b/arch/s390/pci/pci_mmio.c
> >>> @@ -119,33 +119,15 @@ static inline int __memcpy_toio_inuser(void __iomem *dst,
> >>>  	return rc;
> >>>  }
> >>>  
> >>> -static long get_pfn(unsigned long user_addr, unsigned long access,
> >>> -		    unsigned long *pfn)
> >>> -{
> >>> -	struct vm_area_struct *vma;
> >>> -	long ret;
> >>> -
> >>> -	mmap_read_lock(current->mm);
> >>> -	ret = -EINVAL;
> >>> -	vma = find_vma(current->mm, user_addr);
> >>> -	if (!vma)
> >>> -		goto out;
> >>> -	ret = -EACCES;
> >>> -	if (!(vma->vm_flags & access))
> >>> -		goto out;
> >>> -	ret = follow_pfn(vma, user_addr, pfn);
> >>> -out:
> >>> -	mmap_read_unlock(current->mm);
> >>> -	return ret;
> >>> -}
> >>> -
> >>>  SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, mmio_addr,
> >>>  		const void __user *, user_buffer, size_t, length)
> >>>  {
> >>>  	u8 local_buf[64];
> >>>  	void __iomem *io_addr;
> >>>  	void *buf;
> >>> -	unsigned long pfn;
> >>> +	struct vm_area_struct *vma;
> >>> +	pte_t *ptep;
> >>> +	spinlock_t *ptl;
> >>
> >> With checkpatch.pl --strict the above yields a complained
> >> "CHECK: spinlock_t definition without comment" but I think
> >> that's really okay since your commit description is very clear.
> >> Same oin line 277.
> > 
> > I think this is a falls positive, checkpatch doesn't realize that
> > SYSCALL_DEFINE3 is a function, not a structure. And in a structure I'd
> > have added the kerneldoc or comment.
> > 
> > I'll fix up all the nits you've found for the next round. Thanks for
> > taking a look.
> > -Daniel
> > 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 69+ messages in thread

end of thread, back to index

Thread overview: 69+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-09  7:59 [PATCH v2 00/17] follow_pfn and other iomap races Daniel Vetter
2020-10-09  7:59 ` [PATCH v2 01/17] drm/exynos: Stop using frame_vector helpers Daniel Vetter
2020-10-16  7:42   ` John Hubbard
2020-10-09  7:59 ` [PATCH v2 02/17] drm/exynos: Use FOLL_LONGTERM for g2d cmdlists Daniel Vetter
2020-10-09  7:59 ` [PATCH v2 03/17] misc/habana: Stop using frame_vector helpers Daniel Vetter
2020-10-10 20:26   ` Oded Gabbay
2020-10-10 21:32     ` Daniel Vetter
2020-10-10 21:41       ` Daniel Vetter
2020-10-10 21:47         ` Oded Gabbay
2020-10-16  7:45   ` John Hubbard
2020-10-09  7:59 ` [PATCH v2 04/17] misc/habana: Use FOLL_LONGTERM for userptr Daniel Vetter
2020-10-09  7:59 ` [PATCH v2 05/17] mm/frame-vector: Use FOLL_LONGTERM Daniel Vetter
2020-10-16  7:54   ` John Hubbard
2020-10-16  8:03     ` Daniel Vetter
2020-10-09  7:59 ` [PATCH v2 06/17] media: videobuf2: Move frame_vector into media subsystem Daniel Vetter
2020-10-09 10:14   ` Mauro Carvalho Chehab
2020-10-09 16:57     ` Daniel Vetter
2020-10-09  7:59 ` [PATCH v2 07/17] mm: Close race in generic_access_phys Daniel Vetter
2020-10-09  7:59 ` [PATCH v2 08/17] s390/pci: Remove races against pte updates Daniel Vetter
2020-10-12 14:03   ` Niklas Schnelle
2020-10-12 14:19     ` Daniel Vetter
2020-10-12 14:39       ` Niklas Schnelle
2020-10-21  7:55       ` Niklas Schnelle
2020-10-22  7:39         ` Daniel Vetter
2020-10-09  7:59 ` [PATCH v2 09/17] mm: Add unsafe_follow_pfn Daniel Vetter
2020-10-09 10:34   ` Mauro Carvalho Chehab
2020-10-09 12:21     ` Jason Gunthorpe
2020-10-09 12:37       ` Mauro Carvalho Chehab
2020-10-09 12:39         ` Mauro Carvalho Chehab
2020-10-09 12:48         ` Jason Gunthorpe
2020-10-09 17:52           ` Daniel Vetter
2020-10-09 18:01             ` Jason Gunthorpe
2020-10-09 19:31               ` Daniel Vetter
2020-10-10  9:21             ` Mauro Carvalho Chehab
2020-10-10 10:53               ` Daniel Vetter
2020-10-10 11:39                 ` Mauro Carvalho Chehab
2020-10-10 11:56                   ` Daniel Vetter
2020-10-10 17:22             ` Tomasz Figa
2020-10-10 21:35               ` Laurent Pinchart
2020-10-10 21:50                 ` Daniel Vetter
2020-10-11  6:27                   ` Mauro Carvalho Chehab
2020-10-11  6:36                     ` Mauro Carvalho Chehab
2020-10-10 21:11             ` Laurent Pinchart
2020-10-12 10:46           ` Marek Szyprowski
2020-10-12 13:49             ` Daniel Vetter
2020-10-10 17:30         ` Tomasz Figa
2020-10-09  7:59 ` [PATCH v2 10/17] media/videbuf1|2: Mark follow_pfn usage as unsafe Daniel Vetter
2020-10-10  9:24   ` Mauro Carvalho Chehab
2020-10-09  7:59 ` [PATCH v2 11/17] vfio/type1: Mark follow_pfn " Daniel Vetter
2020-10-09  7:59 ` [PATCH v2 12/17] PCI: Obey iomem restrictions for procfs mmap Daniel Vetter
2020-10-09  7:59 ` [PATCH v2 13/17] /dev/mem: Only set filp->f_mapping Daniel Vetter
2020-10-09  7:59 ` [PATCH v2 14/17] resource: Move devmem revoke code to resource framework Daniel Vetter
2020-10-09 10:59   ` Greg Kroah-Hartman
2020-10-09 12:31   ` Jason Gunthorpe
2020-10-09 14:24     ` Daniel Vetter
2020-10-09 14:32       ` Jason Gunthorpe
2020-10-09 18:28         ` Dan Williams
2020-10-15  0:09           ` Jason Gunthorpe
2020-10-15  7:52             ` Daniel Vetter
2020-10-15  7:55               ` Daniel Vetter
2020-10-15 15:29             ` Daniel Vetter
2020-10-09  7:59 ` [PATCH v2 15/17] sysfs: Support zapping of binary attr mmaps Daniel Vetter
2020-10-09 10:58   ` Greg Kroah-Hartman
2020-10-09  7:59 ` [PATCH v2 16/17] PCI: Revoke mappings like devmem Daniel Vetter
2020-10-09  7:59 ` [PATCH v2 17/17] drm/i915: Properly request PCI BARs Daniel Vetter
2020-10-09  9:47   ` Ville Syrjälä
2020-10-09 10:01     ` Daniel Vetter
2020-10-09 10:41       ` Ville Syrjälä
2020-10-09 14:18         ` Daniel Vetter

KVM Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/kvm/0 kvm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 kvm kvm/ https://lore.kernel.org/kvm \
		kvm@vger.kernel.org
	public-inbox-index kvm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.kvm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git