All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/17] follow_pfn and other iomap races
@ 2020-10-09  7:59 ` Daniel Vetter
  0 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-09  7:59 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: kvm, linux-mm, linux-arm-kernel, linux-samsung-soc, linux-media,
	linux-s390, Daniel Vetter

Hi all,

Round two of my patch series to clamp down a bunch of races and gaps
around follow_pfn and other access to iomem mmaps. Previous version:

v1: https://lore.kernel.org/dri-devel/20201007164426.1812530-1-daniel.vetter@ffwll.ch/

And the discussion that sparked this journey:

https://lore.kernel.org/dri-devel/20201007164426.1812530-1-daniel.vetter@ffwll.ch/

Changes in v2:
- tons of small polish&fixes all over, thanks to all the reviewers who
  spotted issues
- I managed to test at least the generic_access_phys and pci mmap revoke
  stuff with a few gdb sessions using our i915 debug tools (hence now also
  the drm/i915 patch to properly request all the pci bar regions)
- reworked approach for the pci mmap revoke: Infrastructure moved into
  kernel/resource.c, address_space mapping is now set up at open time for
  everyone (which required some sysfs changes). Does indeed look a lot
  cleaner and a lot less invasive than I feared at first.

The big thing I can't test are all the frame_vector changes in habanalbas,
exynos and media. Gerald has given the s390 patch a spin already.

Review, testing, feedback all very much welcome.

Cheers, Daniel

Daniel Vetter (17):
  drm/exynos: Stop using frame_vector helpers
  drm/exynos: Use FOLL_LONGTERM for g2d cmdlists
  misc/habana: Stop using frame_vector helpers
  misc/habana: Use FOLL_LONGTERM for userptr
  mm/frame-vector: Use FOLL_LONGTERM
  media: videobuf2: Move frame_vector into media subsystem
  mm: Close race in generic_access_phys
  s390/pci: Remove races against pte updates
  mm: Add unsafe_follow_pfn
  media/videbuf1|2: Mark follow_pfn usage as unsafe
  vfio/type1: Mark follow_pfn as unsafe
  PCI: Obey iomem restrictions for procfs mmap
  /dev/mem: Only set filp->f_mapping
  resource: Move devmem revoke code to resource framework
  sysfs: Support zapping of binary attr mmaps
  PCI: Revoke mappings like devmem
  drm/i915: Properly request PCI BARs

 arch/s390/pci/pci_mmio.c                      | 98 +++++++++++--------
 drivers/char/mem.c                            | 86 +---------------
 drivers/gpu/drm/exynos/Kconfig                |  1 -
 drivers/gpu/drm/exynos/exynos_drm_g2d.c       | 48 ++++-----
 drivers/gpu/drm/i915/intel_uncore.c           | 25 ++++-
 drivers/media/common/videobuf2/Kconfig        |  1 -
 drivers/media/common/videobuf2/Makefile       |  1 +
 .../media/common/videobuf2}/frame_vector.c    | 54 ++++------
 drivers/media/platform/omap/Kconfig           |  1 -
 drivers/media/v4l2-core/videobuf-dma-contig.c |  2 +-
 drivers/misc/habanalabs/Kconfig               |  1 -
 drivers/misc/habanalabs/common/habanalabs.h   |  3 +-
 drivers/misc/habanalabs/common/memory.c       | 50 ++++------
 drivers/pci/pci-sysfs.c                       |  4 +
 drivers/pci/proc.c                            |  6 ++
 drivers/vfio/vfio_iommu_type1.c               |  4 +-
 fs/sysfs/file.c                               | 11 +++
 include/linux/ioport.h                        |  6 +-
 include/linux/mm.h                            | 47 +--------
 include/linux/sysfs.h                         |  2 +
 include/media/videobuf2-core.h                | 42 ++++++++
 kernel/resource.c                             | 95 +++++++++++++++++-
 mm/Kconfig                                    |  3 -
 mm/Makefile                                   |  1 -
 mm/memory.c                                   | 76 +++++++++++++-
 mm/nommu.c                                    | 17 ++++
 security/Kconfig                              | 13 +++
 27 files changed, 412 insertions(+), 286 deletions(-)
 rename {mm => drivers/media/common/videobuf2}/frame_vector.c (85%)

-- 
2.28.0


^ permalink raw reply	[flat|nested] 214+ messages in thread

* [PATCH v2 00/17] follow_pfn and other iomap races
@ 2020-10-09  7:59 ` Daniel Vetter
  0 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-09  7:59 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: linux-s390, linux-samsung-soc, kvm, Daniel Vetter, linux-mm,
	linux-arm-kernel, linux-media

Hi all,

Round two of my patch series to clamp down a bunch of races and gaps
around follow_pfn and other access to iomem mmaps. Previous version:

v1: https://lore.kernel.org/dri-devel/20201007164426.1812530-1-daniel.vetter@ffwll.ch/

And the discussion that sparked this journey:

https://lore.kernel.org/dri-devel/20201007164426.1812530-1-daniel.vetter@ffwll.ch/

Changes in v2:
- tons of small polish&fixes all over, thanks to all the reviewers who
  spotted issues
- I managed to test at least the generic_access_phys and pci mmap revoke
  stuff with a few gdb sessions using our i915 debug tools (hence now also
  the drm/i915 patch to properly request all the pci bar regions)
- reworked approach for the pci mmap revoke: Infrastructure moved into
  kernel/resource.c, address_space mapping is now set up at open time for
  everyone (which required some sysfs changes). Does indeed look a lot
  cleaner and a lot less invasive than I feared at first.

The big thing I can't test are all the frame_vector changes in habanalbas,
exynos and media. Gerald has given the s390 patch a spin already.

Review, testing, feedback all very much welcome.

Cheers, Daniel

Daniel Vetter (17):
  drm/exynos: Stop using frame_vector helpers
  drm/exynos: Use FOLL_LONGTERM for g2d cmdlists
  misc/habana: Stop using frame_vector helpers
  misc/habana: Use FOLL_LONGTERM for userptr
  mm/frame-vector: Use FOLL_LONGTERM
  media: videobuf2: Move frame_vector into media subsystem
  mm: Close race in generic_access_phys
  s390/pci: Remove races against pte updates
  mm: Add unsafe_follow_pfn
  media/videbuf1|2: Mark follow_pfn usage as unsafe
  vfio/type1: Mark follow_pfn as unsafe
  PCI: Obey iomem restrictions for procfs mmap
  /dev/mem: Only set filp->f_mapping
  resource: Move devmem revoke code to resource framework
  sysfs: Support zapping of binary attr mmaps
  PCI: Revoke mappings like devmem
  drm/i915: Properly request PCI BARs

 arch/s390/pci/pci_mmio.c                      | 98 +++++++++++--------
 drivers/char/mem.c                            | 86 +---------------
 drivers/gpu/drm/exynos/Kconfig                |  1 -
 drivers/gpu/drm/exynos/exynos_drm_g2d.c       | 48 ++++-----
 drivers/gpu/drm/i915/intel_uncore.c           | 25 ++++-
 drivers/media/common/videobuf2/Kconfig        |  1 -
 drivers/media/common/videobuf2/Makefile       |  1 +
 .../media/common/videobuf2}/frame_vector.c    | 54 ++++------
 drivers/media/platform/omap/Kconfig           |  1 -
 drivers/media/v4l2-core/videobuf-dma-contig.c |  2 +-
 drivers/misc/habanalabs/Kconfig               |  1 -
 drivers/misc/habanalabs/common/habanalabs.h   |  3 +-
 drivers/misc/habanalabs/common/memory.c       | 50 ++++------
 drivers/pci/pci-sysfs.c                       |  4 +
 drivers/pci/proc.c                            |  6 ++
 drivers/vfio/vfio_iommu_type1.c               |  4 +-
 fs/sysfs/file.c                               | 11 +++
 include/linux/ioport.h                        |  6 +-
 include/linux/mm.h                            | 47 +--------
 include/linux/sysfs.h                         |  2 +
 include/media/videobuf2-core.h                | 42 ++++++++
 kernel/resource.c                             | 95 +++++++++++++++++-
 mm/Kconfig                                    |  3 -
 mm/Makefile                                   |  1 -
 mm/memory.c                                   | 76 +++++++++++++-
 mm/nommu.c                                    | 17 ++++
 security/Kconfig                              | 13 +++
 27 files changed, 412 insertions(+), 286 deletions(-)
 rename {mm => drivers/media/common/videobuf2}/frame_vector.c (85%)

-- 
2.28.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* [PATCH v2 00/17] follow_pfn and other iomap races
@ 2020-10-09  7:59 ` Daniel Vetter
  0 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-09  7:59 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: linux-s390, linux-samsung-soc, kvm, Daniel Vetter, linux-mm,
	linux-arm-kernel, linux-media

Hi all,

Round two of my patch series to clamp down a bunch of races and gaps
around follow_pfn and other access to iomem mmaps. Previous version:

v1: https://lore.kernel.org/dri-devel/20201007164426.1812530-1-daniel.vetter@ffwll.ch/

And the discussion that sparked this journey:

https://lore.kernel.org/dri-devel/20201007164426.1812530-1-daniel.vetter@ffwll.ch/

Changes in v2:
- tons of small polish&fixes all over, thanks to all the reviewers who
  spotted issues
- I managed to test at least the generic_access_phys and pci mmap revoke
  stuff with a few gdb sessions using our i915 debug tools (hence now also
  the drm/i915 patch to properly request all the pci bar regions)
- reworked approach for the pci mmap revoke: Infrastructure moved into
  kernel/resource.c, address_space mapping is now set up at open time for
  everyone (which required some sysfs changes). Does indeed look a lot
  cleaner and a lot less invasive than I feared at first.

The big thing I can't test are all the frame_vector changes in habanalbas,
exynos and media. Gerald has given the s390 patch a spin already.

Review, testing, feedback all very much welcome.

Cheers, Daniel

Daniel Vetter (17):
  drm/exynos: Stop using frame_vector helpers
  drm/exynos: Use FOLL_LONGTERM for g2d cmdlists
  misc/habana: Stop using frame_vector helpers
  misc/habana: Use FOLL_LONGTERM for userptr
  mm/frame-vector: Use FOLL_LONGTERM
  media: videobuf2: Move frame_vector into media subsystem
  mm: Close race in generic_access_phys
  s390/pci: Remove races against pte updates
  mm: Add unsafe_follow_pfn
  media/videbuf1|2: Mark follow_pfn usage as unsafe
  vfio/type1: Mark follow_pfn as unsafe
  PCI: Obey iomem restrictions for procfs mmap
  /dev/mem: Only set filp->f_mapping
  resource: Move devmem revoke code to resource framework
  sysfs: Support zapping of binary attr mmaps
  PCI: Revoke mappings like devmem
  drm/i915: Properly request PCI BARs

 arch/s390/pci/pci_mmio.c                      | 98 +++++++++++--------
 drivers/char/mem.c                            | 86 +---------------
 drivers/gpu/drm/exynos/Kconfig                |  1 -
 drivers/gpu/drm/exynos/exynos_drm_g2d.c       | 48 ++++-----
 drivers/gpu/drm/i915/intel_uncore.c           | 25 ++++-
 drivers/media/common/videobuf2/Kconfig        |  1 -
 drivers/media/common/videobuf2/Makefile       |  1 +
 .../media/common/videobuf2}/frame_vector.c    | 54 ++++------
 drivers/media/platform/omap/Kconfig           |  1 -
 drivers/media/v4l2-core/videobuf-dma-contig.c |  2 +-
 drivers/misc/habanalabs/Kconfig               |  1 -
 drivers/misc/habanalabs/common/habanalabs.h   |  3 +-
 drivers/misc/habanalabs/common/memory.c       | 50 ++++------
 drivers/pci/pci-sysfs.c                       |  4 +
 drivers/pci/proc.c                            |  6 ++
 drivers/vfio/vfio_iommu_type1.c               |  4 +-
 fs/sysfs/file.c                               | 11 +++
 include/linux/ioport.h                        |  6 +-
 include/linux/mm.h                            | 47 +--------
 include/linux/sysfs.h                         |  2 +
 include/media/videobuf2-core.h                | 42 ++++++++
 kernel/resource.c                             | 95 +++++++++++++++++-
 mm/Kconfig                                    |  3 -
 mm/Makefile                                   |  1 -
 mm/memory.c                                   | 76 +++++++++++++-
 mm/nommu.c                                    | 17 ++++
 security/Kconfig                              | 13 +++
 27 files changed, 412 insertions(+), 286 deletions(-)
 rename {mm => drivers/media/common/videobuf2}/frame_vector.c (85%)

-- 
2.28.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* [PATCH v2 01/17] drm/exynos: Stop using frame_vector helpers
  2020-10-09  7:59 ` Daniel Vetter
  (?)
@ 2020-10-09  7:59   ` Daniel Vetter
  -1 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-09  7:59 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: kvm, linux-mm, linux-arm-kernel, linux-samsung-soc, linux-media,
	linux-s390, Daniel Vetter, Daniel Vetter, Jason Gunthorpe,
	Inki Dae, Joonyoung Shim, Seung-Woo Kim, Kyungmin Park,
	Kukjin Kim, Krzysztof Kozlowski, Andrew Morton, John Hubbard,
	Jérôme Glisse, Jan Kara, Dan Williams

All we need are a pages array, pin_user_pages_fast can give us that
directly. Plus this avoids the entire raw pfn side of get_vaddr_frames.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Inki Dae <inki.dae@samsung.com>
Cc: Joonyoung Shim <jy0922.shim@samsung.com>
Cc: Seung-Woo Kim <sw0312.kim@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Kukjin Kim <kgene@kernel.org>
Cc: Krzysztof Kozlowski <krzk@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
--
v2: Use unpin_user_pages_dirty_lock (John)
---
 drivers/gpu/drm/exynos/Kconfig          |  1 -
 drivers/gpu/drm/exynos/exynos_drm_g2d.c | 47 +++++++++++--------------
 2 files changed, 20 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/exynos/Kconfig b/drivers/gpu/drm/exynos/Kconfig
index 6417f374b923..43257ef3c09d 100644
--- a/drivers/gpu/drm/exynos/Kconfig
+++ b/drivers/gpu/drm/exynos/Kconfig
@@ -88,7 +88,6 @@ comment "Sub-drivers"
 config DRM_EXYNOS_G2D
 	bool "G2D"
 	depends on VIDEO_SAMSUNG_S5P_G2D=n || COMPILE_TEST
-	select FRAME_VECTOR
 	help
 	  Choose this option if you want to use Exynos G2D for DRM.
 
diff --git a/drivers/gpu/drm/exynos/exynos_drm_g2d.c b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
index 967a5cdc120e..ecede41af9b9 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_g2d.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
@@ -205,7 +205,8 @@ struct g2d_cmdlist_userptr {
 	dma_addr_t		dma_addr;
 	unsigned long		userptr;
 	unsigned long		size;
-	struct frame_vector	*vec;
+	struct page		**pages;
+	unsigned int		npages;
 	struct sg_table		*sgt;
 	atomic_t		refcount;
 	bool			in_pool;
@@ -378,7 +379,6 @@ static void g2d_userptr_put_dma_addr(struct g2d_data *g2d,
 					bool force)
 {
 	struct g2d_cmdlist_userptr *g2d_userptr = obj;
-	struct page **pages;
 
 	if (!obj)
 		return;
@@ -398,15 +398,9 @@ static void g2d_userptr_put_dma_addr(struct g2d_data *g2d,
 	dma_unmap_sgtable(to_dma_dev(g2d->drm_dev), g2d_userptr->sgt,
 			  DMA_BIDIRECTIONAL, 0);
 
-	pages = frame_vector_pages(g2d_userptr->vec);
-	if (!IS_ERR(pages)) {
-		int i;
-
-		for (i = 0; i < frame_vector_count(g2d_userptr->vec); i++)
-			set_page_dirty_lock(pages[i]);
-	}
-	put_vaddr_frames(g2d_userptr->vec);
-	frame_vector_destroy(g2d_userptr->vec);
+	unpin_user_pages_dirty_lock(g2d_userptr->pages, g2d_userptr->npages,
+				    true);
+	kvfree(g2d_userptr->pages);
 
 	if (!g2d_userptr->out_of_list)
 		list_del_init(&g2d_userptr->list);
@@ -474,35 +468,34 @@ static dma_addr_t *g2d_userptr_get_dma_addr(struct g2d_data *g2d,
 	offset = userptr & ~PAGE_MASK;
 	end = PAGE_ALIGN(userptr + size);
 	npages = (end - start) >> PAGE_SHIFT;
-	g2d_userptr->vec = frame_vector_create(npages);
-	if (!g2d_userptr->vec) {
+	g2d_userptr->pages = kvmalloc_array(npages, sizeof(*g2d_userptr->pages),
+					    GFP_KERNEL);
+	if (!g2d_userptr->pages) {
 		ret = -ENOMEM;
 		goto err_free;
 	}
 
-	ret = get_vaddr_frames(start, npages, FOLL_FORCE | FOLL_WRITE,
-		g2d_userptr->vec);
+	ret = pin_user_pages_fast(start, npages, FOLL_FORCE | FOLL_WRITE,
+				  g2d_userptr->pages);
 	if (ret != npages) {
 		DRM_DEV_ERROR(g2d->dev,
 			      "failed to get user pages from userptr.\n");
 		if (ret < 0)
-			goto err_destroy_framevec;
-		ret = -EFAULT;
-		goto err_put_framevec;
-	}
-	if (frame_vector_to_pages(g2d_userptr->vec) < 0) {
+			goto err_destroy_pages;
+		npages = ret;
 		ret = -EFAULT;
-		goto err_put_framevec;
+		goto err_unpin_pages;
 	}
+	g2d_userptr->npages = npages;
 
 	sgt = kzalloc(sizeof(*sgt), GFP_KERNEL);
 	if (!sgt) {
 		ret = -ENOMEM;
-		goto err_put_framevec;
+		goto err_unpin_pages;
 	}
 
 	ret = sg_alloc_table_from_pages(sgt,
-					frame_vector_pages(g2d_userptr->vec),
+					g2d_userptr->pages,
 					npages, offset, size, GFP_KERNEL);
 	if (ret < 0) {
 		DRM_DEV_ERROR(g2d->dev, "failed to get sgt from pages.\n");
@@ -538,11 +531,11 @@ static dma_addr_t *g2d_userptr_get_dma_addr(struct g2d_data *g2d,
 err_free_sgt:
 	kfree(sgt);
 
-err_put_framevec:
-	put_vaddr_frames(g2d_userptr->vec);
+err_unpin_pages:
+	unpin_user_pages(g2d_userptr->pages, npages);
 
-err_destroy_framevec:
-	frame_vector_destroy(g2d_userptr->vec);
+err_destroy_pages:
+	kvfree(g2d_userptr->pages);
 
 err_free:
 	kfree(g2d_userptr);
-- 
2.28.0


^ permalink raw reply	[flat|nested] 214+ messages in thread

* [PATCH v2 01/17] drm/exynos: Stop using frame_vector helpers
@ 2020-10-09  7:59   ` Daniel Vetter
  0 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-09  7:59 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Joonyoung Shim, kvm,
	Jason Gunthorpe, Daniel Vetter, Seung-Woo Kim,
	Jérôme Glisse, Krzysztof Kozlowski, Inki Dae, linux-mm,
	Kyungmin Park, Kukjin Kim, John Hubbard, Daniel Vetter,
	Andrew Morton, Dan Williams, linux-arm-kernel, linux-media

All we need are a pages array, pin_user_pages_fast can give us that
directly. Plus this avoids the entire raw pfn side of get_vaddr_frames.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Inki Dae <inki.dae@samsung.com>
Cc: Joonyoung Shim <jy0922.shim@samsung.com>
Cc: Seung-Woo Kim <sw0312.kim@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Kukjin Kim <kgene@kernel.org>
Cc: Krzysztof Kozlowski <krzk@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
--
v2: Use unpin_user_pages_dirty_lock (John)
---
 drivers/gpu/drm/exynos/Kconfig          |  1 -
 drivers/gpu/drm/exynos/exynos_drm_g2d.c | 47 +++++++++++--------------
 2 files changed, 20 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/exynos/Kconfig b/drivers/gpu/drm/exynos/Kconfig
index 6417f374b923..43257ef3c09d 100644
--- a/drivers/gpu/drm/exynos/Kconfig
+++ b/drivers/gpu/drm/exynos/Kconfig
@@ -88,7 +88,6 @@ comment "Sub-drivers"
 config DRM_EXYNOS_G2D
 	bool "G2D"
 	depends on VIDEO_SAMSUNG_S5P_G2D=n || COMPILE_TEST
-	select FRAME_VECTOR
 	help
 	  Choose this option if you want to use Exynos G2D for DRM.
 
diff --git a/drivers/gpu/drm/exynos/exynos_drm_g2d.c b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
index 967a5cdc120e..ecede41af9b9 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_g2d.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
@@ -205,7 +205,8 @@ struct g2d_cmdlist_userptr {
 	dma_addr_t		dma_addr;
 	unsigned long		userptr;
 	unsigned long		size;
-	struct frame_vector	*vec;
+	struct page		**pages;
+	unsigned int		npages;
 	struct sg_table		*sgt;
 	atomic_t		refcount;
 	bool			in_pool;
@@ -378,7 +379,6 @@ static void g2d_userptr_put_dma_addr(struct g2d_data *g2d,
 					bool force)
 {
 	struct g2d_cmdlist_userptr *g2d_userptr = obj;
-	struct page **pages;
 
 	if (!obj)
 		return;
@@ -398,15 +398,9 @@ static void g2d_userptr_put_dma_addr(struct g2d_data *g2d,
 	dma_unmap_sgtable(to_dma_dev(g2d->drm_dev), g2d_userptr->sgt,
 			  DMA_BIDIRECTIONAL, 0);
 
-	pages = frame_vector_pages(g2d_userptr->vec);
-	if (!IS_ERR(pages)) {
-		int i;
-
-		for (i = 0; i < frame_vector_count(g2d_userptr->vec); i++)
-			set_page_dirty_lock(pages[i]);
-	}
-	put_vaddr_frames(g2d_userptr->vec);
-	frame_vector_destroy(g2d_userptr->vec);
+	unpin_user_pages_dirty_lock(g2d_userptr->pages, g2d_userptr->npages,
+				    true);
+	kvfree(g2d_userptr->pages);
 
 	if (!g2d_userptr->out_of_list)
 		list_del_init(&g2d_userptr->list);
@@ -474,35 +468,34 @@ static dma_addr_t *g2d_userptr_get_dma_addr(struct g2d_data *g2d,
 	offset = userptr & ~PAGE_MASK;
 	end = PAGE_ALIGN(userptr + size);
 	npages = (end - start) >> PAGE_SHIFT;
-	g2d_userptr->vec = frame_vector_create(npages);
-	if (!g2d_userptr->vec) {
+	g2d_userptr->pages = kvmalloc_array(npages, sizeof(*g2d_userptr->pages),
+					    GFP_KERNEL);
+	if (!g2d_userptr->pages) {
 		ret = -ENOMEM;
 		goto err_free;
 	}
 
-	ret = get_vaddr_frames(start, npages, FOLL_FORCE | FOLL_WRITE,
-		g2d_userptr->vec);
+	ret = pin_user_pages_fast(start, npages, FOLL_FORCE | FOLL_WRITE,
+				  g2d_userptr->pages);
 	if (ret != npages) {
 		DRM_DEV_ERROR(g2d->dev,
 			      "failed to get user pages from userptr.\n");
 		if (ret < 0)
-			goto err_destroy_framevec;
-		ret = -EFAULT;
-		goto err_put_framevec;
-	}
-	if (frame_vector_to_pages(g2d_userptr->vec) < 0) {
+			goto err_destroy_pages;
+		npages = ret;
 		ret = -EFAULT;
-		goto err_put_framevec;
+		goto err_unpin_pages;
 	}
+	g2d_userptr->npages = npages;
 
 	sgt = kzalloc(sizeof(*sgt), GFP_KERNEL);
 	if (!sgt) {
 		ret = -ENOMEM;
-		goto err_put_framevec;
+		goto err_unpin_pages;
 	}
 
 	ret = sg_alloc_table_from_pages(sgt,
-					frame_vector_pages(g2d_userptr->vec),
+					g2d_userptr->pages,
 					npages, offset, size, GFP_KERNEL);
 	if (ret < 0) {
 		DRM_DEV_ERROR(g2d->dev, "failed to get sgt from pages.\n");
@@ -538,11 +531,11 @@ static dma_addr_t *g2d_userptr_get_dma_addr(struct g2d_data *g2d,
 err_free_sgt:
 	kfree(sgt);
 
-err_put_framevec:
-	put_vaddr_frames(g2d_userptr->vec);
+err_unpin_pages:
+	unpin_user_pages(g2d_userptr->pages, npages);
 
-err_destroy_framevec:
-	frame_vector_destroy(g2d_userptr->vec);
+err_destroy_pages:
+	kvfree(g2d_userptr->pages);
 
 err_free:
 	kfree(g2d_userptr);
-- 
2.28.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* [PATCH v2 01/17] drm/exynos: Stop using frame_vector helpers
@ 2020-10-09  7:59   ` Daniel Vetter
  0 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-09  7:59 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Joonyoung Shim, kvm,
	Jason Gunthorpe, Daniel Vetter, Seung-Woo Kim,
	Jérôme Glisse, Krzysztof Kozlowski, linux-mm,
	Kyungmin Park, Kukjin Kim, John Hubbard, Daniel Vetter,
	Andrew Morton, Dan Williams, linux-arm-kernel, linux-media

All we need are a pages array, pin_user_pages_fast can give us that
directly. Plus this avoids the entire raw pfn side of get_vaddr_frames.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Inki Dae <inki.dae@samsung.com>
Cc: Joonyoung Shim <jy0922.shim@samsung.com>
Cc: Seung-Woo Kim <sw0312.kim@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Kukjin Kim <kgene@kernel.org>
Cc: Krzysztof Kozlowski <krzk@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
--
v2: Use unpin_user_pages_dirty_lock (John)
---
 drivers/gpu/drm/exynos/Kconfig          |  1 -
 drivers/gpu/drm/exynos/exynos_drm_g2d.c | 47 +++++++++++--------------
 2 files changed, 20 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/exynos/Kconfig b/drivers/gpu/drm/exynos/Kconfig
index 6417f374b923..43257ef3c09d 100644
--- a/drivers/gpu/drm/exynos/Kconfig
+++ b/drivers/gpu/drm/exynos/Kconfig
@@ -88,7 +88,6 @@ comment "Sub-drivers"
 config DRM_EXYNOS_G2D
 	bool "G2D"
 	depends on VIDEO_SAMSUNG_S5P_G2D=n || COMPILE_TEST
-	select FRAME_VECTOR
 	help
 	  Choose this option if you want to use Exynos G2D for DRM.
 
diff --git a/drivers/gpu/drm/exynos/exynos_drm_g2d.c b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
index 967a5cdc120e..ecede41af9b9 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_g2d.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
@@ -205,7 +205,8 @@ struct g2d_cmdlist_userptr {
 	dma_addr_t		dma_addr;
 	unsigned long		userptr;
 	unsigned long		size;
-	struct frame_vector	*vec;
+	struct page		**pages;
+	unsigned int		npages;
 	struct sg_table		*sgt;
 	atomic_t		refcount;
 	bool			in_pool;
@@ -378,7 +379,6 @@ static void g2d_userptr_put_dma_addr(struct g2d_data *g2d,
 					bool force)
 {
 	struct g2d_cmdlist_userptr *g2d_userptr = obj;
-	struct page **pages;
 
 	if (!obj)
 		return;
@@ -398,15 +398,9 @@ static void g2d_userptr_put_dma_addr(struct g2d_data *g2d,
 	dma_unmap_sgtable(to_dma_dev(g2d->drm_dev), g2d_userptr->sgt,
 			  DMA_BIDIRECTIONAL, 0);
 
-	pages = frame_vector_pages(g2d_userptr->vec);
-	if (!IS_ERR(pages)) {
-		int i;
-
-		for (i = 0; i < frame_vector_count(g2d_userptr->vec); i++)
-			set_page_dirty_lock(pages[i]);
-	}
-	put_vaddr_frames(g2d_userptr->vec);
-	frame_vector_destroy(g2d_userptr->vec);
+	unpin_user_pages_dirty_lock(g2d_userptr->pages, g2d_userptr->npages,
+				    true);
+	kvfree(g2d_userptr->pages);
 
 	if (!g2d_userptr->out_of_list)
 		list_del_init(&g2d_userptr->list);
@@ -474,35 +468,34 @@ static dma_addr_t *g2d_userptr_get_dma_addr(struct g2d_data *g2d,
 	offset = userptr & ~PAGE_MASK;
 	end = PAGE_ALIGN(userptr + size);
 	npages = (end - start) >> PAGE_SHIFT;
-	g2d_userptr->vec = frame_vector_create(npages);
-	if (!g2d_userptr->vec) {
+	g2d_userptr->pages = kvmalloc_array(npages, sizeof(*g2d_userptr->pages),
+					    GFP_KERNEL);
+	if (!g2d_userptr->pages) {
 		ret = -ENOMEM;
 		goto err_free;
 	}
 
-	ret = get_vaddr_frames(start, npages, FOLL_FORCE | FOLL_WRITE,
-		g2d_userptr->vec);
+	ret = pin_user_pages_fast(start, npages, FOLL_FORCE | FOLL_WRITE,
+				  g2d_userptr->pages);
 	if (ret != npages) {
 		DRM_DEV_ERROR(g2d->dev,
 			      "failed to get user pages from userptr.\n");
 		if (ret < 0)
-			goto err_destroy_framevec;
-		ret = -EFAULT;
-		goto err_put_framevec;
-	}
-	if (frame_vector_to_pages(g2d_userptr->vec) < 0) {
+			goto err_destroy_pages;
+		npages = ret;
 		ret = -EFAULT;
-		goto err_put_framevec;
+		goto err_unpin_pages;
 	}
+	g2d_userptr->npages = npages;
 
 	sgt = kzalloc(sizeof(*sgt), GFP_KERNEL);
 	if (!sgt) {
 		ret = -ENOMEM;
-		goto err_put_framevec;
+		goto err_unpin_pages;
 	}
 
 	ret = sg_alloc_table_from_pages(sgt,
-					frame_vector_pages(g2d_userptr->vec),
+					g2d_userptr->pages,
 					npages, offset, size, GFP_KERNEL);
 	if (ret < 0) {
 		DRM_DEV_ERROR(g2d->dev, "failed to get sgt from pages.\n");
@@ -538,11 +531,11 @@ static dma_addr_t *g2d_userptr_get_dma_addr(struct g2d_data *g2d,
 err_free_sgt:
 	kfree(sgt);
 
-err_put_framevec:
-	put_vaddr_frames(g2d_userptr->vec);
+err_unpin_pages:
+	unpin_user_pages(g2d_userptr->pages, npages);
 
-err_destroy_framevec:
-	frame_vector_destroy(g2d_userptr->vec);
+err_destroy_pages:
+	kvfree(g2d_userptr->pages);
 
 err_free:
 	kfree(g2d_userptr);
-- 
2.28.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* [PATCH v2 02/17] drm/exynos: Use FOLL_LONGTERM for g2d cmdlists
  2020-10-09  7:59 ` Daniel Vetter
  (?)
@ 2020-10-09  7:59   ` Daniel Vetter
  -1 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-09  7:59 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: kvm, linux-mm, linux-arm-kernel, linux-samsung-soc, linux-media,
	linux-s390, Daniel Vetter, Daniel Vetter, Jason Gunthorpe,
	Inki Dae, Joonyoung Shim, Seung-Woo Kim, Kyungmin Park,
	Kukjin Kim, Krzysztof Kozlowski, Andrew Morton, John Hubbard,
	Jérôme Glisse, Jan Kara, Dan Williams

The exynos g2d interface is very unusual, but it looks like the
userptr objects are persistent. Hence they need FOLL_LONGTERM.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Inki Dae <inki.dae@samsung.com>
Cc: Joonyoung Shim <jy0922.shim@samsung.com>
Cc: Seung-Woo Kim <sw0312.kim@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Kukjin Kim <kgene@kernel.org>
Cc: Krzysztof Kozlowski <krzk@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
---
 drivers/gpu/drm/exynos/exynos_drm_g2d.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/exynos/exynos_drm_g2d.c b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
index ecede41af9b9..1e0c5a7f206e 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_g2d.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
@@ -475,7 +475,8 @@ static dma_addr_t *g2d_userptr_get_dma_addr(struct g2d_data *g2d,
 		goto err_free;
 	}
 
-	ret = pin_user_pages_fast(start, npages, FOLL_FORCE | FOLL_WRITE,
+	ret = pin_user_pages_fast(start, npages,
+				  FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM,
 				  g2d_userptr->pages);
 	if (ret != npages) {
 		DRM_DEV_ERROR(g2d->dev,
-- 
2.28.0


^ permalink raw reply	[flat|nested] 214+ messages in thread

* [PATCH v2 02/17] drm/exynos: Use FOLL_LONGTERM for g2d cmdlists
@ 2020-10-09  7:59   ` Daniel Vetter
  0 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-09  7:59 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Joonyoung Shim, kvm,
	Jason Gunthorpe, Daniel Vetter, Seung-Woo Kim,
	Jérôme Glisse, Krzysztof Kozlowski, Inki Dae, linux-mm,
	Kyungmin Park, Kukjin Kim, John Hubbard, Daniel Vetter,
	Andrew Morton, Dan Williams, linux-arm-kernel, linux-media

The exynos g2d interface is very unusual, but it looks like the
userptr objects are persistent. Hence they need FOLL_LONGTERM.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Inki Dae <inki.dae@samsung.com>
Cc: Joonyoung Shim <jy0922.shim@samsung.com>
Cc: Seung-Woo Kim <sw0312.kim@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Kukjin Kim <kgene@kernel.org>
Cc: Krzysztof Kozlowski <krzk@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
---
 drivers/gpu/drm/exynos/exynos_drm_g2d.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/exynos/exynos_drm_g2d.c b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
index ecede41af9b9..1e0c5a7f206e 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_g2d.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
@@ -475,7 +475,8 @@ static dma_addr_t *g2d_userptr_get_dma_addr(struct g2d_data *g2d,
 		goto err_free;
 	}
 
-	ret = pin_user_pages_fast(start, npages, FOLL_FORCE | FOLL_WRITE,
+	ret = pin_user_pages_fast(start, npages,
+				  FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM,
 				  g2d_userptr->pages);
 	if (ret != npages) {
 		DRM_DEV_ERROR(g2d->dev,
-- 
2.28.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* [PATCH v2 02/17] drm/exynos: Use FOLL_LONGTERM for g2d cmdlists
@ 2020-10-09  7:59   ` Daniel Vetter
  0 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-09  7:59 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Joonyoung Shim, kvm,
	Jason Gunthorpe, Daniel Vetter, Seung-Woo Kim,
	Jérôme Glisse, Krzysztof Kozlowski, linux-mm,
	Kyungmin Park, Kukjin Kim, John Hubbard, Daniel Vetter,
	Andrew Morton, Dan Williams, linux-arm-kernel, linux-media

The exynos g2d interface is very unusual, but it looks like the
userptr objects are persistent. Hence they need FOLL_LONGTERM.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Inki Dae <inki.dae@samsung.com>
Cc: Joonyoung Shim <jy0922.shim@samsung.com>
Cc: Seung-Woo Kim <sw0312.kim@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Kukjin Kim <kgene@kernel.org>
Cc: Krzysztof Kozlowski <krzk@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
---
 drivers/gpu/drm/exynos/exynos_drm_g2d.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/exynos/exynos_drm_g2d.c b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
index ecede41af9b9..1e0c5a7f206e 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_g2d.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
@@ -475,7 +475,8 @@ static dma_addr_t *g2d_userptr_get_dma_addr(struct g2d_data *g2d,
 		goto err_free;
 	}
 
-	ret = pin_user_pages_fast(start, npages, FOLL_FORCE | FOLL_WRITE,
+	ret = pin_user_pages_fast(start, npages,
+				  FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM,
 				  g2d_userptr->pages);
 	if (ret != npages) {
 		DRM_DEV_ERROR(g2d->dev,
-- 
2.28.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* [PATCH v2 03/17] misc/habana: Stop using frame_vector helpers
  2020-10-09  7:59 ` Daniel Vetter
  (?)
@ 2020-10-09  7:59   ` Daniel Vetter
  -1 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-09  7:59 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: kvm, linux-mm, linux-arm-kernel, linux-samsung-soc, linux-media,
	linux-s390, Daniel Vetter, Daniel Vetter, Jason Gunthorpe,
	Andrew Morton, John Hubbard, Jérôme Glisse, Jan Kara,
	Dan Williams, Oded Gabbay, Omer Shpigelman, Ofir Bitton,
	Tomer Tayar, Moti Haimovski, Greg Kroah-Hartman, Pawel Piskorski

All we need are a pages array, pin_user_pages_fast can give us that
directly. Plus this avoids the entire raw pfn side of get_vaddr_frames.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Oded Gabbay <oded.gabbay@gmail.com>
Cc: Omer Shpigelman <oshpigelman@habana.ai>
Cc: Ofir Bitton <obitton@habana.ai>
Cc: Tomer Tayar <ttayar@habana.ai>
Cc: Moti Haimovski <mhaimovski@habana.ai>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Pawel Piskorski <ppiskorski@habana.ai>
--
v2: Use unpin_user_pages_dirty_lock (John)
---
 drivers/misc/habanalabs/Kconfig             |  1 -
 drivers/misc/habanalabs/common/habanalabs.h |  3 +-
 drivers/misc/habanalabs/common/memory.c     | 49 ++++++++-------------
 3 files changed, 20 insertions(+), 33 deletions(-)

diff --git a/drivers/misc/habanalabs/Kconfig b/drivers/misc/habanalabs/Kconfig
index 8eb5d38c618e..2f04187f7167 100644
--- a/drivers/misc/habanalabs/Kconfig
+++ b/drivers/misc/habanalabs/Kconfig
@@ -6,7 +6,6 @@
 config HABANA_AI
 	tristate "HabanaAI accelerators (habanalabs)"
 	depends on PCI && HAS_IOMEM
-	select FRAME_VECTOR
 	select DMA_SHARED_BUFFER
 	select GENERIC_ALLOCATOR
 	select HWMON
diff --git a/drivers/misc/habanalabs/common/habanalabs.h b/drivers/misc/habanalabs/common/habanalabs.h
index edbd627b29d2..c1b3ad613b15 100644
--- a/drivers/misc/habanalabs/common/habanalabs.h
+++ b/drivers/misc/habanalabs/common/habanalabs.h
@@ -881,7 +881,8 @@ struct hl_ctx_mgr {
 struct hl_userptr {
 	enum vm_type_t		vm_type; /* must be first */
 	struct list_head	job_node;
-	struct frame_vector	*vec;
+	struct page		**pages;
+	unsigned int		npages;
 	struct sg_table		*sgt;
 	enum dma_data_direction dir;
 	struct list_head	debugfs_list;
diff --git a/drivers/misc/habanalabs/common/memory.c b/drivers/misc/habanalabs/common/memory.c
index 5ff4688683fd..327b64479f97 100644
--- a/drivers/misc/habanalabs/common/memory.c
+++ b/drivers/misc/habanalabs/common/memory.c
@@ -1281,45 +1281,41 @@ static int get_user_memory(struct hl_device *hdev, u64 addr, u64 size,
 		return -EFAULT;
 	}
 
-	userptr->vec = frame_vector_create(npages);
-	if (!userptr->vec) {
+	userptr->pages = kvmalloc_array(npages, sizeof(*userptr->pages),
+					GFP_KERNEL);
+	if (!userptr->pages) {
 		dev_err(hdev->dev, "Failed to create frame vector\n");
 		return -ENOMEM;
 	}
 
-	rc = get_vaddr_frames(start, npages, FOLL_FORCE | FOLL_WRITE,
-				userptr->vec);
+	rc = pin_user_pages_fast(start, npages, FOLL_FORCE | FOLL_WRITE,
+				 userptr->pages);
 
 	if (rc != npages) {
 		dev_err(hdev->dev,
 			"Failed to map host memory, user ptr probably wrong\n");
 		if (rc < 0)
-			goto destroy_framevec;
+			goto destroy_pages;
+		npages = rc;
 		rc = -EFAULT;
-		goto put_framevec;
-	}
-
-	if (frame_vector_to_pages(userptr->vec) < 0) {
-		dev_err(hdev->dev,
-			"Failed to translate frame vector to pages\n");
-		rc = -EFAULT;
-		goto put_framevec;
+		goto put_pages;
 	}
+	userptr->npages = npages;
 
 	rc = sg_alloc_table_from_pages(userptr->sgt,
-					frame_vector_pages(userptr->vec),
-					npages, offset, size, GFP_ATOMIC);
+				       userptr->pages,
+				       npages, offset, size, GFP_ATOMIC);
 	if (rc < 0) {
 		dev_err(hdev->dev, "failed to create SG table from pages\n");
-		goto put_framevec;
+		goto put_pages;
 	}
 
 	return 0;
 
-put_framevec:
-	put_vaddr_frames(userptr->vec);
-destroy_framevec:
-	frame_vector_destroy(userptr->vec);
+put_pages:
+	unpin_user_pages(userptr->pages, npages);
+destroy_pages:
+	kvfree(userptr->pages);
 	return rc;
 }
 
@@ -1405,8 +1401,6 @@ int hl_pin_host_memory(struct hl_device *hdev, u64 addr, u64 size,
  */
 void hl_unpin_host_memory(struct hl_device *hdev, struct hl_userptr *userptr)
 {
-	struct page **pages;
-
 	hl_debugfs_remove_userptr(hdev, userptr);
 
 	if (userptr->dma_mapped)
@@ -1414,15 +1408,8 @@ void hl_unpin_host_memory(struct hl_device *hdev, struct hl_userptr *userptr)
 							userptr->sgt->nents,
 							userptr->dir);
 
-	pages = frame_vector_pages(userptr->vec);
-	if (!IS_ERR(pages)) {
-		int i;
-
-		for (i = 0; i < frame_vector_count(userptr->vec); i++)
-			set_page_dirty_lock(pages[i]);
-	}
-	put_vaddr_frames(userptr->vec);
-	frame_vector_destroy(userptr->vec);
+	unpin_user_pages_dirty_lock(userptr->pages, userptr->npages, true);
+	kvfree(userptr->pages);
 
 	list_del(&userptr->job_node);
 
-- 
2.28.0


^ permalink raw reply	[flat|nested] 214+ messages in thread

* [PATCH v2 03/17] misc/habana: Stop using frame_vector helpers
@ 2020-10-09  7:59   ` Daniel Vetter
  0 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-09  7:59 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: Oded Gabbay, linux-s390, linux-samsung-soc, Jan Kara, kvm,
	Jason Gunthorpe, Pawel Piskorski, Daniel Vetter,
	Greg Kroah-Hartman, Ofir Bitton, linux-mm,
	Jérôme Glisse, Tomer Tayar, Omer Shpigelman,
	John Hubbard, Daniel Vetter, Andrew Morton, Moti Haimovski,
	Dan Williams, linux-arm-kernel, linux-media

All we need are a pages array, pin_user_pages_fast can give us that
directly. Plus this avoids the entire raw pfn side of get_vaddr_frames.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Oded Gabbay <oded.gabbay@gmail.com>
Cc: Omer Shpigelman <oshpigelman@habana.ai>
Cc: Ofir Bitton <obitton@habana.ai>
Cc: Tomer Tayar <ttayar@habana.ai>
Cc: Moti Haimovski <mhaimovski@habana.ai>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Pawel Piskorski <ppiskorski@habana.ai>
--
v2: Use unpin_user_pages_dirty_lock (John)
---
 drivers/misc/habanalabs/Kconfig             |  1 -
 drivers/misc/habanalabs/common/habanalabs.h |  3 +-
 drivers/misc/habanalabs/common/memory.c     | 49 ++++++++-------------
 3 files changed, 20 insertions(+), 33 deletions(-)

diff --git a/drivers/misc/habanalabs/Kconfig b/drivers/misc/habanalabs/Kconfig
index 8eb5d38c618e..2f04187f7167 100644
--- a/drivers/misc/habanalabs/Kconfig
+++ b/drivers/misc/habanalabs/Kconfig
@@ -6,7 +6,6 @@
 config HABANA_AI
 	tristate "HabanaAI accelerators (habanalabs)"
 	depends on PCI && HAS_IOMEM
-	select FRAME_VECTOR
 	select DMA_SHARED_BUFFER
 	select GENERIC_ALLOCATOR
 	select HWMON
diff --git a/drivers/misc/habanalabs/common/habanalabs.h b/drivers/misc/habanalabs/common/habanalabs.h
index edbd627b29d2..c1b3ad613b15 100644
--- a/drivers/misc/habanalabs/common/habanalabs.h
+++ b/drivers/misc/habanalabs/common/habanalabs.h
@@ -881,7 +881,8 @@ struct hl_ctx_mgr {
 struct hl_userptr {
 	enum vm_type_t		vm_type; /* must be first */
 	struct list_head	job_node;
-	struct frame_vector	*vec;
+	struct page		**pages;
+	unsigned int		npages;
 	struct sg_table		*sgt;
 	enum dma_data_direction dir;
 	struct list_head	debugfs_list;
diff --git a/drivers/misc/habanalabs/common/memory.c b/drivers/misc/habanalabs/common/memory.c
index 5ff4688683fd..327b64479f97 100644
--- a/drivers/misc/habanalabs/common/memory.c
+++ b/drivers/misc/habanalabs/common/memory.c
@@ -1281,45 +1281,41 @@ static int get_user_memory(struct hl_device *hdev, u64 addr, u64 size,
 		return -EFAULT;
 	}
 
-	userptr->vec = frame_vector_create(npages);
-	if (!userptr->vec) {
+	userptr->pages = kvmalloc_array(npages, sizeof(*userptr->pages),
+					GFP_KERNEL);
+	if (!userptr->pages) {
 		dev_err(hdev->dev, "Failed to create frame vector\n");
 		return -ENOMEM;
 	}
 
-	rc = get_vaddr_frames(start, npages, FOLL_FORCE | FOLL_WRITE,
-				userptr->vec);
+	rc = pin_user_pages_fast(start, npages, FOLL_FORCE | FOLL_WRITE,
+				 userptr->pages);
 
 	if (rc != npages) {
 		dev_err(hdev->dev,
 			"Failed to map host memory, user ptr probably wrong\n");
 		if (rc < 0)
-			goto destroy_framevec;
+			goto destroy_pages;
+		npages = rc;
 		rc = -EFAULT;
-		goto put_framevec;
-	}
-
-	if (frame_vector_to_pages(userptr->vec) < 0) {
-		dev_err(hdev->dev,
-			"Failed to translate frame vector to pages\n");
-		rc = -EFAULT;
-		goto put_framevec;
+		goto put_pages;
 	}
+	userptr->npages = npages;
 
 	rc = sg_alloc_table_from_pages(userptr->sgt,
-					frame_vector_pages(userptr->vec),
-					npages, offset, size, GFP_ATOMIC);
+				       userptr->pages,
+				       npages, offset, size, GFP_ATOMIC);
 	if (rc < 0) {
 		dev_err(hdev->dev, "failed to create SG table from pages\n");
-		goto put_framevec;
+		goto put_pages;
 	}
 
 	return 0;
 
-put_framevec:
-	put_vaddr_frames(userptr->vec);
-destroy_framevec:
-	frame_vector_destroy(userptr->vec);
+put_pages:
+	unpin_user_pages(userptr->pages, npages);
+destroy_pages:
+	kvfree(userptr->pages);
 	return rc;
 }
 
@@ -1405,8 +1401,6 @@ int hl_pin_host_memory(struct hl_device *hdev, u64 addr, u64 size,
  */
 void hl_unpin_host_memory(struct hl_device *hdev, struct hl_userptr *userptr)
 {
-	struct page **pages;
-
 	hl_debugfs_remove_userptr(hdev, userptr);
 
 	if (userptr->dma_mapped)
@@ -1414,15 +1408,8 @@ void hl_unpin_host_memory(struct hl_device *hdev, struct hl_userptr *userptr)
 							userptr->sgt->nents,
 							userptr->dir);
 
-	pages = frame_vector_pages(userptr->vec);
-	if (!IS_ERR(pages)) {
-		int i;
-
-		for (i = 0; i < frame_vector_count(userptr->vec); i++)
-			set_page_dirty_lock(pages[i]);
-	}
-	put_vaddr_frames(userptr->vec);
-	frame_vector_destroy(userptr->vec);
+	unpin_user_pages_dirty_lock(userptr->pages, userptr->npages, true);
+	kvfree(userptr->pages);
 
 	list_del(&userptr->job_node);
 
-- 
2.28.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* [PATCH v2 03/17] misc/habana: Stop using frame_vector helpers
@ 2020-10-09  7:59   ` Daniel Vetter
  0 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-09  7:59 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: linux-s390, linux-samsung-soc, Jan Kara, kvm, Jason Gunthorpe,
	Pawel Piskorski, Daniel Vetter, Greg Kroah-Hartman, Ofir Bitton,
	linux-mm, Jérôme Glisse, Tomer Tayar, Omer Shpigelman,
	John Hubbard, Daniel Vetter, Andrew Morton, Moti Haimovski,
	Dan Williams, linux-arm-kernel, linux-media

All we need are a pages array, pin_user_pages_fast can give us that
directly. Plus this avoids the entire raw pfn side of get_vaddr_frames.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Oded Gabbay <oded.gabbay@gmail.com>
Cc: Omer Shpigelman <oshpigelman@habana.ai>
Cc: Ofir Bitton <obitton@habana.ai>
Cc: Tomer Tayar <ttayar@habana.ai>
Cc: Moti Haimovski <mhaimovski@habana.ai>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Pawel Piskorski <ppiskorski@habana.ai>
--
v2: Use unpin_user_pages_dirty_lock (John)
---
 drivers/misc/habanalabs/Kconfig             |  1 -
 drivers/misc/habanalabs/common/habanalabs.h |  3 +-
 drivers/misc/habanalabs/common/memory.c     | 49 ++++++++-------------
 3 files changed, 20 insertions(+), 33 deletions(-)

diff --git a/drivers/misc/habanalabs/Kconfig b/drivers/misc/habanalabs/Kconfig
index 8eb5d38c618e..2f04187f7167 100644
--- a/drivers/misc/habanalabs/Kconfig
+++ b/drivers/misc/habanalabs/Kconfig
@@ -6,7 +6,6 @@
 config HABANA_AI
 	tristate "HabanaAI accelerators (habanalabs)"
 	depends on PCI && HAS_IOMEM
-	select FRAME_VECTOR
 	select DMA_SHARED_BUFFER
 	select GENERIC_ALLOCATOR
 	select HWMON
diff --git a/drivers/misc/habanalabs/common/habanalabs.h b/drivers/misc/habanalabs/common/habanalabs.h
index edbd627b29d2..c1b3ad613b15 100644
--- a/drivers/misc/habanalabs/common/habanalabs.h
+++ b/drivers/misc/habanalabs/common/habanalabs.h
@@ -881,7 +881,8 @@ struct hl_ctx_mgr {
 struct hl_userptr {
 	enum vm_type_t		vm_type; /* must be first */
 	struct list_head	job_node;
-	struct frame_vector	*vec;
+	struct page		**pages;
+	unsigned int		npages;
 	struct sg_table		*sgt;
 	enum dma_data_direction dir;
 	struct list_head	debugfs_list;
diff --git a/drivers/misc/habanalabs/common/memory.c b/drivers/misc/habanalabs/common/memory.c
index 5ff4688683fd..327b64479f97 100644
--- a/drivers/misc/habanalabs/common/memory.c
+++ b/drivers/misc/habanalabs/common/memory.c
@@ -1281,45 +1281,41 @@ static int get_user_memory(struct hl_device *hdev, u64 addr, u64 size,
 		return -EFAULT;
 	}
 
-	userptr->vec = frame_vector_create(npages);
-	if (!userptr->vec) {
+	userptr->pages = kvmalloc_array(npages, sizeof(*userptr->pages),
+					GFP_KERNEL);
+	if (!userptr->pages) {
 		dev_err(hdev->dev, "Failed to create frame vector\n");
 		return -ENOMEM;
 	}
 
-	rc = get_vaddr_frames(start, npages, FOLL_FORCE | FOLL_WRITE,
-				userptr->vec);
+	rc = pin_user_pages_fast(start, npages, FOLL_FORCE | FOLL_WRITE,
+				 userptr->pages);
 
 	if (rc != npages) {
 		dev_err(hdev->dev,
 			"Failed to map host memory, user ptr probably wrong\n");
 		if (rc < 0)
-			goto destroy_framevec;
+			goto destroy_pages;
+		npages = rc;
 		rc = -EFAULT;
-		goto put_framevec;
-	}
-
-	if (frame_vector_to_pages(userptr->vec) < 0) {
-		dev_err(hdev->dev,
-			"Failed to translate frame vector to pages\n");
-		rc = -EFAULT;
-		goto put_framevec;
+		goto put_pages;
 	}
+	userptr->npages = npages;
 
 	rc = sg_alloc_table_from_pages(userptr->sgt,
-					frame_vector_pages(userptr->vec),
-					npages, offset, size, GFP_ATOMIC);
+				       userptr->pages,
+				       npages, offset, size, GFP_ATOMIC);
 	if (rc < 0) {
 		dev_err(hdev->dev, "failed to create SG table from pages\n");
-		goto put_framevec;
+		goto put_pages;
 	}
 
 	return 0;
 
-put_framevec:
-	put_vaddr_frames(userptr->vec);
-destroy_framevec:
-	frame_vector_destroy(userptr->vec);
+put_pages:
+	unpin_user_pages(userptr->pages, npages);
+destroy_pages:
+	kvfree(userptr->pages);
 	return rc;
 }
 
@@ -1405,8 +1401,6 @@ int hl_pin_host_memory(struct hl_device *hdev, u64 addr, u64 size,
  */
 void hl_unpin_host_memory(struct hl_device *hdev, struct hl_userptr *userptr)
 {
-	struct page **pages;
-
 	hl_debugfs_remove_userptr(hdev, userptr);
 
 	if (userptr->dma_mapped)
@@ -1414,15 +1408,8 @@ void hl_unpin_host_memory(struct hl_device *hdev, struct hl_userptr *userptr)
 							userptr->sgt->nents,
 							userptr->dir);
 
-	pages = frame_vector_pages(userptr->vec);
-	if (!IS_ERR(pages)) {
-		int i;
-
-		for (i = 0; i < frame_vector_count(userptr->vec); i++)
-			set_page_dirty_lock(pages[i]);
-	}
-	put_vaddr_frames(userptr->vec);
-	frame_vector_destroy(userptr->vec);
+	unpin_user_pages_dirty_lock(userptr->pages, userptr->npages, true);
+	kvfree(userptr->pages);
 
 	list_del(&userptr->job_node);
 
-- 
2.28.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* [PATCH v2 04/17] misc/habana: Use FOLL_LONGTERM for userptr
  2020-10-09  7:59 ` Daniel Vetter
  (?)
@ 2020-10-09  7:59   ` Daniel Vetter
  -1 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-09  7:59 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: kvm, linux-mm, linux-arm-kernel, linux-samsung-soc, linux-media,
	linux-s390, Daniel Vetter, Daniel Vetter, Jason Gunthorpe,
	Andrew Morton, John Hubbard, Jérôme Glisse, Jan Kara,
	Dan Williams, Oded Gabbay, Omer Shpigelman, Ofir Bitton,
	Tomer Tayar, Moti Haimovski, Greg Kroah-Hartman, Pawel Piskorski

These are persistent, not just for the duration of a dma operation.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Oded Gabbay <oded.gabbay@gmail.com>
Cc: Omer Shpigelman <oshpigelman@habana.ai>
Cc: Ofir Bitton <obitton@habana.ai>
Cc: Tomer Tayar <ttayar@habana.ai>
Cc: Moti Haimovski <mhaimovski@habana.ai>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Pawel Piskorski <ppiskorski@habana.ai>
---
 drivers/misc/habanalabs/common/memory.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/misc/habanalabs/common/memory.c b/drivers/misc/habanalabs/common/memory.c
index 327b64479f97..767d3644c033 100644
--- a/drivers/misc/habanalabs/common/memory.c
+++ b/drivers/misc/habanalabs/common/memory.c
@@ -1288,7 +1288,8 @@ static int get_user_memory(struct hl_device *hdev, u64 addr, u64 size,
 		return -ENOMEM;
 	}
 
-	rc = pin_user_pages_fast(start, npages, FOLL_FORCE | FOLL_WRITE,
+	rc = pin_user_pages_fast(start, npages,
+				 FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM,
 				 userptr->pages);
 
 	if (rc != npages) {
-- 
2.28.0


^ permalink raw reply	[flat|nested] 214+ messages in thread

* [PATCH v2 04/17] misc/habana: Use FOLL_LONGTERM for userptr
@ 2020-10-09  7:59   ` Daniel Vetter
  0 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-09  7:59 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: Oded Gabbay, linux-s390, linux-samsung-soc, Jan Kara, kvm,
	Jason Gunthorpe, Pawel Piskorski, Daniel Vetter,
	Greg Kroah-Hartman, Ofir Bitton, linux-mm,
	Jérôme Glisse, Tomer Tayar, Omer Shpigelman,
	John Hubbard, Daniel Vetter, Andrew Morton, Moti Haimovski,
	Dan Williams, linux-arm-kernel, linux-media

These are persistent, not just for the duration of a dma operation.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Oded Gabbay <oded.gabbay@gmail.com>
Cc: Omer Shpigelman <oshpigelman@habana.ai>
Cc: Ofir Bitton <obitton@habana.ai>
Cc: Tomer Tayar <ttayar@habana.ai>
Cc: Moti Haimovski <mhaimovski@habana.ai>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Pawel Piskorski <ppiskorski@habana.ai>
---
 drivers/misc/habanalabs/common/memory.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/misc/habanalabs/common/memory.c b/drivers/misc/habanalabs/common/memory.c
index 327b64479f97..767d3644c033 100644
--- a/drivers/misc/habanalabs/common/memory.c
+++ b/drivers/misc/habanalabs/common/memory.c
@@ -1288,7 +1288,8 @@ static int get_user_memory(struct hl_device *hdev, u64 addr, u64 size,
 		return -ENOMEM;
 	}
 
-	rc = pin_user_pages_fast(start, npages, FOLL_FORCE | FOLL_WRITE,
+	rc = pin_user_pages_fast(start, npages,
+				 FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM,
 				 userptr->pages);
 
 	if (rc != npages) {
-- 
2.28.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* [PATCH v2 04/17] misc/habana: Use FOLL_LONGTERM for userptr
@ 2020-10-09  7:59   ` Daniel Vetter
  0 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-09  7:59 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: linux-s390, linux-samsung-soc, Jan Kara, kvm, Jason Gunthorpe,
	Pawel Piskorski, Daniel Vetter, Greg Kroah-Hartman, Ofir Bitton,
	linux-mm, Jérôme Glisse, Tomer Tayar, Omer Shpigelman,
	John Hubbard, Daniel Vetter, Andrew Morton, Moti Haimovski,
	Dan Williams, linux-arm-kernel, linux-media

These are persistent, not just for the duration of a dma operation.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Oded Gabbay <oded.gabbay@gmail.com>
Cc: Omer Shpigelman <oshpigelman@habana.ai>
Cc: Ofir Bitton <obitton@habana.ai>
Cc: Tomer Tayar <ttayar@habana.ai>
Cc: Moti Haimovski <mhaimovski@habana.ai>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Pawel Piskorski <ppiskorski@habana.ai>
---
 drivers/misc/habanalabs/common/memory.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/misc/habanalabs/common/memory.c b/drivers/misc/habanalabs/common/memory.c
index 327b64479f97..767d3644c033 100644
--- a/drivers/misc/habanalabs/common/memory.c
+++ b/drivers/misc/habanalabs/common/memory.c
@@ -1288,7 +1288,8 @@ static int get_user_memory(struct hl_device *hdev, u64 addr, u64 size,
 		return -ENOMEM;
 	}
 
-	rc = pin_user_pages_fast(start, npages, FOLL_FORCE | FOLL_WRITE,
+	rc = pin_user_pages_fast(start, npages,
+				 FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM,
 				 userptr->pages);
 
 	if (rc != npages) {
-- 
2.28.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* [PATCH v2 05/17] mm/frame-vector: Use FOLL_LONGTERM
  2020-10-09  7:59 ` Daniel Vetter
  (?)
@ 2020-10-09  7:59   ` Daniel Vetter
  -1 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-09  7:59 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: kvm, linux-mm, linux-arm-kernel, linux-samsung-soc, linux-media,
	linux-s390, Daniel Vetter, Daniel Vetter, Jason Gunthorpe,
	Pawel Osciak, Marek Szyprowski, Kyungmin Park, Tomasz Figa,
	Mauro Carvalho Chehab, Andrew Morton, John Hubbard,
	Jérôme Glisse, Jan Kara, Dan Williams

This is used by media/videbuf2 for persistent dma mappings, not just
for a single dma operation and then freed again, so needs
FOLL_LONGTERM.

Unfortunately current pup_locked doesn't support FOLL_LONGTERM due to
locking issues. Rework the code to pull the pup path out from the
mmap_sem critical section as suggested by Jason.

By relying entirely on the vma checks in pin_user_pages and follow_pfn
(for vm_flags and vma_is_fsdax) we can also streamline the code a lot.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Pawel Osciak <pawel@osciak.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Tomasz Figa <tfiga@chromium.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
--
v2: Streamline the code and further simplify the loop checks (Jason)
---
 mm/frame_vector.c | 50 ++++++++++++++---------------------------------
 1 file changed, 15 insertions(+), 35 deletions(-)

diff --git a/mm/frame_vector.c b/mm/frame_vector.c
index 10f82d5643b6..d44779e56313 100644
--- a/mm/frame_vector.c
+++ b/mm/frame_vector.c
@@ -38,7 +38,6 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
 	struct vm_area_struct *vma;
 	int ret = 0;
 	int err;
-	int locked;
 
 	if (nr_frames == 0)
 		return 0;
@@ -48,40 +47,25 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
 
 	start = untagged_addr(start);
 
-	mmap_read_lock(mm);
-	locked = 1;
-	vma = find_vma_intersection(mm, start, start + 1);
-	if (!vma) {
-		ret = -EFAULT;
-		goto out;
-	}
-
-	/*
-	 * While get_vaddr_frames() could be used for transient (kernel
-	 * controlled lifetime) pinning of memory pages all current
-	 * users establish long term (userspace controlled lifetime)
-	 * page pinning. Treat get_vaddr_frames() like
-	 * get_user_pages_longterm() and disallow it for filesystem-dax
-	 * mappings.
-	 */
-	if (vma_is_fsdax(vma)) {
-		ret = -EOPNOTSUPP;
-		goto out;
-	}
-
-	if (!(vma->vm_flags & (VM_IO | VM_PFNMAP))) {
+	ret = pin_user_pages_fast(start, nr_frames,
+				  FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM,
+				  (struct page **)(vec->ptrs));
+	if (ret > 0) {
 		vec->got_ref = true;
 		vec->is_pfns = false;
-		ret = pin_user_pages_locked(start, nr_frames,
-			gup_flags, (struct page **)(vec->ptrs), &locked);
-		goto out;
+		goto out_unlocked;
 	}
 
+	mmap_read_lock(mm);
 	vec->got_ref = false;
 	vec->is_pfns = true;
 	do {
 		unsigned long *nums = frame_vector_pfns(vec);
 
+		vma = find_vma_intersection(mm, start, start + 1);
+		if (!vma)
+			break;
+
 		while (ret < nr_frames && start + PAGE_SIZE <= vma->vm_end) {
 			err = follow_pfn(vma, start, &nums[ret]);
 			if (err) {
@@ -92,17 +76,13 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
 			start += PAGE_SIZE;
 			ret++;
 		}
-		/*
-		 * We stop if we have enough pages or if VMA doesn't completely
-		 * cover the tail page.
-		 */
-		if (ret >= nr_frames || start < vma->vm_end)
+		/* Bail out if VMA doesn't completely cover the tail page. */
+		if (start < vma->vm_end)
 			break;
-		vma = find_vma_intersection(mm, start, start + 1);
-	} while (vma && vma->vm_flags & (VM_IO | VM_PFNMAP));
+	} while (ret < nr_frames);
 out:
-	if (locked)
-		mmap_read_unlock(mm);
+	mmap_read_unlock(mm);
+out_unlocked:
 	if (!ret)
 		ret = -EFAULT;
 	if (ret > 0)
-- 
2.28.0


^ permalink raw reply	[flat|nested] 214+ messages in thread

* [PATCH v2 05/17] mm/frame-vector: Use FOLL_LONGTERM
@ 2020-10-09  7:59   ` Daniel Vetter
  0 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-09  7:59 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Pawel Osciak, kvm,
	Jason Gunthorpe, Daniel Vetter, Mauro Carvalho Chehab,
	Jérôme Glisse, Tomasz Figa, linux-mm, Kyungmin Park,
	John Hubbard, Daniel Vetter, Andrew Morton, Marek Szyprowski,
	Dan Williams, linux-arm-kernel, linux-media

This is used by media/videbuf2 for persistent dma mappings, not just
for a single dma operation and then freed again, so needs
FOLL_LONGTERM.

Unfortunately current pup_locked doesn't support FOLL_LONGTERM due to
locking issues. Rework the code to pull the pup path out from the
mmap_sem critical section as suggested by Jason.

By relying entirely on the vma checks in pin_user_pages and follow_pfn
(for vm_flags and vma_is_fsdax) we can also streamline the code a lot.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Pawel Osciak <pawel@osciak.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Tomasz Figa <tfiga@chromium.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
--
v2: Streamline the code and further simplify the loop checks (Jason)
---
 mm/frame_vector.c | 50 ++++++++++++++---------------------------------
 1 file changed, 15 insertions(+), 35 deletions(-)

diff --git a/mm/frame_vector.c b/mm/frame_vector.c
index 10f82d5643b6..d44779e56313 100644
--- a/mm/frame_vector.c
+++ b/mm/frame_vector.c
@@ -38,7 +38,6 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
 	struct vm_area_struct *vma;
 	int ret = 0;
 	int err;
-	int locked;
 
 	if (nr_frames == 0)
 		return 0;
@@ -48,40 +47,25 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
 
 	start = untagged_addr(start);
 
-	mmap_read_lock(mm);
-	locked = 1;
-	vma = find_vma_intersection(mm, start, start + 1);
-	if (!vma) {
-		ret = -EFAULT;
-		goto out;
-	}
-
-	/*
-	 * While get_vaddr_frames() could be used for transient (kernel
-	 * controlled lifetime) pinning of memory pages all current
-	 * users establish long term (userspace controlled lifetime)
-	 * page pinning. Treat get_vaddr_frames() like
-	 * get_user_pages_longterm() and disallow it for filesystem-dax
-	 * mappings.
-	 */
-	if (vma_is_fsdax(vma)) {
-		ret = -EOPNOTSUPP;
-		goto out;
-	}
-
-	if (!(vma->vm_flags & (VM_IO | VM_PFNMAP))) {
+	ret = pin_user_pages_fast(start, nr_frames,
+				  FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM,
+				  (struct page **)(vec->ptrs));
+	if (ret > 0) {
 		vec->got_ref = true;
 		vec->is_pfns = false;
-		ret = pin_user_pages_locked(start, nr_frames,
-			gup_flags, (struct page **)(vec->ptrs), &locked);
-		goto out;
+		goto out_unlocked;
 	}
 
+	mmap_read_lock(mm);
 	vec->got_ref = false;
 	vec->is_pfns = true;
 	do {
 		unsigned long *nums = frame_vector_pfns(vec);
 
+		vma = find_vma_intersection(mm, start, start + 1);
+		if (!vma)
+			break;
+
 		while (ret < nr_frames && start + PAGE_SIZE <= vma->vm_end) {
 			err = follow_pfn(vma, start, &nums[ret]);
 			if (err) {
@@ -92,17 +76,13 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
 			start += PAGE_SIZE;
 			ret++;
 		}
-		/*
-		 * We stop if we have enough pages or if VMA doesn't completely
-		 * cover the tail page.
-		 */
-		if (ret >= nr_frames || start < vma->vm_end)
+		/* Bail out if VMA doesn't completely cover the tail page. */
+		if (start < vma->vm_end)
 			break;
-		vma = find_vma_intersection(mm, start, start + 1);
-	} while (vma && vma->vm_flags & (VM_IO | VM_PFNMAP));
+	} while (ret < nr_frames);
 out:
-	if (locked)
-		mmap_read_unlock(mm);
+	mmap_read_unlock(mm);
+out_unlocked:
 	if (!ret)
 		ret = -EFAULT;
 	if (ret > 0)
-- 
2.28.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* [PATCH v2 05/17] mm/frame-vector: Use FOLL_LONGTERM
@ 2020-10-09  7:59   ` Daniel Vetter
  0 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-09  7:59 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Pawel Osciak, kvm,
	Jason Gunthorpe, Daniel Vetter, Mauro Carvalho Chehab,
	Jérôme Glisse, Tomasz Figa, linux-mm, Kyungmin Park,
	John Hubbard, Daniel Vetter, Andrew Morton, Marek Szyprowski,
	Dan Williams, linux-arm-kernel, linux-media

This is used by media/videbuf2 for persistent dma mappings, not just
for a single dma operation and then freed again, so needs
FOLL_LONGTERM.

Unfortunately current pup_locked doesn't support FOLL_LONGTERM due to
locking issues. Rework the code to pull the pup path out from the
mmap_sem critical section as suggested by Jason.

By relying entirely on the vma checks in pin_user_pages and follow_pfn
(for vm_flags and vma_is_fsdax) we can also streamline the code a lot.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Pawel Osciak <pawel@osciak.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Tomasz Figa <tfiga@chromium.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
--
v2: Streamline the code and further simplify the loop checks (Jason)
---
 mm/frame_vector.c | 50 ++++++++++++++---------------------------------
 1 file changed, 15 insertions(+), 35 deletions(-)

diff --git a/mm/frame_vector.c b/mm/frame_vector.c
index 10f82d5643b6..d44779e56313 100644
--- a/mm/frame_vector.c
+++ b/mm/frame_vector.c
@@ -38,7 +38,6 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
 	struct vm_area_struct *vma;
 	int ret = 0;
 	int err;
-	int locked;
 
 	if (nr_frames == 0)
 		return 0;
@@ -48,40 +47,25 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
 
 	start = untagged_addr(start);
 
-	mmap_read_lock(mm);
-	locked = 1;
-	vma = find_vma_intersection(mm, start, start + 1);
-	if (!vma) {
-		ret = -EFAULT;
-		goto out;
-	}
-
-	/*
-	 * While get_vaddr_frames() could be used for transient (kernel
-	 * controlled lifetime) pinning of memory pages all current
-	 * users establish long term (userspace controlled lifetime)
-	 * page pinning. Treat get_vaddr_frames() like
-	 * get_user_pages_longterm() and disallow it for filesystem-dax
-	 * mappings.
-	 */
-	if (vma_is_fsdax(vma)) {
-		ret = -EOPNOTSUPP;
-		goto out;
-	}
-
-	if (!(vma->vm_flags & (VM_IO | VM_PFNMAP))) {
+	ret = pin_user_pages_fast(start, nr_frames,
+				  FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM,
+				  (struct page **)(vec->ptrs));
+	if (ret > 0) {
 		vec->got_ref = true;
 		vec->is_pfns = false;
-		ret = pin_user_pages_locked(start, nr_frames,
-			gup_flags, (struct page **)(vec->ptrs), &locked);
-		goto out;
+		goto out_unlocked;
 	}
 
+	mmap_read_lock(mm);
 	vec->got_ref = false;
 	vec->is_pfns = true;
 	do {
 		unsigned long *nums = frame_vector_pfns(vec);
 
+		vma = find_vma_intersection(mm, start, start + 1);
+		if (!vma)
+			break;
+
 		while (ret < nr_frames && start + PAGE_SIZE <= vma->vm_end) {
 			err = follow_pfn(vma, start, &nums[ret]);
 			if (err) {
@@ -92,17 +76,13 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
 			start += PAGE_SIZE;
 			ret++;
 		}
-		/*
-		 * We stop if we have enough pages or if VMA doesn't completely
-		 * cover the tail page.
-		 */
-		if (ret >= nr_frames || start < vma->vm_end)
+		/* Bail out if VMA doesn't completely cover the tail page. */
+		if (start < vma->vm_end)
 			break;
-		vma = find_vma_intersection(mm, start, start + 1);
-	} while (vma && vma->vm_flags & (VM_IO | VM_PFNMAP));
+	} while (ret < nr_frames);
 out:
-	if (locked)
-		mmap_read_unlock(mm);
+	mmap_read_unlock(mm);
+out_unlocked:
 	if (!ret)
 		ret = -EFAULT;
 	if (ret > 0)
-- 
2.28.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* [PATCH v2 06/17] media: videobuf2: Move frame_vector into media subsystem
  2020-10-09  7:59 ` Daniel Vetter
  (?)
@ 2020-10-09  7:59   ` Daniel Vetter
  -1 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-09  7:59 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: kvm, linux-mm, linux-arm-kernel, linux-samsung-soc, linux-media,
	linux-s390, Daniel Vetter, John Hubbard, Daniel Vetter,
	Jason Gunthorpe, Pawel Osciak, Marek Szyprowski, Kyungmin Park,
	Tomasz Figa, Mauro Carvalho Chehab, Andrew Morton,
	Jérôme Glisse, Jan Kara, Dan Williams

It's the only user. This also garbage collects the CONFIG_FRAME_VECTOR
symbol from all over the tree (well just one place, somehow omap media
driver still had this in its Kconfig, despite not using it).

Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Pawel Osciak <pawel@osciak.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Tomasz Figa <tfiga@chromium.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/media/common/videobuf2/Kconfig        |  1 -
 drivers/media/common/videobuf2/Makefile       |  1 +
 .../media/common/videobuf2}/frame_vector.c    |  2 +
 drivers/media/platform/omap/Kconfig           |  1 -
 include/linux/mm.h                            | 42 -------------------
 include/media/videobuf2-core.h                | 42 +++++++++++++++++++
 mm/Kconfig                                    |  3 --
 mm/Makefile                                   |  1 -
 8 files changed, 45 insertions(+), 48 deletions(-)
 rename {mm => drivers/media/common/videobuf2}/frame_vector.c (99%)

diff --git a/drivers/media/common/videobuf2/Kconfig b/drivers/media/common/videobuf2/Kconfig
index edbc99ebba87..d2223a12c95f 100644
--- a/drivers/media/common/videobuf2/Kconfig
+++ b/drivers/media/common/videobuf2/Kconfig
@@ -9,7 +9,6 @@ config VIDEOBUF2_V4L2
 
 config VIDEOBUF2_MEMOPS
 	tristate
-	select FRAME_VECTOR
 
 config VIDEOBUF2_DMA_CONTIG
 	tristate
diff --git a/drivers/media/common/videobuf2/Makefile b/drivers/media/common/videobuf2/Makefile
index 77bebe8b202f..54306f8d096c 100644
--- a/drivers/media/common/videobuf2/Makefile
+++ b/drivers/media/common/videobuf2/Makefile
@@ -1,5 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0
 videobuf2-common-objs := videobuf2-core.o
+videobuf2-common-objs += frame_vector.o
 
 ifeq ($(CONFIG_TRACEPOINTS),y)
   videobuf2-common-objs += vb2-trace.o
diff --git a/mm/frame_vector.c b/drivers/media/common/videobuf2/frame_vector.c
similarity index 99%
rename from mm/frame_vector.c
rename to drivers/media/common/videobuf2/frame_vector.c
index d44779e56313..2b0b97761d15 100644
--- a/mm/frame_vector.c
+++ b/drivers/media/common/videobuf2/frame_vector.c
@@ -8,6 +8,8 @@
 #include <linux/pagemap.h>
 #include <linux/sched.h>
 
+#include <media/videobuf2-core.h>
+
 /**
  * get_vaddr_frames() - map virtual addresses to pfns
  * @start:	starting user address
diff --git a/drivers/media/platform/omap/Kconfig b/drivers/media/platform/omap/Kconfig
index f73b5893220d..de16de46c0f4 100644
--- a/drivers/media/platform/omap/Kconfig
+++ b/drivers/media/platform/omap/Kconfig
@@ -12,6 +12,5 @@ config VIDEO_OMAP2_VOUT
 	depends on VIDEO_V4L2
 	select VIDEOBUF2_DMA_CONTIG
 	select OMAP2_VRFB if ARCH_OMAP2 || ARCH_OMAP3
-	select FRAME_VECTOR
 	help
 	  V4L2 Display driver support for OMAP2/3 based boards.
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 16b799a0522c..acd60fbf1a5a 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1743,48 +1743,6 @@ int account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc);
 int __account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc,
 			struct task_struct *task, bool bypass_rlim);
 
-/* Container for pinned pfns / pages */
-struct frame_vector {
-	unsigned int nr_allocated;	/* Number of frames we have space for */
-	unsigned int nr_frames;	/* Number of frames stored in ptrs array */
-	bool got_ref;		/* Did we pin pages by getting page ref? */
-	bool is_pfns;		/* Does array contain pages or pfns? */
-	void *ptrs[];		/* Array of pinned pfns / pages. Use
-				 * pfns_vector_pages() or pfns_vector_pfns()
-				 * for access */
-};
-
-struct frame_vector *frame_vector_create(unsigned int nr_frames);
-void frame_vector_destroy(struct frame_vector *vec);
-int get_vaddr_frames(unsigned long start, unsigned int nr_pfns,
-		     unsigned int gup_flags, struct frame_vector *vec);
-void put_vaddr_frames(struct frame_vector *vec);
-int frame_vector_to_pages(struct frame_vector *vec);
-void frame_vector_to_pfns(struct frame_vector *vec);
-
-static inline unsigned int frame_vector_count(struct frame_vector *vec)
-{
-	return vec->nr_frames;
-}
-
-static inline struct page **frame_vector_pages(struct frame_vector *vec)
-{
-	if (vec->is_pfns) {
-		int err = frame_vector_to_pages(vec);
-
-		if (err)
-			return ERR_PTR(err);
-	}
-	return (struct page **)(vec->ptrs);
-}
-
-static inline unsigned long *frame_vector_pfns(struct frame_vector *vec)
-{
-	if (!vec->is_pfns)
-		frame_vector_to_pfns(vec);
-	return (unsigned long *)(vec->ptrs);
-}
-
 struct kvec;
 int get_kernel_pages(const struct kvec *iov, int nr_pages, int write,
 			struct page **pages);
diff --git a/include/media/videobuf2-core.h b/include/media/videobuf2-core.h
index bbb3f26fbde9..a2e75ca0334f 100644
--- a/include/media/videobuf2-core.h
+++ b/include/media/videobuf2-core.h
@@ -1254,4 +1254,46 @@ bool vb2_request_object_is_buffer(struct media_request_object *obj);
  */
 unsigned int vb2_request_buffer_cnt(struct media_request *req);
 
+/* Container for pinned pfns / pages in frame_vector.c */
+struct frame_vector {
+	unsigned int nr_allocated;	/* Number of frames we have space for */
+	unsigned int nr_frames;	/* Number of frames stored in ptrs array */
+	bool got_ref;		/* Did we pin pages by getting page ref? */
+	bool is_pfns;		/* Does array contain pages or pfns? */
+	void *ptrs[];		/* Array of pinned pfns / pages. Use
+				 * pfns_vector_pages() or pfns_vector_pfns()
+				 * for access */
+};
+
+struct frame_vector *frame_vector_create(unsigned int nr_frames);
+void frame_vector_destroy(struct frame_vector *vec);
+int get_vaddr_frames(unsigned long start, unsigned int nr_pfns,
+		     unsigned int gup_flags, struct frame_vector *vec);
+void put_vaddr_frames(struct frame_vector *vec);
+int frame_vector_to_pages(struct frame_vector *vec);
+void frame_vector_to_pfns(struct frame_vector *vec);
+
+static inline unsigned int frame_vector_count(struct frame_vector *vec)
+{
+	return vec->nr_frames;
+}
+
+static inline struct page **frame_vector_pages(struct frame_vector *vec)
+{
+	if (vec->is_pfns) {
+		int err = frame_vector_to_pages(vec);
+
+		if (err)
+			return ERR_PTR(err);
+	}
+	return (struct page **)(vec->ptrs);
+}
+
+static inline unsigned long *frame_vector_pfns(struct frame_vector *vec)
+{
+	if (!vec->is_pfns)
+		frame_vector_to_pfns(vec);
+	return (unsigned long *)(vec->ptrs);
+}
+
 #endif /* _MEDIA_VIDEOBUF2_CORE_H */
diff --git a/mm/Kconfig b/mm/Kconfig
index 6c974888f86f..da6c943fe9f1 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -815,9 +815,6 @@ config DEVICE_PRIVATE
 	  memory; i.e., memory that is only accessible from the device (or
 	  group of devices). You likely also want to select HMM_MIRROR.
 
-config FRAME_VECTOR
-	bool
-
 config ARCH_USES_HIGH_VMA_FLAGS
 	bool
 config ARCH_HAS_PKEYS
diff --git a/mm/Makefile b/mm/Makefile
index d5649f1c12c0..a025fd6c6afd 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -111,7 +111,6 @@ obj-$(CONFIG_PAGE_EXTENSION) += page_ext.o
 obj-$(CONFIG_CMA_DEBUGFS) += cma_debug.o
 obj-$(CONFIG_USERFAULTFD) += userfaultfd.o
 obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o
-obj-$(CONFIG_FRAME_VECTOR) += frame_vector.o
 obj-$(CONFIG_DEBUG_PAGE_REF) += debug_page_ref.o
 obj-$(CONFIG_HARDENED_USERCOPY) += usercopy.o
 obj-$(CONFIG_PERCPU_STATS) += percpu-stats.o
-- 
2.28.0


^ permalink raw reply	[flat|nested] 214+ messages in thread

* [PATCH v2 06/17] media: videobuf2: Move frame_vector into media subsystem
@ 2020-10-09  7:59   ` Daniel Vetter
  0 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-09  7:59 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Pawel Osciak, kvm,
	Jason Gunthorpe, Daniel Vetter, Mauro Carvalho Chehab,
	Jérôme Glisse, Tomasz Figa, linux-mm, Kyungmin Park,
	John Hubbard, Daniel Vetter, Andrew Morton, Marek Szyprowski,
	Dan Williams, linux-arm-kernel, linux-media

It's the only user. This also garbage collects the CONFIG_FRAME_VECTOR
symbol from all over the tree (well just one place, somehow omap media
driver still had this in its Kconfig, despite not using it).

Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Pawel Osciak <pawel@osciak.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Tomasz Figa <tfiga@chromium.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/media/common/videobuf2/Kconfig        |  1 -
 drivers/media/common/videobuf2/Makefile       |  1 +
 .../media/common/videobuf2}/frame_vector.c    |  2 +
 drivers/media/platform/omap/Kconfig           |  1 -
 include/linux/mm.h                            | 42 -------------------
 include/media/videobuf2-core.h                | 42 +++++++++++++++++++
 mm/Kconfig                                    |  3 --
 mm/Makefile                                   |  1 -
 8 files changed, 45 insertions(+), 48 deletions(-)
 rename {mm => drivers/media/common/videobuf2}/frame_vector.c (99%)

diff --git a/drivers/media/common/videobuf2/Kconfig b/drivers/media/common/videobuf2/Kconfig
index edbc99ebba87..d2223a12c95f 100644
--- a/drivers/media/common/videobuf2/Kconfig
+++ b/drivers/media/common/videobuf2/Kconfig
@@ -9,7 +9,6 @@ config VIDEOBUF2_V4L2
 
 config VIDEOBUF2_MEMOPS
 	tristate
-	select FRAME_VECTOR
 
 config VIDEOBUF2_DMA_CONTIG
 	tristate
diff --git a/drivers/media/common/videobuf2/Makefile b/drivers/media/common/videobuf2/Makefile
index 77bebe8b202f..54306f8d096c 100644
--- a/drivers/media/common/videobuf2/Makefile
+++ b/drivers/media/common/videobuf2/Makefile
@@ -1,5 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0
 videobuf2-common-objs := videobuf2-core.o
+videobuf2-common-objs += frame_vector.o
 
 ifeq ($(CONFIG_TRACEPOINTS),y)
   videobuf2-common-objs += vb2-trace.o
diff --git a/mm/frame_vector.c b/drivers/media/common/videobuf2/frame_vector.c
similarity index 99%
rename from mm/frame_vector.c
rename to drivers/media/common/videobuf2/frame_vector.c
index d44779e56313..2b0b97761d15 100644
--- a/mm/frame_vector.c
+++ b/drivers/media/common/videobuf2/frame_vector.c
@@ -8,6 +8,8 @@
 #include <linux/pagemap.h>
 #include <linux/sched.h>
 
+#include <media/videobuf2-core.h>
+
 /**
  * get_vaddr_frames() - map virtual addresses to pfns
  * @start:	starting user address
diff --git a/drivers/media/platform/omap/Kconfig b/drivers/media/platform/omap/Kconfig
index f73b5893220d..de16de46c0f4 100644
--- a/drivers/media/platform/omap/Kconfig
+++ b/drivers/media/platform/omap/Kconfig
@@ -12,6 +12,5 @@ config VIDEO_OMAP2_VOUT
 	depends on VIDEO_V4L2
 	select VIDEOBUF2_DMA_CONTIG
 	select OMAP2_VRFB if ARCH_OMAP2 || ARCH_OMAP3
-	select FRAME_VECTOR
 	help
 	  V4L2 Display driver support for OMAP2/3 based boards.
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 16b799a0522c..acd60fbf1a5a 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1743,48 +1743,6 @@ int account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc);
 int __account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc,
 			struct task_struct *task, bool bypass_rlim);
 
-/* Container for pinned pfns / pages */
-struct frame_vector {
-	unsigned int nr_allocated;	/* Number of frames we have space for */
-	unsigned int nr_frames;	/* Number of frames stored in ptrs array */
-	bool got_ref;		/* Did we pin pages by getting page ref? */
-	bool is_pfns;		/* Does array contain pages or pfns? */
-	void *ptrs[];		/* Array of pinned pfns / pages. Use
-				 * pfns_vector_pages() or pfns_vector_pfns()
-				 * for access */
-};
-
-struct frame_vector *frame_vector_create(unsigned int nr_frames);
-void frame_vector_destroy(struct frame_vector *vec);
-int get_vaddr_frames(unsigned long start, unsigned int nr_pfns,
-		     unsigned int gup_flags, struct frame_vector *vec);
-void put_vaddr_frames(struct frame_vector *vec);
-int frame_vector_to_pages(struct frame_vector *vec);
-void frame_vector_to_pfns(struct frame_vector *vec);
-
-static inline unsigned int frame_vector_count(struct frame_vector *vec)
-{
-	return vec->nr_frames;
-}
-
-static inline struct page **frame_vector_pages(struct frame_vector *vec)
-{
-	if (vec->is_pfns) {
-		int err = frame_vector_to_pages(vec);
-
-		if (err)
-			return ERR_PTR(err);
-	}
-	return (struct page **)(vec->ptrs);
-}
-
-static inline unsigned long *frame_vector_pfns(struct frame_vector *vec)
-{
-	if (!vec->is_pfns)
-		frame_vector_to_pfns(vec);
-	return (unsigned long *)(vec->ptrs);
-}
-
 struct kvec;
 int get_kernel_pages(const struct kvec *iov, int nr_pages, int write,
 			struct page **pages);
diff --git a/include/media/videobuf2-core.h b/include/media/videobuf2-core.h
index bbb3f26fbde9..a2e75ca0334f 100644
--- a/include/media/videobuf2-core.h
+++ b/include/media/videobuf2-core.h
@@ -1254,4 +1254,46 @@ bool vb2_request_object_is_buffer(struct media_request_object *obj);
  */
 unsigned int vb2_request_buffer_cnt(struct media_request *req);
 
+/* Container for pinned pfns / pages in frame_vector.c */
+struct frame_vector {
+	unsigned int nr_allocated;	/* Number of frames we have space for */
+	unsigned int nr_frames;	/* Number of frames stored in ptrs array */
+	bool got_ref;		/* Did we pin pages by getting page ref? */
+	bool is_pfns;		/* Does array contain pages or pfns? */
+	void *ptrs[];		/* Array of pinned pfns / pages. Use
+				 * pfns_vector_pages() or pfns_vector_pfns()
+				 * for access */
+};
+
+struct frame_vector *frame_vector_create(unsigned int nr_frames);
+void frame_vector_destroy(struct frame_vector *vec);
+int get_vaddr_frames(unsigned long start, unsigned int nr_pfns,
+		     unsigned int gup_flags, struct frame_vector *vec);
+void put_vaddr_frames(struct frame_vector *vec);
+int frame_vector_to_pages(struct frame_vector *vec);
+void frame_vector_to_pfns(struct frame_vector *vec);
+
+static inline unsigned int frame_vector_count(struct frame_vector *vec)
+{
+	return vec->nr_frames;
+}
+
+static inline struct page **frame_vector_pages(struct frame_vector *vec)
+{
+	if (vec->is_pfns) {
+		int err = frame_vector_to_pages(vec);
+
+		if (err)
+			return ERR_PTR(err);
+	}
+	return (struct page **)(vec->ptrs);
+}
+
+static inline unsigned long *frame_vector_pfns(struct frame_vector *vec)
+{
+	if (!vec->is_pfns)
+		frame_vector_to_pfns(vec);
+	return (unsigned long *)(vec->ptrs);
+}
+
 #endif /* _MEDIA_VIDEOBUF2_CORE_H */
diff --git a/mm/Kconfig b/mm/Kconfig
index 6c974888f86f..da6c943fe9f1 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -815,9 +815,6 @@ config DEVICE_PRIVATE
 	  memory; i.e., memory that is only accessible from the device (or
 	  group of devices). You likely also want to select HMM_MIRROR.
 
-config FRAME_VECTOR
-	bool
-
 config ARCH_USES_HIGH_VMA_FLAGS
 	bool
 config ARCH_HAS_PKEYS
diff --git a/mm/Makefile b/mm/Makefile
index d5649f1c12c0..a025fd6c6afd 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -111,7 +111,6 @@ obj-$(CONFIG_PAGE_EXTENSION) += page_ext.o
 obj-$(CONFIG_CMA_DEBUGFS) += cma_debug.o
 obj-$(CONFIG_USERFAULTFD) += userfaultfd.o
 obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o
-obj-$(CONFIG_FRAME_VECTOR) += frame_vector.o
 obj-$(CONFIG_DEBUG_PAGE_REF) += debug_page_ref.o
 obj-$(CONFIG_HARDENED_USERCOPY) += usercopy.o
 obj-$(CONFIG_PERCPU_STATS) += percpu-stats.o
-- 
2.28.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* [PATCH v2 06/17] media: videobuf2: Move frame_vector into media subsystem
@ 2020-10-09  7:59   ` Daniel Vetter
  0 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-09  7:59 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Pawel Osciak, kvm,
	Jason Gunthorpe, Daniel Vetter, Mauro Carvalho Chehab,
	Jérôme Glisse, Tomasz Figa, linux-mm, Kyungmin Park,
	John Hubbard, Daniel Vetter, Andrew Morton, Marek Szyprowski,
	Dan Williams, linux-arm-kernel, linux-media

It's the only user. This also garbage collects the CONFIG_FRAME_VECTOR
symbol from all over the tree (well just one place, somehow omap media
driver still had this in its Kconfig, despite not using it).

Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Pawel Osciak <pawel@osciak.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Tomasz Figa <tfiga@chromium.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/media/common/videobuf2/Kconfig        |  1 -
 drivers/media/common/videobuf2/Makefile       |  1 +
 .../media/common/videobuf2}/frame_vector.c    |  2 +
 drivers/media/platform/omap/Kconfig           |  1 -
 include/linux/mm.h                            | 42 -------------------
 include/media/videobuf2-core.h                | 42 +++++++++++++++++++
 mm/Kconfig                                    |  3 --
 mm/Makefile                                   |  1 -
 8 files changed, 45 insertions(+), 48 deletions(-)
 rename {mm => drivers/media/common/videobuf2}/frame_vector.c (99%)

diff --git a/drivers/media/common/videobuf2/Kconfig b/drivers/media/common/videobuf2/Kconfig
index edbc99ebba87..d2223a12c95f 100644
--- a/drivers/media/common/videobuf2/Kconfig
+++ b/drivers/media/common/videobuf2/Kconfig
@@ -9,7 +9,6 @@ config VIDEOBUF2_V4L2
 
 config VIDEOBUF2_MEMOPS
 	tristate
-	select FRAME_VECTOR
 
 config VIDEOBUF2_DMA_CONTIG
 	tristate
diff --git a/drivers/media/common/videobuf2/Makefile b/drivers/media/common/videobuf2/Makefile
index 77bebe8b202f..54306f8d096c 100644
--- a/drivers/media/common/videobuf2/Makefile
+++ b/drivers/media/common/videobuf2/Makefile
@@ -1,5 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0
 videobuf2-common-objs := videobuf2-core.o
+videobuf2-common-objs += frame_vector.o
 
 ifeq ($(CONFIG_TRACEPOINTS),y)
   videobuf2-common-objs += vb2-trace.o
diff --git a/mm/frame_vector.c b/drivers/media/common/videobuf2/frame_vector.c
similarity index 99%
rename from mm/frame_vector.c
rename to drivers/media/common/videobuf2/frame_vector.c
index d44779e56313..2b0b97761d15 100644
--- a/mm/frame_vector.c
+++ b/drivers/media/common/videobuf2/frame_vector.c
@@ -8,6 +8,8 @@
 #include <linux/pagemap.h>
 #include <linux/sched.h>
 
+#include <media/videobuf2-core.h>
+
 /**
  * get_vaddr_frames() - map virtual addresses to pfns
  * @start:	starting user address
diff --git a/drivers/media/platform/omap/Kconfig b/drivers/media/platform/omap/Kconfig
index f73b5893220d..de16de46c0f4 100644
--- a/drivers/media/platform/omap/Kconfig
+++ b/drivers/media/platform/omap/Kconfig
@@ -12,6 +12,5 @@ config VIDEO_OMAP2_VOUT
 	depends on VIDEO_V4L2
 	select VIDEOBUF2_DMA_CONTIG
 	select OMAP2_VRFB if ARCH_OMAP2 || ARCH_OMAP3
-	select FRAME_VECTOR
 	help
 	  V4L2 Display driver support for OMAP2/3 based boards.
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 16b799a0522c..acd60fbf1a5a 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1743,48 +1743,6 @@ int account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc);
 int __account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc,
 			struct task_struct *task, bool bypass_rlim);
 
-/* Container for pinned pfns / pages */
-struct frame_vector {
-	unsigned int nr_allocated;	/* Number of frames we have space for */
-	unsigned int nr_frames;	/* Number of frames stored in ptrs array */
-	bool got_ref;		/* Did we pin pages by getting page ref? */
-	bool is_pfns;		/* Does array contain pages or pfns? */
-	void *ptrs[];		/* Array of pinned pfns / pages. Use
-				 * pfns_vector_pages() or pfns_vector_pfns()
-				 * for access */
-};
-
-struct frame_vector *frame_vector_create(unsigned int nr_frames);
-void frame_vector_destroy(struct frame_vector *vec);
-int get_vaddr_frames(unsigned long start, unsigned int nr_pfns,
-		     unsigned int gup_flags, struct frame_vector *vec);
-void put_vaddr_frames(struct frame_vector *vec);
-int frame_vector_to_pages(struct frame_vector *vec);
-void frame_vector_to_pfns(struct frame_vector *vec);
-
-static inline unsigned int frame_vector_count(struct frame_vector *vec)
-{
-	return vec->nr_frames;
-}
-
-static inline struct page **frame_vector_pages(struct frame_vector *vec)
-{
-	if (vec->is_pfns) {
-		int err = frame_vector_to_pages(vec);
-
-		if (err)
-			return ERR_PTR(err);
-	}
-	return (struct page **)(vec->ptrs);
-}
-
-static inline unsigned long *frame_vector_pfns(struct frame_vector *vec)
-{
-	if (!vec->is_pfns)
-		frame_vector_to_pfns(vec);
-	return (unsigned long *)(vec->ptrs);
-}
-
 struct kvec;
 int get_kernel_pages(const struct kvec *iov, int nr_pages, int write,
 			struct page **pages);
diff --git a/include/media/videobuf2-core.h b/include/media/videobuf2-core.h
index bbb3f26fbde9..a2e75ca0334f 100644
--- a/include/media/videobuf2-core.h
+++ b/include/media/videobuf2-core.h
@@ -1254,4 +1254,46 @@ bool vb2_request_object_is_buffer(struct media_request_object *obj);
  */
 unsigned int vb2_request_buffer_cnt(struct media_request *req);
 
+/* Container for pinned pfns / pages in frame_vector.c */
+struct frame_vector {
+	unsigned int nr_allocated;	/* Number of frames we have space for */
+	unsigned int nr_frames;	/* Number of frames stored in ptrs array */
+	bool got_ref;		/* Did we pin pages by getting page ref? */
+	bool is_pfns;		/* Does array contain pages or pfns? */
+	void *ptrs[];		/* Array of pinned pfns / pages. Use
+				 * pfns_vector_pages() or pfns_vector_pfns()
+				 * for access */
+};
+
+struct frame_vector *frame_vector_create(unsigned int nr_frames);
+void frame_vector_destroy(struct frame_vector *vec);
+int get_vaddr_frames(unsigned long start, unsigned int nr_pfns,
+		     unsigned int gup_flags, struct frame_vector *vec);
+void put_vaddr_frames(struct frame_vector *vec);
+int frame_vector_to_pages(struct frame_vector *vec);
+void frame_vector_to_pfns(struct frame_vector *vec);
+
+static inline unsigned int frame_vector_count(struct frame_vector *vec)
+{
+	return vec->nr_frames;
+}
+
+static inline struct page **frame_vector_pages(struct frame_vector *vec)
+{
+	if (vec->is_pfns) {
+		int err = frame_vector_to_pages(vec);
+
+		if (err)
+			return ERR_PTR(err);
+	}
+	return (struct page **)(vec->ptrs);
+}
+
+static inline unsigned long *frame_vector_pfns(struct frame_vector *vec)
+{
+	if (!vec->is_pfns)
+		frame_vector_to_pfns(vec);
+	return (unsigned long *)(vec->ptrs);
+}
+
 #endif /* _MEDIA_VIDEOBUF2_CORE_H */
diff --git a/mm/Kconfig b/mm/Kconfig
index 6c974888f86f..da6c943fe9f1 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -815,9 +815,6 @@ config DEVICE_PRIVATE
 	  memory; i.e., memory that is only accessible from the device (or
 	  group of devices). You likely also want to select HMM_MIRROR.
 
-config FRAME_VECTOR
-	bool
-
 config ARCH_USES_HIGH_VMA_FLAGS
 	bool
 config ARCH_HAS_PKEYS
diff --git a/mm/Makefile b/mm/Makefile
index d5649f1c12c0..a025fd6c6afd 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -111,7 +111,6 @@ obj-$(CONFIG_PAGE_EXTENSION) += page_ext.o
 obj-$(CONFIG_CMA_DEBUGFS) += cma_debug.o
 obj-$(CONFIG_USERFAULTFD) += userfaultfd.o
 obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o
-obj-$(CONFIG_FRAME_VECTOR) += frame_vector.o
 obj-$(CONFIG_DEBUG_PAGE_REF) += debug_page_ref.o
 obj-$(CONFIG_HARDENED_USERCOPY) += usercopy.o
 obj-$(CONFIG_PERCPU_STATS) += percpu-stats.o
-- 
2.28.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* [PATCH v2 07/17] mm: Close race in generic_access_phys
  2020-10-09  7:59 ` Daniel Vetter
  (?)
@ 2020-10-09  7:59   ` Daniel Vetter
  -1 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-09  7:59 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: kvm, linux-mm, linux-arm-kernel, linux-samsung-soc, linux-media,
	linux-s390, Daniel Vetter, Jason Gunthorpe, Dan Williams,
	Kees Cook, Rik van Riel, Benjamin Herrensmidt, Dave Airlie,
	Andrew Morton, John Hubbard, Jérôme Glisse, Jan Kara,
	Daniel Vetter

Way back it was a reasonable assumptions that iomem mappings never
change the pfn range they point at. But this has changed:

- gpu drivers dynamically manage their memory nowadays, invalidating
  ptes with unmap_mapping_range when buffers get moved

- contiguous dma allocations have moved from dedicated carvetouts to
  cma regions. This means if we miss the unmap the pfn might contain
  pagecache or anon memory (well anything allocated with GFP_MOVEABLE)

- even /dev/mem now invalidates mappings when the kernel requests that
  iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
  ("/dev/mem: Revoke mappings when a driver claims the region")

Accessing pfns obtained from ptes without holding all the locks is
therefore no longer a good idea. Fix this.

Since ioremap might need to manipulate pagetables too we need to drop
the pt lock and have a retry loop if we raced.

While at it, also add kerneldoc and improve the comment for the
vma_ops->access function. It's for accessing, not for moving the
memory from iomem to system memory, as the old comment seemed to
suggest.

References: 28b2ee20c7cb ("access_process_vm device memory infrastructure")
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Benjamin Herrensmidt <benh@kernel.crashing.org>
Cc: Dave Airlie <airlied@linux.ie>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
--
v2: Fix inversion in the retry check (John).
---
 include/linux/mm.h |  3 ++-
 mm/memory.c        | 44 ++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 44 insertions(+), 3 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index acd60fbf1a5a..2a16631c1fda 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -566,7 +566,8 @@ struct vm_operations_struct {
 	vm_fault_t (*pfn_mkwrite)(struct vm_fault *vmf);
 
 	/* called by access_process_vm when get_user_pages() fails, typically
-	 * for use by special VMAs that can switch between memory and hardware
+	 * for use by special VMAs. See also generic_access_phys() for a generic
+	 * implementation useful for any iomem mapping.
 	 */
 	int (*access)(struct vm_area_struct *vma, unsigned long addr,
 		      void *buf, int len, int write);
diff --git a/mm/memory.c b/mm/memory.c
index fcfc4ca36eba..f7cbc4dde0ef 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4873,28 +4873,68 @@ int follow_phys(struct vm_area_struct *vma,
 	return ret;
 }
 
+/**
+ * generic_access_phys - generic implementation for iomem mmap access
+ * @vma: the vma to access
+ * @addr: userspace addres, not relative offset within @vma
+ * @buf: buffer to read/write
+ * @len: length of transfer
+ * @write: set to FOLL_WRITE when writing, otherwise reading
+ *
+ * This is a generic implementation for &vm_operations_struct.access for an
+ * iomem mapping. This callback is used by access_process_vm() when the @vma is
+ * not page based.
+ */
 int generic_access_phys(struct vm_area_struct *vma, unsigned long addr,
 			void *buf, int len, int write)
 {
 	resource_size_t phys_addr;
 	unsigned long prot = 0;
 	void __iomem *maddr;
+	pte_t *ptep, pte;
+	spinlock_t *ptl;
 	int offset = addr & (PAGE_SIZE-1);
+	int ret = -EINVAL;
+
+	if (!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
+		return -EINVAL;
+
+retry:
+	if (follow_pte(vma->vm_mm, addr, &ptep, &ptl))
+		return -EINVAL;
+	pte = *ptep;
+	pte_unmap_unlock(ptep, ptl);
 
-	if (follow_phys(vma, addr, write, &prot, &phys_addr))
+	prot = pgprot_val(pte_pgprot(pte));
+	phys_addr = (resource_size_t)pte_pfn(pte) << PAGE_SHIFT;
+
+	if ((write & FOLL_WRITE) && !pte_write(pte))
 		return -EINVAL;
 
 	maddr = ioremap_prot(phys_addr, PAGE_ALIGN(len + offset), prot);
 	if (!maddr)
 		return -ENOMEM;
 
+	if (follow_pte(vma->vm_mm, addr, &ptep, &ptl))
+		goto out_unmap;
+
+	if (!pte_same(pte, *ptep)) {
+		pte_unmap_unlock(ptep, ptl);
+		iounmap(maddr);
+
+		goto retry;
+	}
+
 	if (write)
 		memcpy_toio(maddr + offset, buf, len);
 	else
 		memcpy_fromio(buf, maddr + offset, len);
+	ret = len;
+	pte_unmap_unlock(ptep, ptl);
+out_unmap:
 	iounmap(maddr);
 
-	return len;
+	return ret;
 }
 EXPORT_SYMBOL_GPL(generic_access_phys);
 #endif
-- 
2.28.0


^ permalink raw reply	[flat|nested] 214+ messages in thread

* [PATCH v2 07/17] mm: Close race in generic_access_phys
@ 2020-10-09  7:59   ` Daniel Vetter
  0 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-09  7:59 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: linux-s390, Rik van Riel, linux-samsung-soc, Jan Kara, Kees Cook,
	kvm, Jason Gunthorpe, Dave Airlie, Daniel Vetter, linux-mm,
	Jérôme Glisse, John Hubbard, Benjamin Herrensmidt,
	Daniel Vetter, Dan Williams, Andrew Morton, linux-arm-kernel,
	linux-media

Way back it was a reasonable assumptions that iomem mappings never
change the pfn range they point at. But this has changed:

- gpu drivers dynamically manage their memory nowadays, invalidating
  ptes with unmap_mapping_range when buffers get moved

- contiguous dma allocations have moved from dedicated carvetouts to
  cma regions. This means if we miss the unmap the pfn might contain
  pagecache or anon memory (well anything allocated with GFP_MOVEABLE)

- even /dev/mem now invalidates mappings when the kernel requests that
  iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
  ("/dev/mem: Revoke mappings when a driver claims the region")

Accessing pfns obtained from ptes without holding all the locks is
therefore no longer a good idea. Fix this.

Since ioremap might need to manipulate pagetables too we need to drop
the pt lock and have a retry loop if we raced.

While at it, also add kerneldoc and improve the comment for the
vma_ops->access function. It's for accessing, not for moving the
memory from iomem to system memory, as the old comment seemed to
suggest.

References: 28b2ee20c7cb ("access_process_vm device memory infrastructure")
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Benjamin Herrensmidt <benh@kernel.crashing.org>
Cc: Dave Airlie <airlied@linux.ie>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
--
v2: Fix inversion in the retry check (John).
---
 include/linux/mm.h |  3 ++-
 mm/memory.c        | 44 ++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 44 insertions(+), 3 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index acd60fbf1a5a..2a16631c1fda 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -566,7 +566,8 @@ struct vm_operations_struct {
 	vm_fault_t (*pfn_mkwrite)(struct vm_fault *vmf);
 
 	/* called by access_process_vm when get_user_pages() fails, typically
-	 * for use by special VMAs that can switch between memory and hardware
+	 * for use by special VMAs. See also generic_access_phys() for a generic
+	 * implementation useful for any iomem mapping.
 	 */
 	int (*access)(struct vm_area_struct *vma, unsigned long addr,
 		      void *buf, int len, int write);
diff --git a/mm/memory.c b/mm/memory.c
index fcfc4ca36eba..f7cbc4dde0ef 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4873,28 +4873,68 @@ int follow_phys(struct vm_area_struct *vma,
 	return ret;
 }
 
+/**
+ * generic_access_phys - generic implementation for iomem mmap access
+ * @vma: the vma to access
+ * @addr: userspace addres, not relative offset within @vma
+ * @buf: buffer to read/write
+ * @len: length of transfer
+ * @write: set to FOLL_WRITE when writing, otherwise reading
+ *
+ * This is a generic implementation for &vm_operations_struct.access for an
+ * iomem mapping. This callback is used by access_process_vm() when the @vma is
+ * not page based.
+ */
 int generic_access_phys(struct vm_area_struct *vma, unsigned long addr,
 			void *buf, int len, int write)
 {
 	resource_size_t phys_addr;
 	unsigned long prot = 0;
 	void __iomem *maddr;
+	pte_t *ptep, pte;
+	spinlock_t *ptl;
 	int offset = addr & (PAGE_SIZE-1);
+	int ret = -EINVAL;
+
+	if (!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
+		return -EINVAL;
+
+retry:
+	if (follow_pte(vma->vm_mm, addr, &ptep, &ptl))
+		return -EINVAL;
+	pte = *ptep;
+	pte_unmap_unlock(ptep, ptl);
 
-	if (follow_phys(vma, addr, write, &prot, &phys_addr))
+	prot = pgprot_val(pte_pgprot(pte));
+	phys_addr = (resource_size_t)pte_pfn(pte) << PAGE_SHIFT;
+
+	if ((write & FOLL_WRITE) && !pte_write(pte))
 		return -EINVAL;
 
 	maddr = ioremap_prot(phys_addr, PAGE_ALIGN(len + offset), prot);
 	if (!maddr)
 		return -ENOMEM;
 
+	if (follow_pte(vma->vm_mm, addr, &ptep, &ptl))
+		goto out_unmap;
+
+	if (!pte_same(pte, *ptep)) {
+		pte_unmap_unlock(ptep, ptl);
+		iounmap(maddr);
+
+		goto retry;
+	}
+
 	if (write)
 		memcpy_toio(maddr + offset, buf, len);
 	else
 		memcpy_fromio(buf, maddr + offset, len);
+	ret = len;
+	pte_unmap_unlock(ptep, ptl);
+out_unmap:
 	iounmap(maddr);
 
-	return len;
+	return ret;
 }
 EXPORT_SYMBOL_GPL(generic_access_phys);
 #endif
-- 
2.28.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* [PATCH v2 07/17] mm: Close race in generic_access_phys
@ 2020-10-09  7:59   ` Daniel Vetter
  0 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-09  7:59 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: linux-s390, Rik van Riel, linux-samsung-soc, Jan Kara, Kees Cook,
	kvm, Jason Gunthorpe, Dave Airlie, Daniel Vetter, linux-mm,
	Jérôme Glisse, John Hubbard, Daniel Vetter,
	Dan Williams, Andrew Morton, linux-arm-kernel, linux-media

Way back it was a reasonable assumptions that iomem mappings never
change the pfn range they point at. But this has changed:

- gpu drivers dynamically manage their memory nowadays, invalidating
  ptes with unmap_mapping_range when buffers get moved

- contiguous dma allocations have moved from dedicated carvetouts to
  cma regions. This means if we miss the unmap the pfn might contain
  pagecache or anon memory (well anything allocated with GFP_MOVEABLE)

- even /dev/mem now invalidates mappings when the kernel requests that
  iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
  ("/dev/mem: Revoke mappings when a driver claims the region")

Accessing pfns obtained from ptes without holding all the locks is
therefore no longer a good idea. Fix this.

Since ioremap might need to manipulate pagetables too we need to drop
the pt lock and have a retry loop if we raced.

While at it, also add kerneldoc and improve the comment for the
vma_ops->access function. It's for accessing, not for moving the
memory from iomem to system memory, as the old comment seemed to
suggest.

References: 28b2ee20c7cb ("access_process_vm device memory infrastructure")
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Benjamin Herrensmidt <benh@kernel.crashing.org>
Cc: Dave Airlie <airlied@linux.ie>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
--
v2: Fix inversion in the retry check (John).
---
 include/linux/mm.h |  3 ++-
 mm/memory.c        | 44 ++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 44 insertions(+), 3 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index acd60fbf1a5a..2a16631c1fda 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -566,7 +566,8 @@ struct vm_operations_struct {
 	vm_fault_t (*pfn_mkwrite)(struct vm_fault *vmf);
 
 	/* called by access_process_vm when get_user_pages() fails, typically
-	 * for use by special VMAs that can switch between memory and hardware
+	 * for use by special VMAs. See also generic_access_phys() for a generic
+	 * implementation useful for any iomem mapping.
 	 */
 	int (*access)(struct vm_area_struct *vma, unsigned long addr,
 		      void *buf, int len, int write);
diff --git a/mm/memory.c b/mm/memory.c
index fcfc4ca36eba..f7cbc4dde0ef 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4873,28 +4873,68 @@ int follow_phys(struct vm_area_struct *vma,
 	return ret;
 }
 
+/**
+ * generic_access_phys - generic implementation for iomem mmap access
+ * @vma: the vma to access
+ * @addr: userspace addres, not relative offset within @vma
+ * @buf: buffer to read/write
+ * @len: length of transfer
+ * @write: set to FOLL_WRITE when writing, otherwise reading
+ *
+ * This is a generic implementation for &vm_operations_struct.access for an
+ * iomem mapping. This callback is used by access_process_vm() when the @vma is
+ * not page based.
+ */
 int generic_access_phys(struct vm_area_struct *vma, unsigned long addr,
 			void *buf, int len, int write)
 {
 	resource_size_t phys_addr;
 	unsigned long prot = 0;
 	void __iomem *maddr;
+	pte_t *ptep, pte;
+	spinlock_t *ptl;
 	int offset = addr & (PAGE_SIZE-1);
+	int ret = -EINVAL;
+
+	if (!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
+		return -EINVAL;
+
+retry:
+	if (follow_pte(vma->vm_mm, addr, &ptep, &ptl))
+		return -EINVAL;
+	pte = *ptep;
+	pte_unmap_unlock(ptep, ptl);
 
-	if (follow_phys(vma, addr, write, &prot, &phys_addr))
+	prot = pgprot_val(pte_pgprot(pte));
+	phys_addr = (resource_size_t)pte_pfn(pte) << PAGE_SHIFT;
+
+	if ((write & FOLL_WRITE) && !pte_write(pte))
 		return -EINVAL;
 
 	maddr = ioremap_prot(phys_addr, PAGE_ALIGN(len + offset), prot);
 	if (!maddr)
 		return -ENOMEM;
 
+	if (follow_pte(vma->vm_mm, addr, &ptep, &ptl))
+		goto out_unmap;
+
+	if (!pte_same(pte, *ptep)) {
+		pte_unmap_unlock(ptep, ptl);
+		iounmap(maddr);
+
+		goto retry;
+	}
+
 	if (write)
 		memcpy_toio(maddr + offset, buf, len);
 	else
 		memcpy_fromio(buf, maddr + offset, len);
+	ret = len;
+	pte_unmap_unlock(ptep, ptl);
+out_unmap:
 	iounmap(maddr);
 
-	return len;
+	return ret;
 }
 EXPORT_SYMBOL_GPL(generic_access_phys);
 #endif
-- 
2.28.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* [PATCH v2 08/17] s390/pci: Remove races against pte updates
  2020-10-09  7:59 ` Daniel Vetter
  (?)
@ 2020-10-09  7:59   ` Daniel Vetter
  -1 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-09  7:59 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: kvm, linux-mm, linux-arm-kernel, linux-samsung-soc, linux-media,
	linux-s390, Daniel Vetter, Gerald Schaefer, Daniel Vetter,
	Jason Gunthorpe, Dan Williams, Kees Cook, Andrew Morton,
	John Hubbard, Jérôme Glisse, Jan Kara, Niklas Schnelle

Way back it was a reasonable assumptions that iomem mappings never
change the pfn range they point at. But this has changed:

- gpu drivers dynamically manage their memory nowadays, invalidating
ptes with unmap_mapping_range when buffers get moved

- contiguous dma allocations have moved from dedicated carvetouts to
cma regions. This means if we miss the unmap the pfn might contain
pagecache or anon memory (well anything allocated with GFP_MOVEABLE)

- even /dev/mem now invalidates mappings when the kernel requests that
iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
("/dev/mem: Revoke mappings when a driver claims the region")

Accessing pfns obtained from ptes without holding all the locks is
therefore no longer a good idea. Fix this.

Since zpci_memcpy_from|toio seems to not do anything nefarious with
locks we just need to open code get_pfn and follow_pfn and make sure
we drop the locks only after we've done. The write function also needs
the copy_from_user move, since we can't take userspace faults while
holding the mmap sem.

Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Niklas Schnelle <schnelle@linux.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: linux-s390@vger.kernel.org
--
v2: Move VM_IO | VM_PFNMAP checks around so they keep returning EINVAL
like before (Gerard)
---
 arch/s390/pci/pci_mmio.c | 98 +++++++++++++++++++++++-----------------
 1 file changed, 57 insertions(+), 41 deletions(-)

diff --git a/arch/s390/pci/pci_mmio.c b/arch/s390/pci/pci_mmio.c
index 401cf670a243..1a6adbc68ee8 100644
--- a/arch/s390/pci/pci_mmio.c
+++ b/arch/s390/pci/pci_mmio.c
@@ -119,33 +119,15 @@ static inline int __memcpy_toio_inuser(void __iomem *dst,
 	return rc;
 }
 
-static long get_pfn(unsigned long user_addr, unsigned long access,
-		    unsigned long *pfn)
-{
-	struct vm_area_struct *vma;
-	long ret;
-
-	mmap_read_lock(current->mm);
-	ret = -EINVAL;
-	vma = find_vma(current->mm, user_addr);
-	if (!vma)
-		goto out;
-	ret = -EACCES;
-	if (!(vma->vm_flags & access))
-		goto out;
-	ret = follow_pfn(vma, user_addr, pfn);
-out:
-	mmap_read_unlock(current->mm);
-	return ret;
-}
-
 SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, mmio_addr,
 		const void __user *, user_buffer, size_t, length)
 {
 	u8 local_buf[64];
 	void __iomem *io_addr;
 	void *buf;
-	unsigned long pfn;
+	struct vm_area_struct *vma;
+	pte_t *ptep;
+	spinlock_t *ptl;
 	long ret;
 
 	if (!zpci_is_enabled())
@@ -158,7 +140,7 @@ SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, mmio_addr,
 	 * We only support write access to MIO capable devices if we are on
 	 * a MIO enabled system. Otherwise we would have to check for every
 	 * address if it is a special ZPCI_ADDR and would have to do
-	 * a get_pfn() which we don't need for MIO capable devices.  Currently
+	 * a pfn lookup which we don't need for MIO capable devices.  Currently
 	 * ISM devices are the only devices without MIO support and there is no
 	 * known need for accessing these from userspace.
 	 */
@@ -176,21 +158,37 @@ SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, mmio_addr,
 	} else
 		buf = local_buf;
 
-	ret = get_pfn(mmio_addr, VM_WRITE, &pfn);
+	ret = -EFAULT;
+	if (copy_from_user(buf, user_buffer, length))
+		goto out_free;
+
+	mmap_read_lock(current->mm);
+	ret = -EINVAL;
+	vma = find_vma(current->mm, mmio_addr);
+	if (!vma)
+		goto out_unlock_mmap;
+	if (!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
+		goto out_unlock_mmap;
+	ret = -EACCES;
+	if (!(vma->vm_flags & VM_WRITE))
+		goto out_unlock_mmap;
+
+	ret = follow_pte_pmd(vma->vm_mm, mmio_addr, NULL, &ptep, NULL, &ptl);
 	if (ret)
-		goto out;
-	io_addr = (void __iomem *)((pfn << PAGE_SHIFT) |
+		goto out_unlock_mmap;
+
+	io_addr = (void __iomem *)((pte_pfn(*ptep) << PAGE_SHIFT) |
 			(mmio_addr & ~PAGE_MASK));
 
-	ret = -EFAULT;
 	if ((unsigned long) io_addr < ZPCI_IOMAP_ADDR_BASE)
-		goto out;
-
-	if (copy_from_user(buf, user_buffer, length))
-		goto out;
+		goto out_unlock_pt;
 
 	ret = zpci_memcpy_toio(io_addr, buf, length);
-out:
+out_unlock_pt:
+	pte_unmap_unlock(ptep, ptl);
+out_unlock_mmap:
+	mmap_read_unlock(current->mm);
+out_free:
 	if (buf != local_buf)
 		kfree(buf);
 	return ret;
@@ -274,7 +272,9 @@ SYSCALL_DEFINE3(s390_pci_mmio_read, unsigned long, mmio_addr,
 	u8 local_buf[64];
 	void __iomem *io_addr;
 	void *buf;
-	unsigned long pfn;
+	struct vm_area_struct *vma;
+	pte_t *ptep;
+	spinlock_t *ptl;
 	long ret;
 
 	if (!zpci_is_enabled())
@@ -287,7 +287,7 @@ SYSCALL_DEFINE3(s390_pci_mmio_read, unsigned long, mmio_addr,
 	 * We only support read access to MIO capable devices if we are on
 	 * a MIO enabled system. Otherwise we would have to check for every
 	 * address if it is a special ZPCI_ADDR and would have to do
-	 * a get_pfn() which we don't need for MIO capable devices.  Currently
+	 * a pfn lookup which we don't need for MIO capable devices.  Currently
 	 * ISM devices are the only devices without MIO support and there is no
 	 * known need for accessing these from userspace.
 	 */
@@ -306,22 +306,38 @@ SYSCALL_DEFINE3(s390_pci_mmio_read, unsigned long, mmio_addr,
 		buf = local_buf;
 	}
 
-	ret = get_pfn(mmio_addr, VM_READ, &pfn);
+	mmap_read_lock(current->mm);
+	ret = -EINVAL;
+	vma = find_vma(current->mm, mmio_addr);
+	if (!vma)
+		goto out_unlock_mmap;
+	if (!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
+		goto out_unlock_mmap;
+	ret = -EACCES;
+	if (!(vma->vm_flags & VM_WRITE))
+		goto out_unlock_mmap;
+
+	ret = follow_pte_pmd(vma->vm_mm, mmio_addr, NULL, &ptep, NULL, &ptl);
 	if (ret)
-		goto out;
-	io_addr = (void __iomem *)((pfn << PAGE_SHIFT) | (mmio_addr & ~PAGE_MASK));
+		goto out_unlock_mmap;
+
+	io_addr = (void __iomem *)((pte_pfn(*ptep) << PAGE_SHIFT) |
+			(mmio_addr & ~PAGE_MASK));
 
 	if ((unsigned long) io_addr < ZPCI_IOMAP_ADDR_BASE) {
 		ret = -EFAULT;
-		goto out;
+		goto out_unlock_pt;
 	}
 	ret = zpci_memcpy_fromio(buf, io_addr, length);
-	if (ret)
-		goto out;
-	if (copy_to_user(user_buffer, buf, length))
+
+out_unlock_pt:
+	pte_unmap_unlock(ptep, ptl);
+out_unlock_mmap:
+	mmap_read_unlock(current->mm);
+
+	if (!ret && copy_to_user(user_buffer, buf, length))
 		ret = -EFAULT;
 
-out:
 	if (buf != local_buf)
 		kfree(buf);
 	return ret;
-- 
2.28.0


^ permalink raw reply	[flat|nested] 214+ messages in thread

* [PATCH v2 08/17] s390/pci: Remove races against pte updates
@ 2020-10-09  7:59   ` Daniel Vetter
  0 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-09  7:59 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, kvm,
	Jason Gunthorpe, Daniel Vetter, Niklas Schnelle, Andrew Morton,
	linux-mm, Jérôme Glisse, John Hubbard, Daniel Vetter,
	Gerald Schaefer, Dan Williams, linux-arm-kernel, linux-media

Way back it was a reasonable assumptions that iomem mappings never
change the pfn range they point at. But this has changed:

- gpu drivers dynamically manage their memory nowadays, invalidating
ptes with unmap_mapping_range when buffers get moved

- contiguous dma allocations have moved from dedicated carvetouts to
cma regions. This means if we miss the unmap the pfn might contain
pagecache or anon memory (well anything allocated with GFP_MOVEABLE)

- even /dev/mem now invalidates mappings when the kernel requests that
iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
("/dev/mem: Revoke mappings when a driver claims the region")

Accessing pfns obtained from ptes without holding all the locks is
therefore no longer a good idea. Fix this.

Since zpci_memcpy_from|toio seems to not do anything nefarious with
locks we just need to open code get_pfn and follow_pfn and make sure
we drop the locks only after we've done. The write function also needs
the copy_from_user move, since we can't take userspace faults while
holding the mmap sem.

Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Niklas Schnelle <schnelle@linux.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: linux-s390@vger.kernel.org
--
v2: Move VM_IO | VM_PFNMAP checks around so they keep returning EINVAL
like before (Gerard)
---
 arch/s390/pci/pci_mmio.c | 98 +++++++++++++++++++++++-----------------
 1 file changed, 57 insertions(+), 41 deletions(-)

diff --git a/arch/s390/pci/pci_mmio.c b/arch/s390/pci/pci_mmio.c
index 401cf670a243..1a6adbc68ee8 100644
--- a/arch/s390/pci/pci_mmio.c
+++ b/arch/s390/pci/pci_mmio.c
@@ -119,33 +119,15 @@ static inline int __memcpy_toio_inuser(void __iomem *dst,
 	return rc;
 }
 
-static long get_pfn(unsigned long user_addr, unsigned long access,
-		    unsigned long *pfn)
-{
-	struct vm_area_struct *vma;
-	long ret;
-
-	mmap_read_lock(current->mm);
-	ret = -EINVAL;
-	vma = find_vma(current->mm, user_addr);
-	if (!vma)
-		goto out;
-	ret = -EACCES;
-	if (!(vma->vm_flags & access))
-		goto out;
-	ret = follow_pfn(vma, user_addr, pfn);
-out:
-	mmap_read_unlock(current->mm);
-	return ret;
-}
-
 SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, mmio_addr,
 		const void __user *, user_buffer, size_t, length)
 {
 	u8 local_buf[64];
 	void __iomem *io_addr;
 	void *buf;
-	unsigned long pfn;
+	struct vm_area_struct *vma;
+	pte_t *ptep;
+	spinlock_t *ptl;
 	long ret;
 
 	if (!zpci_is_enabled())
@@ -158,7 +140,7 @@ SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, mmio_addr,
 	 * We only support write access to MIO capable devices if we are on
 	 * a MIO enabled system. Otherwise we would have to check for every
 	 * address if it is a special ZPCI_ADDR and would have to do
-	 * a get_pfn() which we don't need for MIO capable devices.  Currently
+	 * a pfn lookup which we don't need for MIO capable devices.  Currently
 	 * ISM devices are the only devices without MIO support and there is no
 	 * known need for accessing these from userspace.
 	 */
@@ -176,21 +158,37 @@ SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, mmio_addr,
 	} else
 		buf = local_buf;
 
-	ret = get_pfn(mmio_addr, VM_WRITE, &pfn);
+	ret = -EFAULT;
+	if (copy_from_user(buf, user_buffer, length))
+		goto out_free;
+
+	mmap_read_lock(current->mm);
+	ret = -EINVAL;
+	vma = find_vma(current->mm, mmio_addr);
+	if (!vma)
+		goto out_unlock_mmap;
+	if (!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
+		goto out_unlock_mmap;
+	ret = -EACCES;
+	if (!(vma->vm_flags & VM_WRITE))
+		goto out_unlock_mmap;
+
+	ret = follow_pte_pmd(vma->vm_mm, mmio_addr, NULL, &ptep, NULL, &ptl);
 	if (ret)
-		goto out;
-	io_addr = (void __iomem *)((pfn << PAGE_SHIFT) |
+		goto out_unlock_mmap;
+
+	io_addr = (void __iomem *)((pte_pfn(*ptep) << PAGE_SHIFT) |
 			(mmio_addr & ~PAGE_MASK));
 
-	ret = -EFAULT;
 	if ((unsigned long) io_addr < ZPCI_IOMAP_ADDR_BASE)
-		goto out;
-
-	if (copy_from_user(buf, user_buffer, length))
-		goto out;
+		goto out_unlock_pt;
 
 	ret = zpci_memcpy_toio(io_addr, buf, length);
-out:
+out_unlock_pt:
+	pte_unmap_unlock(ptep, ptl);
+out_unlock_mmap:
+	mmap_read_unlock(current->mm);
+out_free:
 	if (buf != local_buf)
 		kfree(buf);
 	return ret;
@@ -274,7 +272,9 @@ SYSCALL_DEFINE3(s390_pci_mmio_read, unsigned long, mmio_addr,
 	u8 local_buf[64];
 	void __iomem *io_addr;
 	void *buf;
-	unsigned long pfn;
+	struct vm_area_struct *vma;
+	pte_t *ptep;
+	spinlock_t *ptl;
 	long ret;
 
 	if (!zpci_is_enabled())
@@ -287,7 +287,7 @@ SYSCALL_DEFINE3(s390_pci_mmio_read, unsigned long, mmio_addr,
 	 * We only support read access to MIO capable devices if we are on
 	 * a MIO enabled system. Otherwise we would have to check for every
 	 * address if it is a special ZPCI_ADDR and would have to do
-	 * a get_pfn() which we don't need for MIO capable devices.  Currently
+	 * a pfn lookup which we don't need for MIO capable devices.  Currently
 	 * ISM devices are the only devices without MIO support and there is no
 	 * known need for accessing these from userspace.
 	 */
@@ -306,22 +306,38 @@ SYSCALL_DEFINE3(s390_pci_mmio_read, unsigned long, mmio_addr,
 		buf = local_buf;
 	}
 
-	ret = get_pfn(mmio_addr, VM_READ, &pfn);
+	mmap_read_lock(current->mm);
+	ret = -EINVAL;
+	vma = find_vma(current->mm, mmio_addr);
+	if (!vma)
+		goto out_unlock_mmap;
+	if (!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
+		goto out_unlock_mmap;
+	ret = -EACCES;
+	if (!(vma->vm_flags & VM_WRITE))
+		goto out_unlock_mmap;
+
+	ret = follow_pte_pmd(vma->vm_mm, mmio_addr, NULL, &ptep, NULL, &ptl);
 	if (ret)
-		goto out;
-	io_addr = (void __iomem *)((pfn << PAGE_SHIFT) | (mmio_addr & ~PAGE_MASK));
+		goto out_unlock_mmap;
+
+	io_addr = (void __iomem *)((pte_pfn(*ptep) << PAGE_SHIFT) |
+			(mmio_addr & ~PAGE_MASK));
 
 	if ((unsigned long) io_addr < ZPCI_IOMAP_ADDR_BASE) {
 		ret = -EFAULT;
-		goto out;
+		goto out_unlock_pt;
 	}
 	ret = zpci_memcpy_fromio(buf, io_addr, length);
-	if (ret)
-		goto out;
-	if (copy_to_user(user_buffer, buf, length))
+
+out_unlock_pt:
+	pte_unmap_unlock(ptep, ptl);
+out_unlock_mmap:
+	mmap_read_unlock(current->mm);
+
+	if (!ret && copy_to_user(user_buffer, buf, length))
 		ret = -EFAULT;
 
-out:
 	if (buf != local_buf)
 		kfree(buf);
 	return ret;
-- 
2.28.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* [PATCH v2 08/17] s390/pci: Remove races against pte updates
@ 2020-10-09  7:59   ` Daniel Vetter
  0 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-09  7:59 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, kvm,
	Jason Gunthorpe, Daniel Vetter, Niklas Schnelle, Andrew Morton,
	linux-mm, Jérôme Glisse, John Hubbard, Daniel Vetter,
	Gerald Schaefer, Dan Williams, linux-arm-kernel, linux-media

Way back it was a reasonable assumptions that iomem mappings never
change the pfn range they point at. But this has changed:

- gpu drivers dynamically manage their memory nowadays, invalidating
ptes with unmap_mapping_range when buffers get moved

- contiguous dma allocations have moved from dedicated carvetouts to
cma regions. This means if we miss the unmap the pfn might contain
pagecache or anon memory (well anything allocated with GFP_MOVEABLE)

- even /dev/mem now invalidates mappings when the kernel requests that
iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
("/dev/mem: Revoke mappings when a driver claims the region")

Accessing pfns obtained from ptes without holding all the locks is
therefore no longer a good idea. Fix this.

Since zpci_memcpy_from|toio seems to not do anything nefarious with
locks we just need to open code get_pfn and follow_pfn and make sure
we drop the locks only after we've done. The write function also needs
the copy_from_user move, since we can't take userspace faults while
holding the mmap sem.

Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Niklas Schnelle <schnelle@linux.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: linux-s390@vger.kernel.org
--
v2: Move VM_IO | VM_PFNMAP checks around so they keep returning EINVAL
like before (Gerard)
---
 arch/s390/pci/pci_mmio.c | 98 +++++++++++++++++++++++-----------------
 1 file changed, 57 insertions(+), 41 deletions(-)

diff --git a/arch/s390/pci/pci_mmio.c b/arch/s390/pci/pci_mmio.c
index 401cf670a243..1a6adbc68ee8 100644
--- a/arch/s390/pci/pci_mmio.c
+++ b/arch/s390/pci/pci_mmio.c
@@ -119,33 +119,15 @@ static inline int __memcpy_toio_inuser(void __iomem *dst,
 	return rc;
 }
 
-static long get_pfn(unsigned long user_addr, unsigned long access,
-		    unsigned long *pfn)
-{
-	struct vm_area_struct *vma;
-	long ret;
-
-	mmap_read_lock(current->mm);
-	ret = -EINVAL;
-	vma = find_vma(current->mm, user_addr);
-	if (!vma)
-		goto out;
-	ret = -EACCES;
-	if (!(vma->vm_flags & access))
-		goto out;
-	ret = follow_pfn(vma, user_addr, pfn);
-out:
-	mmap_read_unlock(current->mm);
-	return ret;
-}
-
 SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, mmio_addr,
 		const void __user *, user_buffer, size_t, length)
 {
 	u8 local_buf[64];
 	void __iomem *io_addr;
 	void *buf;
-	unsigned long pfn;
+	struct vm_area_struct *vma;
+	pte_t *ptep;
+	spinlock_t *ptl;
 	long ret;
 
 	if (!zpci_is_enabled())
@@ -158,7 +140,7 @@ SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, mmio_addr,
 	 * We only support write access to MIO capable devices if we are on
 	 * a MIO enabled system. Otherwise we would have to check for every
 	 * address if it is a special ZPCI_ADDR and would have to do
-	 * a get_pfn() which we don't need for MIO capable devices.  Currently
+	 * a pfn lookup which we don't need for MIO capable devices.  Currently
 	 * ISM devices are the only devices without MIO support and there is no
 	 * known need for accessing these from userspace.
 	 */
@@ -176,21 +158,37 @@ SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, mmio_addr,
 	} else
 		buf = local_buf;
 
-	ret = get_pfn(mmio_addr, VM_WRITE, &pfn);
+	ret = -EFAULT;
+	if (copy_from_user(buf, user_buffer, length))
+		goto out_free;
+
+	mmap_read_lock(current->mm);
+	ret = -EINVAL;
+	vma = find_vma(current->mm, mmio_addr);
+	if (!vma)
+		goto out_unlock_mmap;
+	if (!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
+		goto out_unlock_mmap;
+	ret = -EACCES;
+	if (!(vma->vm_flags & VM_WRITE))
+		goto out_unlock_mmap;
+
+	ret = follow_pte_pmd(vma->vm_mm, mmio_addr, NULL, &ptep, NULL, &ptl);
 	if (ret)
-		goto out;
-	io_addr = (void __iomem *)((pfn << PAGE_SHIFT) |
+		goto out_unlock_mmap;
+
+	io_addr = (void __iomem *)((pte_pfn(*ptep) << PAGE_SHIFT) |
 			(mmio_addr & ~PAGE_MASK));
 
-	ret = -EFAULT;
 	if ((unsigned long) io_addr < ZPCI_IOMAP_ADDR_BASE)
-		goto out;
-
-	if (copy_from_user(buf, user_buffer, length))
-		goto out;
+		goto out_unlock_pt;
 
 	ret = zpci_memcpy_toio(io_addr, buf, length);
-out:
+out_unlock_pt:
+	pte_unmap_unlock(ptep, ptl);
+out_unlock_mmap:
+	mmap_read_unlock(current->mm);
+out_free:
 	if (buf != local_buf)
 		kfree(buf);
 	return ret;
@@ -274,7 +272,9 @@ SYSCALL_DEFINE3(s390_pci_mmio_read, unsigned long, mmio_addr,
 	u8 local_buf[64];
 	void __iomem *io_addr;
 	void *buf;
-	unsigned long pfn;
+	struct vm_area_struct *vma;
+	pte_t *ptep;
+	spinlock_t *ptl;
 	long ret;
 
 	if (!zpci_is_enabled())
@@ -287,7 +287,7 @@ SYSCALL_DEFINE3(s390_pci_mmio_read, unsigned long, mmio_addr,
 	 * We only support read access to MIO capable devices if we are on
 	 * a MIO enabled system. Otherwise we would have to check for every
 	 * address if it is a special ZPCI_ADDR and would have to do
-	 * a get_pfn() which we don't need for MIO capable devices.  Currently
+	 * a pfn lookup which we don't need for MIO capable devices.  Currently
 	 * ISM devices are the only devices without MIO support and there is no
 	 * known need for accessing these from userspace.
 	 */
@@ -306,22 +306,38 @@ SYSCALL_DEFINE3(s390_pci_mmio_read, unsigned long, mmio_addr,
 		buf = local_buf;
 	}
 
-	ret = get_pfn(mmio_addr, VM_READ, &pfn);
+	mmap_read_lock(current->mm);
+	ret = -EINVAL;
+	vma = find_vma(current->mm, mmio_addr);
+	if (!vma)
+		goto out_unlock_mmap;
+	if (!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
+		goto out_unlock_mmap;
+	ret = -EACCES;
+	if (!(vma->vm_flags & VM_WRITE))
+		goto out_unlock_mmap;
+
+	ret = follow_pte_pmd(vma->vm_mm, mmio_addr, NULL, &ptep, NULL, &ptl);
 	if (ret)
-		goto out;
-	io_addr = (void __iomem *)((pfn << PAGE_SHIFT) | (mmio_addr & ~PAGE_MASK));
+		goto out_unlock_mmap;
+
+	io_addr = (void __iomem *)((pte_pfn(*ptep) << PAGE_SHIFT) |
+			(mmio_addr & ~PAGE_MASK));
 
 	if ((unsigned long) io_addr < ZPCI_IOMAP_ADDR_BASE) {
 		ret = -EFAULT;
-		goto out;
+		goto out_unlock_pt;
 	}
 	ret = zpci_memcpy_fromio(buf, io_addr, length);
-	if (ret)
-		goto out;
-	if (copy_to_user(user_buffer, buf, length))
+
+out_unlock_pt:
+	pte_unmap_unlock(ptep, ptl);
+out_unlock_mmap:
+	mmap_read_unlock(current->mm);
+
+	if (!ret && copy_to_user(user_buffer, buf, length))
 		ret = -EFAULT;
 
-out:
 	if (buf != local_buf)
 		kfree(buf);
 	return ret;
-- 
2.28.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* [PATCH v2 09/17] mm: Add unsafe_follow_pfn
  2020-10-09  7:59 ` Daniel Vetter
  (?)
@ 2020-10-09  7:59   ` Daniel Vetter
  -1 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-09  7:59 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: kvm, linux-mm, linux-arm-kernel, linux-samsung-soc, linux-media,
	linux-s390, Daniel Vetter, Daniel Vetter, Jason Gunthorpe,
	Kees Cook, Dan Williams, Andrew Morton, John Hubbard,
	Jérôme Glisse, Jan Kara

Way back it was a reasonable assumptions that iomem mappings never
change the pfn range they point at. But this has changed:

- gpu drivers dynamically manage their memory nowadays, invalidating
ptes with unmap_mapping_range when buffers get moved

- contiguous dma allocations have moved from dedicated carvetouts to
cma regions. This means if we miss the unmap the pfn might contain
pagecache or anon memory (well anything allocated with GFP_MOVEABLE)

- even /dev/mem now invalidates mappings when the kernel requests that
iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
("/dev/mem: Revoke mappings when a driver claims the region")

Accessing pfns obtained from ptes without holding all the locks is
therefore no longer a good idea.

Unfortunately there's some users where this is not fixable (like v4l
userptr of iomem mappings) or involves a pile of work (vfio type1
iommu). For now annotate these as unsafe and splat appropriately.

This patch adds an unsafe_follow_pfn, which later patches will then
roll out to all appropriate places.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kees Cook <keescook@chromium.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: kvm@vger.kernel.org
---
 include/linux/mm.h |  2 ++
 mm/memory.c        | 32 +++++++++++++++++++++++++++++++-
 mm/nommu.c         | 17 +++++++++++++++++
 security/Kconfig   | 13 +++++++++++++
 4 files changed, 63 insertions(+), 1 deletion(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 2a16631c1fda..ec8c90928fc9 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1653,6 +1653,8 @@ int follow_pte_pmd(struct mm_struct *mm, unsigned long address,
 		   pte_t **ptepp, pmd_t **pmdpp, spinlock_t **ptlp);
 int follow_pfn(struct vm_area_struct *vma, unsigned long address,
 	unsigned long *pfn);
+int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address,
+		      unsigned long *pfn);
 int follow_phys(struct vm_area_struct *vma, unsigned long address,
 		unsigned int flags, unsigned long *prot, resource_size_t *phys);
 int generic_access_phys(struct vm_area_struct *vma, unsigned long addr,
diff --git a/mm/memory.c b/mm/memory.c
index f7cbc4dde0ef..7c7b234ffb24 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4821,7 +4821,12 @@ EXPORT_SYMBOL(follow_pte_pmd);
  * @address: user virtual address
  * @pfn: location to store found PFN
  *
- * Only IO mappings and raw PFN mappings are allowed.
+ * Only IO mappings and raw PFN mappings are allowed. Note that callers must
+ * ensure coherency with pte updates by using a &mmu_notifier to follow updates.
+ * If this is not feasible, or the access to the @pfn is only very short term,
+ * use follow_pte_pmd() instead and hold the pagetable lock for the duration of
+ * the access instead. Any caller not following these requirements must use
+ * unsafe_follow_pfn() instead.
  *
  * Return: zero and the pfn at @pfn on success, -ve otherwise.
  */
@@ -4844,6 +4849,31 @@ int follow_pfn(struct vm_area_struct *vma, unsigned long address,
 }
 EXPORT_SYMBOL(follow_pfn);
 
+/**
+ * unsafe_follow_pfn - look up PFN at a user virtual address
+ * @vma: memory mapping
+ * @address: user virtual address
+ * @pfn: location to store found PFN
+ *
+ * Only IO mappings and raw PFN mappings are allowed.
+ *
+ * Returns zero and the pfn at @pfn on success, -ve otherwise.
+ */
+int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address,
+	unsigned long *pfn)
+{
+#ifdef CONFIG_STRICT_FOLLOW_PFN
+	pr_info("unsafe follow_pfn usage rejected, see CONFIG_STRICT_FOLLOW_PFN\n");
+	return -EINVAL;
+#else
+	WARN_ONCE(1, "unsafe follow_pfn usage\n");
+	add_taint(TAINT_USER, LOCKDEP_STILL_OK);
+
+	return follow_pfn(vma, address, pfn);
+#endif
+}
+EXPORT_SYMBOL(unsafe_follow_pfn);
+
 #ifdef CONFIG_HAVE_IOREMAP_PROT
 int follow_phys(struct vm_area_struct *vma,
 		unsigned long address, unsigned int flags,
diff --git a/mm/nommu.c b/mm/nommu.c
index 75a327149af1..3db2910f0d64 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -132,6 +132,23 @@ int follow_pfn(struct vm_area_struct *vma, unsigned long address,
 }
 EXPORT_SYMBOL(follow_pfn);
 
+/**
+ * unsafe_follow_pfn - look up PFN at a user virtual address
+ * @vma: memory mapping
+ * @address: user virtual address
+ * @pfn: location to store found PFN
+ *
+ * Only IO mappings and raw PFN mappings are allowed.
+ *
+ * Returns zero and the pfn at @pfn on success, -ve otherwise.
+ */
+int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address,
+	unsigned long *pfn)
+{
+	return follow_pfn(vma, address, pfn);
+}
+EXPORT_SYMBOL(unsafe_follow_pfn);
+
 LIST_HEAD(vmap_area_list);
 
 void vfree(const void *addr)
diff --git a/security/Kconfig b/security/Kconfig
index 7561f6f99f1d..48945402e103 100644
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -230,6 +230,19 @@ config STATIC_USERMODEHELPER_PATH
 	  If you wish for all usermode helper programs to be disabled,
 	  specify an empty string here (i.e. "").
 
+config STRICT_FOLLOW_PFN
+	bool "Disable unsafe use of follow_pfn"
+	depends on MMU
+	help
+	  Some functionality in the kernel follows userspace mappings to iomem
+	  ranges in an unsafe matter. Examples include v4l userptr for zero-copy
+	  buffers sharing.
+
+	  If this option is switched on, such access is rejected. Only enable
+	  this option when you must run userspace which requires this.
+
+	  If in doubt, say Y.
+
 source "security/selinux/Kconfig"
 source "security/smack/Kconfig"
 source "security/tomoyo/Kconfig"
-- 
2.28.0


^ permalink raw reply	[flat|nested] 214+ messages in thread

* [PATCH v2 09/17] mm: Add unsafe_follow_pfn
@ 2020-10-09  7:59   ` Daniel Vetter
  0 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-09  7:59 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, kvm,
	Jason Gunthorpe, Daniel Vetter, linux-mm,
	Jérôme Glisse, John Hubbard, Daniel Vetter,
	Dan Williams, Andrew Morton, linux-arm-kernel, linux-media

Way back it was a reasonable assumptions that iomem mappings never
change the pfn range they point at. But this has changed:

- gpu drivers dynamically manage their memory nowadays, invalidating
ptes with unmap_mapping_range when buffers get moved

- contiguous dma allocations have moved from dedicated carvetouts to
cma regions. This means if we miss the unmap the pfn might contain
pagecache or anon memory (well anything allocated with GFP_MOVEABLE)

- even /dev/mem now invalidates mappings when the kernel requests that
iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
("/dev/mem: Revoke mappings when a driver claims the region")

Accessing pfns obtained from ptes without holding all the locks is
therefore no longer a good idea.

Unfortunately there's some users where this is not fixable (like v4l
userptr of iomem mappings) or involves a pile of work (vfio type1
iommu). For now annotate these as unsafe and splat appropriately.

This patch adds an unsafe_follow_pfn, which later patches will then
roll out to all appropriate places.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kees Cook <keescook@chromium.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: kvm@vger.kernel.org
---
 include/linux/mm.h |  2 ++
 mm/memory.c        | 32 +++++++++++++++++++++++++++++++-
 mm/nommu.c         | 17 +++++++++++++++++
 security/Kconfig   | 13 +++++++++++++
 4 files changed, 63 insertions(+), 1 deletion(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 2a16631c1fda..ec8c90928fc9 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1653,6 +1653,8 @@ int follow_pte_pmd(struct mm_struct *mm, unsigned long address,
 		   pte_t **ptepp, pmd_t **pmdpp, spinlock_t **ptlp);
 int follow_pfn(struct vm_area_struct *vma, unsigned long address,
 	unsigned long *pfn);
+int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address,
+		      unsigned long *pfn);
 int follow_phys(struct vm_area_struct *vma, unsigned long address,
 		unsigned int flags, unsigned long *prot, resource_size_t *phys);
 int generic_access_phys(struct vm_area_struct *vma, unsigned long addr,
diff --git a/mm/memory.c b/mm/memory.c
index f7cbc4dde0ef..7c7b234ffb24 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4821,7 +4821,12 @@ EXPORT_SYMBOL(follow_pte_pmd);
  * @address: user virtual address
  * @pfn: location to store found PFN
  *
- * Only IO mappings and raw PFN mappings are allowed.
+ * Only IO mappings and raw PFN mappings are allowed. Note that callers must
+ * ensure coherency with pte updates by using a &mmu_notifier to follow updates.
+ * If this is not feasible, or the access to the @pfn is only very short term,
+ * use follow_pte_pmd() instead and hold the pagetable lock for the duration of
+ * the access instead. Any caller not following these requirements must use
+ * unsafe_follow_pfn() instead.
  *
  * Return: zero and the pfn at @pfn on success, -ve otherwise.
  */
@@ -4844,6 +4849,31 @@ int follow_pfn(struct vm_area_struct *vma, unsigned long address,
 }
 EXPORT_SYMBOL(follow_pfn);
 
+/**
+ * unsafe_follow_pfn - look up PFN at a user virtual address
+ * @vma: memory mapping
+ * @address: user virtual address
+ * @pfn: location to store found PFN
+ *
+ * Only IO mappings and raw PFN mappings are allowed.
+ *
+ * Returns zero and the pfn at @pfn on success, -ve otherwise.
+ */
+int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address,
+	unsigned long *pfn)
+{
+#ifdef CONFIG_STRICT_FOLLOW_PFN
+	pr_info("unsafe follow_pfn usage rejected, see CONFIG_STRICT_FOLLOW_PFN\n");
+	return -EINVAL;
+#else
+	WARN_ONCE(1, "unsafe follow_pfn usage\n");
+	add_taint(TAINT_USER, LOCKDEP_STILL_OK);
+
+	return follow_pfn(vma, address, pfn);
+#endif
+}
+EXPORT_SYMBOL(unsafe_follow_pfn);
+
 #ifdef CONFIG_HAVE_IOREMAP_PROT
 int follow_phys(struct vm_area_struct *vma,
 		unsigned long address, unsigned int flags,
diff --git a/mm/nommu.c b/mm/nommu.c
index 75a327149af1..3db2910f0d64 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -132,6 +132,23 @@ int follow_pfn(struct vm_area_struct *vma, unsigned long address,
 }
 EXPORT_SYMBOL(follow_pfn);
 
+/**
+ * unsafe_follow_pfn - look up PFN at a user virtual address
+ * @vma: memory mapping
+ * @address: user virtual address
+ * @pfn: location to store found PFN
+ *
+ * Only IO mappings and raw PFN mappings are allowed.
+ *
+ * Returns zero and the pfn at @pfn on success, -ve otherwise.
+ */
+int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address,
+	unsigned long *pfn)
+{
+	return follow_pfn(vma, address, pfn);
+}
+EXPORT_SYMBOL(unsafe_follow_pfn);
+
 LIST_HEAD(vmap_area_list);
 
 void vfree(const void *addr)
diff --git a/security/Kconfig b/security/Kconfig
index 7561f6f99f1d..48945402e103 100644
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -230,6 +230,19 @@ config STATIC_USERMODEHELPER_PATH
 	  If you wish for all usermode helper programs to be disabled,
 	  specify an empty string here (i.e. "").
 
+config STRICT_FOLLOW_PFN
+	bool "Disable unsafe use of follow_pfn"
+	depends on MMU
+	help
+	  Some functionality in the kernel follows userspace mappings to iomem
+	  ranges in an unsafe matter. Examples include v4l userptr for zero-copy
+	  buffers sharing.
+
+	  If this option is switched on, such access is rejected. Only enable
+	  this option when you must run userspace which requires this.
+
+	  If in doubt, say Y.
+
 source "security/selinux/Kconfig"
 source "security/smack/Kconfig"
 source "security/tomoyo/Kconfig"
-- 
2.28.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* [PATCH v2 09/17] mm: Add unsafe_follow_pfn
@ 2020-10-09  7:59   ` Daniel Vetter
  0 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-09  7:59 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, kvm,
	Jason Gunthorpe, Daniel Vetter, linux-mm,
	Jérôme Glisse, John Hubbard, Daniel Vetter,
	Dan Williams, Andrew Morton, linux-arm-kernel, linux-media

Way back it was a reasonable assumptions that iomem mappings never
change the pfn range they point at. But this has changed:

- gpu drivers dynamically manage their memory nowadays, invalidating
ptes with unmap_mapping_range when buffers get moved

- contiguous dma allocations have moved from dedicated carvetouts to
cma regions. This means if we miss the unmap the pfn might contain
pagecache or anon memory (well anything allocated with GFP_MOVEABLE)

- even /dev/mem now invalidates mappings when the kernel requests that
iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
("/dev/mem: Revoke mappings when a driver claims the region")

Accessing pfns obtained from ptes without holding all the locks is
therefore no longer a good idea.

Unfortunately there's some users where this is not fixable (like v4l
userptr of iomem mappings) or involves a pile of work (vfio type1
iommu). For now annotate these as unsafe and splat appropriately.

This patch adds an unsafe_follow_pfn, which later patches will then
roll out to all appropriate places.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kees Cook <keescook@chromium.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: kvm@vger.kernel.org
---
 include/linux/mm.h |  2 ++
 mm/memory.c        | 32 +++++++++++++++++++++++++++++++-
 mm/nommu.c         | 17 +++++++++++++++++
 security/Kconfig   | 13 +++++++++++++
 4 files changed, 63 insertions(+), 1 deletion(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 2a16631c1fda..ec8c90928fc9 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1653,6 +1653,8 @@ int follow_pte_pmd(struct mm_struct *mm, unsigned long address,
 		   pte_t **ptepp, pmd_t **pmdpp, spinlock_t **ptlp);
 int follow_pfn(struct vm_area_struct *vma, unsigned long address,
 	unsigned long *pfn);
+int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address,
+		      unsigned long *pfn);
 int follow_phys(struct vm_area_struct *vma, unsigned long address,
 		unsigned int flags, unsigned long *prot, resource_size_t *phys);
 int generic_access_phys(struct vm_area_struct *vma, unsigned long addr,
diff --git a/mm/memory.c b/mm/memory.c
index f7cbc4dde0ef..7c7b234ffb24 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4821,7 +4821,12 @@ EXPORT_SYMBOL(follow_pte_pmd);
  * @address: user virtual address
  * @pfn: location to store found PFN
  *
- * Only IO mappings and raw PFN mappings are allowed.
+ * Only IO mappings and raw PFN mappings are allowed. Note that callers must
+ * ensure coherency with pte updates by using a &mmu_notifier to follow updates.
+ * If this is not feasible, or the access to the @pfn is only very short term,
+ * use follow_pte_pmd() instead and hold the pagetable lock for the duration of
+ * the access instead. Any caller not following these requirements must use
+ * unsafe_follow_pfn() instead.
  *
  * Return: zero and the pfn at @pfn on success, -ve otherwise.
  */
@@ -4844,6 +4849,31 @@ int follow_pfn(struct vm_area_struct *vma, unsigned long address,
 }
 EXPORT_SYMBOL(follow_pfn);
 
+/**
+ * unsafe_follow_pfn - look up PFN at a user virtual address
+ * @vma: memory mapping
+ * @address: user virtual address
+ * @pfn: location to store found PFN
+ *
+ * Only IO mappings and raw PFN mappings are allowed.
+ *
+ * Returns zero and the pfn at @pfn on success, -ve otherwise.
+ */
+int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address,
+	unsigned long *pfn)
+{
+#ifdef CONFIG_STRICT_FOLLOW_PFN
+	pr_info("unsafe follow_pfn usage rejected, see CONFIG_STRICT_FOLLOW_PFN\n");
+	return -EINVAL;
+#else
+	WARN_ONCE(1, "unsafe follow_pfn usage\n");
+	add_taint(TAINT_USER, LOCKDEP_STILL_OK);
+
+	return follow_pfn(vma, address, pfn);
+#endif
+}
+EXPORT_SYMBOL(unsafe_follow_pfn);
+
 #ifdef CONFIG_HAVE_IOREMAP_PROT
 int follow_phys(struct vm_area_struct *vma,
 		unsigned long address, unsigned int flags,
diff --git a/mm/nommu.c b/mm/nommu.c
index 75a327149af1..3db2910f0d64 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -132,6 +132,23 @@ int follow_pfn(struct vm_area_struct *vma, unsigned long address,
 }
 EXPORT_SYMBOL(follow_pfn);
 
+/**
+ * unsafe_follow_pfn - look up PFN at a user virtual address
+ * @vma: memory mapping
+ * @address: user virtual address
+ * @pfn: location to store found PFN
+ *
+ * Only IO mappings and raw PFN mappings are allowed.
+ *
+ * Returns zero and the pfn at @pfn on success, -ve otherwise.
+ */
+int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address,
+	unsigned long *pfn)
+{
+	return follow_pfn(vma, address, pfn);
+}
+EXPORT_SYMBOL(unsafe_follow_pfn);
+
 LIST_HEAD(vmap_area_list);
 
 void vfree(const void *addr)
diff --git a/security/Kconfig b/security/Kconfig
index 7561f6f99f1d..48945402e103 100644
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -230,6 +230,19 @@ config STATIC_USERMODEHELPER_PATH
 	  If you wish for all usermode helper programs to be disabled,
 	  specify an empty string here (i.e. "").
 
+config STRICT_FOLLOW_PFN
+	bool "Disable unsafe use of follow_pfn"
+	depends on MMU
+	help
+	  Some functionality in the kernel follows userspace mappings to iomem
+	  ranges in an unsafe matter. Examples include v4l userptr for zero-copy
+	  buffers sharing.
+
+	  If this option is switched on, such access is rejected. Only enable
+	  this option when you must run userspace which requires this.
+
+	  If in doubt, say Y.
+
 source "security/selinux/Kconfig"
 source "security/smack/Kconfig"
 source "security/tomoyo/Kconfig"
-- 
2.28.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* [PATCH v2 10/17] media/videbuf1|2: Mark follow_pfn usage as unsafe
  2020-10-09  7:59 ` Daniel Vetter
  (?)
@ 2020-10-09  7:59   ` Daniel Vetter
  -1 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-09  7:59 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: kvm, linux-mm, linux-arm-kernel, linux-samsung-soc, linux-media,
	linux-s390, Daniel Vetter, Daniel Vetter, Jason Gunthorpe,
	Kees Cook, Dan Williams, Andrew Morton, John Hubbard,
	Jérôme Glisse, Jan Kara, Pawel Osciak,
	Marek Szyprowski, Kyungmin Park, Tomasz Figa, Laurent Dufour,
	Vlastimil Babka, Daniel Jordan, Michel Lespinasse

The media model assumes that buffers are all preallocated, so that
when a media pipeline is running we never miss a deadline because the
buffers aren't allocated or available.

This means we cannot fix the v4l follow_pfn usage through
mmu_notifier, without breaking how this all works. The only real fix
is to deprecate userptr support for VM_IO | VM_PFNMAP mappings and
tell everyone to cut over to dma-buf memory sharing for zerocopy.

userptr for normal memory will keep working as-is.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kees Cook <keescook@chromium.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Pawel Osciak <pawel@osciak.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Tomasz Figa <tfiga@chromium.org>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Michel Lespinasse <walken@google.com>
---
 drivers/media/common/videobuf2/frame_vector.c | 2 +-
 drivers/media/v4l2-core/videobuf-dma-contig.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/media/common/videobuf2/frame_vector.c b/drivers/media/common/videobuf2/frame_vector.c
index 2b0b97761d15..a1b85fe9e7c1 100644
--- a/drivers/media/common/videobuf2/frame_vector.c
+++ b/drivers/media/common/videobuf2/frame_vector.c
@@ -69,7 +69,7 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
 			break;
 
 		while (ret < nr_frames && start + PAGE_SIZE <= vma->vm_end) {
-			err = follow_pfn(vma, start, &nums[ret]);
+			err = unsafe_follow_pfn(vma, start, &nums[ret]);
 			if (err) {
 				if (ret == 0)
 					ret = err;
diff --git a/drivers/media/v4l2-core/videobuf-dma-contig.c b/drivers/media/v4l2-core/videobuf-dma-contig.c
index 52312ce2ba05..821c4a76ab96 100644
--- a/drivers/media/v4l2-core/videobuf-dma-contig.c
+++ b/drivers/media/v4l2-core/videobuf-dma-contig.c
@@ -183,7 +183,7 @@ static int videobuf_dma_contig_user_get(struct videobuf_dma_contig_memory *mem,
 	user_address = untagged_baddr;
 
 	while (pages_done < (mem->size >> PAGE_SHIFT)) {
-		ret = follow_pfn(vma, user_address, &this_pfn);
+		ret = unsafe_follow_pfn(vma, user_address, &this_pfn);
 		if (ret)
 			break;
 
-- 
2.28.0


^ permalink raw reply	[flat|nested] 214+ messages in thread

* [PATCH v2 10/17] media/videbuf1|2: Mark follow_pfn usage as unsafe
@ 2020-10-09  7:59   ` Daniel Vetter
  0 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-09  7:59 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: Jan Kara, kvm, Daniel Vetter, linux-mm, Daniel Vetter,
	Michel Lespinasse, Marek Szyprowski, linux-s390,
	linux-samsung-soc, Daniel Jordan, Jason Gunthorpe,
	linux-arm-kernel, linux-media, Kees Cook, Pawel Osciak,
	John Hubbard, Jérôme Glisse, Dan Williams,
	Laurent Dufour, Vlastimil Babka, Tomasz Figa, Kyungmin Park,
	Andrew Morton

The media model assumes that buffers are all preallocated, so that
when a media pipeline is running we never miss a deadline because the
buffers aren't allocated or available.

This means we cannot fix the v4l follow_pfn usage through
mmu_notifier, without breaking how this all works. The only real fix
is to deprecate userptr support for VM_IO | VM_PFNMAP mappings and
tell everyone to cut over to dma-buf memory sharing for zerocopy.

userptr for normal memory will keep working as-is.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kees Cook <keescook@chromium.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Pawel Osciak <pawel@osciak.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Tomasz Figa <tfiga@chromium.org>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Michel Lespinasse <walken@google.com>
---
 drivers/media/common/videobuf2/frame_vector.c | 2 +-
 drivers/media/v4l2-core/videobuf-dma-contig.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/media/common/videobuf2/frame_vector.c b/drivers/media/common/videobuf2/frame_vector.c
index 2b0b97761d15..a1b85fe9e7c1 100644
--- a/drivers/media/common/videobuf2/frame_vector.c
+++ b/drivers/media/common/videobuf2/frame_vector.c
@@ -69,7 +69,7 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
 			break;
 
 		while (ret < nr_frames && start + PAGE_SIZE <= vma->vm_end) {
-			err = follow_pfn(vma, start, &nums[ret]);
+			err = unsafe_follow_pfn(vma, start, &nums[ret]);
 			if (err) {
 				if (ret == 0)
 					ret = err;
diff --git a/drivers/media/v4l2-core/videobuf-dma-contig.c b/drivers/media/v4l2-core/videobuf-dma-contig.c
index 52312ce2ba05..821c4a76ab96 100644
--- a/drivers/media/v4l2-core/videobuf-dma-contig.c
+++ b/drivers/media/v4l2-core/videobuf-dma-contig.c
@@ -183,7 +183,7 @@ static int videobuf_dma_contig_user_get(struct videobuf_dma_contig_memory *mem,
 	user_address = untagged_baddr;
 
 	while (pages_done < (mem->size >> PAGE_SHIFT)) {
-		ret = follow_pfn(vma, user_address, &this_pfn);
+		ret = unsafe_follow_pfn(vma, user_address, &this_pfn);
 		if (ret)
 			break;
 
-- 
2.28.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* [PATCH v2 10/17] media/videbuf1|2: Mark follow_pfn usage as unsafe
@ 2020-10-09  7:59   ` Daniel Vetter
  0 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-09  7:59 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: Jan Kara, kvm, Daniel Vetter, linux-mm, Daniel Vetter,
	Michel Lespinasse, Marek Szyprowski, linux-s390,
	linux-samsung-soc, Daniel Jordan, Jason Gunthorpe,
	linux-arm-kernel, linux-media, Kees Cook, Pawel Osciak,
	John Hubbard, Jérôme Glisse, Dan Williams,
	Laurent Dufour, Vlastimil Babka, Tomasz Figa, Kyungmin Park,
	Andrew Morton

The media model assumes that buffers are all preallocated, so that
when a media pipeline is running we never miss a deadline because the
buffers aren't allocated or available.

This means we cannot fix the v4l follow_pfn usage through
mmu_notifier, without breaking how this all works. The only real fix
is to deprecate userptr support for VM_IO | VM_PFNMAP mappings and
tell everyone to cut over to dma-buf memory sharing for zerocopy.

userptr for normal memory will keep working as-is.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kees Cook <keescook@chromium.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Pawel Osciak <pawel@osciak.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Tomasz Figa <tfiga@chromium.org>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Michel Lespinasse <walken@google.com>
---
 drivers/media/common/videobuf2/frame_vector.c | 2 +-
 drivers/media/v4l2-core/videobuf-dma-contig.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/media/common/videobuf2/frame_vector.c b/drivers/media/common/videobuf2/frame_vector.c
index 2b0b97761d15..a1b85fe9e7c1 100644
--- a/drivers/media/common/videobuf2/frame_vector.c
+++ b/drivers/media/common/videobuf2/frame_vector.c
@@ -69,7 +69,7 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
 			break;
 
 		while (ret < nr_frames && start + PAGE_SIZE <= vma->vm_end) {
-			err = follow_pfn(vma, start, &nums[ret]);
+			err = unsafe_follow_pfn(vma, start, &nums[ret]);
 			if (err) {
 				if (ret == 0)
 					ret = err;
diff --git a/drivers/media/v4l2-core/videobuf-dma-contig.c b/drivers/media/v4l2-core/videobuf-dma-contig.c
index 52312ce2ba05..821c4a76ab96 100644
--- a/drivers/media/v4l2-core/videobuf-dma-contig.c
+++ b/drivers/media/v4l2-core/videobuf-dma-contig.c
@@ -183,7 +183,7 @@ static int videobuf_dma_contig_user_get(struct videobuf_dma_contig_memory *mem,
 	user_address = untagged_baddr;
 
 	while (pages_done < (mem->size >> PAGE_SHIFT)) {
-		ret = follow_pfn(vma, user_address, &this_pfn);
+		ret = unsafe_follow_pfn(vma, user_address, &this_pfn);
 		if (ret)
 			break;
 
-- 
2.28.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* [PATCH v2 11/17] vfio/type1: Mark follow_pfn as unsafe
  2020-10-09  7:59 ` Daniel Vetter
  (?)
@ 2020-10-09  7:59   ` Daniel Vetter
  -1 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-09  7:59 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: kvm, linux-mm, linux-arm-kernel, linux-samsung-soc, linux-media,
	linux-s390, Daniel Vetter, Daniel Vetter, Jason Gunthorpe,
	Kees Cook, Dan Williams, Andrew Morton, John Hubbard,
	Jérôme Glisse, Jan Kara, Alex Williamson,
	Cornelia Huck

The code seems to stuff these pfns into iommu pts (or something like
that, I didn't follow), but there's no mmu_notifier to ensure that
access is synchronized with pte updates.

Hence mark these as unsafe. This means that with
CONFIG_STRICT_FOLLOW_PFN, these will be rejected.

Real fix is to wire up an mmu_notifier ... somehow. Probably means any
invalidate is a fatal fault for this vfio device, but then this
shouldn't ever happen if userspace is reasonable.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kees Cook <keescook@chromium.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Cornelia Huck <cohuck@redhat.com>
Cc: kvm@vger.kernel.org
---
 drivers/vfio/vfio_iommu_type1.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 5fbf0c1f7433..a4d53f3d0a35 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -421,7 +421,7 @@ static int follow_fault_pfn(struct vm_area_struct *vma, struct mm_struct *mm,
 {
 	int ret;
 
-	ret = follow_pfn(vma, vaddr, pfn);
+	ret = unsafe_follow_pfn(vma, vaddr, pfn);
 	if (ret) {
 		bool unlocked = false;
 
@@ -435,7 +435,7 @@ static int follow_fault_pfn(struct vm_area_struct *vma, struct mm_struct *mm,
 		if (ret)
 			return ret;
 
-		ret = follow_pfn(vma, vaddr, pfn);
+		ret = unsafe_follow_pfn(vma, vaddr, pfn);
 	}
 
 	return ret;
-- 
2.28.0


^ permalink raw reply	[flat|nested] 214+ messages in thread

* [PATCH v2 11/17] vfio/type1: Mark follow_pfn as unsafe
@ 2020-10-09  7:59   ` Daniel Vetter
  0 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-09  7:59 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, kvm,
	Jason Gunthorpe, Daniel Vetter, Cornelia Huck, Alex Williamson,
	linux-mm, Jérôme Glisse, John Hubbard, Daniel Vetter,
	Dan Williams, Andrew Morton, linux-arm-kernel, linux-media

The code seems to stuff these pfns into iommu pts (or something like
that, I didn't follow), but there's no mmu_notifier to ensure that
access is synchronized with pte updates.

Hence mark these as unsafe. This means that with
CONFIG_STRICT_FOLLOW_PFN, these will be rejected.

Real fix is to wire up an mmu_notifier ... somehow. Probably means any
invalidate is a fatal fault for this vfio device, but then this
shouldn't ever happen if userspace is reasonable.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kees Cook <keescook@chromium.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Cornelia Huck <cohuck@redhat.com>
Cc: kvm@vger.kernel.org
---
 drivers/vfio/vfio_iommu_type1.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 5fbf0c1f7433..a4d53f3d0a35 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -421,7 +421,7 @@ static int follow_fault_pfn(struct vm_area_struct *vma, struct mm_struct *mm,
 {
 	int ret;
 
-	ret = follow_pfn(vma, vaddr, pfn);
+	ret = unsafe_follow_pfn(vma, vaddr, pfn);
 	if (ret) {
 		bool unlocked = false;
 
@@ -435,7 +435,7 @@ static int follow_fault_pfn(struct vm_area_struct *vma, struct mm_struct *mm,
 		if (ret)
 			return ret;
 
-		ret = follow_pfn(vma, vaddr, pfn);
+		ret = unsafe_follow_pfn(vma, vaddr, pfn);
 	}
 
 	return ret;
-- 
2.28.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* [PATCH v2 11/17] vfio/type1: Mark follow_pfn as unsafe
@ 2020-10-09  7:59   ` Daniel Vetter
  0 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-09  7:59 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, kvm,
	Jason Gunthorpe, Daniel Vetter, Cornelia Huck, Alex Williamson,
	linux-mm, Jérôme Glisse, John Hubbard, Daniel Vetter,
	Dan Williams, Andrew Morton, linux-arm-kernel, linux-media

The code seems to stuff these pfns into iommu pts (or something like
that, I didn't follow), but there's no mmu_notifier to ensure that
access is synchronized with pte updates.

Hence mark these as unsafe. This means that with
CONFIG_STRICT_FOLLOW_PFN, these will be rejected.

Real fix is to wire up an mmu_notifier ... somehow. Probably means any
invalidate is a fatal fault for this vfio device, but then this
shouldn't ever happen if userspace is reasonable.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kees Cook <keescook@chromium.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Cornelia Huck <cohuck@redhat.com>
Cc: kvm@vger.kernel.org
---
 drivers/vfio/vfio_iommu_type1.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 5fbf0c1f7433..a4d53f3d0a35 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -421,7 +421,7 @@ static int follow_fault_pfn(struct vm_area_struct *vma, struct mm_struct *mm,
 {
 	int ret;
 
-	ret = follow_pfn(vma, vaddr, pfn);
+	ret = unsafe_follow_pfn(vma, vaddr, pfn);
 	if (ret) {
 		bool unlocked = false;
 
@@ -435,7 +435,7 @@ static int follow_fault_pfn(struct vm_area_struct *vma, struct mm_struct *mm,
 		if (ret)
 			return ret;
 
-		ret = follow_pfn(vma, vaddr, pfn);
+		ret = unsafe_follow_pfn(vma, vaddr, pfn);
 	}
 
 	return ret;
-- 
2.28.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* [PATCH v2 12/17] PCI: Obey iomem restrictions for procfs mmap
  2020-10-09  7:59 ` Daniel Vetter
  (?)
@ 2020-10-09  7:59   ` Daniel Vetter
  -1 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-09  7:59 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: kvm, linux-mm, linux-arm-kernel, linux-samsung-soc, linux-media,
	linux-s390, Daniel Vetter, Daniel Vetter, Jason Gunthorpe,
	Kees Cook, Dan Williams, Andrew Morton, John Hubbard,
	Jérôme Glisse, Jan Kara, Bjorn Helgaas, linux-pci

There's three ways to access PCI BARs from userspace: /dev/mem, sysfs
files, and the old proc interface. Two check against
iomem_is_exclusive, proc never did. And with CONFIG_IO_STRICT_DEVMEM,
this starts to matter, since we don't want random userspace having
access to PCI BARs while a driver is loaded and using it.

Fix this by adding the same iomem_is_exclusive() check we already have
on the sysfs side in pci_mmap_resource().

References: 90a545e98126 ("restrict /dev/mem to idle io memory ranges")
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kees Cook <keescook@chromium.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-pci@vger.kernel.org
--
v2: Improve commit message (Bjorn)
---
 drivers/pci/proc.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/pci/proc.c b/drivers/pci/proc.c
index d35186b01d98..3a2f90beb4cb 100644
--- a/drivers/pci/proc.c
+++ b/drivers/pci/proc.c
@@ -274,6 +274,11 @@ static int proc_bus_pci_mmap(struct file *file, struct vm_area_struct *vma)
 		else
 			return -EINVAL;
 	}
+
+	if (dev->resource[i].flags & IORESOURCE_MEM &&
+	    iomem_is_exclusive(dev->resource[i].start))
+		return -EINVAL;
+
 	ret = pci_mmap_page_range(dev, i, vma,
 				  fpriv->mmap_state, write_combine);
 	if (ret < 0)
-- 
2.28.0


^ permalink raw reply	[flat|nested] 214+ messages in thread

* [PATCH v2 12/17] PCI: Obey iomem restrictions for procfs mmap
@ 2020-10-09  7:59   ` Daniel Vetter
  0 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-09  7:59 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, kvm,
	Jason Gunthorpe, Daniel Vetter, linux-pci, linux-mm,
	Jérôme Glisse, John Hubbard, Bjorn Helgaas,
	Daniel Vetter, Dan Williams, Andrew Morton, linux-arm-kernel,
	linux-media

There's three ways to access PCI BARs from userspace: /dev/mem, sysfs
files, and the old proc interface. Two check against
iomem_is_exclusive, proc never did. And with CONFIG_IO_STRICT_DEVMEM,
this starts to matter, since we don't want random userspace having
access to PCI BARs while a driver is loaded and using it.

Fix this by adding the same iomem_is_exclusive() check we already have
on the sysfs side in pci_mmap_resource().

References: 90a545e98126 ("restrict /dev/mem to idle io memory ranges")
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kees Cook <keescook@chromium.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-pci@vger.kernel.org
--
v2: Improve commit message (Bjorn)
---
 drivers/pci/proc.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/pci/proc.c b/drivers/pci/proc.c
index d35186b01d98..3a2f90beb4cb 100644
--- a/drivers/pci/proc.c
+++ b/drivers/pci/proc.c
@@ -274,6 +274,11 @@ static int proc_bus_pci_mmap(struct file *file, struct vm_area_struct *vma)
 		else
 			return -EINVAL;
 	}
+
+	if (dev->resource[i].flags & IORESOURCE_MEM &&
+	    iomem_is_exclusive(dev->resource[i].start))
+		return -EINVAL;
+
 	ret = pci_mmap_page_range(dev, i, vma,
 				  fpriv->mmap_state, write_combine);
 	if (ret < 0)
-- 
2.28.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* [PATCH v2 12/17] PCI: Obey iomem restrictions for procfs mmap
@ 2020-10-09  7:59   ` Daniel Vetter
  0 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-09  7:59 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, kvm,
	Jason Gunthorpe, Daniel Vetter, linux-pci, linux-mm,
	Jérôme Glisse, John Hubbard, Bjorn Helgaas,
	Daniel Vetter, Dan Williams, Andrew Morton, linux-arm-kernel,
	linux-media

There's three ways to access PCI BARs from userspace: /dev/mem, sysfs
files, and the old proc interface. Two check against
iomem_is_exclusive, proc never did. And with CONFIG_IO_STRICT_DEVMEM,
this starts to matter, since we don't want random userspace having
access to PCI BARs while a driver is loaded and using it.

Fix this by adding the same iomem_is_exclusive() check we already have
on the sysfs side in pci_mmap_resource().

References: 90a545e98126 ("restrict /dev/mem to idle io memory ranges")
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kees Cook <keescook@chromium.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-pci@vger.kernel.org
--
v2: Improve commit message (Bjorn)
---
 drivers/pci/proc.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/pci/proc.c b/drivers/pci/proc.c
index d35186b01d98..3a2f90beb4cb 100644
--- a/drivers/pci/proc.c
+++ b/drivers/pci/proc.c
@@ -274,6 +274,11 @@ static int proc_bus_pci_mmap(struct file *file, struct vm_area_struct *vma)
 		else
 			return -EINVAL;
 	}
+
+	if (dev->resource[i].flags & IORESOURCE_MEM &&
+	    iomem_is_exclusive(dev->resource[i].start))
+		return -EINVAL;
+
 	ret = pci_mmap_page_range(dev, i, vma,
 				  fpriv->mmap_state, write_combine);
 	if (ret < 0)
-- 
2.28.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* [PATCH v2 13/17] /dev/mem: Only set filp->f_mapping
  2020-10-09  7:59 ` Daniel Vetter
  (?)
@ 2020-10-09  7:59   ` Daniel Vetter
  -1 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-09  7:59 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: kvm, linux-mm, linux-arm-kernel, linux-samsung-soc, linux-media,
	linux-s390, Daniel Vetter, Daniel Vetter, Jason Gunthorpe,
	Kees Cook, Dan Williams, Andrew Morton, John Hubbard,
	Jérôme Glisse, Jan Kara

When we care about pagecache maintenance, we need to make sure that
both f_mapping and i_mapping point at the right mapping.

But for iomem mappings we only care about the virtual/pte side of
things, so f_mapping is enough. Also setting inode->i_mapping was
confusing me as a driver maintainer, since in e.g. drivers/gpu we
don't do that. Per Dan this seems to be copypasta from places which do
care about pagecache consistency, but not needed. Hence remove it for
slightly less confusion.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kees Cook <keescook@chromium.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
---
 drivers/char/mem.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/char/mem.c b/drivers/char/mem.c
index abd4ffdc8cde..5502f56f3655 100644
--- a/drivers/char/mem.c
+++ b/drivers/char/mem.c
@@ -864,7 +864,6 @@ static int open_port(struct inode *inode, struct file *filp)
 	 * revocations when drivers want to take over a /dev/mem mapped
 	 * range.
 	 */
-	inode->i_mapping = devmem_inode->i_mapping;
 	filp->f_mapping = inode->i_mapping;
 
 	return 0;
-- 
2.28.0


^ permalink raw reply	[flat|nested] 214+ messages in thread

* [PATCH v2 13/17] /dev/mem: Only set filp->f_mapping
@ 2020-10-09  7:59   ` Daniel Vetter
  0 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-09  7:59 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, kvm,
	Jason Gunthorpe, Daniel Vetter, linux-mm,
	Jérôme Glisse, John Hubbard, Daniel Vetter,
	Dan Williams, Andrew Morton, linux-arm-kernel, linux-media

When we care about pagecache maintenance, we need to make sure that
both f_mapping and i_mapping point at the right mapping.

But for iomem mappings we only care about the virtual/pte side of
things, so f_mapping is enough. Also setting inode->i_mapping was
confusing me as a driver maintainer, since in e.g. drivers/gpu we
don't do that. Per Dan this seems to be copypasta from places which do
care about pagecache consistency, but not needed. Hence remove it for
slightly less confusion.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kees Cook <keescook@chromium.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
---
 drivers/char/mem.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/char/mem.c b/drivers/char/mem.c
index abd4ffdc8cde..5502f56f3655 100644
--- a/drivers/char/mem.c
+++ b/drivers/char/mem.c
@@ -864,7 +864,6 @@ static int open_port(struct inode *inode, struct file *filp)
 	 * revocations when drivers want to take over a /dev/mem mapped
 	 * range.
 	 */
-	inode->i_mapping = devmem_inode->i_mapping;
 	filp->f_mapping = inode->i_mapping;
 
 	return 0;
-- 
2.28.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* [PATCH v2 13/17] /dev/mem: Only set filp->f_mapping
@ 2020-10-09  7:59   ` Daniel Vetter
  0 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-09  7:59 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, kvm,
	Jason Gunthorpe, Daniel Vetter, linux-mm,
	Jérôme Glisse, John Hubbard, Daniel Vetter,
	Dan Williams, Andrew Morton, linux-arm-kernel, linux-media

When we care about pagecache maintenance, we need to make sure that
both f_mapping and i_mapping point at the right mapping.

But for iomem mappings we only care about the virtual/pte side of
things, so f_mapping is enough. Also setting inode->i_mapping was
confusing me as a driver maintainer, since in e.g. drivers/gpu we
don't do that. Per Dan this seems to be copypasta from places which do
care about pagecache consistency, but not needed. Hence remove it for
slightly less confusion.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kees Cook <keescook@chromium.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
---
 drivers/char/mem.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/char/mem.c b/drivers/char/mem.c
index abd4ffdc8cde..5502f56f3655 100644
--- a/drivers/char/mem.c
+++ b/drivers/char/mem.c
@@ -864,7 +864,6 @@ static int open_port(struct inode *inode, struct file *filp)
 	 * revocations when drivers want to take over a /dev/mem mapped
 	 * range.
 	 */
-	inode->i_mapping = devmem_inode->i_mapping;
 	filp->f_mapping = inode->i_mapping;
 
 	return 0;
-- 
2.28.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* [PATCH v2 14/17] resource: Move devmem revoke code to resource framework
  2020-10-09  7:59 ` Daniel Vetter
  (?)
@ 2020-10-09  7:59   ` Daniel Vetter
  -1 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-09  7:59 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: kvm, linux-mm, linux-arm-kernel, linux-samsung-soc, linux-media,
	linux-s390, Daniel Vetter, Daniel Vetter, Jason Gunthorpe,
	Kees Cook, Dan Williams, Andrew Morton, John Hubbard,
	Jérôme Glisse, Jan Kara, Arnd Bergmann,
	Greg Kroah-Hartman, David Hildenbrand, Rafael J. Wysocki

We want all iomem mmaps to consistently revoke ptes when the kernel
takes over and CONFIG_IO_STRICT_DEVMEM is enabled. This includes the
pci bar mmaps available through procfs and sysfs, which currently do
not revoke mappings.

To prepare for this, move the code from the /dev/kmem driver to
kernel/resource.c.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kees Cook <keescook@chromium.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Hildenbrand <david@redhat.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
---
 drivers/char/mem.c     | 85 +------------------------------------
 include/linux/ioport.h |  6 +--
 kernel/resource.c      | 95 +++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 96 insertions(+), 90 deletions(-)

diff --git a/drivers/char/mem.c b/drivers/char/mem.c
index 5502f56f3655..53338aad8d28 100644
--- a/drivers/char/mem.c
+++ b/drivers/char/mem.c
@@ -31,9 +31,6 @@
 #include <linux/uio.h>
 #include <linux/uaccess.h>
 #include <linux/security.h>
-#include <linux/pseudo_fs.h>
-#include <uapi/linux/magic.h>
-#include <linux/mount.h>
 
 #ifdef CONFIG_IA64
 # include <linux/efi.h>
@@ -809,42 +806,6 @@ static loff_t memory_lseek(struct file *file, loff_t offset, int orig)
 	return ret;
 }
 
-static struct inode *devmem_inode;
-
-#ifdef CONFIG_IO_STRICT_DEVMEM
-void revoke_devmem(struct resource *res)
-{
-	/* pairs with smp_store_release() in devmem_init_inode() */
-	struct inode *inode = smp_load_acquire(&devmem_inode);
-
-	/*
-	 * Check that the initialization has completed. Losing the race
-	 * is ok because it means drivers are claiming resources before
-	 * the fs_initcall level of init and prevent /dev/mem from
-	 * establishing mappings.
-	 */
-	if (!inode)
-		return;
-
-	/*
-	 * The expectation is that the driver has successfully marked
-	 * the resource busy by this point, so devmem_is_allowed()
-	 * should start returning false, however for performance this
-	 * does not iterate the entire resource range.
-	 */
-	if (devmem_is_allowed(PHYS_PFN(res->start)) &&
-	    devmem_is_allowed(PHYS_PFN(res->end))) {
-		/*
-		 * *cringe* iomem=relaxed says "go ahead, what's the
-		 * worst that can happen?"
-		 */
-		return;
-	}
-
-	unmap_mapping_range(inode->i_mapping, res->start, resource_size(res), 1);
-}
-#endif
-
 static int open_port(struct inode *inode, struct file *filp)
 {
 	int rc;
@@ -864,7 +825,7 @@ static int open_port(struct inode *inode, struct file *filp)
 	 * revocations when drivers want to take over a /dev/mem mapped
 	 * range.
 	 */
-	filp->f_mapping = inode->i_mapping;
+	filp->f_mapping = iomem_get_mapping();
 
 	return 0;
 }
@@ -995,48 +956,6 @@ static char *mem_devnode(struct device *dev, umode_t *mode)
 
 static struct class *mem_class;
 
-static int devmem_fs_init_fs_context(struct fs_context *fc)
-{
-	return init_pseudo(fc, DEVMEM_MAGIC) ? 0 : -ENOMEM;
-}
-
-static struct file_system_type devmem_fs_type = {
-	.name		= "devmem",
-	.owner		= THIS_MODULE,
-	.init_fs_context = devmem_fs_init_fs_context,
-	.kill_sb	= kill_anon_super,
-};
-
-static int devmem_init_inode(void)
-{
-	static struct vfsmount *devmem_vfs_mount;
-	static int devmem_fs_cnt;
-	struct inode *inode;
-	int rc;
-
-	rc = simple_pin_fs(&devmem_fs_type, &devmem_vfs_mount, &devmem_fs_cnt);
-	if (rc < 0) {
-		pr_err("Cannot mount /dev/mem pseudo filesystem: %d\n", rc);
-		return rc;
-	}
-
-	inode = alloc_anon_inode(devmem_vfs_mount->mnt_sb);
-	if (IS_ERR(inode)) {
-		rc = PTR_ERR(inode);
-		pr_err("Cannot allocate inode for /dev/mem: %d\n", rc);
-		simple_release_fs(&devmem_vfs_mount, &devmem_fs_cnt);
-		return rc;
-	}
-
-	/*
-	 * Publish /dev/mem initialized.
-	 * Pairs with smp_load_acquire() in revoke_devmem().
-	 */
-	smp_store_release(&devmem_inode, inode);
-
-	return 0;
-}
-
 static int __init chr_dev_init(void)
 {
 	int minor;
@@ -1058,8 +977,6 @@ static int __init chr_dev_init(void)
 		 */
 		if ((minor == DEVPORT_MINOR) && !arch_has_dev_port())
 			continue;
-		if ((minor == DEVMEM_MINOR) && devmem_init_inode() != 0)
-			continue;
 
 		device_create(mem_class, NULL, MKDEV(MEM_MAJOR, minor),
 			      NULL, devlist[minor].name);
diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index 6c2b06fe8beb..8ffb61b36606 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -302,11 +302,7 @@ struct resource *devm_request_free_mem_region(struct device *dev,
 struct resource *request_free_mem_region(struct resource *base,
 		unsigned long size, const char *name);
 
-#ifdef CONFIG_IO_STRICT_DEVMEM
-void revoke_devmem(struct resource *res);
-#else
-static inline void revoke_devmem(struct resource *res) { };
-#endif
+extern struct address_space *iomem_get_mapping(void);
 
 #endif /* __ASSEMBLY__ */
 #endif	/* _LINUX_IOPORT_H */
diff --git a/kernel/resource.c b/kernel/resource.c
index 841737bbda9e..22153fdec4f5 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -18,12 +18,15 @@
 #include <linux/spinlock.h>
 #include <linux/fs.h>
 #include <linux/proc_fs.h>
+#include <linux/pseudo_fs.h>
 #include <linux/sched.h>
 #include <linux/seq_file.h>
 #include <linux/device.h>
 #include <linux/pfn.h>
 #include <linux/mm.h>
+#include <linux/mount.h>
 #include <linux/resource_ext.h>
+#include <uapi/linux/magic.h>
 #include <asm/io.h>
 
 
@@ -1112,6 +1115,52 @@ resource_size_t resource_alignment(struct resource *res)
 
 static DECLARE_WAIT_QUEUE_HEAD(muxed_resource_wait);
 
+static struct inode *iomem_inode;
+
+#ifdef CONFIG_IO_STRICT_DEVMEM
+static void revoke_iomem(struct resource *res)
+{
+	/* pairs with smp_store_release() in iomem_init_inode() */
+	struct inode *inode = smp_load_acquire(&iomem_inode);
+
+	/*
+	 * Check that the initialization has completed. Losing the race
+	 * is ok because it means drivers are claiming resources before
+	 * the fs_initcall level of init and prevent /dev/mem from
+	 * establishing mappings.
+	 */
+	if (!inode)
+		return;
+
+	/*
+	 * The expectation is that the driver has successfully marked
+	 * the resource busy by this point, so devmem_is_allowed()
+	 * should start returning false, however for performance this
+	 * does not iterate the entire resource range.
+	 */
+	if (devmem_is_allowed(PHYS_PFN(res->start)) &&
+	    devmem_is_allowed(PHYS_PFN(res->end))) {
+		/*
+		 * *cringe* iomem=relaxed says "go ahead, what's the
+		 * worst that can happen?"
+		 */
+		return;
+	}
+
+	unmap_mapping_range(inode->i_mapping, res->start, resource_size(res), 1);
+}
+struct address_space *iomem_get_mapping(void)
+{
+	return iomem_inode->i_mapping;
+}
+#else
+static void revoke_iomem(struct resource *res) {}
+struct address_space *iomem_get_mapping(void)
+{
+	return NULL;
+}
+#endif
+
 /**
  * __request_region - create a new busy resource region
  * @parent: parent resource descriptor
@@ -1179,7 +1228,7 @@ struct resource * __request_region(struct resource *parent,
 	write_unlock(&resource_lock);
 
 	if (res && orig_parent == &iomem_resource)
-		revoke_devmem(res);
+		revoke_iomem(res);
 
 	return res;
 }
@@ -1713,4 +1762,48 @@ static int __init strict_iomem(char *str)
 	return 1;
 }
 
+static int iomem_fs_init_fs_context(struct fs_context *fc)
+{
+	return init_pseudo(fc, DEVMEM_MAGIC) ? 0 : -ENOMEM;
+}
+
+static struct file_system_type iomem_fs_type = {
+	.name		= "iomem",
+	.owner		= THIS_MODULE,
+	.init_fs_context = iomem_fs_init_fs_context,
+	.kill_sb	= kill_anon_super,
+};
+
+static int __init iomem_init_inode(void)
+{
+	static struct vfsmount *iomem_vfs_mount;
+	static int iomem_fs_cnt;
+	struct inode *inode;
+	int rc;
+
+	rc = simple_pin_fs(&iomem_fs_type, &iomem_vfs_mount, &iomem_fs_cnt);
+	if (rc < 0) {
+		pr_err("Cannot mount iomem pseudo filesystem: %d\n", rc);
+		return rc;
+	}
+
+	inode = alloc_anon_inode(iomem_vfs_mount->mnt_sb);
+	if (IS_ERR(inode)) {
+		rc = PTR_ERR(inode);
+		pr_err("Cannot allocate inode for iomem: %d\n", rc);
+		simple_release_fs(&iomem_vfs_mount, &iomem_fs_cnt);
+		return rc;
+	}
+
+	/*
+	 * Publish /dev/mem initialized.
+	 * Pairs with smp_load_acquire() in revoke_iomem().
+	 */
+	smp_store_release(&iomem_inode, inode);
+
+	return 0;
+}
+
+fs_initcall(iomem_init_inode);
+
 __setup("iomem=", strict_iomem);
-- 
2.28.0


^ permalink raw reply	[flat|nested] 214+ messages in thread

* [PATCH v2 14/17] resource: Move devmem revoke code to resource framework
@ 2020-10-09  7:59   ` Daniel Vetter
  0 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-09  7:59 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, kvm,
	Jason Gunthorpe, Daniel Vetter, David Hildenbrand,
	Greg Kroah-Hartman, Rafael J. Wysocki, linux-mm,
	Jérôme Glisse, Arnd Bergmann, John Hubbard,
	Daniel Vetter, Dan Williams, Andrew Morton, linux-arm-kernel,
	linux-media

We want all iomem mmaps to consistently revoke ptes when the kernel
takes over and CONFIG_IO_STRICT_DEVMEM is enabled. This includes the
pci bar mmaps available through procfs and sysfs, which currently do
not revoke mappings.

To prepare for this, move the code from the /dev/kmem driver to
kernel/resource.c.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kees Cook <keescook@chromium.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Hildenbrand <david@redhat.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
---
 drivers/char/mem.c     | 85 +------------------------------------
 include/linux/ioport.h |  6 +--
 kernel/resource.c      | 95 +++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 96 insertions(+), 90 deletions(-)

diff --git a/drivers/char/mem.c b/drivers/char/mem.c
index 5502f56f3655..53338aad8d28 100644
--- a/drivers/char/mem.c
+++ b/drivers/char/mem.c
@@ -31,9 +31,6 @@
 #include <linux/uio.h>
 #include <linux/uaccess.h>
 #include <linux/security.h>
-#include <linux/pseudo_fs.h>
-#include <uapi/linux/magic.h>
-#include <linux/mount.h>
 
 #ifdef CONFIG_IA64
 # include <linux/efi.h>
@@ -809,42 +806,6 @@ static loff_t memory_lseek(struct file *file, loff_t offset, int orig)
 	return ret;
 }
 
-static struct inode *devmem_inode;
-
-#ifdef CONFIG_IO_STRICT_DEVMEM
-void revoke_devmem(struct resource *res)
-{
-	/* pairs with smp_store_release() in devmem_init_inode() */
-	struct inode *inode = smp_load_acquire(&devmem_inode);
-
-	/*
-	 * Check that the initialization has completed. Losing the race
-	 * is ok because it means drivers are claiming resources before
-	 * the fs_initcall level of init and prevent /dev/mem from
-	 * establishing mappings.
-	 */
-	if (!inode)
-		return;
-
-	/*
-	 * The expectation is that the driver has successfully marked
-	 * the resource busy by this point, so devmem_is_allowed()
-	 * should start returning false, however for performance this
-	 * does not iterate the entire resource range.
-	 */
-	if (devmem_is_allowed(PHYS_PFN(res->start)) &&
-	    devmem_is_allowed(PHYS_PFN(res->end))) {
-		/*
-		 * *cringe* iomem=relaxed says "go ahead, what's the
-		 * worst that can happen?"
-		 */
-		return;
-	}
-
-	unmap_mapping_range(inode->i_mapping, res->start, resource_size(res), 1);
-}
-#endif
-
 static int open_port(struct inode *inode, struct file *filp)
 {
 	int rc;
@@ -864,7 +825,7 @@ static int open_port(struct inode *inode, struct file *filp)
 	 * revocations when drivers want to take over a /dev/mem mapped
 	 * range.
 	 */
-	filp->f_mapping = inode->i_mapping;
+	filp->f_mapping = iomem_get_mapping();
 
 	return 0;
 }
@@ -995,48 +956,6 @@ static char *mem_devnode(struct device *dev, umode_t *mode)
 
 static struct class *mem_class;
 
-static int devmem_fs_init_fs_context(struct fs_context *fc)
-{
-	return init_pseudo(fc, DEVMEM_MAGIC) ? 0 : -ENOMEM;
-}
-
-static struct file_system_type devmem_fs_type = {
-	.name		= "devmem",
-	.owner		= THIS_MODULE,
-	.init_fs_context = devmem_fs_init_fs_context,
-	.kill_sb	= kill_anon_super,
-};
-
-static int devmem_init_inode(void)
-{
-	static struct vfsmount *devmem_vfs_mount;
-	static int devmem_fs_cnt;
-	struct inode *inode;
-	int rc;
-
-	rc = simple_pin_fs(&devmem_fs_type, &devmem_vfs_mount, &devmem_fs_cnt);
-	if (rc < 0) {
-		pr_err("Cannot mount /dev/mem pseudo filesystem: %d\n", rc);
-		return rc;
-	}
-
-	inode = alloc_anon_inode(devmem_vfs_mount->mnt_sb);
-	if (IS_ERR(inode)) {
-		rc = PTR_ERR(inode);
-		pr_err("Cannot allocate inode for /dev/mem: %d\n", rc);
-		simple_release_fs(&devmem_vfs_mount, &devmem_fs_cnt);
-		return rc;
-	}
-
-	/*
-	 * Publish /dev/mem initialized.
-	 * Pairs with smp_load_acquire() in revoke_devmem().
-	 */
-	smp_store_release(&devmem_inode, inode);
-
-	return 0;
-}
-
 static int __init chr_dev_init(void)
 {
 	int minor;
@@ -1058,8 +977,6 @@ static int __init chr_dev_init(void)
 		 */
 		if ((minor == DEVPORT_MINOR) && !arch_has_dev_port())
 			continue;
-		if ((minor == DEVMEM_MINOR) && devmem_init_inode() != 0)
-			continue;
 
 		device_create(mem_class, NULL, MKDEV(MEM_MAJOR, minor),
 			      NULL, devlist[minor].name);
diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index 6c2b06fe8beb..8ffb61b36606 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -302,11 +302,7 @@ struct resource *devm_request_free_mem_region(struct device *dev,
 struct resource *request_free_mem_region(struct resource *base,
 		unsigned long size, const char *name);
 
-#ifdef CONFIG_IO_STRICT_DEVMEM
-void revoke_devmem(struct resource *res);
-#else
-static inline void revoke_devmem(struct resource *res) { };
-#endif
+extern struct address_space *iomem_get_mapping(void);
 
 #endif /* __ASSEMBLY__ */
 #endif	/* _LINUX_IOPORT_H */
diff --git a/kernel/resource.c b/kernel/resource.c
index 841737bbda9e..22153fdec4f5 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -18,12 +18,15 @@
 #include <linux/spinlock.h>
 #include <linux/fs.h>
 #include <linux/proc_fs.h>
+#include <linux/pseudo_fs.h>
 #include <linux/sched.h>
 #include <linux/seq_file.h>
 #include <linux/device.h>
 #include <linux/pfn.h>
 #include <linux/mm.h>
+#include <linux/mount.h>
 #include <linux/resource_ext.h>
+#include <uapi/linux/magic.h>
 #include <asm/io.h>
 
 
@@ -1112,6 +1115,52 @@ resource_size_t resource_alignment(struct resource *res)
 
 static DECLARE_WAIT_QUEUE_HEAD(muxed_resource_wait);
 
+static struct inode *iomem_inode;
+
+#ifdef CONFIG_IO_STRICT_DEVMEM
+static void revoke_iomem(struct resource *res)
+{
+	/* pairs with smp_store_release() in iomem_init_inode() */
+	struct inode *inode = smp_load_acquire(&iomem_inode);
+
+	/*
+	 * Check that the initialization has completed. Losing the race
+	 * is ok because it means drivers are claiming resources before
+	 * the fs_initcall level of init and prevent /dev/mem from
+	 * establishing mappings.
+	 */
+	if (!inode)
+		return;
+
+	/*
+	 * The expectation is that the driver has successfully marked
+	 * the resource busy by this point, so devmem_is_allowed()
+	 * should start returning false, however for performance this
+	 * does not iterate the entire resource range.
+	 */
+	if (devmem_is_allowed(PHYS_PFN(res->start)) &&
+	    devmem_is_allowed(PHYS_PFN(res->end))) {
+		/*
+		 * *cringe* iomem=relaxed says "go ahead, what's the
+		 * worst that can happen?"
+		 */
+		return;
+	}
+
+	unmap_mapping_range(inode->i_mapping, res->start, resource_size(res), 1);
+}
+struct address_space *iomem_get_mapping(void)
+{
+	return iomem_inode->i_mapping;
+}
+#else
+static void revoke_iomem(struct resource *res) {}
+struct address_space *iomem_get_mapping(void)
+{
+	return NULL;
+}
+#endif
+
 /**
  * __request_region - create a new busy resource region
  * @parent: parent resource descriptor
@@ -1179,7 +1228,7 @@ struct resource * __request_region(struct resource *parent,
 	write_unlock(&resource_lock);
 
 	if (res && orig_parent == &iomem_resource)
-		revoke_devmem(res);
+		revoke_iomem(res);
 
 	return res;
 }
@@ -1713,4 +1762,48 @@ static int __init strict_iomem(char *str)
 	return 1;
 }
 
+static int iomem_fs_init_fs_context(struct fs_context *fc)
+{
+	return init_pseudo(fc, DEVMEM_MAGIC) ? 0 : -ENOMEM;
+}
+
+static struct file_system_type iomem_fs_type = {
+	.name		= "iomem",
+	.owner		= THIS_MODULE,
+	.init_fs_context = iomem_fs_init_fs_context,
+	.kill_sb	= kill_anon_super,
+};
+
+static int __init iomem_init_inode(void)
+{
+	static struct vfsmount *iomem_vfs_mount;
+	static int iomem_fs_cnt;
+	struct inode *inode;
+	int rc;
+
+	rc = simple_pin_fs(&iomem_fs_type, &iomem_vfs_mount, &iomem_fs_cnt);
+	if (rc < 0) {
+		pr_err("Cannot mount iomem pseudo filesystem: %d\n", rc);
+		return rc;
+	}
+
+	inode = alloc_anon_inode(iomem_vfs_mount->mnt_sb);
+	if (IS_ERR(inode)) {
+		rc = PTR_ERR(inode);
+		pr_err("Cannot allocate inode for iomem: %d\n", rc);
+		simple_release_fs(&iomem_vfs_mount, &iomem_fs_cnt);
+		return rc;
+	}
+
+	/*
+	 * Publish /dev/mem initialized.
+	 * Pairs with smp_load_acquire() in revoke_iomem().
+	 */
+	smp_store_release(&iomem_inode, inode);
+
+	return 0;
+}
+
+fs_initcall(iomem_init_inode);
+
 __setup("iomem=", strict_iomem);
-- 
2.28.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* [PATCH v2 14/17] resource: Move devmem revoke code to resource framework
@ 2020-10-09  7:59   ` Daniel Vetter
  0 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-09  7:59 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, kvm,
	Jason Gunthorpe, Daniel Vetter, David Hildenbrand,
	Greg Kroah-Hartman, Rafael J. Wysocki, linux-mm,
	Jérôme Glisse, Arnd Bergmann, John Hubbard,
	Daniel Vetter, Dan Williams, Andrew Morton, linux-arm-kernel,
	linux-media

We want all iomem mmaps to consistently revoke ptes when the kernel
takes over and CONFIG_IO_STRICT_DEVMEM is enabled. This includes the
pci bar mmaps available through procfs and sysfs, which currently do
not revoke mappings.

To prepare for this, move the code from the /dev/kmem driver to
kernel/resource.c.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kees Cook <keescook@chromium.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Hildenbrand <david@redhat.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
---
 drivers/char/mem.c     | 85 +------------------------------------
 include/linux/ioport.h |  6 +--
 kernel/resource.c      | 95 +++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 96 insertions(+), 90 deletions(-)

diff --git a/drivers/char/mem.c b/drivers/char/mem.c
index 5502f56f3655..53338aad8d28 100644
--- a/drivers/char/mem.c
+++ b/drivers/char/mem.c
@@ -31,9 +31,6 @@
 #include <linux/uio.h>
 #include <linux/uaccess.h>
 #include <linux/security.h>
-#include <linux/pseudo_fs.h>
-#include <uapi/linux/magic.h>
-#include <linux/mount.h>
 
 #ifdef CONFIG_IA64
 # include <linux/efi.h>
@@ -809,42 +806,6 @@ static loff_t memory_lseek(struct file *file, loff_t offset, int orig)
 	return ret;
 }
 
-static struct inode *devmem_inode;
-
-#ifdef CONFIG_IO_STRICT_DEVMEM
-void revoke_devmem(struct resource *res)
-{
-	/* pairs with smp_store_release() in devmem_init_inode() */
-	struct inode *inode = smp_load_acquire(&devmem_inode);
-
-	/*
-	 * Check that the initialization has completed. Losing the race
-	 * is ok because it means drivers are claiming resources before
-	 * the fs_initcall level of init and prevent /dev/mem from
-	 * establishing mappings.
-	 */
-	if (!inode)
-		return;
-
-	/*
-	 * The expectation is that the driver has successfully marked
-	 * the resource busy by this point, so devmem_is_allowed()
-	 * should start returning false, however for performance this
-	 * does not iterate the entire resource range.
-	 */
-	if (devmem_is_allowed(PHYS_PFN(res->start)) &&
-	    devmem_is_allowed(PHYS_PFN(res->end))) {
-		/*
-		 * *cringe* iomem=relaxed says "go ahead, what's the
-		 * worst that can happen?"
-		 */
-		return;
-	}
-
-	unmap_mapping_range(inode->i_mapping, res->start, resource_size(res), 1);
-}
-#endif
-
 static int open_port(struct inode *inode, struct file *filp)
 {
 	int rc;
@@ -864,7 +825,7 @@ static int open_port(struct inode *inode, struct file *filp)
 	 * revocations when drivers want to take over a /dev/mem mapped
 	 * range.
 	 */
-	filp->f_mapping = inode->i_mapping;
+	filp->f_mapping = iomem_get_mapping();
 
 	return 0;
 }
@@ -995,48 +956,6 @@ static char *mem_devnode(struct device *dev, umode_t *mode)
 
 static struct class *mem_class;
 
-static int devmem_fs_init_fs_context(struct fs_context *fc)
-{
-	return init_pseudo(fc, DEVMEM_MAGIC) ? 0 : -ENOMEM;
-}
-
-static struct file_system_type devmem_fs_type = {
-	.name		= "devmem",
-	.owner		= THIS_MODULE,
-	.init_fs_context = devmem_fs_init_fs_context,
-	.kill_sb	= kill_anon_super,
-};
-
-static int devmem_init_inode(void)
-{
-	static struct vfsmount *devmem_vfs_mount;
-	static int devmem_fs_cnt;
-	struct inode *inode;
-	int rc;
-
-	rc = simple_pin_fs(&devmem_fs_type, &devmem_vfs_mount, &devmem_fs_cnt);
-	if (rc < 0) {
-		pr_err("Cannot mount /dev/mem pseudo filesystem: %d\n", rc);
-		return rc;
-	}
-
-	inode = alloc_anon_inode(devmem_vfs_mount->mnt_sb);
-	if (IS_ERR(inode)) {
-		rc = PTR_ERR(inode);
-		pr_err("Cannot allocate inode for /dev/mem: %d\n", rc);
-		simple_release_fs(&devmem_vfs_mount, &devmem_fs_cnt);
-		return rc;
-	}
-
-	/*
-	 * Publish /dev/mem initialized.
-	 * Pairs with smp_load_acquire() in revoke_devmem().
-	 */
-	smp_store_release(&devmem_inode, inode);
-
-	return 0;
-}
-
 static int __init chr_dev_init(void)
 {
 	int minor;
@@ -1058,8 +977,6 @@ static int __init chr_dev_init(void)
 		 */
 		if ((minor == DEVPORT_MINOR) && !arch_has_dev_port())
 			continue;
-		if ((minor == DEVMEM_MINOR) && devmem_init_inode() != 0)
-			continue;
 
 		device_create(mem_class, NULL, MKDEV(MEM_MAJOR, minor),
 			      NULL, devlist[minor].name);
diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index 6c2b06fe8beb..8ffb61b36606 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -302,11 +302,7 @@ struct resource *devm_request_free_mem_region(struct device *dev,
 struct resource *request_free_mem_region(struct resource *base,
 		unsigned long size, const char *name);
 
-#ifdef CONFIG_IO_STRICT_DEVMEM
-void revoke_devmem(struct resource *res);
-#else
-static inline void revoke_devmem(struct resource *res) { };
-#endif
+extern struct address_space *iomem_get_mapping(void);
 
 #endif /* __ASSEMBLY__ */
 #endif	/* _LINUX_IOPORT_H */
diff --git a/kernel/resource.c b/kernel/resource.c
index 841737bbda9e..22153fdec4f5 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -18,12 +18,15 @@
 #include <linux/spinlock.h>
 #include <linux/fs.h>
 #include <linux/proc_fs.h>
+#include <linux/pseudo_fs.h>
 #include <linux/sched.h>
 #include <linux/seq_file.h>
 #include <linux/device.h>
 #include <linux/pfn.h>
 #include <linux/mm.h>
+#include <linux/mount.h>
 #include <linux/resource_ext.h>
+#include <uapi/linux/magic.h>
 #include <asm/io.h>
 
 
@@ -1112,6 +1115,52 @@ resource_size_t resource_alignment(struct resource *res)
 
 static DECLARE_WAIT_QUEUE_HEAD(muxed_resource_wait);
 
+static struct inode *iomem_inode;
+
+#ifdef CONFIG_IO_STRICT_DEVMEM
+static void revoke_iomem(struct resource *res)
+{
+	/* pairs with smp_store_release() in iomem_init_inode() */
+	struct inode *inode = smp_load_acquire(&iomem_inode);
+
+	/*
+	 * Check that the initialization has completed. Losing the race
+	 * is ok because it means drivers are claiming resources before
+	 * the fs_initcall level of init and prevent /dev/mem from
+	 * establishing mappings.
+	 */
+	if (!inode)
+		return;
+
+	/*
+	 * The expectation is that the driver has successfully marked
+	 * the resource busy by this point, so devmem_is_allowed()
+	 * should start returning false, however for performance this
+	 * does not iterate the entire resource range.
+	 */
+	if (devmem_is_allowed(PHYS_PFN(res->start)) &&
+	    devmem_is_allowed(PHYS_PFN(res->end))) {
+		/*
+		 * *cringe* iomem=relaxed says "go ahead, what's the
+		 * worst that can happen?"
+		 */
+		return;
+	}
+
+	unmap_mapping_range(inode->i_mapping, res->start, resource_size(res), 1);
+}
+struct address_space *iomem_get_mapping(void)
+{
+	return iomem_inode->i_mapping;
+}
+#else
+static void revoke_iomem(struct resource *res) {}
+struct address_space *iomem_get_mapping(void)
+{
+	return NULL;
+}
+#endif
+
 /**
  * __request_region - create a new busy resource region
  * @parent: parent resource descriptor
@@ -1179,7 +1228,7 @@ struct resource * __request_region(struct resource *parent,
 	write_unlock(&resource_lock);
 
 	if (res && orig_parent == &iomem_resource)
-		revoke_devmem(res);
+		revoke_iomem(res);
 
 	return res;
 }
@@ -1713,4 +1762,48 @@ static int __init strict_iomem(char *str)
 	return 1;
 }
 
+static int iomem_fs_init_fs_context(struct fs_context *fc)
+{
+	return init_pseudo(fc, DEVMEM_MAGIC) ? 0 : -ENOMEM;
+}
+
+static struct file_system_type iomem_fs_type = {
+	.name		= "iomem",
+	.owner		= THIS_MODULE,
+	.init_fs_context = iomem_fs_init_fs_context,
+	.kill_sb	= kill_anon_super,
+};
+
+static int __init iomem_init_inode(void)
+{
+	static struct vfsmount *iomem_vfs_mount;
+	static int iomem_fs_cnt;
+	struct inode *inode;
+	int rc;
+
+	rc = simple_pin_fs(&iomem_fs_type, &iomem_vfs_mount, &iomem_fs_cnt);
+	if (rc < 0) {
+		pr_err("Cannot mount iomem pseudo filesystem: %d\n", rc);
+		return rc;
+	}
+
+	inode = alloc_anon_inode(iomem_vfs_mount->mnt_sb);
+	if (IS_ERR(inode)) {
+		rc = PTR_ERR(inode);
+		pr_err("Cannot allocate inode for iomem: %d\n", rc);
+		simple_release_fs(&iomem_vfs_mount, &iomem_fs_cnt);
+		return rc;
+	}
+
+	/*
+	 * Publish /dev/mem initialized.
+	 * Pairs with smp_load_acquire() in revoke_iomem().
+	 */
+	smp_store_release(&iomem_inode, inode);
+
+	return 0;
+}
+
+fs_initcall(iomem_init_inode);
+
 __setup("iomem=", strict_iomem);
-- 
2.28.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* [PATCH v2 15/17] sysfs: Support zapping of binary attr mmaps
  2020-10-09  7:59 ` Daniel Vetter
  (?)
@ 2020-10-09  7:59   ` Daniel Vetter
  -1 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-09  7:59 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: kvm, linux-mm, linux-arm-kernel, linux-samsung-soc, linux-media,
	linux-s390, Daniel Vetter, Daniel Vetter, Jason Gunthorpe,
	Kees Cook, Dan Williams, Andrew Morton, John Hubbard,
	Jérôme Glisse, Jan Kara, Bjorn Helgaas, linux-pci,
	Greg Kroah-Hartman, Rafael J. Wysocki, Christian Brauner,
	David S. Miller, Michael Ellerman, Sourabh Jain,
	Mauro Carvalho Chehab, Nayna Jain

We want to be able to revoke pci mmaps so that the same access rules
applies as for /dev/kmem. Revoke support for devmem was added in
3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims the
region").

The simplest way to achieve this is by having the same filp->f_mapping
for all mappings, so that unmap_mapping_range can find them all, no
matter through which file they've been created. Since this must be set
at open time we need sysfs support for this.

Add an optional mapping parameter bin_attr, which is only consulted
when there's also an mmap callback, since without mmap support
allowing to adjust the ->f_mapping makes no sense.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kees Cook <keescook@chromium.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-pci@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Sourabh Jain <sourabhjain@linux.ibm.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Cc: Nayna Jain <nayna@linux.ibm.com>
---
 fs/sysfs/file.c       | 11 +++++++++++
 include/linux/sysfs.h |  2 ++
 2 files changed, 13 insertions(+)

diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c
index eb6897ab78e7..9d8ccdb000e3 100644
--- a/fs/sysfs/file.c
+++ b/fs/sysfs/file.c
@@ -169,6 +169,16 @@ static int sysfs_kf_bin_mmap(struct kernfs_open_file *of,
 	return battr->mmap(of->file, kobj, battr, vma);
 }
 
+static int sysfs_kf_bin_open(struct kernfs_open_file *of)
+{
+	struct bin_attribute *battr = of->kn->priv;
+
+	if (battr->mapping)
+		of->file->f_mapping = battr->mapping;
+
+	return 0;
+}
+
 void sysfs_notify(struct kobject *kobj, const char *dir, const char *attr)
 {
 	struct kernfs_node *kn = kobj->sd, *tmp;
@@ -240,6 +250,7 @@ static const struct kernfs_ops sysfs_bin_kfops_mmap = {
 	.read		= sysfs_kf_bin_read,
 	.write		= sysfs_kf_bin_write,
 	.mmap		= sysfs_kf_bin_mmap,
+	.open		= sysfs_kf_bin_open,
 };
 
 int sysfs_add_file_mode_ns(struct kernfs_node *parent,
diff --git a/include/linux/sysfs.h b/include/linux/sysfs.h
index 34e84122f635..a17a474d1601 100644
--- a/include/linux/sysfs.h
+++ b/include/linux/sysfs.h
@@ -164,11 +164,13 @@ __ATTRIBUTE_GROUPS(_name)
 
 struct file;
 struct vm_area_struct;
+struct address_space;
 
 struct bin_attribute {
 	struct attribute	attr;
 	size_t			size;
 	void			*private;
+	struct address_space	*mapping;
 	ssize_t (*read)(struct file *, struct kobject *, struct bin_attribute *,
 			char *, loff_t, size_t);
 	ssize_t (*write)(struct file *, struct kobject *, struct bin_attribute *,
-- 
2.28.0


^ permalink raw reply	[flat|nested] 214+ messages in thread

* [PATCH v2 15/17] sysfs: Support zapping of binary attr mmaps
@ 2020-10-09  7:59   ` Daniel Vetter
  0 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-09  7:59 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: Jan Kara, kvm, Rafael J. Wysocki, Daniel Vetter, linux-mm,
	Daniel Vetter, Christian Brauner, linux-s390, linux-samsung-soc,
	Mauro Carvalho Chehab, Michael Ellerman, Nayna Jain,
	Jason Gunthorpe, linux-pci, linux-media, Kees Cook, John Hubbard,
	Jérôme Glisse, Bjorn Helgaas, Dan Williams,
	linux-arm-kernel, Greg Kroah-Hartman, Sourabh Jain,
	Andrew Morton, David S. Miller

We want to be able to revoke pci mmaps so that the same access rules
applies as for /dev/kmem. Revoke support for devmem was added in
3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims the
region").

The simplest way to achieve this is by having the same filp->f_mapping
for all mappings, so that unmap_mapping_range can find them all, no
matter through which file they've been created. Since this must be set
at open time we need sysfs support for this.

Add an optional mapping parameter bin_attr, which is only consulted
when there's also an mmap callback, since without mmap support
allowing to adjust the ->f_mapping makes no sense.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kees Cook <keescook@chromium.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-pci@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Sourabh Jain <sourabhjain@linux.ibm.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Cc: Nayna Jain <nayna@linux.ibm.com>
---
 fs/sysfs/file.c       | 11 +++++++++++
 include/linux/sysfs.h |  2 ++
 2 files changed, 13 insertions(+)

diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c
index eb6897ab78e7..9d8ccdb000e3 100644
--- a/fs/sysfs/file.c
+++ b/fs/sysfs/file.c
@@ -169,6 +169,16 @@ static int sysfs_kf_bin_mmap(struct kernfs_open_file *of,
 	return battr->mmap(of->file, kobj, battr, vma);
 }
 
+static int sysfs_kf_bin_open(struct kernfs_open_file *of)
+{
+	struct bin_attribute *battr = of->kn->priv;
+
+	if (battr->mapping)
+		of->file->f_mapping = battr->mapping;
+
+	return 0;
+}
+
 void sysfs_notify(struct kobject *kobj, const char *dir, const char *attr)
 {
 	struct kernfs_node *kn = kobj->sd, *tmp;
@@ -240,6 +250,7 @@ static const struct kernfs_ops sysfs_bin_kfops_mmap = {
 	.read		= sysfs_kf_bin_read,
 	.write		= sysfs_kf_bin_write,
 	.mmap		= sysfs_kf_bin_mmap,
+	.open		= sysfs_kf_bin_open,
 };
 
 int sysfs_add_file_mode_ns(struct kernfs_node *parent,
diff --git a/include/linux/sysfs.h b/include/linux/sysfs.h
index 34e84122f635..a17a474d1601 100644
--- a/include/linux/sysfs.h
+++ b/include/linux/sysfs.h
@@ -164,11 +164,13 @@ __ATTRIBUTE_GROUPS(_name)
 
 struct file;
 struct vm_area_struct;
+struct address_space;
 
 struct bin_attribute {
 	struct attribute	attr;
 	size_t			size;
 	void			*private;
+	struct address_space	*mapping;
 	ssize_t (*read)(struct file *, struct kobject *, struct bin_attribute *,
 			char *, loff_t, size_t);
 	ssize_t (*write)(struct file *, struct kobject *, struct bin_attribute *,
-- 
2.28.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* [PATCH v2 15/17] sysfs: Support zapping of binary attr mmaps
@ 2020-10-09  7:59   ` Daniel Vetter
  0 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-09  7:59 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: Jan Kara, kvm, Rafael J. Wysocki, Daniel Vetter, linux-mm,
	Daniel Vetter, Christian Brauner, linux-s390, linux-samsung-soc,
	Mauro Carvalho Chehab, Michael Ellerman, Nayna Jain,
	Jason Gunthorpe, linux-pci, linux-media, Kees Cook, John Hubbard,
	Jérôme Glisse, Bjorn Helgaas, Dan Williams,
	linux-arm-kernel, Greg Kroah-Hartman, Sourabh Jain,
	Andrew Morton, David S. Miller

We want to be able to revoke pci mmaps so that the same access rules
applies as for /dev/kmem. Revoke support for devmem was added in
3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims the
region").

The simplest way to achieve this is by having the same filp->f_mapping
for all mappings, so that unmap_mapping_range can find them all, no
matter through which file they've been created. Since this must be set
at open time we need sysfs support for this.

Add an optional mapping parameter bin_attr, which is only consulted
when there's also an mmap callback, since without mmap support
allowing to adjust the ->f_mapping makes no sense.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kees Cook <keescook@chromium.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-pci@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Sourabh Jain <sourabhjain@linux.ibm.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Cc: Nayna Jain <nayna@linux.ibm.com>
---
 fs/sysfs/file.c       | 11 +++++++++++
 include/linux/sysfs.h |  2 ++
 2 files changed, 13 insertions(+)

diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c
index eb6897ab78e7..9d8ccdb000e3 100644
--- a/fs/sysfs/file.c
+++ b/fs/sysfs/file.c
@@ -169,6 +169,16 @@ static int sysfs_kf_bin_mmap(struct kernfs_open_file *of,
 	return battr->mmap(of->file, kobj, battr, vma);
 }
 
+static int sysfs_kf_bin_open(struct kernfs_open_file *of)
+{
+	struct bin_attribute *battr = of->kn->priv;
+
+	if (battr->mapping)
+		of->file->f_mapping = battr->mapping;
+
+	return 0;
+}
+
 void sysfs_notify(struct kobject *kobj, const char *dir, const char *attr)
 {
 	struct kernfs_node *kn = kobj->sd, *tmp;
@@ -240,6 +250,7 @@ static const struct kernfs_ops sysfs_bin_kfops_mmap = {
 	.read		= sysfs_kf_bin_read,
 	.write		= sysfs_kf_bin_write,
 	.mmap		= sysfs_kf_bin_mmap,
+	.open		= sysfs_kf_bin_open,
 };
 
 int sysfs_add_file_mode_ns(struct kernfs_node *parent,
diff --git a/include/linux/sysfs.h b/include/linux/sysfs.h
index 34e84122f635..a17a474d1601 100644
--- a/include/linux/sysfs.h
+++ b/include/linux/sysfs.h
@@ -164,11 +164,13 @@ __ATTRIBUTE_GROUPS(_name)
 
 struct file;
 struct vm_area_struct;
+struct address_space;
 
 struct bin_attribute {
 	struct attribute	attr;
 	size_t			size;
 	void			*private;
+	struct address_space	*mapping;
 	ssize_t (*read)(struct file *, struct kobject *, struct bin_attribute *,
 			char *, loff_t, size_t);
 	ssize_t (*write)(struct file *, struct kobject *, struct bin_attribute *,
-- 
2.28.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* [PATCH v2 16/17] PCI: Revoke mappings like devmem
  2020-10-09  7:59 ` Daniel Vetter
  (?)
@ 2020-10-09  7:59   ` Daniel Vetter
  -1 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-09  7:59 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: kvm, linux-mm, linux-arm-kernel, linux-samsung-soc, linux-media,
	linux-s390, Daniel Vetter, Daniel Vetter, Jason Gunthorpe,
	Kees Cook, Dan Williams, Andrew Morton, John Hubbard,
	Jérôme Glisse, Jan Kara, Greg Kroah-Hartman,
	Bjorn Helgaas, linux-pci

Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims
the region") /dev/kmem zaps ptes when the kernel requests exclusive
acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is
the default for all driver uses.

Except there's two more ways to access PCI BARs: sysfs and proc mmap
support. Let's plug that hole.

For revoke_devmem() to work we need to link our vma into the same
address_space, with consistent vma->vm_pgoff. ->pgoff is already
adjusted, because that's how (io_)remap_pfn_range works, but for the
mapping we need to adjust vma->vm_file->f_mapping. The cleanest way is
to adjust this at at ->open time:

- for sysfs this is easy, now that binary attributes support this. We
  just set bin_attr->mapping when mmap is supported
- for procfs it's a bit more tricky, since procfs pci access has only
  one file per device, and access to a specific resources first needs
  to be set up with some ioctl calls. But mmap is only supported for
  the same resources as sysfs exposes with mmap support, and otherwise
  rejected, so we can set the mapping unconditionally at open time
  without harm.

A special consideration is for arch_can_pci_mmap_io() - we need to
make sure that the ->f_mapping doesn't alias between ioport and iomem
space. There's only 2 ways in-tree to support mmap of ioports: generic
pci mmap (ARCH_GENERIC_PCI_MMAP_RESOURCE), and sparc as the single
architecture hand-rolling. Both approach support ioport mmap through a
special pfn range and not through magic pte attributes. Aliasing is
therefore not a problem.

The only difference in access checks left is that sysfs PCI mmap does
not check for CAP_RAWIO. I'm not really sure whether that should be
added or not.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kees Cook <keescook@chromium.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-pci@vger.kernel.org
--
v2:
- Totally new approach: Adjust filp->f_mapping at open time. Note that
  this now works on all architectures, not just those support
  ARCH_GENERIC_PCI_MMAP_RESOURCE
---
 drivers/pci/pci-sysfs.c | 4 ++++
 drivers/pci/proc.c      | 1 +
 2 files changed, 5 insertions(+)

diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index 6d78df981d41..cee38fcb4a86 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -928,6 +928,7 @@ void pci_create_legacy_files(struct pci_bus *b)
 	b->legacy_io->read = pci_read_legacy_io;
 	b->legacy_io->write = pci_write_legacy_io;
 	b->legacy_io->mmap = pci_mmap_legacy_io;
+	b->legacy_io->mapping = iomem_get_mapping();
 	pci_adjust_legacy_attr(b, pci_mmap_io);
 	error = device_create_bin_file(&b->dev, b->legacy_io);
 	if (error)
@@ -940,6 +941,7 @@ void pci_create_legacy_files(struct pci_bus *b)
 	b->legacy_mem->size = 1024*1024;
 	b->legacy_mem->attr.mode = 0600;
 	b->legacy_mem->mmap = pci_mmap_legacy_mem;
+	b->legacy_io->mapping = iomem_get_mapping();
 	pci_adjust_legacy_attr(b, pci_mmap_mem);
 	error = device_create_bin_file(&b->dev, b->legacy_mem);
 	if (error)
@@ -1155,6 +1157,8 @@ static int pci_create_attr(struct pci_dev *pdev, int num, int write_combine)
 			res_attr->mmap = pci_mmap_resource_uc;
 		}
 	}
+	if (res_attr->mmap)
+		res_attr->mapping = iomem_get_mapping();
 	res_attr->attr.name = res_attr_name;
 	res_attr->attr.mode = 0600;
 	res_attr->size = pci_resource_len(pdev, num);
diff --git a/drivers/pci/proc.c b/drivers/pci/proc.c
index 3a2f90beb4cb..9bab07302bbf 100644
--- a/drivers/pci/proc.c
+++ b/drivers/pci/proc.c
@@ -298,6 +298,7 @@ static int proc_bus_pci_open(struct inode *inode, struct file *file)
 	fpriv->write_combine = 0;
 
 	file->private_data = fpriv;
+	file->f_mapping = iomem_get_mapping();
 
 	return 0;
 }
-- 
2.28.0


^ permalink raw reply	[flat|nested] 214+ messages in thread

* [PATCH v2 16/17] PCI: Revoke mappings like devmem
@ 2020-10-09  7:59   ` Daniel Vetter
  0 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-09  7:59 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, kvm,
	Jason Gunthorpe, Daniel Vetter, linux-pci, Greg Kroah-Hartman,
	linux-mm, Jérôme Glisse, John Hubbard, Bjorn Helgaas,
	Daniel Vetter, Dan Williams, Andrew Morton, linux-arm-kernel,
	linux-media

Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims
the region") /dev/kmem zaps ptes when the kernel requests exclusive
acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is
the default for all driver uses.

Except there's two more ways to access PCI BARs: sysfs and proc mmap
support. Let's plug that hole.

For revoke_devmem() to work we need to link our vma into the same
address_space, with consistent vma->vm_pgoff. ->pgoff is already
adjusted, because that's how (io_)remap_pfn_range works, but for the
mapping we need to adjust vma->vm_file->f_mapping. The cleanest way is
to adjust this at at ->open time:

- for sysfs this is easy, now that binary attributes support this. We
  just set bin_attr->mapping when mmap is supported
- for procfs it's a bit more tricky, since procfs pci access has only
  one file per device, and access to a specific resources first needs
  to be set up with some ioctl calls. But mmap is only supported for
  the same resources as sysfs exposes with mmap support, and otherwise
  rejected, so we can set the mapping unconditionally at open time
  without harm.

A special consideration is for arch_can_pci_mmap_io() - we need to
make sure that the ->f_mapping doesn't alias between ioport and iomem
space. There's only 2 ways in-tree to support mmap of ioports: generic
pci mmap (ARCH_GENERIC_PCI_MMAP_RESOURCE), and sparc as the single
architecture hand-rolling. Both approach support ioport mmap through a
special pfn range and not through magic pte attributes. Aliasing is
therefore not a problem.

The only difference in access checks left is that sysfs PCI mmap does
not check for CAP_RAWIO. I'm not really sure whether that should be
added or not.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kees Cook <keescook@chromium.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-pci@vger.kernel.org
--
v2:
- Totally new approach: Adjust filp->f_mapping at open time. Note that
  this now works on all architectures, not just those support
  ARCH_GENERIC_PCI_MMAP_RESOURCE
---
 drivers/pci/pci-sysfs.c | 4 ++++
 drivers/pci/proc.c      | 1 +
 2 files changed, 5 insertions(+)

diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index 6d78df981d41..cee38fcb4a86 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -928,6 +928,7 @@ void pci_create_legacy_files(struct pci_bus *b)
 	b->legacy_io->read = pci_read_legacy_io;
 	b->legacy_io->write = pci_write_legacy_io;
 	b->legacy_io->mmap = pci_mmap_legacy_io;
+	b->legacy_io->mapping = iomem_get_mapping();
 	pci_adjust_legacy_attr(b, pci_mmap_io);
 	error = device_create_bin_file(&b->dev, b->legacy_io);
 	if (error)
@@ -940,6 +941,7 @@ void pci_create_legacy_files(struct pci_bus *b)
 	b->legacy_mem->size = 1024*1024;
 	b->legacy_mem->attr.mode = 0600;
 	b->legacy_mem->mmap = pci_mmap_legacy_mem;
+	b->legacy_io->mapping = iomem_get_mapping();
 	pci_adjust_legacy_attr(b, pci_mmap_mem);
 	error = device_create_bin_file(&b->dev, b->legacy_mem);
 	if (error)
@@ -1155,6 +1157,8 @@ static int pci_create_attr(struct pci_dev *pdev, int num, int write_combine)
 			res_attr->mmap = pci_mmap_resource_uc;
 		}
 	}
+	if (res_attr->mmap)
+		res_attr->mapping = iomem_get_mapping();
 	res_attr->attr.name = res_attr_name;
 	res_attr->attr.mode = 0600;
 	res_attr->size = pci_resource_len(pdev, num);
diff --git a/drivers/pci/proc.c b/drivers/pci/proc.c
index 3a2f90beb4cb..9bab07302bbf 100644
--- a/drivers/pci/proc.c
+++ b/drivers/pci/proc.c
@@ -298,6 +298,7 @@ static int proc_bus_pci_open(struct inode *inode, struct file *file)
 	fpriv->write_combine = 0;
 
 	file->private_data = fpriv;
+	file->f_mapping = iomem_get_mapping();
 
 	return 0;
 }
-- 
2.28.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* [PATCH v2 16/17] PCI: Revoke mappings like devmem
@ 2020-10-09  7:59   ` Daniel Vetter
  0 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-09  7:59 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, kvm,
	Jason Gunthorpe, Daniel Vetter, linux-pci, Greg Kroah-Hartman,
	linux-mm, Jérôme Glisse, John Hubbard, Bjorn Helgaas,
	Daniel Vetter, Dan Williams, Andrew Morton, linux-arm-kernel,
	linux-media

Since 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims
the region") /dev/kmem zaps ptes when the kernel requests exclusive
acccess to an iomem region. And with CONFIG_IO_STRICT_DEVMEM, this is
the default for all driver uses.

Except there's two more ways to access PCI BARs: sysfs and proc mmap
support. Let's plug that hole.

For revoke_devmem() to work we need to link our vma into the same
address_space, with consistent vma->vm_pgoff. ->pgoff is already
adjusted, because that's how (io_)remap_pfn_range works, but for the
mapping we need to adjust vma->vm_file->f_mapping. The cleanest way is
to adjust this at at ->open time:

- for sysfs this is easy, now that binary attributes support this. We
  just set bin_attr->mapping when mmap is supported
- for procfs it's a bit more tricky, since procfs pci access has only
  one file per device, and access to a specific resources first needs
  to be set up with some ioctl calls. But mmap is only supported for
  the same resources as sysfs exposes with mmap support, and otherwise
  rejected, so we can set the mapping unconditionally at open time
  without harm.

A special consideration is for arch_can_pci_mmap_io() - we need to
make sure that the ->f_mapping doesn't alias between ioport and iomem
space. There's only 2 ways in-tree to support mmap of ioports: generic
pci mmap (ARCH_GENERIC_PCI_MMAP_RESOURCE), and sparc as the single
architecture hand-rolling. Both approach support ioport mmap through a
special pfn range and not through magic pte attributes. Aliasing is
therefore not a problem.

The only difference in access checks left is that sysfs PCI mmap does
not check for CAP_RAWIO. I'm not really sure whether that should be
added or not.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kees Cook <keescook@chromium.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-pci@vger.kernel.org
--
v2:
- Totally new approach: Adjust filp->f_mapping at open time. Note that
  this now works on all architectures, not just those support
  ARCH_GENERIC_PCI_MMAP_RESOURCE
---
 drivers/pci/pci-sysfs.c | 4 ++++
 drivers/pci/proc.c      | 1 +
 2 files changed, 5 insertions(+)

diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index 6d78df981d41..cee38fcb4a86 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -928,6 +928,7 @@ void pci_create_legacy_files(struct pci_bus *b)
 	b->legacy_io->read = pci_read_legacy_io;
 	b->legacy_io->write = pci_write_legacy_io;
 	b->legacy_io->mmap = pci_mmap_legacy_io;
+	b->legacy_io->mapping = iomem_get_mapping();
 	pci_adjust_legacy_attr(b, pci_mmap_io);
 	error = device_create_bin_file(&b->dev, b->legacy_io);
 	if (error)
@@ -940,6 +941,7 @@ void pci_create_legacy_files(struct pci_bus *b)
 	b->legacy_mem->size = 1024*1024;
 	b->legacy_mem->attr.mode = 0600;
 	b->legacy_mem->mmap = pci_mmap_legacy_mem;
+	b->legacy_io->mapping = iomem_get_mapping();
 	pci_adjust_legacy_attr(b, pci_mmap_mem);
 	error = device_create_bin_file(&b->dev, b->legacy_mem);
 	if (error)
@@ -1155,6 +1157,8 @@ static int pci_create_attr(struct pci_dev *pdev, int num, int write_combine)
 			res_attr->mmap = pci_mmap_resource_uc;
 		}
 	}
+	if (res_attr->mmap)
+		res_attr->mapping = iomem_get_mapping();
 	res_attr->attr.name = res_attr_name;
 	res_attr->attr.mode = 0600;
 	res_attr->size = pci_resource_len(pdev, num);
diff --git a/drivers/pci/proc.c b/drivers/pci/proc.c
index 3a2f90beb4cb..9bab07302bbf 100644
--- a/drivers/pci/proc.c
+++ b/drivers/pci/proc.c
@@ -298,6 +298,7 @@ static int proc_bus_pci_open(struct inode *inode, struct file *file)
 	fpriv->write_combine = 0;
 
 	file->private_data = fpriv;
+	file->f_mapping = iomem_get_mapping();
 
 	return 0;
 }
-- 
2.28.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* [PATCH v2 17/17] drm/i915: Properly request PCI BARs
  2020-10-09  7:59 ` Daniel Vetter
  (?)
@ 2020-10-09  7:59   ` Daniel Vetter
  -1 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-09  7:59 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: kvm, linux-mm, linux-arm-kernel, linux-samsung-soc, linux-media,
	linux-s390, Daniel Vetter, Daniel Vetter, Jason Gunthorpe,
	Kees Cook, Dan Williams, Andrew Morton, John Hubbard,
	Jérôme Glisse, Jan Kara, Bjorn Helgaas, linux-pci

When trying to test my CONFIG_IO_STRICT_DEVMEM changes I realized they
do nothing for i915. Because i915 doesn't request any regions, like
pretty much all drm pci drivers. I guess this is some very old
remnants from the userspace modesetting days, when we wanted to
co-exist with the fbdev driver. Which usually requested these
resources.

But makes me wonder why the pci subsystem doesn't just request
resource automatically when we map a bar and a pci driver is bound?

Knowledge about which pci bars we need kludged together from
intel_uncore.c and intel_gtt.c from i915 and intel-gtt.c over in the
fake agp driver.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kees Cook <keescook@chromium.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-pci@vger.kernel.org
---
 drivers/gpu/drm/i915/intel_uncore.c | 25 +++++++++++++++++++++++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
index 54e201fdeba4..ce39049d8919 100644
--- a/drivers/gpu/drm/i915/intel_uncore.c
+++ b/drivers/gpu/drm/i915/intel_uncore.c
@@ -1692,10 +1692,13 @@ static int uncore_mmio_setup(struct intel_uncore *uncore)
 	struct pci_dev *pdev = i915->drm.pdev;
 	int mmio_bar;
 	int mmio_size;
+	int bar_selection;
+	int ret;
 
 	mmio_bar = IS_GEN(i915, 2) ? 1 : 0;
+	bar_selection = BIT (2) | BIT(mmio_bar);
 	/*
-	 * Before gen4, the registers and the GTT are behind different BARs.
+	 * On gen3 the registers and the GTT are behind different BARs.
 	 * However, from gen4 onwards, the registers and the GTT are shared
 	 * in the same BAR, so we want to restrict this ioremap from
 	 * clobbering the GTT which we want ioremap_wc instead. Fortunately,
@@ -1703,6 +1706,8 @@ static int uncore_mmio_setup(struct intel_uncore *uncore)
 	 * generations up to Ironlake.
 	 * For dgfx chips register range is expanded to 4MB.
 	 */
+	if (INTEL_GEN(i915) == 3)
+		bar_selection |= BIT(3);
 	if (INTEL_GEN(i915) < 5)
 		mmio_size = 512 * 1024;
 	else if (IS_DGFX(i915))
@@ -1710,8 +1715,15 @@ static int uncore_mmio_setup(struct intel_uncore *uncore)
 	else
 		mmio_size = 2 * 1024 * 1024;
 
+	ret = pci_request_selected_regions(pdev, bar_selection, "i915");
+	if (ret < 0) {
+		drm_err(&i915->drm, "failed to request pci bars\n");
+		return ret;
+	}
+
 	uncore->regs = pci_iomap(pdev, mmio_bar, mmio_size);
 	if (uncore->regs == NULL) {
+		pci_release_selected_regions(pdev, bar_selection);
 		drm_err(&i915->drm, "failed to map registers\n");
 		return -EIO;
 	}
@@ -1721,9 +1733,18 @@ static int uncore_mmio_setup(struct intel_uncore *uncore)
 
 static void uncore_mmio_cleanup(struct intel_uncore *uncore)
 {
-	struct pci_dev *pdev = uncore->i915->drm.pdev;
+	struct drm_i915_private *i915 = uncore->i915;
+	struct pci_dev *pdev = i915->drm.pdev;
+	int mmio_bar;
+	int bar_selection;
+
+	mmio_bar = IS_GEN(i915, 2) ? 1 : 0;
+	bar_selection = BIT (2) | BIT(mmio_bar);
+	if (INTEL_GEN(i915) == 3)
+		bar_selection |= BIT(3);
 
 	pci_iounmap(pdev, uncore->regs);
+	pci_release_selected_regions(pdev, bar_selection);
 }
 
 void intel_uncore_init_early(struct intel_uncore *uncore,
-- 
2.28.0


^ permalink raw reply	[flat|nested] 214+ messages in thread

* [PATCH v2 17/17] drm/i915: Properly request PCI BARs
@ 2020-10-09  7:59   ` Daniel Vetter
  0 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-09  7:59 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, kvm,
	Jason Gunthorpe, Daniel Vetter, linux-pci, linux-mm,
	Jérôme Glisse, John Hubbard, Bjorn Helgaas,
	Daniel Vetter, Dan Williams, Andrew Morton, linux-arm-kernel,
	linux-media

When trying to test my CONFIG_IO_STRICT_DEVMEM changes I realized they
do nothing for i915. Because i915 doesn't request any regions, like
pretty much all drm pci drivers. I guess this is some very old
remnants from the userspace modesetting days, when we wanted to
co-exist with the fbdev driver. Which usually requested these
resources.

But makes me wonder why the pci subsystem doesn't just request
resource automatically when we map a bar and a pci driver is bound?

Knowledge about which pci bars we need kludged together from
intel_uncore.c and intel_gtt.c from i915 and intel-gtt.c over in the
fake agp driver.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kees Cook <keescook@chromium.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-pci@vger.kernel.org
---
 drivers/gpu/drm/i915/intel_uncore.c | 25 +++++++++++++++++++++++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
index 54e201fdeba4..ce39049d8919 100644
--- a/drivers/gpu/drm/i915/intel_uncore.c
+++ b/drivers/gpu/drm/i915/intel_uncore.c
@@ -1692,10 +1692,13 @@ static int uncore_mmio_setup(struct intel_uncore *uncore)
 	struct pci_dev *pdev = i915->drm.pdev;
 	int mmio_bar;
 	int mmio_size;
+	int bar_selection;
+	int ret;
 
 	mmio_bar = IS_GEN(i915, 2) ? 1 : 0;
+	bar_selection = BIT (2) | BIT(mmio_bar);
 	/*
-	 * Before gen4, the registers and the GTT are behind different BARs.
+	 * On gen3 the registers and the GTT are behind different BARs.
 	 * However, from gen4 onwards, the registers and the GTT are shared
 	 * in the same BAR, so we want to restrict this ioremap from
 	 * clobbering the GTT which we want ioremap_wc instead. Fortunately,
@@ -1703,6 +1706,8 @@ static int uncore_mmio_setup(struct intel_uncore *uncore)
 	 * generations up to Ironlake.
 	 * For dgfx chips register range is expanded to 4MB.
 	 */
+	if (INTEL_GEN(i915) == 3)
+		bar_selection |= BIT(3);
 	if (INTEL_GEN(i915) < 5)
 		mmio_size = 512 * 1024;
 	else if (IS_DGFX(i915))
@@ -1710,8 +1715,15 @@ static int uncore_mmio_setup(struct intel_uncore *uncore)
 	else
 		mmio_size = 2 * 1024 * 1024;
 
+	ret = pci_request_selected_regions(pdev, bar_selection, "i915");
+	if (ret < 0) {
+		drm_err(&i915->drm, "failed to request pci bars\n");
+		return ret;
+	}
+
 	uncore->regs = pci_iomap(pdev, mmio_bar, mmio_size);
 	if (uncore->regs == NULL) {
+		pci_release_selected_regions(pdev, bar_selection);
 		drm_err(&i915->drm, "failed to map registers\n");
 		return -EIO;
 	}
@@ -1721,9 +1733,18 @@ static int uncore_mmio_setup(struct intel_uncore *uncore)
 
 static void uncore_mmio_cleanup(struct intel_uncore *uncore)
 {
-	struct pci_dev *pdev = uncore->i915->drm.pdev;
+	struct drm_i915_private *i915 = uncore->i915;
+	struct pci_dev *pdev = i915->drm.pdev;
+	int mmio_bar;
+	int bar_selection;
+
+	mmio_bar = IS_GEN(i915, 2) ? 1 : 0;
+	bar_selection = BIT (2) | BIT(mmio_bar);
+	if (INTEL_GEN(i915) == 3)
+		bar_selection |= BIT(3);
 
 	pci_iounmap(pdev, uncore->regs);
+	pci_release_selected_regions(pdev, bar_selection);
 }
 
 void intel_uncore_init_early(struct intel_uncore *uncore,
-- 
2.28.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* [PATCH v2 17/17] drm/i915: Properly request PCI BARs
@ 2020-10-09  7:59   ` Daniel Vetter
  0 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-09  7:59 UTC (permalink / raw)
  To: DRI Development, LKML
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, kvm,
	Jason Gunthorpe, Daniel Vetter, linux-pci, linux-mm,
	Jérôme Glisse, John Hubbard, Bjorn Helgaas,
	Daniel Vetter, Dan Williams, Andrew Morton, linux-arm-kernel,
	linux-media

When trying to test my CONFIG_IO_STRICT_DEVMEM changes I realized they
do nothing for i915. Because i915 doesn't request any regions, like
pretty much all drm pci drivers. I guess this is some very old
remnants from the userspace modesetting days, when we wanted to
co-exist with the fbdev driver. Which usually requested these
resources.

But makes me wonder why the pci subsystem doesn't just request
resource automatically when we map a bar and a pci driver is bound?

Knowledge about which pci bars we need kludged together from
intel_uncore.c and intel_gtt.c from i915 and intel-gtt.c over in the
fake agp driver.

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kees Cook <keescook@chromium.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-pci@vger.kernel.org
---
 drivers/gpu/drm/i915/intel_uncore.c | 25 +++++++++++++++++++++++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
index 54e201fdeba4..ce39049d8919 100644
--- a/drivers/gpu/drm/i915/intel_uncore.c
+++ b/drivers/gpu/drm/i915/intel_uncore.c
@@ -1692,10 +1692,13 @@ static int uncore_mmio_setup(struct intel_uncore *uncore)
 	struct pci_dev *pdev = i915->drm.pdev;
 	int mmio_bar;
 	int mmio_size;
+	int bar_selection;
+	int ret;
 
 	mmio_bar = IS_GEN(i915, 2) ? 1 : 0;
+	bar_selection = BIT (2) | BIT(mmio_bar);
 	/*
-	 * Before gen4, the registers and the GTT are behind different BARs.
+	 * On gen3 the registers and the GTT are behind different BARs.
 	 * However, from gen4 onwards, the registers and the GTT are shared
 	 * in the same BAR, so we want to restrict this ioremap from
 	 * clobbering the GTT which we want ioremap_wc instead. Fortunately,
@@ -1703,6 +1706,8 @@ static int uncore_mmio_setup(struct intel_uncore *uncore)
 	 * generations up to Ironlake.
 	 * For dgfx chips register range is expanded to 4MB.
 	 */
+	if (INTEL_GEN(i915) == 3)
+		bar_selection |= BIT(3);
 	if (INTEL_GEN(i915) < 5)
 		mmio_size = 512 * 1024;
 	else if (IS_DGFX(i915))
@@ -1710,8 +1715,15 @@ static int uncore_mmio_setup(struct intel_uncore *uncore)
 	else
 		mmio_size = 2 * 1024 * 1024;
 
+	ret = pci_request_selected_regions(pdev, bar_selection, "i915");
+	if (ret < 0) {
+		drm_err(&i915->drm, "failed to request pci bars\n");
+		return ret;
+	}
+
 	uncore->regs = pci_iomap(pdev, mmio_bar, mmio_size);
 	if (uncore->regs == NULL) {
+		pci_release_selected_regions(pdev, bar_selection);
 		drm_err(&i915->drm, "failed to map registers\n");
 		return -EIO;
 	}
@@ -1721,9 +1733,18 @@ static int uncore_mmio_setup(struct intel_uncore *uncore)
 
 static void uncore_mmio_cleanup(struct intel_uncore *uncore)
 {
-	struct pci_dev *pdev = uncore->i915->drm.pdev;
+	struct drm_i915_private *i915 = uncore->i915;
+	struct pci_dev *pdev = i915->drm.pdev;
+	int mmio_bar;
+	int bar_selection;
+
+	mmio_bar = IS_GEN(i915, 2) ? 1 : 0;
+	bar_selection = BIT (2) | BIT(mmio_bar);
+	if (INTEL_GEN(i915) == 3)
+		bar_selection |= BIT(3);
 
 	pci_iounmap(pdev, uncore->regs);
+	pci_release_selected_regions(pdev, bar_selection);
 }
 
 void intel_uncore_init_early(struct intel_uncore *uncore,
-- 
2.28.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 17/17] drm/i915: Properly request PCI BARs
  2020-10-09  7:59   ` Daniel Vetter
  (?)
  (?)
@ 2020-10-09  9:47     ` Ville Syrjälä
  -1 siblings, 0 replies; 214+ messages in thread
From: Ville Syrjälä @ 2020-10-09  9:47 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: DRI Development, LKML, linux-s390, linux-samsung-soc, Jan Kara,
	Kees Cook, kvm, Jason Gunthorpe, linux-pci, linux-mm,
	Jérôme Glisse, John Hubbard, Bjorn Helgaas,
	Daniel Vetter, Dan Williams, Andrew Morton, linux-arm-kernel,
	linux-media

On Fri, Oct 09, 2020 at 09:59:34AM +0200, Daniel Vetter wrote:
> When trying to test my CONFIG_IO_STRICT_DEVMEM changes I realized they
> do nothing for i915. Because i915 doesn't request any regions, like
> pretty much all drm pci drivers. I guess this is some very old
> remnants from the userspace modesetting days, when we wanted to
> co-exist with the fbdev driver. Which usually requested these
> resources.
> 
> But makes me wonder why the pci subsystem doesn't just request
> resource automatically when we map a bar and a pci driver is bound?
> 
> Knowledge about which pci bars we need kludged together from
> intel_uncore.c and intel_gtt.c from i915 and intel-gtt.c over in the
> fake agp driver.
> 
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: linux-mm@kvack.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-samsung-soc@vger.kernel.org
> Cc: linux-media@vger.kernel.org
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Cc: linux-pci@vger.kernel.org
> ---
>  drivers/gpu/drm/i915/intel_uncore.c | 25 +++++++++++++++++++++++--
>  1 file changed, 23 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
> index 54e201fdeba4..ce39049d8919 100644
> --- a/drivers/gpu/drm/i915/intel_uncore.c
> +++ b/drivers/gpu/drm/i915/intel_uncore.c
> @@ -1692,10 +1692,13 @@ static int uncore_mmio_setup(struct intel_uncore *uncore)
>  	struct pci_dev *pdev = i915->drm.pdev;
>  	int mmio_bar;
>  	int mmio_size;
> +	int bar_selection;

Signed bitmasks always make me uneasy. But looks like
that's what it is in the pci api. So meh.

> +	int ret;
>  
>  	mmio_bar = IS_GEN(i915, 2) ? 1 : 0;
> +	bar_selection = BIT (2) | BIT(mmio_bar);
                           ^
spurious space			   

That's also not correct for gen2 I think.

gen2:
0 = GMADR
1 = MMADR
2 = IOBAR

gen3:
0 = MMADR
1 = IOBAR
2 = GMADR
3 = GTTADR

gen4+:
0+1 = GTTMMADR
2+3 = GMADR
4 = IOBAR

Maybe we should just have an explicit list of bars like that in a
comment?

I'd also suggest sucking this bitmask calculation into a small helper
so you can reuse it for the release.

>  	/*
> -	 * Before gen4, the registers and the GTT are behind different BARs.
> +	 * On gen3 the registers and the GTT are behind different BARs.
>  	 * However, from gen4 onwards, the registers and the GTT are shared
>  	 * in the same BAR, so we want to restrict this ioremap from
>  	 * clobbering the GTT which we want ioremap_wc instead. Fortunately,
> @@ -1703,6 +1706,8 @@ static int uncore_mmio_setup(struct intel_uncore *uncore)
>  	 * generations up to Ironlake.
>  	 * For dgfx chips register range is expanded to 4MB.
>  	 */
> +	if (INTEL_GEN(i915) == 3)
> +		bar_selection |= BIT(3);
>  	if (INTEL_GEN(i915) < 5)
>  		mmio_size = 512 * 1024;
>  	else if (IS_DGFX(i915))
> @@ -1710,8 +1715,15 @@ static int uncore_mmio_setup(struct intel_uncore *uncore)
>  	else
>  		mmio_size = 2 * 1024 * 1024;
>  
> +	ret = pci_request_selected_regions(pdev, bar_selection, "i915");
> +	if (ret < 0) {
> +		drm_err(&i915->drm, "failed to request pci bars\n");
> +		return ret;
> +	}
> +
>  	uncore->regs = pci_iomap(pdev, mmio_bar, mmio_size);
>  	if (uncore->regs == NULL) {
> +		pci_release_selected_regions(pdev, bar_selection);
>  		drm_err(&i915->drm, "failed to map registers\n");
>  		return -EIO;
>  	}
> @@ -1721,9 +1733,18 @@ static int uncore_mmio_setup(struct intel_uncore *uncore)
>  
>  static void uncore_mmio_cleanup(struct intel_uncore *uncore)
>  {
> -	struct pci_dev *pdev = uncore->i915->drm.pdev;
> +	struct drm_i915_private *i915 = uncore->i915;
> +	struct pci_dev *pdev = i915->drm.pdev;
> +	int mmio_bar;
> +	int bar_selection;
> +
> +	mmio_bar = IS_GEN(i915, 2) ? 1 : 0;
> +	bar_selection = BIT (2) | BIT(mmio_bar);
> +	if (INTEL_GEN(i915) == 3)
> +		bar_selection |= BIT(3);
>  
>  	pci_iounmap(pdev, uncore->regs);
> +	pci_release_selected_regions(pdev, bar_selection);
>  }
>  
>  void intel_uncore_init_early(struct intel_uncore *uncore,
> -- 
> 2.28.0
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

-- 
Ville Syrjälä
Intel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 17/17] drm/i915: Properly request PCI BARs
@ 2020-10-09  9:47     ` Ville Syrjälä
  0 siblings, 0 replies; 214+ messages in thread
From: Ville Syrjälä @ 2020-10-09  9:47 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: DRI Development, LKML, linux-s390, linux-samsung-soc, Jan Kara,
	Kees Cook, kvm, Jason Gunthorpe, linux-pci, linux-mm,
	Jérôme Glisse, John Hubbard, Bjorn Helgaas,
	Daniel Vetter, Dan Williams, Andrew Morton, linux-arm-kernel,
	linux-media

On Fri, Oct 09, 2020 at 09:59:34AM +0200, Daniel Vetter wrote:
> When trying to test my CONFIG_IO_STRICT_DEVMEM changes I realized they
> do nothing for i915. Because i915 doesn't request any regions, like
> pretty much all drm pci drivers. I guess this is some very old
> remnants from the userspace modesetting days, when we wanted to
> co-exist with the fbdev driver. Which usually requested these
> resources.
> 
> But makes me wonder why the pci subsystem doesn't just request
> resource automatically when we map a bar and a pci driver is bound?
> 
> Knowledge about which pci bars we need kludged together from
> intel_uncore.c and intel_gtt.c from i915 and intel-gtt.c over in the
> fake agp driver.
> 
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: J�r�me Glisse <jglisse@redhat.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: linux-mm@kvack.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-samsung-soc@vger.kernel.org
> Cc: linux-media@vger.kernel.org
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Cc: linux-pci@vger.kernel.org
> ---
>  drivers/gpu/drm/i915/intel_uncore.c | 25 +++++++++++++++++++++++--
>  1 file changed, 23 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
> index 54e201fdeba4..ce39049d8919 100644
> --- a/drivers/gpu/drm/i915/intel_uncore.c
> +++ b/drivers/gpu/drm/i915/intel_uncore.c
> @@ -1692,10 +1692,13 @@ static int uncore_mmio_setup(struct intel_uncore *uncore)
>  	struct pci_dev *pdev = i915->drm.pdev;
>  	int mmio_bar;
>  	int mmio_size;
> +	int bar_selection;

Signed bitmasks always make me uneasy. But looks like
that's what it is in the pci api. So meh.

> +	int ret;
>  
>  	mmio_bar = IS_GEN(i915, 2) ? 1 : 0;
> +	bar_selection = BIT (2) | BIT(mmio_bar);
                           ^
spurious space			   

That's also not correct for gen2 I think.

gen2:
0 = GMADR
1 = MMADR
2 = IOBAR

gen3:
0 = MMADR
1 = IOBAR
2 = GMADR
3 = GTTADR

gen4+:
0+1 = GTTMMADR
2+3 = GMADR
4 = IOBAR

Maybe we should just have an explicit list of bars like that in a
comment?

I'd also suggest sucking this bitmask calculation into a small helper
so you can reuse it for the release.

>  	/*
> -	 * Before gen4, the registers and the GTT are behind different BARs.
> +	 * On gen3 the registers and the GTT are behind different BARs.
>  	 * However, from gen4 onwards, the registers and the GTT are shared
>  	 * in the same BAR, so we want to restrict this ioremap from
>  	 * clobbering the GTT which we want ioremap_wc instead. Fortunately,
> @@ -1703,6 +1706,8 @@ static int uncore_mmio_setup(struct intel_uncore *uncore)
>  	 * generations up to Ironlake.
>  	 * For dgfx chips register range is expanded to 4MB.
>  	 */
> +	if (INTEL_GEN(i915) == 3)
> +		bar_selection |= BIT(3);
>  	if (INTEL_GEN(i915) < 5)
>  		mmio_size = 512 * 1024;
>  	else if (IS_DGFX(i915))
> @@ -1710,8 +1715,15 @@ static int uncore_mmio_setup(struct intel_uncore *uncore)
>  	else
>  		mmio_size = 2 * 1024 * 1024;
>  
> +	ret = pci_request_selected_regions(pdev, bar_selection, "i915");
> +	if (ret < 0) {
> +		drm_err(&i915->drm, "failed to request pci bars\n");
> +		return ret;
> +	}
> +
>  	uncore->regs = pci_iomap(pdev, mmio_bar, mmio_size);
>  	if (uncore->regs == NULL) {
> +		pci_release_selected_regions(pdev, bar_selection);
>  		drm_err(&i915->drm, "failed to map registers\n");
>  		return -EIO;
>  	}
> @@ -1721,9 +1733,18 @@ static int uncore_mmio_setup(struct intel_uncore *uncore)
>  
>  static void uncore_mmio_cleanup(struct intel_uncore *uncore)
>  {
> -	struct pci_dev *pdev = uncore->i915->drm.pdev;
> +	struct drm_i915_private *i915 = uncore->i915;
> +	struct pci_dev *pdev = i915->drm.pdev;
> +	int mmio_bar;
> +	int bar_selection;
> +
> +	mmio_bar = IS_GEN(i915, 2) ? 1 : 0;
> +	bar_selection = BIT (2) | BIT(mmio_bar);
> +	if (INTEL_GEN(i915) == 3)
> +		bar_selection |= BIT(3);
>  
>  	pci_iounmap(pdev, uncore->regs);
> +	pci_release_selected_regions(pdev, bar_selection);
>  }
>  
>  void intel_uncore_init_early(struct intel_uncore *uncore,
> -- 
> 2.28.0
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

-- 
Ville Syrj�l�
Intel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 17/17] drm/i915: Properly request PCI BARs
@ 2020-10-09  9:47     ` Ville Syrjälä
  0 siblings, 0 replies; 214+ messages in thread
From: Ville Syrjälä @ 2020-10-09  9:47 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, kvm,
	linux-mm, linux-pci, LKML, DRI Development, Jason Gunthorpe,
	Jérôme Glisse, John Hubbard, Bjorn Helgaas,
	Daniel Vetter, Dan Williams, Andrew Morton, linux-arm-kernel,
	linux-media

On Fri, Oct 09, 2020 at 09:59:34AM +0200, Daniel Vetter wrote:
> When trying to test my CONFIG_IO_STRICT_DEVMEM changes I realized they
> do nothing for i915. Because i915 doesn't request any regions, like
> pretty much all drm pci drivers. I guess this is some very old
> remnants from the userspace modesetting days, when we wanted to
> co-exist with the fbdev driver. Which usually requested these
> resources.
> 
> But makes me wonder why the pci subsystem doesn't just request
> resource automatically when we map a bar and a pci driver is bound?
> 
> Knowledge about which pci bars we need kludged together from
> intel_uncore.c and intel_gtt.c from i915 and intel-gtt.c over in the
> fake agp driver.
> 
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: linux-mm@kvack.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-samsung-soc@vger.kernel.org
> Cc: linux-media@vger.kernel.org
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Cc: linux-pci@vger.kernel.org
> ---
>  drivers/gpu/drm/i915/intel_uncore.c | 25 +++++++++++++++++++++++--
>  1 file changed, 23 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
> index 54e201fdeba4..ce39049d8919 100644
> --- a/drivers/gpu/drm/i915/intel_uncore.c
> +++ b/drivers/gpu/drm/i915/intel_uncore.c
> @@ -1692,10 +1692,13 @@ static int uncore_mmio_setup(struct intel_uncore *uncore)
>  	struct pci_dev *pdev = i915->drm.pdev;
>  	int mmio_bar;
>  	int mmio_size;
> +	int bar_selection;

Signed bitmasks always make me uneasy. But looks like
that's what it is in the pci api. So meh.

> +	int ret;
>  
>  	mmio_bar = IS_GEN(i915, 2) ? 1 : 0;
> +	bar_selection = BIT (2) | BIT(mmio_bar);
                           ^
spurious space			   

That's also not correct for gen2 I think.

gen2:
0 = GMADR
1 = MMADR
2 = IOBAR

gen3:
0 = MMADR
1 = IOBAR
2 = GMADR
3 = GTTADR

gen4+:
0+1 = GTTMMADR
2+3 = GMADR
4 = IOBAR

Maybe we should just have an explicit list of bars like that in a
comment?

I'd also suggest sucking this bitmask calculation into a small helper
so you can reuse it for the release.

>  	/*
> -	 * Before gen4, the registers and the GTT are behind different BARs.
> +	 * On gen3 the registers and the GTT are behind different BARs.
>  	 * However, from gen4 onwards, the registers and the GTT are shared
>  	 * in the same BAR, so we want to restrict this ioremap from
>  	 * clobbering the GTT which we want ioremap_wc instead. Fortunately,
> @@ -1703,6 +1706,8 @@ static int uncore_mmio_setup(struct intel_uncore *uncore)
>  	 * generations up to Ironlake.
>  	 * For dgfx chips register range is expanded to 4MB.
>  	 */
> +	if (INTEL_GEN(i915) == 3)
> +		bar_selection |= BIT(3);
>  	if (INTEL_GEN(i915) < 5)
>  		mmio_size = 512 * 1024;
>  	else if (IS_DGFX(i915))
> @@ -1710,8 +1715,15 @@ static int uncore_mmio_setup(struct intel_uncore *uncore)
>  	else
>  		mmio_size = 2 * 1024 * 1024;
>  
> +	ret = pci_request_selected_regions(pdev, bar_selection, "i915");
> +	if (ret < 0) {
> +		drm_err(&i915->drm, "failed to request pci bars\n");
> +		return ret;
> +	}
> +
>  	uncore->regs = pci_iomap(pdev, mmio_bar, mmio_size);
>  	if (uncore->regs == NULL) {
> +		pci_release_selected_regions(pdev, bar_selection);
>  		drm_err(&i915->drm, "failed to map registers\n");
>  		return -EIO;
>  	}
> @@ -1721,9 +1733,18 @@ static int uncore_mmio_setup(struct intel_uncore *uncore)
>  
>  static void uncore_mmio_cleanup(struct intel_uncore *uncore)
>  {
> -	struct pci_dev *pdev = uncore->i915->drm.pdev;
> +	struct drm_i915_private *i915 = uncore->i915;
> +	struct pci_dev *pdev = i915->drm.pdev;
> +	int mmio_bar;
> +	int bar_selection;
> +
> +	mmio_bar = IS_GEN(i915, 2) ? 1 : 0;
> +	bar_selection = BIT (2) | BIT(mmio_bar);
> +	if (INTEL_GEN(i915) == 3)
> +		bar_selection |= BIT(3);
>  
>  	pci_iounmap(pdev, uncore->regs);
> +	pci_release_selected_regions(pdev, bar_selection);
>  }
>  
>  void intel_uncore_init_early(struct intel_uncore *uncore,
> -- 
> 2.28.0
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

-- 
Ville Syrjälä
Intel

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 17/17] drm/i915: Properly request PCI BARs
@ 2020-10-09  9:47     ` Ville Syrjälä
  0 siblings, 0 replies; 214+ messages in thread
From: Ville Syrjälä @ 2020-10-09  9:47 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, kvm,
	linux-mm, linux-pci, LKML, DRI Development, Jason Gunthorpe,
	Jérôme Glisse, John Hubbard, Bjorn Helgaas,
	Daniel Vetter, Dan Williams, Andrew Morton, linux-arm-kernel,
	linux-media

On Fri, Oct 09, 2020 at 09:59:34AM +0200, Daniel Vetter wrote:
> When trying to test my CONFIG_IO_STRICT_DEVMEM changes I realized they
> do nothing for i915. Because i915 doesn't request any regions, like
> pretty much all drm pci drivers. I guess this is some very old
> remnants from the userspace modesetting days, when we wanted to
> co-exist with the fbdev driver. Which usually requested these
> resources.
> 
> But makes me wonder why the pci subsystem doesn't just request
> resource automatically when we map a bar and a pci driver is bound?
> 
> Knowledge about which pci bars we need kludged together from
> intel_uncore.c and intel_gtt.c from i915 and intel-gtt.c over in the
> fake agp driver.
> 
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: linux-mm@kvack.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-samsung-soc@vger.kernel.org
> Cc: linux-media@vger.kernel.org
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Cc: linux-pci@vger.kernel.org
> ---
>  drivers/gpu/drm/i915/intel_uncore.c | 25 +++++++++++++++++++++++--
>  1 file changed, 23 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
> index 54e201fdeba4..ce39049d8919 100644
> --- a/drivers/gpu/drm/i915/intel_uncore.c
> +++ b/drivers/gpu/drm/i915/intel_uncore.c
> @@ -1692,10 +1692,13 @@ static int uncore_mmio_setup(struct intel_uncore *uncore)
>  	struct pci_dev *pdev = i915->drm.pdev;
>  	int mmio_bar;
>  	int mmio_size;
> +	int bar_selection;

Signed bitmasks always make me uneasy. But looks like
that's what it is in the pci api. So meh.

> +	int ret;
>  
>  	mmio_bar = IS_GEN(i915, 2) ? 1 : 0;
> +	bar_selection = BIT (2) | BIT(mmio_bar);
                           ^
spurious space			   

That's also not correct for gen2 I think.

gen2:
0 = GMADR
1 = MMADR
2 = IOBAR

gen3:
0 = MMADR
1 = IOBAR
2 = GMADR
3 = GTTADR

gen4+:
0+1 = GTTMMADR
2+3 = GMADR
4 = IOBAR

Maybe we should just have an explicit list of bars like that in a
comment?

I'd also suggest sucking this bitmask calculation into a small helper
so you can reuse it for the release.

>  	/*
> -	 * Before gen4, the registers and the GTT are behind different BARs.
> +	 * On gen3 the registers and the GTT are behind different BARs.
>  	 * However, from gen4 onwards, the registers and the GTT are shared
>  	 * in the same BAR, so we want to restrict this ioremap from
>  	 * clobbering the GTT which we want ioremap_wc instead. Fortunately,
> @@ -1703,6 +1706,8 @@ static int uncore_mmio_setup(struct intel_uncore *uncore)
>  	 * generations up to Ironlake.
>  	 * For dgfx chips register range is expanded to 4MB.
>  	 */
> +	if (INTEL_GEN(i915) == 3)
> +		bar_selection |= BIT(3);
>  	if (INTEL_GEN(i915) < 5)
>  		mmio_size = 512 * 1024;
>  	else if (IS_DGFX(i915))
> @@ -1710,8 +1715,15 @@ static int uncore_mmio_setup(struct intel_uncore *uncore)
>  	else
>  		mmio_size = 2 * 1024 * 1024;
>  
> +	ret = pci_request_selected_regions(pdev, bar_selection, "i915");
> +	if (ret < 0) {
> +		drm_err(&i915->drm, "failed to request pci bars\n");
> +		return ret;
> +	}
> +
>  	uncore->regs = pci_iomap(pdev, mmio_bar, mmio_size);
>  	if (uncore->regs == NULL) {
> +		pci_release_selected_regions(pdev, bar_selection);
>  		drm_err(&i915->drm, "failed to map registers\n");
>  		return -EIO;
>  	}
> @@ -1721,9 +1733,18 @@ static int uncore_mmio_setup(struct intel_uncore *uncore)
>  
>  static void uncore_mmio_cleanup(struct intel_uncore *uncore)
>  {
> -	struct pci_dev *pdev = uncore->i915->drm.pdev;
> +	struct drm_i915_private *i915 = uncore->i915;
> +	struct pci_dev *pdev = i915->drm.pdev;
> +	int mmio_bar;
> +	int bar_selection;
> +
> +	mmio_bar = IS_GEN(i915, 2) ? 1 : 0;
> +	bar_selection = BIT (2) | BIT(mmio_bar);
> +	if (INTEL_GEN(i915) == 3)
> +		bar_selection |= BIT(3);
>  
>  	pci_iounmap(pdev, uncore->regs);
> +	pci_release_selected_regions(pdev, bar_selection);
>  }
>  
>  void intel_uncore_init_early(struct intel_uncore *uncore,
> -- 
> 2.28.0
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

-- 
Ville Syrjälä
Intel
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 17/17] drm/i915: Properly request PCI BARs
  2020-10-09  9:47     ` Ville Syrjälä
  (?)
@ 2020-10-09 10:01       ` Daniel Vetter
  -1 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-09 10:01 UTC (permalink / raw)
  To: Ville Syrjälä
  Cc: DRI Development, LKML, linux-s390, linux-samsung-soc, Jan Kara,
	Kees Cook, KVM list, Jason Gunthorpe, Linux PCI, Linux MM,
	Jérôme Glisse, John Hubbard, Bjorn Helgaas,
	Daniel Vetter, Dan Williams, Andrew Morton, Linux ARM,
	open list:DMA BUFFER SHARING FRAMEWORK

On Fri, Oct 9, 2020 at 11:47 AM Ville Syrjälä
<ville.syrjala@linux.intel.com> wrote:
>
> On Fri, Oct 09, 2020 at 09:59:34AM +0200, Daniel Vetter wrote:
> > When trying to test my CONFIG_IO_STRICT_DEVMEM changes I realized they
> > do nothing for i915. Because i915 doesn't request any regions, like
> > pretty much all drm pci drivers. I guess this is some very old
> > remnants from the userspace modesetting days, when we wanted to
> > co-exist with the fbdev driver. Which usually requested these
> > resources.
> >
> > But makes me wonder why the pci subsystem doesn't just request
> > resource automatically when we map a bar and a pci driver is bound?
> >
> > Knowledge about which pci bars we need kludged together from
> > intel_uncore.c and intel_gtt.c from i915 and intel-gtt.c over in the
> > fake agp driver.
> >
> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > Cc: Jason Gunthorpe <jgg@ziepe.ca>
> > Cc: Kees Cook <keescook@chromium.org>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: John Hubbard <jhubbard@nvidia.com>
> > Cc: Jérôme Glisse <jglisse@redhat.com>
> > Cc: Jan Kara <jack@suse.cz>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: linux-mm@kvack.org
> > Cc: linux-arm-kernel@lists.infradead.org
> > Cc: linux-samsung-soc@vger.kernel.org
> > Cc: linux-media@vger.kernel.org
> > Cc: Bjorn Helgaas <bhelgaas@google.com>
> > Cc: linux-pci@vger.kernel.org
> > ---
> >  drivers/gpu/drm/i915/intel_uncore.c | 25 +++++++++++++++++++++++--
> >  1 file changed, 23 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
> > index 54e201fdeba4..ce39049d8919 100644
> > --- a/drivers/gpu/drm/i915/intel_uncore.c
> > +++ b/drivers/gpu/drm/i915/intel_uncore.c
> > @@ -1692,10 +1692,13 @@ static int uncore_mmio_setup(struct intel_uncore *uncore)
> >       struct pci_dev *pdev = i915->drm.pdev;
> >       int mmio_bar;
> >       int mmio_size;
> > +     int bar_selection;
>
> Signed bitmasks always make me uneasy. But looks like
> that's what it is in the pci api. So meh.

Yeah it's surprising.

> > +     int ret;
> >
> >       mmio_bar = IS_GEN(i915, 2) ? 1 : 0;
> > +     bar_selection = BIT (2) | BIT(mmio_bar);
>                            ^
> spurious space
>
> That's also not correct for gen2 I think.
>
> gen2:
> 0 = GMADR
> 1 = MMADR
> 2 = IOBAR
>
> gen3:
> 0 = MMADR
> 1 = IOBAR
> 2 = GMADR
> 3 = GTTADR
>
> gen4+:
> 0+1 = GTTMMADR
> 2+3 = GMADR
> 4 = IOBAR
>
> Maybe we should just have an explicit list of bars like that in a
> comment?
>
> I'd also suggest sucking this bitmask calculation into a small helper
> so you can reuse it for the release.

tbh I just hacked this up for testing. Given how almost no other drm
driver does this, I'm wondering whether we should or not.

Also the only reason why I didn't just use the pci_request_regions
helper is to avoid the vga ioport range, since that's managed by
vgaarbiter.

So I think if we go for this for real we should:
- register the vga ioport range in the vgaarbiter
- have a pci_request_iomem_regions helper that grabs all mem bars
- roll that out to all drm pci drivers

Or something like that. The other complication is when we resize the
iobar. So not really sure what to do here.
-Daniel

>
> >       /*
> > -      * Before gen4, the registers and the GTT are behind different BARs.
> > +      * On gen3 the registers and the GTT are behind different BARs.
> >        * However, from gen4 onwards, the registers and the GTT are shared
> >        * in the same BAR, so we want to restrict this ioremap from
> >        * clobbering the GTT which we want ioremap_wc instead. Fortunately,
> > @@ -1703,6 +1706,8 @@ static int uncore_mmio_setup(struct intel_uncore *uncore)
> >        * generations up to Ironlake.
> >        * For dgfx chips register range is expanded to 4MB.
> >        */
> > +     if (INTEL_GEN(i915) == 3)
> > +             bar_selection |= BIT(3);
> >       if (INTEL_GEN(i915) < 5)
> >               mmio_size = 512 * 1024;
> >       else if (IS_DGFX(i915))
> > @@ -1710,8 +1715,15 @@ static int uncore_mmio_setup(struct intel_uncore *uncore)
> >       else
> >               mmio_size = 2 * 1024 * 1024;
> >
> > +     ret = pci_request_selected_regions(pdev, bar_selection, "i915");
> > +     if (ret < 0) {
> > +             drm_err(&i915->drm, "failed to request pci bars\n");
> > +             return ret;
> > +     }
> > +
> >       uncore->regs = pci_iomap(pdev, mmio_bar, mmio_size);
> >       if (uncore->regs == NULL) {
> > +             pci_release_selected_regions(pdev, bar_selection);
> >               drm_err(&i915->drm, "failed to map registers\n");
> >               return -EIO;
> >       }
> > @@ -1721,9 +1733,18 @@ static int uncore_mmio_setup(struct intel_uncore *uncore)
> >
> >  static void uncore_mmio_cleanup(struct intel_uncore *uncore)
> >  {
> > -     struct pci_dev *pdev = uncore->i915->drm.pdev;
> > +     struct drm_i915_private *i915 = uncore->i915;
> > +     struct pci_dev *pdev = i915->drm.pdev;
> > +     int mmio_bar;
> > +     int bar_selection;
> > +
> > +     mmio_bar = IS_GEN(i915, 2) ? 1 : 0;
> > +     bar_selection = BIT (2) | BIT(mmio_bar);
> > +     if (INTEL_GEN(i915) == 3)
> > +             bar_selection |= BIT(3);
> >
> >       pci_iounmap(pdev, uncore->regs);
> > +     pci_release_selected_regions(pdev, bar_selection);
> >  }
> >
> >  void intel_uncore_init_early(struct intel_uncore *uncore,
> > --
> > 2.28.0
> >
> > _______________________________________________
> > dri-devel mailing list
> > dri-devel@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
> --
> Ville Syrjälä
> Intel



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 17/17] drm/i915: Properly request PCI BARs
@ 2020-10-09 10:01       ` Daniel Vetter
  0 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-09 10:01 UTC (permalink / raw)
  To: Ville Syrjälä
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, KVM list,
	Linux MM, Linux PCI, LKML, DRI Development, Jason Gunthorpe,
	Jérôme Glisse, John Hubbard, Bjorn Helgaas,
	Daniel Vetter, Dan Williams, Andrew Morton, Linux ARM,
	open list:DMA BUFFER SHARING FRAMEWORK

On Fri, Oct 9, 2020 at 11:47 AM Ville Syrjälä
<ville.syrjala@linux.intel.com> wrote:
>
> On Fri, Oct 09, 2020 at 09:59:34AM +0200, Daniel Vetter wrote:
> > When trying to test my CONFIG_IO_STRICT_DEVMEM changes I realized they
> > do nothing for i915. Because i915 doesn't request any regions, like
> > pretty much all drm pci drivers. I guess this is some very old
> > remnants from the userspace modesetting days, when we wanted to
> > co-exist with the fbdev driver. Which usually requested these
> > resources.
> >
> > But makes me wonder why the pci subsystem doesn't just request
> > resource automatically when we map a bar and a pci driver is bound?
> >
> > Knowledge about which pci bars we need kludged together from
> > intel_uncore.c and intel_gtt.c from i915 and intel-gtt.c over in the
> > fake agp driver.
> >
> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > Cc: Jason Gunthorpe <jgg@ziepe.ca>
> > Cc: Kees Cook <keescook@chromium.org>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: John Hubbard <jhubbard@nvidia.com>
> > Cc: Jérôme Glisse <jglisse@redhat.com>
> > Cc: Jan Kara <jack@suse.cz>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: linux-mm@kvack.org
> > Cc: linux-arm-kernel@lists.infradead.org
> > Cc: linux-samsung-soc@vger.kernel.org
> > Cc: linux-media@vger.kernel.org
> > Cc: Bjorn Helgaas <bhelgaas@google.com>
> > Cc: linux-pci@vger.kernel.org
> > ---
> >  drivers/gpu/drm/i915/intel_uncore.c | 25 +++++++++++++++++++++++--
> >  1 file changed, 23 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
> > index 54e201fdeba4..ce39049d8919 100644
> > --- a/drivers/gpu/drm/i915/intel_uncore.c
> > +++ b/drivers/gpu/drm/i915/intel_uncore.c
> > @@ -1692,10 +1692,13 @@ static int uncore_mmio_setup(struct intel_uncore *uncore)
> >       struct pci_dev *pdev = i915->drm.pdev;
> >       int mmio_bar;
> >       int mmio_size;
> > +     int bar_selection;
>
> Signed bitmasks always make me uneasy. But looks like
> that's what it is in the pci api. So meh.

Yeah it's surprising.

> > +     int ret;
> >
> >       mmio_bar = IS_GEN(i915, 2) ? 1 : 0;
> > +     bar_selection = BIT (2) | BIT(mmio_bar);
>                            ^
> spurious space
>
> That's also not correct for gen2 I think.
>
> gen2:
> 0 = GMADR
> 1 = MMADR
> 2 = IOBAR
>
> gen3:
> 0 = MMADR
> 1 = IOBAR
> 2 = GMADR
> 3 = GTTADR
>
> gen4+:
> 0+1 = GTTMMADR
> 2+3 = GMADR
> 4 = IOBAR
>
> Maybe we should just have an explicit list of bars like that in a
> comment?
>
> I'd also suggest sucking this bitmask calculation into a small helper
> so you can reuse it for the release.

tbh I just hacked this up for testing. Given how almost no other drm
driver does this, I'm wondering whether we should or not.

Also the only reason why I didn't just use the pci_request_regions
helper is to avoid the vga ioport range, since that's managed by
vgaarbiter.

So I think if we go for this for real we should:
- register the vga ioport range in the vgaarbiter
- have a pci_request_iomem_regions helper that grabs all mem bars
- roll that out to all drm pci drivers

Or something like that. The other complication is when we resize the
iobar. So not really sure what to do here.
-Daniel

>
> >       /*
> > -      * Before gen4, the registers and the GTT are behind different BARs.
> > +      * On gen3 the registers and the GTT are behind different BARs.
> >        * However, from gen4 onwards, the registers and the GTT are shared
> >        * in the same BAR, so we want to restrict this ioremap from
> >        * clobbering the GTT which we want ioremap_wc instead. Fortunately,
> > @@ -1703,6 +1706,8 @@ static int uncore_mmio_setup(struct intel_uncore *uncore)
> >        * generations up to Ironlake.
> >        * For dgfx chips register range is expanded to 4MB.
> >        */
> > +     if (INTEL_GEN(i915) == 3)
> > +             bar_selection |= BIT(3);
> >       if (INTEL_GEN(i915) < 5)
> >               mmio_size = 512 * 1024;
> >       else if (IS_DGFX(i915))
> > @@ -1710,8 +1715,15 @@ static int uncore_mmio_setup(struct intel_uncore *uncore)
> >       else
> >               mmio_size = 2 * 1024 * 1024;
> >
> > +     ret = pci_request_selected_regions(pdev, bar_selection, "i915");
> > +     if (ret < 0) {
> > +             drm_err(&i915->drm, "failed to request pci bars\n");
> > +             return ret;
> > +     }
> > +
> >       uncore->regs = pci_iomap(pdev, mmio_bar, mmio_size);
> >       if (uncore->regs == NULL) {
> > +             pci_release_selected_regions(pdev, bar_selection);
> >               drm_err(&i915->drm, "failed to map registers\n");
> >               return -EIO;
> >       }
> > @@ -1721,9 +1733,18 @@ static int uncore_mmio_setup(struct intel_uncore *uncore)
> >
> >  static void uncore_mmio_cleanup(struct intel_uncore *uncore)
> >  {
> > -     struct pci_dev *pdev = uncore->i915->drm.pdev;
> > +     struct drm_i915_private *i915 = uncore->i915;
> > +     struct pci_dev *pdev = i915->drm.pdev;
> > +     int mmio_bar;
> > +     int bar_selection;
> > +
> > +     mmio_bar = IS_GEN(i915, 2) ? 1 : 0;
> > +     bar_selection = BIT (2) | BIT(mmio_bar);
> > +     if (INTEL_GEN(i915) == 3)
> > +             bar_selection |= BIT(3);
> >
> >       pci_iounmap(pdev, uncore->regs);
> > +     pci_release_selected_regions(pdev, bar_selection);
> >  }
> >
> >  void intel_uncore_init_early(struct intel_uncore *uncore,
> > --
> > 2.28.0
> >
> > _______________________________________________
> > dri-devel mailing list
> > dri-devel@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
> --
> Ville Syrjälä
> Intel



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 17/17] drm/i915: Properly request PCI BARs
@ 2020-10-09 10:01       ` Daniel Vetter
  0 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-09 10:01 UTC (permalink / raw)
  To: Ville Syrjälä
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, KVM list,
	Linux MM, Linux PCI, LKML, DRI Development, Jason Gunthorpe,
	Jérôme Glisse, John Hubbard, Bjorn Helgaas,
	Daniel Vetter, Dan Williams, Andrew Morton, Linux ARM,
	open list:DMA BUFFER SHARING FRAMEWORK

On Fri, Oct 9, 2020 at 11:47 AM Ville Syrjälä
<ville.syrjala@linux.intel.com> wrote:
>
> On Fri, Oct 09, 2020 at 09:59:34AM +0200, Daniel Vetter wrote:
> > When trying to test my CONFIG_IO_STRICT_DEVMEM changes I realized they
> > do nothing for i915. Because i915 doesn't request any regions, like
> > pretty much all drm pci drivers. I guess this is some very old
> > remnants from the userspace modesetting days, when we wanted to
> > co-exist with the fbdev driver. Which usually requested these
> > resources.
> >
> > But makes me wonder why the pci subsystem doesn't just request
> > resource automatically when we map a bar and a pci driver is bound?
> >
> > Knowledge about which pci bars we need kludged together from
> > intel_uncore.c and intel_gtt.c from i915 and intel-gtt.c over in the
> > fake agp driver.
> >
> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > Cc: Jason Gunthorpe <jgg@ziepe.ca>
> > Cc: Kees Cook <keescook@chromium.org>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: John Hubbard <jhubbard@nvidia.com>
> > Cc: Jérôme Glisse <jglisse@redhat.com>
> > Cc: Jan Kara <jack@suse.cz>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: linux-mm@kvack.org
> > Cc: linux-arm-kernel@lists.infradead.org
> > Cc: linux-samsung-soc@vger.kernel.org
> > Cc: linux-media@vger.kernel.org
> > Cc: Bjorn Helgaas <bhelgaas@google.com>
> > Cc: linux-pci@vger.kernel.org
> > ---
> >  drivers/gpu/drm/i915/intel_uncore.c | 25 +++++++++++++++++++++++--
> >  1 file changed, 23 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
> > index 54e201fdeba4..ce39049d8919 100644
> > --- a/drivers/gpu/drm/i915/intel_uncore.c
> > +++ b/drivers/gpu/drm/i915/intel_uncore.c
> > @@ -1692,10 +1692,13 @@ static int uncore_mmio_setup(struct intel_uncore *uncore)
> >       struct pci_dev *pdev = i915->drm.pdev;
> >       int mmio_bar;
> >       int mmio_size;
> > +     int bar_selection;
>
> Signed bitmasks always make me uneasy. But looks like
> that's what it is in the pci api. So meh.

Yeah it's surprising.

> > +     int ret;
> >
> >       mmio_bar = IS_GEN(i915, 2) ? 1 : 0;
> > +     bar_selection = BIT (2) | BIT(mmio_bar);
>                            ^
> spurious space
>
> That's also not correct for gen2 I think.
>
> gen2:
> 0 = GMADR
> 1 = MMADR
> 2 = IOBAR
>
> gen3:
> 0 = MMADR
> 1 = IOBAR
> 2 = GMADR
> 3 = GTTADR
>
> gen4+:
> 0+1 = GTTMMADR
> 2+3 = GMADR
> 4 = IOBAR
>
> Maybe we should just have an explicit list of bars like that in a
> comment?
>
> I'd also suggest sucking this bitmask calculation into a small helper
> so you can reuse it for the release.

tbh I just hacked this up for testing. Given how almost no other drm
driver does this, I'm wondering whether we should or not.

Also the only reason why I didn't just use the pci_request_regions
helper is to avoid the vga ioport range, since that's managed by
vgaarbiter.

So I think if we go for this for real we should:
- register the vga ioport range in the vgaarbiter
- have a pci_request_iomem_regions helper that grabs all mem bars
- roll that out to all drm pci drivers

Or something like that. The other complication is when we resize the
iobar. So not really sure what to do here.
-Daniel

>
> >       /*
> > -      * Before gen4, the registers and the GTT are behind different BARs.
> > +      * On gen3 the registers and the GTT are behind different BARs.
> >        * However, from gen4 onwards, the registers and the GTT are shared
> >        * in the same BAR, so we want to restrict this ioremap from
> >        * clobbering the GTT which we want ioremap_wc instead. Fortunately,
> > @@ -1703,6 +1706,8 @@ static int uncore_mmio_setup(struct intel_uncore *uncore)
> >        * generations up to Ironlake.
> >        * For dgfx chips register range is expanded to 4MB.
> >        */
> > +     if (INTEL_GEN(i915) == 3)
> > +             bar_selection |= BIT(3);
> >       if (INTEL_GEN(i915) < 5)
> >               mmio_size = 512 * 1024;
> >       else if (IS_DGFX(i915))
> > @@ -1710,8 +1715,15 @@ static int uncore_mmio_setup(struct intel_uncore *uncore)
> >       else
> >               mmio_size = 2 * 1024 * 1024;
> >
> > +     ret = pci_request_selected_regions(pdev, bar_selection, "i915");
> > +     if (ret < 0) {
> > +             drm_err(&i915->drm, "failed to request pci bars\n");
> > +             return ret;
> > +     }
> > +
> >       uncore->regs = pci_iomap(pdev, mmio_bar, mmio_size);
> >       if (uncore->regs == NULL) {
> > +             pci_release_selected_regions(pdev, bar_selection);
> >               drm_err(&i915->drm, "failed to map registers\n");
> >               return -EIO;
> >       }
> > @@ -1721,9 +1733,18 @@ static int uncore_mmio_setup(struct intel_uncore *uncore)
> >
> >  static void uncore_mmio_cleanup(struct intel_uncore *uncore)
> >  {
> > -     struct pci_dev *pdev = uncore->i915->drm.pdev;
> > +     struct drm_i915_private *i915 = uncore->i915;
> > +     struct pci_dev *pdev = i915->drm.pdev;
> > +     int mmio_bar;
> > +     int bar_selection;
> > +
> > +     mmio_bar = IS_GEN(i915, 2) ? 1 : 0;
> > +     bar_selection = BIT (2) | BIT(mmio_bar);
> > +     if (INTEL_GEN(i915) == 3)
> > +             bar_selection |= BIT(3);
> >
> >       pci_iounmap(pdev, uncore->regs);
> > +     pci_release_selected_regions(pdev, bar_selection);
> >  }
> >
> >  void intel_uncore_init_early(struct intel_uncore *uncore,
> > --
> > 2.28.0
> >
> > _______________________________________________
> > dri-devel mailing list
> > dri-devel@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
> --
> Ville Syrjälä
> Intel



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 06/17] media: videobuf2: Move frame_vector into media subsystem
  2020-10-09  7:59   ` Daniel Vetter
  (?)
@ 2020-10-09 10:14     ` Mauro Carvalho Chehab
  -1 siblings, 0 replies; 214+ messages in thread
From: Mauro Carvalho Chehab @ 2020-10-09 10:14 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: DRI Development, LKML, kvm, linux-mm, linux-arm-kernel,
	linux-samsung-soc, linux-media, linux-s390, John Hubbard,
	Daniel Vetter, Jason Gunthorpe, Pawel Osciak, Marek Szyprowski,
	Kyungmin Park, Tomasz Figa, Andrew Morton,
	Jérôme Glisse, Jan Kara, Dan Williams

Em Fri,  9 Oct 2020 09:59:23 +0200
Daniel Vetter <daniel.vetter@ffwll.ch> escreveu:

> It's the only user. This also garbage collects the CONFIG_FRAME_VECTOR
> symbol from all over the tree (well just one place, somehow omap media
> driver still had this in its Kconfig, despite not using it).
> 
> Reviewed-by: John Hubbard <jhubbard@nvidia.com>
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Pawel Osciak <pawel@osciak.com>
> Cc: Marek Szyprowski <m.szyprowski@samsung.com>
> Cc: Kyungmin Park <kyungmin.park@samsung.com>
> Cc: Tomasz Figa <tfiga@chromium.org>
> Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: linux-mm@kvack.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-samsung-soc@vger.kernel.org
> Cc: linux-media@vger.kernel.org
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> ---
>  drivers/media/common/videobuf2/Kconfig        |  1 -
>  drivers/media/common/videobuf2/Makefile       |  1 +
>  .../media/common/videobuf2}/frame_vector.c    |  2 +
>  drivers/media/platform/omap/Kconfig           |  1 -
>  include/linux/mm.h                            | 42 -------------------
>  include/media/videobuf2-core.h                | 42 +++++++++++++++++++
>  mm/Kconfig                                    |  3 --
>  mm/Makefile                                   |  1 -
>  8 files changed, 45 insertions(+), 48 deletions(-)
>  rename {mm => drivers/media/common/videobuf2}/frame_vector.c (99%)
> 
> diff --git a/drivers/media/common/videobuf2/Kconfig b/drivers/media/common/videobuf2/Kconfig
> index edbc99ebba87..d2223a12c95f 100644
> --- a/drivers/media/common/videobuf2/Kconfig
> +++ b/drivers/media/common/videobuf2/Kconfig
> @@ -9,7 +9,6 @@ config VIDEOBUF2_V4L2
>  
>  config VIDEOBUF2_MEMOPS
>  	tristate
> -	select FRAME_VECTOR
>  
>  config VIDEOBUF2_DMA_CONTIG
>  	tristate
> diff --git a/drivers/media/common/videobuf2/Makefile b/drivers/media/common/videobuf2/Makefile
> index 77bebe8b202f..54306f8d096c 100644
> --- a/drivers/media/common/videobuf2/Makefile
> +++ b/drivers/media/common/videobuf2/Makefile
> @@ -1,5 +1,6 @@
>  # SPDX-License-Identifier: GPL-2.0
>  videobuf2-common-objs := videobuf2-core.o
> +videobuf2-common-objs += frame_vector.o
>  
>  ifeq ($(CONFIG_TRACEPOINTS),y)
>    videobuf2-common-objs += vb2-trace.o
> diff --git a/mm/frame_vector.c b/drivers/media/common/videobuf2/frame_vector.c
> similarity index 99%
> rename from mm/frame_vector.c
> rename to drivers/media/common/videobuf2/frame_vector.c
> index d44779e56313..2b0b97761d15 100644
> --- a/mm/frame_vector.c
> +++ b/drivers/media/common/videobuf2/frame_vector.c
> @@ -8,6 +8,8 @@
>  #include <linux/pagemap.h>
>  #include <linux/sched.h>
>  
> +#include <media/videobuf2-core.h>
> +

See my comment below...

>  /**
>   * get_vaddr_frames() - map virtual addresses to pfns
>   * @start:	starting user address
> diff --git a/drivers/media/platform/omap/Kconfig b/drivers/media/platform/omap/Kconfig
> index f73b5893220d..de16de46c0f4 100644
> --- a/drivers/media/platform/omap/Kconfig
> +++ b/drivers/media/platform/omap/Kconfig
> @@ -12,6 +12,5 @@ config VIDEO_OMAP2_VOUT
>  	depends on VIDEO_V4L2
>  	select VIDEOBUF2_DMA_CONTIG
>  	select OMAP2_VRFB if ARCH_OMAP2 || ARCH_OMAP3
> -	select FRAME_VECTOR
>  	help
>  	  V4L2 Display driver support for OMAP2/3 based boards.
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 16b799a0522c..acd60fbf1a5a 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1743,48 +1743,6 @@ int account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc);
>  int __account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc,
>  			struct task_struct *task, bool bypass_rlim);
>  
> -/* Container for pinned pfns / pages */
> -struct frame_vector {
> -	unsigned int nr_allocated;	/* Number of frames we have space for */
> -	unsigned int nr_frames;	/* Number of frames stored in ptrs array */
> -	bool got_ref;		/* Did we pin pages by getting page ref? */
> -	bool is_pfns;		/* Does array contain pages or pfns? */
> -	void *ptrs[];		/* Array of pinned pfns / pages. Use
> -				 * pfns_vector_pages() or pfns_vector_pfns()
> -				 * for access */
> -};
> -
> -struct frame_vector *frame_vector_create(unsigned int nr_frames);
> -void frame_vector_destroy(struct frame_vector *vec);
> -int get_vaddr_frames(unsigned long start, unsigned int nr_pfns,
> -		     unsigned int gup_flags, struct frame_vector *vec);
> -void put_vaddr_frames(struct frame_vector *vec);
> -int frame_vector_to_pages(struct frame_vector *vec);
> -void frame_vector_to_pfns(struct frame_vector *vec);
> -
> -static inline unsigned int frame_vector_count(struct frame_vector *vec)
> -{
> -	return vec->nr_frames;
> -}
> -
> -static inline struct page **frame_vector_pages(struct frame_vector *vec)
> -{
> -	if (vec->is_pfns) {
> -		int err = frame_vector_to_pages(vec);
> -
> -		if (err)
> -			return ERR_PTR(err);
> -	}
> -	return (struct page **)(vec->ptrs);
> -}
> -
> -static inline unsigned long *frame_vector_pfns(struct frame_vector *vec)
> -{
> -	if (!vec->is_pfns)
> -		frame_vector_to_pfns(vec);
> -	return (unsigned long *)(vec->ptrs);
> -}
> -
>  struct kvec;
>  int get_kernel_pages(const struct kvec *iov, int nr_pages, int write,
>  			struct page **pages);
> diff --git a/include/media/videobuf2-core.h b/include/media/videobuf2-core.h
> index bbb3f26fbde9..a2e75ca0334f 100644
> --- a/include/media/videobuf2-core.h
> +++ b/include/media/videobuf2-core.h
> @@ -1254,4 +1254,46 @@ bool vb2_request_object_is_buffer(struct media_request_object *obj);
>   */
>  unsigned int vb2_request_buffer_cnt(struct media_request *req);
>  
> +/* Container for pinned pfns / pages in frame_vector.c */
> +struct frame_vector {
> +	unsigned int nr_allocated;	/* Number of frames we have space for */
> +	unsigned int nr_frames;	/* Number of frames stored in ptrs array */
> +	bool got_ref;		/* Did we pin pages by getting page ref? */
> +	bool is_pfns;		/* Does array contain pages or pfns? */
> +	void *ptrs[];		/* Array of pinned pfns / pages. Use
> +				 * pfns_vector_pages() or pfns_vector_pfns()
> +				 * for access */
> +};
> +
> +struct frame_vector *frame_vector_create(unsigned int nr_frames);
> +void frame_vector_destroy(struct frame_vector *vec);
> +int get_vaddr_frames(unsigned long start, unsigned int nr_pfns,
> +		     unsigned int gup_flags, struct frame_vector *vec);
> +void put_vaddr_frames(struct frame_vector *vec);
> +int frame_vector_to_pages(struct frame_vector *vec);
> +void frame_vector_to_pfns(struct frame_vector *vec);
> +
> +static inline unsigned int frame_vector_count(struct frame_vector *vec)
> +{
> +	return vec->nr_frames;
> +}
> +
> +static inline struct page **frame_vector_pages(struct frame_vector *vec)
> +{
> +	if (vec->is_pfns) {
> +		int err = frame_vector_to_pages(vec);
> +
> +		if (err)
> +			return ERR_PTR(err);
> +	}
> +	return (struct page **)(vec->ptrs);
> +}
> +
> +static inline unsigned long *frame_vector_pfns(struct frame_vector *vec)
> +{
> +	if (!vec->is_pfns)
> +		frame_vector_to_pfns(vec);
> +	return (unsigned long *)(vec->ptrs);
> +}
> +
>  #endif /* _MEDIA_VIDEOBUF2_CORE_H */

Please place those into a include/media/frame_vector.h file, instead of
merging it directly at vb2 core header.

Then include the new header at videobuf2-core.h and at frame_vector.c.

With such changes:

Acked-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

> diff --git a/mm/Kconfig b/mm/Kconfig
> index 6c974888f86f..da6c943fe9f1 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -815,9 +815,6 @@ config DEVICE_PRIVATE
>  	  memory; i.e., memory that is only accessible from the device (or
>  	  group of devices). You likely also want to select HMM_MIRROR.
>  
> -config FRAME_VECTOR
> -	bool
> -
>  config ARCH_USES_HIGH_VMA_FLAGS
>  	bool
>  config ARCH_HAS_PKEYS
> diff --git a/mm/Makefile b/mm/Makefile
> index d5649f1c12c0..a025fd6c6afd 100644
> --- a/mm/Makefile
> +++ b/mm/Makefile
> @@ -111,7 +111,6 @@ obj-$(CONFIG_PAGE_EXTENSION) += page_ext.o
>  obj-$(CONFIG_CMA_DEBUGFS) += cma_debug.o
>  obj-$(CONFIG_USERFAULTFD) += userfaultfd.o
>  obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o
> -obj-$(CONFIG_FRAME_VECTOR) += frame_vector.o
>  obj-$(CONFIG_DEBUG_PAGE_REF) += debug_page_ref.o
>  obj-$(CONFIG_HARDENED_USERCOPY) += usercopy.o
>  obj-$(CONFIG_PERCPU_STATS) += percpu-stats.o



Thanks,
Mauro

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 06/17] media: videobuf2: Move frame_vector into media subsystem
@ 2020-10-09 10:14     ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 214+ messages in thread
From: Mauro Carvalho Chehab @ 2020-10-09 10:14 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Jérôme Glisse, linux-s390, linux-samsung-soc, Jan Kara,
	Pawel Osciak, kvm, Jason Gunthorpe, John Hubbard, LKML,
	DRI Development, Tomasz Figa, linux-mm, Kyungmin Park,
	Daniel Vetter, Andrew Morton, Marek Szyprowski, Dan Williams,
	linux-arm-kernel, linux-media

Em Fri,  9 Oct 2020 09:59:23 +0200
Daniel Vetter <daniel.vetter@ffwll.ch> escreveu:

> It's the only user. This also garbage collects the CONFIG_FRAME_VECTOR
> symbol from all over the tree (well just one place, somehow omap media
> driver still had this in its Kconfig, despite not using it).
> 
> Reviewed-by: John Hubbard <jhubbard@nvidia.com>
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Pawel Osciak <pawel@osciak.com>
> Cc: Marek Szyprowski <m.szyprowski@samsung.com>
> Cc: Kyungmin Park <kyungmin.park@samsung.com>
> Cc: Tomasz Figa <tfiga@chromium.org>
> Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: linux-mm@kvack.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-samsung-soc@vger.kernel.org
> Cc: linux-media@vger.kernel.org
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> ---
>  drivers/media/common/videobuf2/Kconfig        |  1 -
>  drivers/media/common/videobuf2/Makefile       |  1 +
>  .../media/common/videobuf2}/frame_vector.c    |  2 +
>  drivers/media/platform/omap/Kconfig           |  1 -
>  include/linux/mm.h                            | 42 -------------------
>  include/media/videobuf2-core.h                | 42 +++++++++++++++++++
>  mm/Kconfig                                    |  3 --
>  mm/Makefile                                   |  1 -
>  8 files changed, 45 insertions(+), 48 deletions(-)
>  rename {mm => drivers/media/common/videobuf2}/frame_vector.c (99%)
> 
> diff --git a/drivers/media/common/videobuf2/Kconfig b/drivers/media/common/videobuf2/Kconfig
> index edbc99ebba87..d2223a12c95f 100644
> --- a/drivers/media/common/videobuf2/Kconfig
> +++ b/drivers/media/common/videobuf2/Kconfig
> @@ -9,7 +9,6 @@ config VIDEOBUF2_V4L2
>  
>  config VIDEOBUF2_MEMOPS
>  	tristate
> -	select FRAME_VECTOR
>  
>  config VIDEOBUF2_DMA_CONTIG
>  	tristate
> diff --git a/drivers/media/common/videobuf2/Makefile b/drivers/media/common/videobuf2/Makefile
> index 77bebe8b202f..54306f8d096c 100644
> --- a/drivers/media/common/videobuf2/Makefile
> +++ b/drivers/media/common/videobuf2/Makefile
> @@ -1,5 +1,6 @@
>  # SPDX-License-Identifier: GPL-2.0
>  videobuf2-common-objs := videobuf2-core.o
> +videobuf2-common-objs += frame_vector.o
>  
>  ifeq ($(CONFIG_TRACEPOINTS),y)
>    videobuf2-common-objs += vb2-trace.o
> diff --git a/mm/frame_vector.c b/drivers/media/common/videobuf2/frame_vector.c
> similarity index 99%
> rename from mm/frame_vector.c
> rename to drivers/media/common/videobuf2/frame_vector.c
> index d44779e56313..2b0b97761d15 100644
> --- a/mm/frame_vector.c
> +++ b/drivers/media/common/videobuf2/frame_vector.c
> @@ -8,6 +8,8 @@
>  #include <linux/pagemap.h>
>  #include <linux/sched.h>
>  
> +#include <media/videobuf2-core.h>
> +

See my comment below...

>  /**
>   * get_vaddr_frames() - map virtual addresses to pfns
>   * @start:	starting user address
> diff --git a/drivers/media/platform/omap/Kconfig b/drivers/media/platform/omap/Kconfig
> index f73b5893220d..de16de46c0f4 100644
> --- a/drivers/media/platform/omap/Kconfig
> +++ b/drivers/media/platform/omap/Kconfig
> @@ -12,6 +12,5 @@ config VIDEO_OMAP2_VOUT
>  	depends on VIDEO_V4L2
>  	select VIDEOBUF2_DMA_CONTIG
>  	select OMAP2_VRFB if ARCH_OMAP2 || ARCH_OMAP3
> -	select FRAME_VECTOR
>  	help
>  	  V4L2 Display driver support for OMAP2/3 based boards.
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 16b799a0522c..acd60fbf1a5a 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1743,48 +1743,6 @@ int account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc);
>  int __account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc,
>  			struct task_struct *task, bool bypass_rlim);
>  
> -/* Container for pinned pfns / pages */
> -struct frame_vector {
> -	unsigned int nr_allocated;	/* Number of frames we have space for */
> -	unsigned int nr_frames;	/* Number of frames stored in ptrs array */
> -	bool got_ref;		/* Did we pin pages by getting page ref? */
> -	bool is_pfns;		/* Does array contain pages or pfns? */
> -	void *ptrs[];		/* Array of pinned pfns / pages. Use
> -				 * pfns_vector_pages() or pfns_vector_pfns()
> -				 * for access */
> -};
> -
> -struct frame_vector *frame_vector_create(unsigned int nr_frames);
> -void frame_vector_destroy(struct frame_vector *vec);
> -int get_vaddr_frames(unsigned long start, unsigned int nr_pfns,
> -		     unsigned int gup_flags, struct frame_vector *vec);
> -void put_vaddr_frames(struct frame_vector *vec);
> -int frame_vector_to_pages(struct frame_vector *vec);
> -void frame_vector_to_pfns(struct frame_vector *vec);
> -
> -static inline unsigned int frame_vector_count(struct frame_vector *vec)
> -{
> -	return vec->nr_frames;
> -}
> -
> -static inline struct page **frame_vector_pages(struct frame_vector *vec)
> -{
> -	if (vec->is_pfns) {
> -		int err = frame_vector_to_pages(vec);
> -
> -		if (err)
> -			return ERR_PTR(err);
> -	}
> -	return (struct page **)(vec->ptrs);
> -}
> -
> -static inline unsigned long *frame_vector_pfns(struct frame_vector *vec)
> -{
> -	if (!vec->is_pfns)
> -		frame_vector_to_pfns(vec);
> -	return (unsigned long *)(vec->ptrs);
> -}
> -
>  struct kvec;
>  int get_kernel_pages(const struct kvec *iov, int nr_pages, int write,
>  			struct page **pages);
> diff --git a/include/media/videobuf2-core.h b/include/media/videobuf2-core.h
> index bbb3f26fbde9..a2e75ca0334f 100644
> --- a/include/media/videobuf2-core.h
> +++ b/include/media/videobuf2-core.h
> @@ -1254,4 +1254,46 @@ bool vb2_request_object_is_buffer(struct media_request_object *obj);
>   */
>  unsigned int vb2_request_buffer_cnt(struct media_request *req);
>  
> +/* Container for pinned pfns / pages in frame_vector.c */
> +struct frame_vector {
> +	unsigned int nr_allocated;	/* Number of frames we have space for */
> +	unsigned int nr_frames;	/* Number of frames stored in ptrs array */
> +	bool got_ref;		/* Did we pin pages by getting page ref? */
> +	bool is_pfns;		/* Does array contain pages or pfns? */
> +	void *ptrs[];		/* Array of pinned pfns / pages. Use
> +				 * pfns_vector_pages() or pfns_vector_pfns()
> +				 * for access */
> +};
> +
> +struct frame_vector *frame_vector_create(unsigned int nr_frames);
> +void frame_vector_destroy(struct frame_vector *vec);
> +int get_vaddr_frames(unsigned long start, unsigned int nr_pfns,
> +		     unsigned int gup_flags, struct frame_vector *vec);
> +void put_vaddr_frames(struct frame_vector *vec);
> +int frame_vector_to_pages(struct frame_vector *vec);
> +void frame_vector_to_pfns(struct frame_vector *vec);
> +
> +static inline unsigned int frame_vector_count(struct frame_vector *vec)
> +{
> +	return vec->nr_frames;
> +}
> +
> +static inline struct page **frame_vector_pages(struct frame_vector *vec)
> +{
> +	if (vec->is_pfns) {
> +		int err = frame_vector_to_pages(vec);
> +
> +		if (err)
> +			return ERR_PTR(err);
> +	}
> +	return (struct page **)(vec->ptrs);
> +}
> +
> +static inline unsigned long *frame_vector_pfns(struct frame_vector *vec)
> +{
> +	if (!vec->is_pfns)
> +		frame_vector_to_pfns(vec);
> +	return (unsigned long *)(vec->ptrs);
> +}
> +
>  #endif /* _MEDIA_VIDEOBUF2_CORE_H */

Please place those into a include/media/frame_vector.h file, instead of
merging it directly at vb2 core header.

Then include the new header at videobuf2-core.h and at frame_vector.c.

With such changes:

Acked-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

> diff --git a/mm/Kconfig b/mm/Kconfig
> index 6c974888f86f..da6c943fe9f1 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -815,9 +815,6 @@ config DEVICE_PRIVATE
>  	  memory; i.e., memory that is only accessible from the device (or
>  	  group of devices). You likely also want to select HMM_MIRROR.
>  
> -config FRAME_VECTOR
> -	bool
> -
>  config ARCH_USES_HIGH_VMA_FLAGS
>  	bool
>  config ARCH_HAS_PKEYS
> diff --git a/mm/Makefile b/mm/Makefile
> index d5649f1c12c0..a025fd6c6afd 100644
> --- a/mm/Makefile
> +++ b/mm/Makefile
> @@ -111,7 +111,6 @@ obj-$(CONFIG_PAGE_EXTENSION) += page_ext.o
>  obj-$(CONFIG_CMA_DEBUGFS) += cma_debug.o
>  obj-$(CONFIG_USERFAULTFD) += userfaultfd.o
>  obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o
> -obj-$(CONFIG_FRAME_VECTOR) += frame_vector.o
>  obj-$(CONFIG_DEBUG_PAGE_REF) += debug_page_ref.o
>  obj-$(CONFIG_HARDENED_USERCOPY) += usercopy.o
>  obj-$(CONFIG_PERCPU_STATS) += percpu-stats.o



Thanks,
Mauro

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 06/17] media: videobuf2: Move frame_vector into media subsystem
@ 2020-10-09 10:14     ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 214+ messages in thread
From: Mauro Carvalho Chehab @ 2020-10-09 10:14 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Jérôme Glisse, linux-s390, linux-samsung-soc, Jan Kara,
	Pawel Osciak, kvm, Jason Gunthorpe, John Hubbard, LKML,
	DRI Development, Tomasz Figa, linux-mm, Kyungmin Park,
	Daniel Vetter, Andrew Morton, Marek Szyprowski, Dan Williams,
	linux-arm-kernel, linux-media

Em Fri,  9 Oct 2020 09:59:23 +0200
Daniel Vetter <daniel.vetter@ffwll.ch> escreveu:

> It's the only user. This also garbage collects the CONFIG_FRAME_VECTOR
> symbol from all over the tree (well just one place, somehow omap media
> driver still had this in its Kconfig, despite not using it).
> 
> Reviewed-by: John Hubbard <jhubbard@nvidia.com>
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Pawel Osciak <pawel@osciak.com>
> Cc: Marek Szyprowski <m.szyprowski@samsung.com>
> Cc: Kyungmin Park <kyungmin.park@samsung.com>
> Cc: Tomasz Figa <tfiga@chromium.org>
> Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: linux-mm@kvack.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-samsung-soc@vger.kernel.org
> Cc: linux-media@vger.kernel.org
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> ---
>  drivers/media/common/videobuf2/Kconfig        |  1 -
>  drivers/media/common/videobuf2/Makefile       |  1 +
>  .../media/common/videobuf2}/frame_vector.c    |  2 +
>  drivers/media/platform/omap/Kconfig           |  1 -
>  include/linux/mm.h                            | 42 -------------------
>  include/media/videobuf2-core.h                | 42 +++++++++++++++++++
>  mm/Kconfig                                    |  3 --
>  mm/Makefile                                   |  1 -
>  8 files changed, 45 insertions(+), 48 deletions(-)
>  rename {mm => drivers/media/common/videobuf2}/frame_vector.c (99%)
> 
> diff --git a/drivers/media/common/videobuf2/Kconfig b/drivers/media/common/videobuf2/Kconfig
> index edbc99ebba87..d2223a12c95f 100644
> --- a/drivers/media/common/videobuf2/Kconfig
> +++ b/drivers/media/common/videobuf2/Kconfig
> @@ -9,7 +9,6 @@ config VIDEOBUF2_V4L2
>  
>  config VIDEOBUF2_MEMOPS
>  	tristate
> -	select FRAME_VECTOR
>  
>  config VIDEOBUF2_DMA_CONTIG
>  	tristate
> diff --git a/drivers/media/common/videobuf2/Makefile b/drivers/media/common/videobuf2/Makefile
> index 77bebe8b202f..54306f8d096c 100644
> --- a/drivers/media/common/videobuf2/Makefile
> +++ b/drivers/media/common/videobuf2/Makefile
> @@ -1,5 +1,6 @@
>  # SPDX-License-Identifier: GPL-2.0
>  videobuf2-common-objs := videobuf2-core.o
> +videobuf2-common-objs += frame_vector.o
>  
>  ifeq ($(CONFIG_TRACEPOINTS),y)
>    videobuf2-common-objs += vb2-trace.o
> diff --git a/mm/frame_vector.c b/drivers/media/common/videobuf2/frame_vector.c
> similarity index 99%
> rename from mm/frame_vector.c
> rename to drivers/media/common/videobuf2/frame_vector.c
> index d44779e56313..2b0b97761d15 100644
> --- a/mm/frame_vector.c
> +++ b/drivers/media/common/videobuf2/frame_vector.c
> @@ -8,6 +8,8 @@
>  #include <linux/pagemap.h>
>  #include <linux/sched.h>
>  
> +#include <media/videobuf2-core.h>
> +

See my comment below...

>  /**
>   * get_vaddr_frames() - map virtual addresses to pfns
>   * @start:	starting user address
> diff --git a/drivers/media/platform/omap/Kconfig b/drivers/media/platform/omap/Kconfig
> index f73b5893220d..de16de46c0f4 100644
> --- a/drivers/media/platform/omap/Kconfig
> +++ b/drivers/media/platform/omap/Kconfig
> @@ -12,6 +12,5 @@ config VIDEO_OMAP2_VOUT
>  	depends on VIDEO_V4L2
>  	select VIDEOBUF2_DMA_CONTIG
>  	select OMAP2_VRFB if ARCH_OMAP2 || ARCH_OMAP3
> -	select FRAME_VECTOR
>  	help
>  	  V4L2 Display driver support for OMAP2/3 based boards.
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 16b799a0522c..acd60fbf1a5a 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1743,48 +1743,6 @@ int account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc);
>  int __account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc,
>  			struct task_struct *task, bool bypass_rlim);
>  
> -/* Container for pinned pfns / pages */
> -struct frame_vector {
> -	unsigned int nr_allocated;	/* Number of frames we have space for */
> -	unsigned int nr_frames;	/* Number of frames stored in ptrs array */
> -	bool got_ref;		/* Did we pin pages by getting page ref? */
> -	bool is_pfns;		/* Does array contain pages or pfns? */
> -	void *ptrs[];		/* Array of pinned pfns / pages. Use
> -				 * pfns_vector_pages() or pfns_vector_pfns()
> -				 * for access */
> -};
> -
> -struct frame_vector *frame_vector_create(unsigned int nr_frames);
> -void frame_vector_destroy(struct frame_vector *vec);
> -int get_vaddr_frames(unsigned long start, unsigned int nr_pfns,
> -		     unsigned int gup_flags, struct frame_vector *vec);
> -void put_vaddr_frames(struct frame_vector *vec);
> -int frame_vector_to_pages(struct frame_vector *vec);
> -void frame_vector_to_pfns(struct frame_vector *vec);
> -
> -static inline unsigned int frame_vector_count(struct frame_vector *vec)
> -{
> -	return vec->nr_frames;
> -}
> -
> -static inline struct page **frame_vector_pages(struct frame_vector *vec)
> -{
> -	if (vec->is_pfns) {
> -		int err = frame_vector_to_pages(vec);
> -
> -		if (err)
> -			return ERR_PTR(err);
> -	}
> -	return (struct page **)(vec->ptrs);
> -}
> -
> -static inline unsigned long *frame_vector_pfns(struct frame_vector *vec)
> -{
> -	if (!vec->is_pfns)
> -		frame_vector_to_pfns(vec);
> -	return (unsigned long *)(vec->ptrs);
> -}
> -
>  struct kvec;
>  int get_kernel_pages(const struct kvec *iov, int nr_pages, int write,
>  			struct page **pages);
> diff --git a/include/media/videobuf2-core.h b/include/media/videobuf2-core.h
> index bbb3f26fbde9..a2e75ca0334f 100644
> --- a/include/media/videobuf2-core.h
> +++ b/include/media/videobuf2-core.h
> @@ -1254,4 +1254,46 @@ bool vb2_request_object_is_buffer(struct media_request_object *obj);
>   */
>  unsigned int vb2_request_buffer_cnt(struct media_request *req);
>  
> +/* Container for pinned pfns / pages in frame_vector.c */
> +struct frame_vector {
> +	unsigned int nr_allocated;	/* Number of frames we have space for */
> +	unsigned int nr_frames;	/* Number of frames stored in ptrs array */
> +	bool got_ref;		/* Did we pin pages by getting page ref? */
> +	bool is_pfns;		/* Does array contain pages or pfns? */
> +	void *ptrs[];		/* Array of pinned pfns / pages. Use
> +				 * pfns_vector_pages() or pfns_vector_pfns()
> +				 * for access */
> +};
> +
> +struct frame_vector *frame_vector_create(unsigned int nr_frames);
> +void frame_vector_destroy(struct frame_vector *vec);
> +int get_vaddr_frames(unsigned long start, unsigned int nr_pfns,
> +		     unsigned int gup_flags, struct frame_vector *vec);
> +void put_vaddr_frames(struct frame_vector *vec);
> +int frame_vector_to_pages(struct frame_vector *vec);
> +void frame_vector_to_pfns(struct frame_vector *vec);
> +
> +static inline unsigned int frame_vector_count(struct frame_vector *vec)
> +{
> +	return vec->nr_frames;
> +}
> +
> +static inline struct page **frame_vector_pages(struct frame_vector *vec)
> +{
> +	if (vec->is_pfns) {
> +		int err = frame_vector_to_pages(vec);
> +
> +		if (err)
> +			return ERR_PTR(err);
> +	}
> +	return (struct page **)(vec->ptrs);
> +}
> +
> +static inline unsigned long *frame_vector_pfns(struct frame_vector *vec)
> +{
> +	if (!vec->is_pfns)
> +		frame_vector_to_pfns(vec);
> +	return (unsigned long *)(vec->ptrs);
> +}
> +
>  #endif /* _MEDIA_VIDEOBUF2_CORE_H */

Please place those into a include/media/frame_vector.h file, instead of
merging it directly at vb2 core header.

Then include the new header at videobuf2-core.h and at frame_vector.c.

With such changes:

Acked-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

> diff --git a/mm/Kconfig b/mm/Kconfig
> index 6c974888f86f..da6c943fe9f1 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -815,9 +815,6 @@ config DEVICE_PRIVATE
>  	  memory; i.e., memory that is only accessible from the device (or
>  	  group of devices). You likely also want to select HMM_MIRROR.
>  
> -config FRAME_VECTOR
> -	bool
> -
>  config ARCH_USES_HIGH_VMA_FLAGS
>  	bool
>  config ARCH_HAS_PKEYS
> diff --git a/mm/Makefile b/mm/Makefile
> index d5649f1c12c0..a025fd6c6afd 100644
> --- a/mm/Makefile
> +++ b/mm/Makefile
> @@ -111,7 +111,6 @@ obj-$(CONFIG_PAGE_EXTENSION) += page_ext.o
>  obj-$(CONFIG_CMA_DEBUGFS) += cma_debug.o
>  obj-$(CONFIG_USERFAULTFD) += userfaultfd.o
>  obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o
> -obj-$(CONFIG_FRAME_VECTOR) += frame_vector.o
>  obj-$(CONFIG_DEBUG_PAGE_REF) += debug_page_ref.o
>  obj-$(CONFIG_HARDENED_USERCOPY) += usercopy.o
>  obj-$(CONFIG_PERCPU_STATS) += percpu-stats.o



Thanks,
Mauro
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 09/17] mm: Add unsafe_follow_pfn
  2020-10-09  7:59   ` Daniel Vetter
  (?)
@ 2020-10-09 10:34     ` Mauro Carvalho Chehab
  -1 siblings, 0 replies; 214+ messages in thread
From: Mauro Carvalho Chehab @ 2020-10-09 10:34 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: DRI Development, LKML, kvm, linux-mm, linux-arm-kernel,
	linux-samsung-soc, linux-media, linux-s390, Daniel Vetter,
	Jason Gunthorpe, Kees Cook, Dan Williams, Andrew Morton,
	John Hubbard, Jérôme Glisse, Jan Kara, Linus Torvalds

Hi,

Em Fri,  9 Oct 2020 09:59:26 +0200
Daniel Vetter <daniel.vetter@ffwll.ch> escreveu:

> Way back it was a reasonable assumptions that iomem mappings never
> change the pfn range they point at. But this has changed:
> 
> - gpu drivers dynamically manage their memory nowadays, invalidating
> ptes with unmap_mapping_range when buffers get moved
> 
> - contiguous dma allocations have moved from dedicated carvetouts to
> cma regions. This means if we miss the unmap the pfn might contain
> pagecache or anon memory (well anything allocated with GFP_MOVEABLE)
> 
> - even /dev/mem now invalidates mappings when the kernel requests that
> iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
> ("/dev/mem: Revoke mappings when a driver claims the region")
> 
> Accessing pfns obtained from ptes without holding all the locks is
> therefore no longer a good idea.
> 
> Unfortunately there's some users where this is not fixable (like v4l
> userptr of iomem mappings) or involves a pile of work (vfio type1
> iommu). For now annotate these as unsafe and splat appropriately.
> 
> This patch adds an unsafe_follow_pfn, which later patches will then
> roll out to all appropriate places.

NACK, as this breaks an existing userspace API on media.

While I agree that using the userptr on media is something that
new drivers may not support, as DMABUF is a better way of
handling it, changing this for existing ones is a big no, 
as it may break usersapace.

The right approach here would be to be able to fine-tune
support for it on a per-driver basis, e. g. disabling such
feature only for drivers that would use a movable page.

The media subsystem has already a way to disable USERPTR
support from VB2. the right approach would be to ensure
that newer drivers will only set this if they won't use
movable pages.

Regards,
Mauro

> 
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: linux-mm@kvack.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-samsung-soc@vger.kernel.org
> Cc: linux-media@vger.kernel.org
> Cc: kvm@vger.kernel.org
> ---
>  include/linux/mm.h |  2 ++
>  mm/memory.c        | 32 +++++++++++++++++++++++++++++++-
>  mm/nommu.c         | 17 +++++++++++++++++
>  security/Kconfig   | 13 +++++++++++++
>  4 files changed, 63 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 2a16631c1fda..ec8c90928fc9 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1653,6 +1653,8 @@ int follow_pte_pmd(struct mm_struct *mm, unsigned long address,
>  		   pte_t **ptepp, pmd_t **pmdpp, spinlock_t **ptlp);
>  int follow_pfn(struct vm_area_struct *vma, unsigned long address,
>  	unsigned long *pfn);
> +int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address,
> +		      unsigned long *pfn);
>  int follow_phys(struct vm_area_struct *vma, unsigned long address,
>  		unsigned int flags, unsigned long *prot, resource_size_t *phys);
>  int generic_access_phys(struct vm_area_struct *vma, unsigned long addr,
> diff --git a/mm/memory.c b/mm/memory.c
> index f7cbc4dde0ef..7c7b234ffb24 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4821,7 +4821,12 @@ EXPORT_SYMBOL(follow_pte_pmd);
>   * @address: user virtual address
>   * @pfn: location to store found PFN
>   *
> - * Only IO mappings and raw PFN mappings are allowed.
> + * Only IO mappings and raw PFN mappings are allowed. Note that callers must
> + * ensure coherency with pte updates by using a &mmu_notifier to follow updates.
> + * If this is not feasible, or the access to the @pfn is only very short term,
> + * use follow_pte_pmd() instead and hold the pagetable lock for the duration of
> + * the access instead. Any caller not following these requirements must use
> + * unsafe_follow_pfn() instead.
>   *
>   * Return: zero and the pfn at @pfn on success, -ve otherwise.
>   */
> @@ -4844,6 +4849,31 @@ int follow_pfn(struct vm_area_struct *vma, unsigned long address,
>  }
>  EXPORT_SYMBOL(follow_pfn);
>  
> +/**
> + * unsafe_follow_pfn - look up PFN at a user virtual address
> + * @vma: memory mapping
> + * @address: user virtual address
> + * @pfn: location to store found PFN
> + *
> + * Only IO mappings and raw PFN mappings are allowed.
> + *
> + * Returns zero and the pfn at @pfn on success, -ve otherwise.
> + */
> +int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address,
> +	unsigned long *pfn)
> +{
> +#ifdef CONFIG_STRICT_FOLLOW_PFN
> +	pr_info("unsafe follow_pfn usage rejected, see CONFIG_STRICT_FOLLOW_PFN\n");
> +	return -EINVAL;
> +#else
> +	WARN_ONCE(1, "unsafe follow_pfn usage\n");
> +	add_taint(TAINT_USER, LOCKDEP_STILL_OK);
> +
> +	return follow_pfn(vma, address, pfn);
> +#endif
> +}
> +EXPORT_SYMBOL(unsafe_follow_pfn);
> +
>  #ifdef CONFIG_HAVE_IOREMAP_PROT
>  int follow_phys(struct vm_area_struct *vma,
>  		unsigned long address, unsigned int flags,
> diff --git a/mm/nommu.c b/mm/nommu.c
> index 75a327149af1..3db2910f0d64 100644
> --- a/mm/nommu.c
> +++ b/mm/nommu.c
> @@ -132,6 +132,23 @@ int follow_pfn(struct vm_area_struct *vma, unsigned long address,
>  }
>  EXPORT_SYMBOL(follow_pfn);
>  
> +/**
> + * unsafe_follow_pfn - look up PFN at a user virtual address
> + * @vma: memory mapping
> + * @address: user virtual address
> + * @pfn: location to store found PFN
> + *
> + * Only IO mappings and raw PFN mappings are allowed.
> + *
> + * Returns zero and the pfn at @pfn on success, -ve otherwise.
> + */
> +int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address,
> +	unsigned long *pfn)
> +{
> +	return follow_pfn(vma, address, pfn);
> +}
> +EXPORT_SYMBOL(unsafe_follow_pfn);
> +
>  LIST_HEAD(vmap_area_list);
>  
>  void vfree(const void *addr)
> diff --git a/security/Kconfig b/security/Kconfig
> index 7561f6f99f1d..48945402e103 100644
> --- a/security/Kconfig
> +++ b/security/Kconfig
> @@ -230,6 +230,19 @@ config STATIC_USERMODEHELPER_PATH
>  	  If you wish for all usermode helper programs to be disabled,
>  	  specify an empty string here (i.e. "").
>  
> +config STRICT_FOLLOW_PFN
> +	bool "Disable unsafe use of follow_pfn"
> +	depends on MMU
> +	help
> +	  Some functionality in the kernel follows userspace mappings to iomem
> +	  ranges in an unsafe matter. Examples include v4l userptr for zero-copy
> +	  buffers sharing.
> +
> +	  If this option is switched on, such access is rejected. Only enable
> +	  this option when you must run userspace which requires this.
> +
> +	  If in doubt, say Y.
> +
>  source "security/selinux/Kconfig"
>  source "security/smack/Kconfig"
>  source "security/tomoyo/Kconfig"



Thanks,
Mauro

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 09/17] mm: Add unsafe_follow_pfn
@ 2020-10-09 10:34     ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 214+ messages in thread
From: Mauro Carvalho Chehab @ 2020-10-09 10:34 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, kvm,
	Jason Gunthorpe, John Hubbard, LKML, DRI Development, linux-mm,
	Jérôme Glisse, Daniel Vetter, Dan Williams,
	Linus Torvalds, Andrew Morton, linux-arm-kernel, linux-media

Hi,

Em Fri,  9 Oct 2020 09:59:26 +0200
Daniel Vetter <daniel.vetter@ffwll.ch> escreveu:

> Way back it was a reasonable assumptions that iomem mappings never
> change the pfn range they point at. But this has changed:
> 
> - gpu drivers dynamically manage their memory nowadays, invalidating
> ptes with unmap_mapping_range when buffers get moved
> 
> - contiguous dma allocations have moved from dedicated carvetouts to
> cma regions. This means if we miss the unmap the pfn might contain
> pagecache or anon memory (well anything allocated with GFP_MOVEABLE)
> 
> - even /dev/mem now invalidates mappings when the kernel requests that
> iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
> ("/dev/mem: Revoke mappings when a driver claims the region")
> 
> Accessing pfns obtained from ptes without holding all the locks is
> therefore no longer a good idea.
> 
> Unfortunately there's some users where this is not fixable (like v4l
> userptr of iomem mappings) or involves a pile of work (vfio type1
> iommu). For now annotate these as unsafe and splat appropriately.
> 
> This patch adds an unsafe_follow_pfn, which later patches will then
> roll out to all appropriate places.

NACK, as this breaks an existing userspace API on media.

While I agree that using the userptr on media is something that
new drivers may not support, as DMABUF is a better way of
handling it, changing this for existing ones is a big no, 
as it may break usersapace.

The right approach here would be to be able to fine-tune
support for it on a per-driver basis, e. g. disabling such
feature only for drivers that would use a movable page.

The media subsystem has already a way to disable USERPTR
support from VB2. the right approach would be to ensure
that newer drivers will only set this if they won't use
movable pages.

Regards,
Mauro

> 
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: linux-mm@kvack.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-samsung-soc@vger.kernel.org
> Cc: linux-media@vger.kernel.org
> Cc: kvm@vger.kernel.org
> ---
>  include/linux/mm.h |  2 ++
>  mm/memory.c        | 32 +++++++++++++++++++++++++++++++-
>  mm/nommu.c         | 17 +++++++++++++++++
>  security/Kconfig   | 13 +++++++++++++
>  4 files changed, 63 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 2a16631c1fda..ec8c90928fc9 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1653,6 +1653,8 @@ int follow_pte_pmd(struct mm_struct *mm, unsigned long address,
>  		   pte_t **ptepp, pmd_t **pmdpp, spinlock_t **ptlp);
>  int follow_pfn(struct vm_area_struct *vma, unsigned long address,
>  	unsigned long *pfn);
> +int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address,
> +		      unsigned long *pfn);
>  int follow_phys(struct vm_area_struct *vma, unsigned long address,
>  		unsigned int flags, unsigned long *prot, resource_size_t *phys);
>  int generic_access_phys(struct vm_area_struct *vma, unsigned long addr,
> diff --git a/mm/memory.c b/mm/memory.c
> index f7cbc4dde0ef..7c7b234ffb24 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4821,7 +4821,12 @@ EXPORT_SYMBOL(follow_pte_pmd);
>   * @address: user virtual address
>   * @pfn: location to store found PFN
>   *
> - * Only IO mappings and raw PFN mappings are allowed.
> + * Only IO mappings and raw PFN mappings are allowed. Note that callers must
> + * ensure coherency with pte updates by using a &mmu_notifier to follow updates.
> + * If this is not feasible, or the access to the @pfn is only very short term,
> + * use follow_pte_pmd() instead and hold the pagetable lock for the duration of
> + * the access instead. Any caller not following these requirements must use
> + * unsafe_follow_pfn() instead.
>   *
>   * Return: zero and the pfn at @pfn on success, -ve otherwise.
>   */
> @@ -4844,6 +4849,31 @@ int follow_pfn(struct vm_area_struct *vma, unsigned long address,
>  }
>  EXPORT_SYMBOL(follow_pfn);
>  
> +/**
> + * unsafe_follow_pfn - look up PFN at a user virtual address
> + * @vma: memory mapping
> + * @address: user virtual address
> + * @pfn: location to store found PFN
> + *
> + * Only IO mappings and raw PFN mappings are allowed.
> + *
> + * Returns zero and the pfn at @pfn on success, -ve otherwise.
> + */
> +int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address,
> +	unsigned long *pfn)
> +{
> +#ifdef CONFIG_STRICT_FOLLOW_PFN
> +	pr_info("unsafe follow_pfn usage rejected, see CONFIG_STRICT_FOLLOW_PFN\n");
> +	return -EINVAL;
> +#else
> +	WARN_ONCE(1, "unsafe follow_pfn usage\n");
> +	add_taint(TAINT_USER, LOCKDEP_STILL_OK);
> +
> +	return follow_pfn(vma, address, pfn);
> +#endif
> +}
> +EXPORT_SYMBOL(unsafe_follow_pfn);
> +
>  #ifdef CONFIG_HAVE_IOREMAP_PROT
>  int follow_phys(struct vm_area_struct *vma,
>  		unsigned long address, unsigned int flags,
> diff --git a/mm/nommu.c b/mm/nommu.c
> index 75a327149af1..3db2910f0d64 100644
> --- a/mm/nommu.c
> +++ b/mm/nommu.c
> @@ -132,6 +132,23 @@ int follow_pfn(struct vm_area_struct *vma, unsigned long address,
>  }
>  EXPORT_SYMBOL(follow_pfn);
>  
> +/**
> + * unsafe_follow_pfn - look up PFN at a user virtual address
> + * @vma: memory mapping
> + * @address: user virtual address
> + * @pfn: location to store found PFN
> + *
> + * Only IO mappings and raw PFN mappings are allowed.
> + *
> + * Returns zero and the pfn at @pfn on success, -ve otherwise.
> + */
> +int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address,
> +	unsigned long *pfn)
> +{
> +	return follow_pfn(vma, address, pfn);
> +}
> +EXPORT_SYMBOL(unsafe_follow_pfn);
> +
>  LIST_HEAD(vmap_area_list);
>  
>  void vfree(const void *addr)
> diff --git a/security/Kconfig b/security/Kconfig
> index 7561f6f99f1d..48945402e103 100644
> --- a/security/Kconfig
> +++ b/security/Kconfig
> @@ -230,6 +230,19 @@ config STATIC_USERMODEHELPER_PATH
>  	  If you wish for all usermode helper programs to be disabled,
>  	  specify an empty string here (i.e. "").
>  
> +config STRICT_FOLLOW_PFN
> +	bool "Disable unsafe use of follow_pfn"
> +	depends on MMU
> +	help
> +	  Some functionality in the kernel follows userspace mappings to iomem
> +	  ranges in an unsafe matter. Examples include v4l userptr for zero-copy
> +	  buffers sharing.
> +
> +	  If this option is switched on, such access is rejected. Only enable
> +	  this option when you must run userspace which requires this.
> +
> +	  If in doubt, say Y.
> +
>  source "security/selinux/Kconfig"
>  source "security/smack/Kconfig"
>  source "security/tomoyo/Kconfig"



Thanks,
Mauro

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 09/17] mm: Add unsafe_follow_pfn
@ 2020-10-09 10:34     ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 214+ messages in thread
From: Mauro Carvalho Chehab @ 2020-10-09 10:34 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, kvm,
	Jason Gunthorpe, John Hubbard, LKML, DRI Development, linux-mm,
	Jérôme Glisse, Daniel Vetter, Dan Williams,
	Linus Torvalds, Andrew Morton, linux-arm-kernel, linux-media

Hi,

Em Fri,  9 Oct 2020 09:59:26 +0200
Daniel Vetter <daniel.vetter@ffwll.ch> escreveu:

> Way back it was a reasonable assumptions that iomem mappings never
> change the pfn range they point at. But this has changed:
> 
> - gpu drivers dynamically manage their memory nowadays, invalidating
> ptes with unmap_mapping_range when buffers get moved
> 
> - contiguous dma allocations have moved from dedicated carvetouts to
> cma regions. This means if we miss the unmap the pfn might contain
> pagecache or anon memory (well anything allocated with GFP_MOVEABLE)
> 
> - even /dev/mem now invalidates mappings when the kernel requests that
> iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
> ("/dev/mem: Revoke mappings when a driver claims the region")
> 
> Accessing pfns obtained from ptes without holding all the locks is
> therefore no longer a good idea.
> 
> Unfortunately there's some users where this is not fixable (like v4l
> userptr of iomem mappings) or involves a pile of work (vfio type1
> iommu). For now annotate these as unsafe and splat appropriately.
> 
> This patch adds an unsafe_follow_pfn, which later patches will then
> roll out to all appropriate places.

NACK, as this breaks an existing userspace API on media.

While I agree that using the userptr on media is something that
new drivers may not support, as DMABUF is a better way of
handling it, changing this for existing ones is a big no, 
as it may break usersapace.

The right approach here would be to be able to fine-tune
support for it on a per-driver basis, e. g. disabling such
feature only for drivers that would use a movable page.

The media subsystem has already a way to disable USERPTR
support from VB2. the right approach would be to ensure
that newer drivers will only set this if they won't use
movable pages.

Regards,
Mauro

> 
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: linux-mm@kvack.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-samsung-soc@vger.kernel.org
> Cc: linux-media@vger.kernel.org
> Cc: kvm@vger.kernel.org
> ---
>  include/linux/mm.h |  2 ++
>  mm/memory.c        | 32 +++++++++++++++++++++++++++++++-
>  mm/nommu.c         | 17 +++++++++++++++++
>  security/Kconfig   | 13 +++++++++++++
>  4 files changed, 63 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 2a16631c1fda..ec8c90928fc9 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1653,6 +1653,8 @@ int follow_pte_pmd(struct mm_struct *mm, unsigned long address,
>  		   pte_t **ptepp, pmd_t **pmdpp, spinlock_t **ptlp);
>  int follow_pfn(struct vm_area_struct *vma, unsigned long address,
>  	unsigned long *pfn);
> +int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address,
> +		      unsigned long *pfn);
>  int follow_phys(struct vm_area_struct *vma, unsigned long address,
>  		unsigned int flags, unsigned long *prot, resource_size_t *phys);
>  int generic_access_phys(struct vm_area_struct *vma, unsigned long addr,
> diff --git a/mm/memory.c b/mm/memory.c
> index f7cbc4dde0ef..7c7b234ffb24 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4821,7 +4821,12 @@ EXPORT_SYMBOL(follow_pte_pmd);
>   * @address: user virtual address
>   * @pfn: location to store found PFN
>   *
> - * Only IO mappings and raw PFN mappings are allowed.
> + * Only IO mappings and raw PFN mappings are allowed. Note that callers must
> + * ensure coherency with pte updates by using a &mmu_notifier to follow updates.
> + * If this is not feasible, or the access to the @pfn is only very short term,
> + * use follow_pte_pmd() instead and hold the pagetable lock for the duration of
> + * the access instead. Any caller not following these requirements must use
> + * unsafe_follow_pfn() instead.
>   *
>   * Return: zero and the pfn at @pfn on success, -ve otherwise.
>   */
> @@ -4844,6 +4849,31 @@ int follow_pfn(struct vm_area_struct *vma, unsigned long address,
>  }
>  EXPORT_SYMBOL(follow_pfn);
>  
> +/**
> + * unsafe_follow_pfn - look up PFN at a user virtual address
> + * @vma: memory mapping
> + * @address: user virtual address
> + * @pfn: location to store found PFN
> + *
> + * Only IO mappings and raw PFN mappings are allowed.
> + *
> + * Returns zero and the pfn at @pfn on success, -ve otherwise.
> + */
> +int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address,
> +	unsigned long *pfn)
> +{
> +#ifdef CONFIG_STRICT_FOLLOW_PFN
> +	pr_info("unsafe follow_pfn usage rejected, see CONFIG_STRICT_FOLLOW_PFN\n");
> +	return -EINVAL;
> +#else
> +	WARN_ONCE(1, "unsafe follow_pfn usage\n");
> +	add_taint(TAINT_USER, LOCKDEP_STILL_OK);
> +
> +	return follow_pfn(vma, address, pfn);
> +#endif
> +}
> +EXPORT_SYMBOL(unsafe_follow_pfn);
> +
>  #ifdef CONFIG_HAVE_IOREMAP_PROT
>  int follow_phys(struct vm_area_struct *vma,
>  		unsigned long address, unsigned int flags,
> diff --git a/mm/nommu.c b/mm/nommu.c
> index 75a327149af1..3db2910f0d64 100644
> --- a/mm/nommu.c
> +++ b/mm/nommu.c
> @@ -132,6 +132,23 @@ int follow_pfn(struct vm_area_struct *vma, unsigned long address,
>  }
>  EXPORT_SYMBOL(follow_pfn);
>  
> +/**
> + * unsafe_follow_pfn - look up PFN at a user virtual address
> + * @vma: memory mapping
> + * @address: user virtual address
> + * @pfn: location to store found PFN
> + *
> + * Only IO mappings and raw PFN mappings are allowed.
> + *
> + * Returns zero and the pfn at @pfn on success, -ve otherwise.
> + */
> +int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address,
> +	unsigned long *pfn)
> +{
> +	return follow_pfn(vma, address, pfn);
> +}
> +EXPORT_SYMBOL(unsafe_follow_pfn);
> +
>  LIST_HEAD(vmap_area_list);
>  
>  void vfree(const void *addr)
> diff --git a/security/Kconfig b/security/Kconfig
> index 7561f6f99f1d..48945402e103 100644
> --- a/security/Kconfig
> +++ b/security/Kconfig
> @@ -230,6 +230,19 @@ config STATIC_USERMODEHELPER_PATH
>  	  If you wish for all usermode helper programs to be disabled,
>  	  specify an empty string here (i.e. "").
>  
> +config STRICT_FOLLOW_PFN
> +	bool "Disable unsafe use of follow_pfn"
> +	depends on MMU
> +	help
> +	  Some functionality in the kernel follows userspace mappings to iomem
> +	  ranges in an unsafe matter. Examples include v4l userptr for zero-copy
> +	  buffers sharing.
> +
> +	  If this option is switched on, such access is rejected. Only enable
> +	  this option when you must run userspace which requires this.
> +
> +	  If in doubt, say Y.
> +
>  source "security/selinux/Kconfig"
>  source "security/smack/Kconfig"
>  source "security/tomoyo/Kconfig"



Thanks,
Mauro
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 17/17] drm/i915: Properly request PCI BARs
  2020-10-09 10:01       ` Daniel Vetter
  (?)
  (?)
@ 2020-10-09 10:41         ` Ville Syrjälä
  -1 siblings, 0 replies; 214+ messages in thread
From: Ville Syrjälä @ 2020-10-09 10:41 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: DRI Development, LKML, linux-s390, linux-samsung-soc, Jan Kara,
	Kees Cook, KVM list, Jason Gunthorpe, Linux PCI, Linux MM,
	Jérôme Glisse, John Hubbard, Bjorn Helgaas,
	Daniel Vetter, Dan Williams, Andrew Morton, Linux ARM,
	open list:DMA BUFFER SHARING FRAMEWORK

On Fri, Oct 09, 2020 at 12:01:39PM +0200, Daniel Vetter wrote:
> On Fri, Oct 9, 2020 at 11:47 AM Ville Syrjälä
> <ville.syrjala@linux.intel.com> wrote:
> >
> > On Fri, Oct 09, 2020 at 09:59:34AM +0200, Daniel Vetter wrote:
> > > When trying to test my CONFIG_IO_STRICT_DEVMEM changes I realized they
> > > do nothing for i915. Because i915 doesn't request any regions, like
> > > pretty much all drm pci drivers. I guess this is some very old
> > > remnants from the userspace modesetting days, when we wanted to
> > > co-exist with the fbdev driver. Which usually requested these
> > > resources.
> > >
> > > But makes me wonder why the pci subsystem doesn't just request
> > > resource automatically when we map a bar and a pci driver is bound?
> > >
> > > Knowledge about which pci bars we need kludged together from
> > > intel_uncore.c and intel_gtt.c from i915 and intel-gtt.c over in the
> > > fake agp driver.
> > >
> > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > > Cc: Jason Gunthorpe <jgg@ziepe.ca>
> > > Cc: Kees Cook <keescook@chromium.org>
> > > Cc: Dan Williams <dan.j.williams@intel.com>
> > > Cc: Andrew Morton <akpm@linux-foundation.org>
> > > Cc: John Hubbard <jhubbard@nvidia.com>
> > > Cc: Jérôme Glisse <jglisse@redhat.com>
> > > Cc: Jan Kara <jack@suse.cz>
> > > Cc: Dan Williams <dan.j.williams@intel.com>
> > > Cc: linux-mm@kvack.org
> > > Cc: linux-arm-kernel@lists.infradead.org
> > > Cc: linux-samsung-soc@vger.kernel.org
> > > Cc: linux-media@vger.kernel.org
> > > Cc: Bjorn Helgaas <bhelgaas@google.com>
> > > Cc: linux-pci@vger.kernel.org
> > > ---
> > >  drivers/gpu/drm/i915/intel_uncore.c | 25 +++++++++++++++++++++++--
> > >  1 file changed, 23 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
> > > index 54e201fdeba4..ce39049d8919 100644
> > > --- a/drivers/gpu/drm/i915/intel_uncore.c
> > > +++ b/drivers/gpu/drm/i915/intel_uncore.c
> > > @@ -1692,10 +1692,13 @@ static int uncore_mmio_setup(struct intel_uncore *uncore)
> > >       struct pci_dev *pdev = i915->drm.pdev;
> > >       int mmio_bar;
> > >       int mmio_size;
> > > +     int bar_selection;
> >
> > Signed bitmasks always make me uneasy. But looks like
> > that's what it is in the pci api. So meh.
> 
> Yeah it's surprising.
> 
> > > +     int ret;
> > >
> > >       mmio_bar = IS_GEN(i915, 2) ? 1 : 0;
> > > +     bar_selection = BIT (2) | BIT(mmio_bar);
> >                            ^
> > spurious space
> >
> > That's also not correct for gen2 I think.
> >
> > gen2:
> > 0 = GMADR
> > 1 = MMADR
> > 2 = IOBAR
> >
> > gen3:
> > 0 = MMADR
> > 1 = IOBAR
> > 2 = GMADR
> > 3 = GTTADR
> >
> > gen4+:
> > 0+1 = GTTMMADR
> > 2+3 = GMADR
> > 4 = IOBAR
> >
> > Maybe we should just have an explicit list of bars like that in a
> > comment?
> >
> > I'd also suggest sucking this bitmask calculation into a small helper
> > so you can reuse it for the release.
> 
> tbh I just hacked this up for testing. Given how almost no other drm
> driver does this, I'm wondering whether we should or not.
> 
> Also the only reason why I didn't just use the pci_request_regions
> helper is to avoid the vga ioport range, since that's managed by
> vgaarbiter.

VGA io range isn't part of any bar. Or do you mean just the io decode
enable bit in the pci command register? That should be just a matter
or pci_enable_device() vs. pci_enable_device_mem() I think. So nothing
to do with which bars we've requested IIRC.

> 
> So I think if we go for this for real we should:
> - register the vga ioport range in the vgaarbiter
> - have a pci_request_iomem_regions helper that grabs all mem bars
> - roll that out to all drm pci drivers
> 
> Or something like that. The other complication is when we resize the
> iobar. So not really sure what to do here.

We resize it?

-- 
Ville Syrjälä
Intel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 17/17] drm/i915: Properly request PCI BARs
@ 2020-10-09 10:41         ` Ville Syrjälä
  0 siblings, 0 replies; 214+ messages in thread
From: Ville Syrjälä @ 2020-10-09 10:41 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: DRI Development, LKML, linux-s390, linux-samsung-soc, Jan Kara,
	Kees Cook, KVM list, Jason Gunthorpe, Linux PCI, Linux MM,
	Jérôme Glisse, John Hubbard, Bjorn Helgaas,
	Daniel Vetter, Dan Williams, Andrew Morton, Linux ARM,
	open list:DMA BUFFER SHARING FRAMEWORK

On Fri, Oct 09, 2020 at 12:01:39PM +0200, Daniel Vetter wrote:
> On Fri, Oct 9, 2020 at 11:47 AM Ville Syrj�l�
> <ville.syrjala@linux.intel.com> wrote:
> >
> > On Fri, Oct 09, 2020 at 09:59:34AM +0200, Daniel Vetter wrote:
> > > When trying to test my CONFIG_IO_STRICT_DEVMEM changes I realized they
> > > do nothing for i915. Because i915 doesn't request any regions, like
> > > pretty much all drm pci drivers. I guess this is some very old
> > > remnants from the userspace modesetting days, when we wanted to
> > > co-exist with the fbdev driver. Which usually requested these
> > > resources.
> > >
> > > But makes me wonder why the pci subsystem doesn't just request
> > > resource automatically when we map a bar and a pci driver is bound?
> > >
> > > Knowledge about which pci bars we need kludged together from
> > > intel_uncore.c and intel_gtt.c from i915 and intel-gtt.c over in the
> > > fake agp driver.
> > >
> > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > > Cc: Jason Gunthorpe <jgg@ziepe.ca>
> > > Cc: Kees Cook <keescook@chromium.org>
> > > Cc: Dan Williams <dan.j.williams@intel.com>
> > > Cc: Andrew Morton <akpm@linux-foundation.org>
> > > Cc: John Hubbard <jhubbard@nvidia.com>
> > > Cc: J�r�me Glisse <jglisse@redhat.com>
> > > Cc: Jan Kara <jack@suse.cz>
> > > Cc: Dan Williams <dan.j.williams@intel.com>
> > > Cc: linux-mm@kvack.org
> > > Cc: linux-arm-kernel@lists.infradead.org
> > > Cc: linux-samsung-soc@vger.kernel.org
> > > Cc: linux-media@vger.kernel.org
> > > Cc: Bjorn Helgaas <bhelgaas@google.com>
> > > Cc: linux-pci@vger.kernel.org
> > > ---
> > >  drivers/gpu/drm/i915/intel_uncore.c | 25 +++++++++++++++++++++++--
> > >  1 file changed, 23 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
> > > index 54e201fdeba4..ce39049d8919 100644
> > > --- a/drivers/gpu/drm/i915/intel_uncore.c
> > > +++ b/drivers/gpu/drm/i915/intel_uncore.c
> > > @@ -1692,10 +1692,13 @@ static int uncore_mmio_setup(struct intel_uncore *uncore)
> > >       struct pci_dev *pdev = i915->drm.pdev;
> > >       int mmio_bar;
> > >       int mmio_size;
> > > +     int bar_selection;
> >
> > Signed bitmasks always make me uneasy. But looks like
> > that's what it is in the pci api. So meh.
> 
> Yeah it's surprising.
> 
> > > +     int ret;
> > >
> > >       mmio_bar = IS_GEN(i915, 2) ? 1 : 0;
> > > +     bar_selection = BIT (2) | BIT(mmio_bar);
> >                            ^
> > spurious space
> >
> > That's also not correct for gen2 I think.
> >
> > gen2:
> > 0 = GMADR
> > 1 = MMADR
> > 2 = IOBAR
> >
> > gen3:
> > 0 = MMADR
> > 1 = IOBAR
> > 2 = GMADR
> > 3 = GTTADR
> >
> > gen4+:
> > 0+1 = GTTMMADR
> > 2+3 = GMADR
> > 4 = IOBAR
> >
> > Maybe we should just have an explicit list of bars like that in a
> > comment?
> >
> > I'd also suggest sucking this bitmask calculation into a small helper
> > so you can reuse it for the release.
> 
> tbh I just hacked this up for testing. Given how almost no other drm
> driver does this, I'm wondering whether we should or not.
> 
> Also the only reason why I didn't just use the pci_request_regions
> helper is to avoid the vga ioport range, since that's managed by
> vgaarbiter.

VGA io range isn't part of any bar. Or do you mean just the io decode
enable bit in the pci command register? That should be just a matter
or pci_enable_device() vs. pci_enable_device_mem() I think. So nothing
to do with which bars we've requested IIRC.

> 
> So I think if we go for this for real we should:
> - register the vga ioport range in the vgaarbiter
> - have a pci_request_iomem_regions helper that grabs all mem bars
> - roll that out to all drm pci drivers
> 
> Or something like that. The other complication is when we resize the
> iobar. So not really sure what to do here.

We resize it?

-- 
Ville Syrj�l�
Intel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 17/17] drm/i915: Properly request PCI BARs
@ 2020-10-09 10:41         ` Ville Syrjälä
  0 siblings, 0 replies; 214+ messages in thread
From: Ville Syrjälä @ 2020-10-09 10:41 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, KVM list,
	Linux MM, Linux PCI, LKML, DRI Development, Jason Gunthorpe,
	Jérôme Glisse, John Hubbard, Bjorn Helgaas,
	Daniel Vetter, Dan Williams, Andrew Morton, Linux ARM,
	open list:DMA BUFFER SHARING FRAMEWORK

On Fri, Oct 09, 2020 at 12:01:39PM +0200, Daniel Vetter wrote:
> On Fri, Oct 9, 2020 at 11:47 AM Ville Syrjälä
> <ville.syrjala@linux.intel.com> wrote:
> >
> > On Fri, Oct 09, 2020 at 09:59:34AM +0200, Daniel Vetter wrote:
> > > When trying to test my CONFIG_IO_STRICT_DEVMEM changes I realized they
> > > do nothing for i915. Because i915 doesn't request any regions, like
> > > pretty much all drm pci drivers. I guess this is some very old
> > > remnants from the userspace modesetting days, when we wanted to
> > > co-exist with the fbdev driver. Which usually requested these
> > > resources.
> > >
> > > But makes me wonder why the pci subsystem doesn't just request
> > > resource automatically when we map a bar and a pci driver is bound?
> > >
> > > Knowledge about which pci bars we need kludged together from
> > > intel_uncore.c and intel_gtt.c from i915 and intel-gtt.c over in the
> > > fake agp driver.
> > >
> > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > > Cc: Jason Gunthorpe <jgg@ziepe.ca>
> > > Cc: Kees Cook <keescook@chromium.org>
> > > Cc: Dan Williams <dan.j.williams@intel.com>
> > > Cc: Andrew Morton <akpm@linux-foundation.org>
> > > Cc: John Hubbard <jhubbard@nvidia.com>
> > > Cc: Jérôme Glisse <jglisse@redhat.com>
> > > Cc: Jan Kara <jack@suse.cz>
> > > Cc: Dan Williams <dan.j.williams@intel.com>
> > > Cc: linux-mm@kvack.org
> > > Cc: linux-arm-kernel@lists.infradead.org
> > > Cc: linux-samsung-soc@vger.kernel.org
> > > Cc: linux-media@vger.kernel.org
> > > Cc: Bjorn Helgaas <bhelgaas@google.com>
> > > Cc: linux-pci@vger.kernel.org
> > > ---
> > >  drivers/gpu/drm/i915/intel_uncore.c | 25 +++++++++++++++++++++++--
> > >  1 file changed, 23 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
> > > index 54e201fdeba4..ce39049d8919 100644
> > > --- a/drivers/gpu/drm/i915/intel_uncore.c
> > > +++ b/drivers/gpu/drm/i915/intel_uncore.c
> > > @@ -1692,10 +1692,13 @@ static int uncore_mmio_setup(struct intel_uncore *uncore)
> > >       struct pci_dev *pdev = i915->drm.pdev;
> > >       int mmio_bar;
> > >       int mmio_size;
> > > +     int bar_selection;
> >
> > Signed bitmasks always make me uneasy. But looks like
> > that's what it is in the pci api. So meh.
> 
> Yeah it's surprising.
> 
> > > +     int ret;
> > >
> > >       mmio_bar = IS_GEN(i915, 2) ? 1 : 0;
> > > +     bar_selection = BIT (2) | BIT(mmio_bar);
> >                            ^
> > spurious space
> >
> > That's also not correct for gen2 I think.
> >
> > gen2:
> > 0 = GMADR
> > 1 = MMADR
> > 2 = IOBAR
> >
> > gen3:
> > 0 = MMADR
> > 1 = IOBAR
> > 2 = GMADR
> > 3 = GTTADR
> >
> > gen4+:
> > 0+1 = GTTMMADR
> > 2+3 = GMADR
> > 4 = IOBAR
> >
> > Maybe we should just have an explicit list of bars like that in a
> > comment?
> >
> > I'd also suggest sucking this bitmask calculation into a small helper
> > so you can reuse it for the release.
> 
> tbh I just hacked this up for testing. Given how almost no other drm
> driver does this, I'm wondering whether we should or not.
> 
> Also the only reason why I didn't just use the pci_request_regions
> helper is to avoid the vga ioport range, since that's managed by
> vgaarbiter.

VGA io range isn't part of any bar. Or do you mean just the io decode
enable bit in the pci command register? That should be just a matter
or pci_enable_device() vs. pci_enable_device_mem() I think. So nothing
to do with which bars we've requested IIRC.

> 
> So I think if we go for this for real we should:
> - register the vga ioport range in the vgaarbiter
> - have a pci_request_iomem_regions helper that grabs all mem bars
> - roll that out to all drm pci drivers
> 
> Or something like that. The other complication is when we resize the
> iobar. So not really sure what to do here.

We resize it?

-- 
Ville Syrjälä
Intel

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 17/17] drm/i915: Properly request PCI BARs
@ 2020-10-09 10:41         ` Ville Syrjälä
  0 siblings, 0 replies; 214+ messages in thread
From: Ville Syrjälä @ 2020-10-09 10:41 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, KVM list,
	Linux MM, Linux PCI, LKML, DRI Development, Jason Gunthorpe,
	Jérôme Glisse, John Hubbard, Bjorn Helgaas,
	Daniel Vetter, Dan Williams, Andrew Morton, Linux ARM,
	open list:DMA BUFFER SHARING FRAMEWORK

On Fri, Oct 09, 2020 at 12:01:39PM +0200, Daniel Vetter wrote:
> On Fri, Oct 9, 2020 at 11:47 AM Ville Syrjälä
> <ville.syrjala@linux.intel.com> wrote:
> >
> > On Fri, Oct 09, 2020 at 09:59:34AM +0200, Daniel Vetter wrote:
> > > When trying to test my CONFIG_IO_STRICT_DEVMEM changes I realized they
> > > do nothing for i915. Because i915 doesn't request any regions, like
> > > pretty much all drm pci drivers. I guess this is some very old
> > > remnants from the userspace modesetting days, when we wanted to
> > > co-exist with the fbdev driver. Which usually requested these
> > > resources.
> > >
> > > But makes me wonder why the pci subsystem doesn't just request
> > > resource automatically when we map a bar and a pci driver is bound?
> > >
> > > Knowledge about which pci bars we need kludged together from
> > > intel_uncore.c and intel_gtt.c from i915 and intel-gtt.c over in the
> > > fake agp driver.
> > >
> > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > > Cc: Jason Gunthorpe <jgg@ziepe.ca>
> > > Cc: Kees Cook <keescook@chromium.org>
> > > Cc: Dan Williams <dan.j.williams@intel.com>
> > > Cc: Andrew Morton <akpm@linux-foundation.org>
> > > Cc: John Hubbard <jhubbard@nvidia.com>
> > > Cc: Jérôme Glisse <jglisse@redhat.com>
> > > Cc: Jan Kara <jack@suse.cz>
> > > Cc: Dan Williams <dan.j.williams@intel.com>
> > > Cc: linux-mm@kvack.org
> > > Cc: linux-arm-kernel@lists.infradead.org
> > > Cc: linux-samsung-soc@vger.kernel.org
> > > Cc: linux-media@vger.kernel.org
> > > Cc: Bjorn Helgaas <bhelgaas@google.com>
> > > Cc: linux-pci@vger.kernel.org
> > > ---
> > >  drivers/gpu/drm/i915/intel_uncore.c | 25 +++++++++++++++++++++++--
> > >  1 file changed, 23 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
> > > index 54e201fdeba4..ce39049d8919 100644
> > > --- a/drivers/gpu/drm/i915/intel_uncore.c
> > > +++ b/drivers/gpu/drm/i915/intel_uncore.c
> > > @@ -1692,10 +1692,13 @@ static int uncore_mmio_setup(struct intel_uncore *uncore)
> > >       struct pci_dev *pdev = i915->drm.pdev;
> > >       int mmio_bar;
> > >       int mmio_size;
> > > +     int bar_selection;
> >
> > Signed bitmasks always make me uneasy. But looks like
> > that's what it is in the pci api. So meh.
> 
> Yeah it's surprising.
> 
> > > +     int ret;
> > >
> > >       mmio_bar = IS_GEN(i915, 2) ? 1 : 0;
> > > +     bar_selection = BIT (2) | BIT(mmio_bar);
> >                            ^
> > spurious space
> >
> > That's also not correct for gen2 I think.
> >
> > gen2:
> > 0 = GMADR
> > 1 = MMADR
> > 2 = IOBAR
> >
> > gen3:
> > 0 = MMADR
> > 1 = IOBAR
> > 2 = GMADR
> > 3 = GTTADR
> >
> > gen4+:
> > 0+1 = GTTMMADR
> > 2+3 = GMADR
> > 4 = IOBAR
> >
> > Maybe we should just have an explicit list of bars like that in a
> > comment?
> >
> > I'd also suggest sucking this bitmask calculation into a small helper
> > so you can reuse it for the release.
> 
> tbh I just hacked this up for testing. Given how almost no other drm
> driver does this, I'm wondering whether we should or not.
> 
> Also the only reason why I didn't just use the pci_request_regions
> helper is to avoid the vga ioport range, since that's managed by
> vgaarbiter.

VGA io range isn't part of any bar. Or do you mean just the io decode
enable bit in the pci command register? That should be just a matter
or pci_enable_device() vs. pci_enable_device_mem() I think. So nothing
to do with which bars we've requested IIRC.

> 
> So I think if we go for this for real we should:
> - register the vga ioport range in the vgaarbiter
> - have a pci_request_iomem_regions helper that grabs all mem bars
> - roll that out to all drm pci drivers
> 
> Or something like that. The other complication is when we resize the
> iobar. So not really sure what to do here.

We resize it?

-- 
Ville Syrjälä
Intel
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 15/17] sysfs: Support zapping of binary attr mmaps
  2020-10-09  7:59   ` Daniel Vetter
  (?)
  (?)
@ 2020-10-09 10:58     ` Greg Kroah-Hartman
  -1 siblings, 0 replies; 214+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-09 10:58 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: DRI Development, LKML, kvm, linux-mm, linux-arm-kernel,
	linux-samsung-soc, linux-media, linux-s390, Daniel Vetter,
	Jason Gunthorpe, Kees Cook, Dan Williams, Andrew Morton,
	John Hubbard, Jérôme Glisse, Jan Kara, Bjorn Helgaas,
	linux-pci, Rafael J. Wysocki, Christian Brauner, David S. Miller,
	Michael Ellerman, Sourabh Jain, Mauro Carvalho Chehab,
	Nayna Jain

On Fri, Oct 09, 2020 at 09:59:32AM +0200, Daniel Vetter wrote:
> We want to be able to revoke pci mmaps so that the same access rules
> applies as for /dev/kmem. Revoke support for devmem was added in
> 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims the
> region").
> 
> The simplest way to achieve this is by having the same filp->f_mapping
> for all mappings, so that unmap_mapping_range can find them all, no
> matter through which file they've been created. Since this must be set
> at open time we need sysfs support for this.
> 
> Add an optional mapping parameter bin_attr, which is only consulted
> when there's also an mmap callback, since without mmap support
> allowing to adjust the ->f_mapping makes no sense.
> 
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: linux-mm@kvack.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-samsung-soc@vger.kernel.org
> Cc: linux-media@vger.kernel.org
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Cc: linux-pci@vger.kernel.org
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> Cc: Christian Brauner <christian.brauner@ubuntu.com>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Sourabh Jain <sourabhjain@linux.ibm.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> Cc: Nayna Jain <nayna@linux.ibm.com>
> ---
>  fs/sysfs/file.c       | 11 +++++++++++
>  include/linux/sysfs.h |  2 ++
>  2 files changed, 13 insertions(+)

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 15/17] sysfs: Support zapping of binary attr mmaps
@ 2020-10-09 10:58     ` Greg Kroah-Hartman
  0 siblings, 0 replies; 214+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-09 10:58 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: DRI Development, LKML, kvm, linux-mm, linux-arm-kernel,
	linux-samsung-soc, linux-media, linux-s390, Daniel Vetter,
	Jason Gunthorpe, Kees Cook, Dan Williams, Andrew Morton,
	John Hubbard, Jérôme Glisse, Jan Kara, Bjorn Helgaas,
	linux-pci, Rafael J. Wysocki, Christian Brauner, David S. Miller,
	Michael Ellerman, Sourabh Jain, Mauro Carvalho Chehab,
	Nayna Jain

On Fri, Oct 09, 2020 at 09:59:32AM +0200, Daniel Vetter wrote:
> We want to be able to revoke pci mmaps so that the same access rules
> applies as for /dev/kmem. Revoke support for devmem was added in
> 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims the
> region").
> 
> The simplest way to achieve this is by having the same filp->f_mapping
> for all mappings, so that unmap_mapping_range can find them all, no
> matter through which file they've been created. Since this must be set
> at open time we need sysfs support for this.
> 
> Add an optional mapping parameter bin_attr, which is only consulted
> when there's also an mmap callback, since without mmap support
> allowing to adjust the ->f_mapping makes no sense.
> 
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: J�r�me Glisse <jglisse@redhat.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: linux-mm@kvack.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-samsung-soc@vger.kernel.org
> Cc: linux-media@vger.kernel.org
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Cc: linux-pci@vger.kernel.org
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> Cc: Christian Brauner <christian.brauner@ubuntu.com>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Sourabh Jain <sourabhjain@linux.ibm.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> Cc: Nayna Jain <nayna@linux.ibm.com>
> ---
>  fs/sysfs/file.c       | 11 +++++++++++
>  include/linux/sysfs.h |  2 ++
>  2 files changed, 13 insertions(+)

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 15/17] sysfs: Support zapping of binary attr mmaps
@ 2020-10-09 10:58     ` Greg Kroah-Hartman
  0 siblings, 0 replies; 214+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-09 10:58 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Jan Kara, kvm, Rafael J. Wysocki, linux-pci, DRI Development,
	linux-mm, Daniel Vetter, Christian Brauner, linux-s390,
	linux-samsung-soc, Mauro Carvalho Chehab, Michael Ellerman,
	Nayna Jain, Jason Gunthorpe, linux-media, Kees Cook,
	John Hubbard, Jérôme Glisse, Bjorn Helgaas,
	Dan Williams, linux-arm-kernel, LKML, Sourabh Jain,
	Andrew Morton, David S. Miller

On Fri, Oct 09, 2020 at 09:59:32AM +0200, Daniel Vetter wrote:
> We want to be able to revoke pci mmaps so that the same access rules
> applies as for /dev/kmem. Revoke support for devmem was added in
> 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims the
> region").
> 
> The simplest way to achieve this is by having the same filp->f_mapping
> for all mappings, so that unmap_mapping_range can find them all, no
> matter through which file they've been created. Since this must be set
> at open time we need sysfs support for this.
> 
> Add an optional mapping parameter bin_attr, which is only consulted
> when there's also an mmap callback, since without mmap support
> allowing to adjust the ->f_mapping makes no sense.
> 
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: linux-mm@kvack.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-samsung-soc@vger.kernel.org
> Cc: linux-media@vger.kernel.org
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Cc: linux-pci@vger.kernel.org
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> Cc: Christian Brauner <christian.brauner@ubuntu.com>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Sourabh Jain <sourabhjain@linux.ibm.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> Cc: Nayna Jain <nayna@linux.ibm.com>
> ---
>  fs/sysfs/file.c       | 11 +++++++++++
>  include/linux/sysfs.h |  2 ++
>  2 files changed, 13 insertions(+)

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 15/17] sysfs: Support zapping of binary attr mmaps
@ 2020-10-09 10:58     ` Greg Kroah-Hartman
  0 siblings, 0 replies; 214+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-09 10:58 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Jan Kara, kvm, Rafael J. Wysocki, linux-pci, DRI Development,
	linux-mm, Daniel Vetter, Christian Brauner, linux-s390,
	linux-samsung-soc, Mauro Carvalho Chehab, Michael Ellerman,
	Nayna Jain, Jason Gunthorpe, linux-media, Kees Cook,
	John Hubbard, Jérôme Glisse, Bjorn Helgaas,
	Dan Williams, linux-arm-kernel, LKML, Sourabh Jain,
	Andrew Morton, David S. Miller

On Fri, Oct 09, 2020 at 09:59:32AM +0200, Daniel Vetter wrote:
> We want to be able to revoke pci mmaps so that the same access rules
> applies as for /dev/kmem. Revoke support for devmem was added in
> 3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims the
> region").
> 
> The simplest way to achieve this is by having the same filp->f_mapping
> for all mappings, so that unmap_mapping_range can find them all, no
> matter through which file they've been created. Since this must be set
> at open time we need sysfs support for this.
> 
> Add an optional mapping parameter bin_attr, which is only consulted
> when there's also an mmap callback, since without mmap support
> allowing to adjust the ->f_mapping makes no sense.
> 
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: linux-mm@kvack.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-samsung-soc@vger.kernel.org
> Cc: linux-media@vger.kernel.org
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Cc: linux-pci@vger.kernel.org
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> Cc: Christian Brauner <christian.brauner@ubuntu.com>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Sourabh Jain <sourabhjain@linux.ibm.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> Cc: Nayna Jain <nayna@linux.ibm.com>
> ---
>  fs/sysfs/file.c       | 11 +++++++++++
>  include/linux/sysfs.h |  2 ++
>  2 files changed, 13 insertions(+)

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 14/17] resource: Move devmem revoke code to resource framework
  2020-10-09  7:59   ` Daniel Vetter
  (?)
  (?)
@ 2020-10-09 10:59     ` Greg Kroah-Hartman
  -1 siblings, 0 replies; 214+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-09 10:59 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: DRI Development, LKML, kvm, linux-mm, linux-arm-kernel,
	linux-samsung-soc, linux-media, linux-s390, Daniel Vetter,
	Jason Gunthorpe, Kees Cook, Dan Williams, Andrew Morton,
	John Hubbard, Jérôme Glisse, Jan Kara, Arnd Bergmann,
	David Hildenbrand, Rafael J. Wysocki

On Fri, Oct 09, 2020 at 09:59:31AM +0200, Daniel Vetter wrote:
> We want all iomem mmaps to consistently revoke ptes when the kernel
> takes over and CONFIG_IO_STRICT_DEVMEM is enabled. This includes the
> pci bar mmaps available through procfs and sysfs, which currently do
> not revoke mappings.
> 
> To prepare for this, move the code from the /dev/kmem driver to
> kernel/resource.c.
> 
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: linux-mm@kvack.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-samsung-soc@vger.kernel.org
> Cc: linux-media@vger.kernel.org
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 14/17] resource: Move devmem revoke code to resource framework
@ 2020-10-09 10:59     ` Greg Kroah-Hartman
  0 siblings, 0 replies; 214+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-09 10:59 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: DRI Development, LKML, kvm, linux-mm, linux-arm-kernel,
	linux-samsung-soc, linux-media, linux-s390, Daniel Vetter,
	Jason Gunthorpe, Kees Cook, Dan Williams, Andrew Morton,
	John Hubbard, Jérôme Glisse, Jan Kara, Arnd Bergmann,
	David Hildenbrand, Rafael J. Wysocki

On Fri, Oct 09, 2020 at 09:59:31AM +0200, Daniel Vetter wrote:
> We want all iomem mmaps to consistently revoke ptes when the kernel
> takes over and CONFIG_IO_STRICT_DEVMEM is enabled. This includes the
> pci bar mmaps available through procfs and sysfs, which currently do
> not revoke mappings.
> 
> To prepare for this, move the code from the /dev/kmem driver to
> kernel/resource.c.
> 
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: J�r�me Glisse <jglisse@redhat.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: linux-mm@kvack.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-samsung-soc@vger.kernel.org
> Cc: linux-media@vger.kernel.org
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 14/17] resource: Move devmem revoke code to resource framework
@ 2020-10-09 10:59     ` Greg Kroah-Hartman
  0 siblings, 0 replies; 214+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-09 10:59 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, kvm,
	Jason Gunthorpe, John Hubbard, David Hildenbrand, LKML,
	DRI Development, Rafael J. Wysocki, linux-mm,
	Jérôme Glisse, Arnd Bergmann, Daniel Vetter,
	Dan Williams, Andrew Morton, linux-arm-kernel, linux-media

On Fri, Oct 09, 2020 at 09:59:31AM +0200, Daniel Vetter wrote:
> We want all iomem mmaps to consistently revoke ptes when the kernel
> takes over and CONFIG_IO_STRICT_DEVMEM is enabled. This includes the
> pci bar mmaps available through procfs and sysfs, which currently do
> not revoke mappings.
> 
> To prepare for this, move the code from the /dev/kmem driver to
> kernel/resource.c.
> 
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: linux-mm@kvack.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-samsung-soc@vger.kernel.org
> Cc: linux-media@vger.kernel.org
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 14/17] resource: Move devmem revoke code to resource framework
@ 2020-10-09 10:59     ` Greg Kroah-Hartman
  0 siblings, 0 replies; 214+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-09 10:59 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, kvm,
	Jason Gunthorpe, John Hubbard, David Hildenbrand, LKML,
	DRI Development, Rafael J. Wysocki, linux-mm,
	Jérôme Glisse, Arnd Bergmann, Daniel Vetter,
	Dan Williams, Andrew Morton, linux-arm-kernel, linux-media

On Fri, Oct 09, 2020 at 09:59:31AM +0200, Daniel Vetter wrote:
> We want all iomem mmaps to consistently revoke ptes when the kernel
> takes over and CONFIG_IO_STRICT_DEVMEM is enabled. This includes the
> pci bar mmaps available through procfs and sysfs, which currently do
> not revoke mappings.
> 
> To prepare for this, move the code from the /dev/kmem driver to
> kernel/resource.c.
> 
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: linux-mm@kvack.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-samsung-soc@vger.kernel.org
> Cc: linux-media@vger.kernel.org
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 09/17] mm: Add unsafe_follow_pfn
  2020-10-09 10:34     ` Mauro Carvalho Chehab
  (?)
@ 2020-10-09 12:21       ` Jason Gunthorpe
  -1 siblings, 0 replies; 214+ messages in thread
From: Jason Gunthorpe @ 2020-10-09 12:21 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Daniel Vetter, DRI Development, LKML, kvm, linux-mm,
	linux-arm-kernel, linux-samsung-soc, linux-media, linux-s390,
	Daniel Vetter, Kees Cook, Dan Williams, Andrew Morton,
	John Hubbard, Jérôme Glisse, Jan Kara, Linus Torvalds

On Fri, Oct 09, 2020 at 12:34:21PM +0200, Mauro Carvalho Chehab wrote:
> Hi,
> 
> Em Fri,  9 Oct 2020 09:59:26 +0200
> Daniel Vetter <daniel.vetter@ffwll.ch> escreveu:
> 
> > Way back it was a reasonable assumptions that iomem mappings never
> > change the pfn range they point at. But this has changed:
> > 
> > - gpu drivers dynamically manage their memory nowadays, invalidating
> > ptes with unmap_mapping_range when buffers get moved
> > 
> > - contiguous dma allocations have moved from dedicated carvetouts to
> > cma regions. This means if we miss the unmap the pfn might contain
> > pagecache or anon memory (well anything allocated with GFP_MOVEABLE)
> > 
> > - even /dev/mem now invalidates mappings when the kernel requests that
> > iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
> > ("/dev/mem: Revoke mappings when a driver claims the region")
> > 
> > Accessing pfns obtained from ptes without holding all the locks is
> > therefore no longer a good idea.
> > 
> > Unfortunately there's some users where this is not fixable (like v4l
> > userptr of iomem mappings) or involves a pile of work (vfio type1
> > iommu). For now annotate these as unsafe and splat appropriately.
> > 
> > This patch adds an unsafe_follow_pfn, which later patches will then
> > roll out to all appropriate places.
> 
> NACK, as this breaks an existing userspace API on media.

It doesn't break it. You get a big warning the thing is broken and it
keeps working.

We can't leave such a huge security hole open - it impacts other
subsystems, distros need to be able to run in a secure mode.

> While I agree that using the userptr on media is something that
> new drivers may not support, as DMABUF is a better way of
> handling it, changing this for existing ones is a big no, 
> as it may break usersapace.

media community needs to work to fix this, not pretend it is OK to
keep going as-is.

Dealing with security issues is the one case where an uABI break might
be acceptable.

If you want to NAK it then you need to come up with the work to do
something here correctly that will support the old drivers without the
kernel taint.

Unfortunately making things uncomfortable for the subsystem is the big
hammer the core kernel needs to use to actually get this security work
done by those responsible.

Jason

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 09/17] mm: Add unsafe_follow_pfn
@ 2020-10-09 12:21       ` Jason Gunthorpe
  0 siblings, 0 replies; 214+ messages in thread
From: Jason Gunthorpe @ 2020-10-09 12:21 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, kvm,
	Daniel Vetter, LKML, DRI Development, linux-mm,
	Jérôme Glisse, John Hubbard, Daniel Vetter,
	Dan Williams, Linus Torvalds, Andrew Morton, linux-arm-kernel,
	linux-media

On Fri, Oct 09, 2020 at 12:34:21PM +0200, Mauro Carvalho Chehab wrote:
> Hi,
> 
> Em Fri,  9 Oct 2020 09:59:26 +0200
> Daniel Vetter <daniel.vetter@ffwll.ch> escreveu:
> 
> > Way back it was a reasonable assumptions that iomem mappings never
> > change the pfn range they point at. But this has changed:
> > 
> > - gpu drivers dynamically manage their memory nowadays, invalidating
> > ptes with unmap_mapping_range when buffers get moved
> > 
> > - contiguous dma allocations have moved from dedicated carvetouts to
> > cma regions. This means if we miss the unmap the pfn might contain
> > pagecache or anon memory (well anything allocated with GFP_MOVEABLE)
> > 
> > - even /dev/mem now invalidates mappings when the kernel requests that
> > iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
> > ("/dev/mem: Revoke mappings when a driver claims the region")
> > 
> > Accessing pfns obtained from ptes without holding all the locks is
> > therefore no longer a good idea.
> > 
> > Unfortunately there's some users where this is not fixable (like v4l
> > userptr of iomem mappings) or involves a pile of work (vfio type1
> > iommu). For now annotate these as unsafe and splat appropriately.
> > 
> > This patch adds an unsafe_follow_pfn, which later patches will then
> > roll out to all appropriate places.
> 
> NACK, as this breaks an existing userspace API on media.

It doesn't break it. You get a big warning the thing is broken and it
keeps working.

We can't leave such a huge security hole open - it impacts other
subsystems, distros need to be able to run in a secure mode.

> While I agree that using the userptr on media is something that
> new drivers may not support, as DMABUF is a better way of
> handling it, changing this for existing ones is a big no, 
> as it may break usersapace.

media community needs to work to fix this, not pretend it is OK to
keep going as-is.

Dealing with security issues is the one case where an uABI break might
be acceptable.

If you want to NAK it then you need to come up with the work to do
something here correctly that will support the old drivers without the
kernel taint.

Unfortunately making things uncomfortable for the subsystem is the big
hammer the core kernel needs to use to actually get this security work
done by those responsible.

Jason

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 09/17] mm: Add unsafe_follow_pfn
@ 2020-10-09 12:21       ` Jason Gunthorpe
  0 siblings, 0 replies; 214+ messages in thread
From: Jason Gunthorpe @ 2020-10-09 12:21 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, kvm,
	Daniel Vetter, LKML, DRI Development, linux-mm,
	Jérôme Glisse, John Hubbard, Daniel Vetter,
	Dan Williams, Linus Torvalds, Andrew Morton, linux-arm-kernel,
	linux-media

On Fri, Oct 09, 2020 at 12:34:21PM +0200, Mauro Carvalho Chehab wrote:
> Hi,
> 
> Em Fri,  9 Oct 2020 09:59:26 +0200
> Daniel Vetter <daniel.vetter@ffwll.ch> escreveu:
> 
> > Way back it was a reasonable assumptions that iomem mappings never
> > change the pfn range they point at. But this has changed:
> > 
> > - gpu drivers dynamically manage their memory nowadays, invalidating
> > ptes with unmap_mapping_range when buffers get moved
> > 
> > - contiguous dma allocations have moved from dedicated carvetouts to
> > cma regions. This means if we miss the unmap the pfn might contain
> > pagecache or anon memory (well anything allocated with GFP_MOVEABLE)
> > 
> > - even /dev/mem now invalidates mappings when the kernel requests that
> > iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
> > ("/dev/mem: Revoke mappings when a driver claims the region")
> > 
> > Accessing pfns obtained from ptes without holding all the locks is
> > therefore no longer a good idea.
> > 
> > Unfortunately there's some users where this is not fixable (like v4l
> > userptr of iomem mappings) or involves a pile of work (vfio type1
> > iommu). For now annotate these as unsafe and splat appropriately.
> > 
> > This patch adds an unsafe_follow_pfn, which later patches will then
> > roll out to all appropriate places.
> 
> NACK, as this breaks an existing userspace API on media.

It doesn't break it. You get a big warning the thing is broken and it
keeps working.

We can't leave such a huge security hole open - it impacts other
subsystems, distros need to be able to run in a secure mode.

> While I agree that using the userptr on media is something that
> new drivers may not support, as DMABUF is a better way of
> handling it, changing this for existing ones is a big no, 
> as it may break usersapace.

media community needs to work to fix this, not pretend it is OK to
keep going as-is.

Dealing with security issues is the one case where an uABI break might
be acceptable.

If you want to NAK it then you need to come up with the work to do
something here correctly that will support the old drivers without the
kernel taint.

Unfortunately making things uncomfortable for the subsystem is the big
hammer the core kernel needs to use to actually get this security work
done by those responsible.

Jason
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 14/17] resource: Move devmem revoke code to resource framework
  2020-10-09  7:59   ` Daniel Vetter
  (?)
@ 2020-10-09 12:31     ` Jason Gunthorpe
  -1 siblings, 0 replies; 214+ messages in thread
From: Jason Gunthorpe @ 2020-10-09 12:31 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: DRI Development, LKML, kvm, linux-mm, linux-arm-kernel,
	linux-samsung-soc, linux-media, linux-s390, Daniel Vetter,
	Kees Cook, Dan Williams, Andrew Morton, John Hubbard,
	Jérôme Glisse, Jan Kara, Arnd Bergmann,
	Greg Kroah-Hartman, David Hildenbrand, Rafael J. Wysocki

On Fri, Oct 09, 2020 at 09:59:31AM +0200, Daniel Vetter wrote:

> +struct address_space *iomem_get_mapping(void)
> +{
> +	return iomem_inode->i_mapping;

This should pair an acquire with the release below

> +	/*
> +	 * Publish /dev/mem initialized.
> +	 * Pairs with smp_load_acquire() in revoke_iomem().
> +	 */
> +	smp_store_release(&iomem_inode, inode);

However, this seems abnormal, initcalls rarely do this kind of stuff
with global data..

The kernel crashes if this fs_initcall is raced with
iomem_get_mapping() due to the unconditional dereference, so I think
it can be safely switched to a simple assignment.

Jason

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 14/17] resource: Move devmem revoke code to resource framework
@ 2020-10-09 12:31     ` Jason Gunthorpe
  0 siblings, 0 replies; 214+ messages in thread
From: Jason Gunthorpe @ 2020-10-09 12:31 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, kvm,
	David Hildenbrand, John Hubbard, Rafael J. Wysocki, LKML,
	DRI Development, linux-mm, Jérôme Glisse,
	Arnd Bergmann, Greg Kroah-Hartman, Daniel Vetter, Dan Williams,
	Andrew Morton, linux-arm-kernel, linux-media

On Fri, Oct 09, 2020 at 09:59:31AM +0200, Daniel Vetter wrote:

> +struct address_space *iomem_get_mapping(void)
> +{
> +	return iomem_inode->i_mapping;

This should pair an acquire with the release below

> +	/*
> +	 * Publish /dev/mem initialized.
> +	 * Pairs with smp_load_acquire() in revoke_iomem().
> +	 */
> +	smp_store_release(&iomem_inode, inode);

However, this seems abnormal, initcalls rarely do this kind of stuff
with global data..

The kernel crashes if this fs_initcall is raced with
iomem_get_mapping() due to the unconditional dereference, so I think
it can be safely switched to a simple assignment.

Jason

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 14/17] resource: Move devmem revoke code to resource framework
@ 2020-10-09 12:31     ` Jason Gunthorpe
  0 siblings, 0 replies; 214+ messages in thread
From: Jason Gunthorpe @ 2020-10-09 12:31 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, kvm,
	David Hildenbrand, John Hubbard, Rafael J. Wysocki, LKML,
	DRI Development, linux-mm, Jérôme Glisse,
	Arnd Bergmann, Greg Kroah-Hartman, Daniel Vetter, Dan Williams,
	Andrew Morton, linux-arm-kernel, linux-media

On Fri, Oct 09, 2020 at 09:59:31AM +0200, Daniel Vetter wrote:

> +struct address_space *iomem_get_mapping(void)
> +{
> +	return iomem_inode->i_mapping;

This should pair an acquire with the release below

> +	/*
> +	 * Publish /dev/mem initialized.
> +	 * Pairs with smp_load_acquire() in revoke_iomem().
> +	 */
> +	smp_store_release(&iomem_inode, inode);

However, this seems abnormal, initcalls rarely do this kind of stuff
with global data..

The kernel crashes if this fs_initcall is raced with
iomem_get_mapping() due to the unconditional dereference, so I think
it can be safely switched to a simple assignment.

Jason
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 09/17] mm: Add unsafe_follow_pfn
  2020-10-09 12:21       ` Jason Gunthorpe
  (?)
@ 2020-10-09 12:37         ` Mauro Carvalho Chehab
  -1 siblings, 0 replies; 214+ messages in thread
From: Mauro Carvalho Chehab @ 2020-10-09 12:37 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Daniel Vetter, DRI Development, LKML, kvm, linux-mm,
	linux-arm-kernel, linux-samsung-soc, linux-media, linux-s390,
	Daniel Vetter, Kees Cook, Dan Williams, Andrew Morton,
	John Hubbard, Jérôme Glisse, Jan Kara, Linus Torvalds

Em Fri, 9 Oct 2020 09:21:11 -0300
Jason Gunthorpe <jgg@ziepe.ca> escreveu:

> On Fri, Oct 09, 2020 at 12:34:21PM +0200, Mauro Carvalho Chehab wrote:
> > Hi,
> > 
> > Em Fri,  9 Oct 2020 09:59:26 +0200
> > Daniel Vetter <daniel.vetter@ffwll.ch> escreveu:
> >   
> > > Way back it was a reasonable assumptions that iomem mappings never
> > > change the pfn range they point at. But this has changed:
> > > 
> > > - gpu drivers dynamically manage their memory nowadays, invalidating
> > > ptes with unmap_mapping_range when buffers get moved
> > > 
> > > - contiguous dma allocations have moved from dedicated carvetouts to
> > > cma regions. This means if we miss the unmap the pfn might contain
> > > pagecache or anon memory (well anything allocated with GFP_MOVEABLE)
> > > 
> > > - even /dev/mem now invalidates mappings when the kernel requests that
> > > iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
> > > ("/dev/mem: Revoke mappings when a driver claims the region")
> > > 
> > > Accessing pfns obtained from ptes without holding all the locks is
> > > therefore no longer a good idea.
> > > 
> > > Unfortunately there's some users where this is not fixable (like v4l
> > > userptr of iomem mappings) or involves a pile of work (vfio type1
> > > iommu). For now annotate these as unsafe and splat appropriately.
> > > 
> > > This patch adds an unsafe_follow_pfn, which later patches will then
> > > roll out to all appropriate places.  
> > 
> > NACK, as this breaks an existing userspace API on media.  
> 
> It doesn't break it. You get a big warning the thing is broken and it
> keeps working.
> 
> We can't leave such a huge security hole open - it impacts other
> subsystems, distros need to be able to run in a secure mode.

Well, if distros disable it, then apps will break.

> > While I agree that using the userptr on media is something that
> > new drivers may not support, as DMABUF is a better way of
> > handling it, changing this for existing ones is a big no, 
> > as it may break usersapace.  
> 
> media community needs to work to fix this, not pretend it is OK to
> keep going as-is.

> Dealing with security issues is the one case where an uABI break might
> be acceptable.
> 
> If you want to NAK it then you need to come up with the work to do
> something here correctly that will support the old drivers without the
> kernel taint.
> 
> Unfortunately making things uncomfortable for the subsystem is the big
> hammer the core kernel needs to use to actually get this security work
> done by those responsible.


I'm not pretending that this is ok. Just pointing that the approach
taken is NOT OK.

I'm not a mm/ expert, but, from what I understood from Daniel's patch
description is that this is unsafe *only if*  __GFP_MOVABLE is used.

Well, no drivers inside the media subsystem uses such flag, although
they may rely on some infrastructure that could be using it behind
the bars.

If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE 
flag that it would be denying the core mm code to set __GFP_MOVABLE.

Please let address the issue on this way, instead of broken an
userspace API that it is there since 1991.

Thanks,
Mauro

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 09/17] mm: Add unsafe_follow_pfn
@ 2020-10-09 12:37         ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 214+ messages in thread
From: Mauro Carvalho Chehab @ 2020-10-09 12:37 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, kvm,
	Daniel Vetter, LKML, DRI Development, linux-mm,
	Jérôme Glisse, John Hubbard, Daniel Vetter,
	Dan Williams, Linus Torvalds, Andrew Morton, linux-arm-kernel,
	linux-media

Em Fri, 9 Oct 2020 09:21:11 -0300
Jason Gunthorpe <jgg@ziepe.ca> escreveu:

> On Fri, Oct 09, 2020 at 12:34:21PM +0200, Mauro Carvalho Chehab wrote:
> > Hi,
> > 
> > Em Fri,  9 Oct 2020 09:59:26 +0200
> > Daniel Vetter <daniel.vetter@ffwll.ch> escreveu:
> >   
> > > Way back it was a reasonable assumptions that iomem mappings never
> > > change the pfn range they point at. But this has changed:
> > > 
> > > - gpu drivers dynamically manage their memory nowadays, invalidating
> > > ptes with unmap_mapping_range when buffers get moved
> > > 
> > > - contiguous dma allocations have moved from dedicated carvetouts to
> > > cma regions. This means if we miss the unmap the pfn might contain
> > > pagecache or anon memory (well anything allocated with GFP_MOVEABLE)
> > > 
> > > - even /dev/mem now invalidates mappings when the kernel requests that
> > > iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
> > > ("/dev/mem: Revoke mappings when a driver claims the region")
> > > 
> > > Accessing pfns obtained from ptes without holding all the locks is
> > > therefore no longer a good idea.
> > > 
> > > Unfortunately there's some users where this is not fixable (like v4l
> > > userptr of iomem mappings) or involves a pile of work (vfio type1
> > > iommu). For now annotate these as unsafe and splat appropriately.
> > > 
> > > This patch adds an unsafe_follow_pfn, which later patches will then
> > > roll out to all appropriate places.  
> > 
> > NACK, as this breaks an existing userspace API on media.  
> 
> It doesn't break it. You get a big warning the thing is broken and it
> keeps working.
> 
> We can't leave such a huge security hole open - it impacts other
> subsystems, distros need to be able to run in a secure mode.

Well, if distros disable it, then apps will break.

> > While I agree that using the userptr on media is something that
> > new drivers may not support, as DMABUF is a better way of
> > handling it, changing this for existing ones is a big no, 
> > as it may break usersapace.  
> 
> media community needs to work to fix this, not pretend it is OK to
> keep going as-is.

> Dealing with security issues is the one case where an uABI break might
> be acceptable.
> 
> If you want to NAK it then you need to come up with the work to do
> something here correctly that will support the old drivers without the
> kernel taint.
> 
> Unfortunately making things uncomfortable for the subsystem is the big
> hammer the core kernel needs to use to actually get this security work
> done by those responsible.


I'm not pretending that this is ok. Just pointing that the approach
taken is NOT OK.

I'm not a mm/ expert, but, from what I understood from Daniel's patch
description is that this is unsafe *only if*  __GFP_MOVABLE is used.

Well, no drivers inside the media subsystem uses such flag, although
they may rely on some infrastructure that could be using it behind
the bars.

If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE 
flag that it would be denying the core mm code to set __GFP_MOVABLE.

Please let address the issue on this way, instead of broken an
userspace API that it is there since 1991.

Thanks,
Mauro

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 09/17] mm: Add unsafe_follow_pfn
@ 2020-10-09 12:37         ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 214+ messages in thread
From: Mauro Carvalho Chehab @ 2020-10-09 12:37 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, kvm,
	Daniel Vetter, LKML, DRI Development, linux-mm,
	Jérôme Glisse, John Hubbard, Daniel Vetter,
	Dan Williams, Linus Torvalds, Andrew Morton, linux-arm-kernel,
	linux-media

Em Fri, 9 Oct 2020 09:21:11 -0300
Jason Gunthorpe <jgg@ziepe.ca> escreveu:

> On Fri, Oct 09, 2020 at 12:34:21PM +0200, Mauro Carvalho Chehab wrote:
> > Hi,
> > 
> > Em Fri,  9 Oct 2020 09:59:26 +0200
> > Daniel Vetter <daniel.vetter@ffwll.ch> escreveu:
> >   
> > > Way back it was a reasonable assumptions that iomem mappings never
> > > change the pfn range they point at. But this has changed:
> > > 
> > > - gpu drivers dynamically manage their memory nowadays, invalidating
> > > ptes with unmap_mapping_range when buffers get moved
> > > 
> > > - contiguous dma allocations have moved from dedicated carvetouts to
> > > cma regions. This means if we miss the unmap the pfn might contain
> > > pagecache or anon memory (well anything allocated with GFP_MOVEABLE)
> > > 
> > > - even /dev/mem now invalidates mappings when the kernel requests that
> > > iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
> > > ("/dev/mem: Revoke mappings when a driver claims the region")
> > > 
> > > Accessing pfns obtained from ptes without holding all the locks is
> > > therefore no longer a good idea.
> > > 
> > > Unfortunately there's some users where this is not fixable (like v4l
> > > userptr of iomem mappings) or involves a pile of work (vfio type1
> > > iommu). For now annotate these as unsafe and splat appropriately.
> > > 
> > > This patch adds an unsafe_follow_pfn, which later patches will then
> > > roll out to all appropriate places.  
> > 
> > NACK, as this breaks an existing userspace API on media.  
> 
> It doesn't break it. You get a big warning the thing is broken and it
> keeps working.
> 
> We can't leave such a huge security hole open - it impacts other
> subsystems, distros need to be able to run in a secure mode.

Well, if distros disable it, then apps will break.

> > While I agree that using the userptr on media is something that
> > new drivers may not support, as DMABUF is a better way of
> > handling it, changing this for existing ones is a big no, 
> > as it may break usersapace.  
> 
> media community needs to work to fix this, not pretend it is OK to
> keep going as-is.

> Dealing with security issues is the one case where an uABI break might
> be acceptable.
> 
> If you want to NAK it then you need to come up with the work to do
> something here correctly that will support the old drivers without the
> kernel taint.
> 
> Unfortunately making things uncomfortable for the subsystem is the big
> hammer the core kernel needs to use to actually get this security work
> done by those responsible.


I'm not pretending that this is ok. Just pointing that the approach
taken is NOT OK.

I'm not a mm/ expert, but, from what I understood from Daniel's patch
description is that this is unsafe *only if*  __GFP_MOVABLE is used.

Well, no drivers inside the media subsystem uses such flag, although
they may rely on some infrastructure that could be using it behind
the bars.

If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE 
flag that it would be denying the core mm code to set __GFP_MOVABLE.

Please let address the issue on this way, instead of broken an
userspace API that it is there since 1991.

Thanks,
Mauro
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 09/17] mm: Add unsafe_follow_pfn
  2020-10-09 12:37         ` Mauro Carvalho Chehab
  (?)
@ 2020-10-09 12:39           ` Mauro Carvalho Chehab
  -1 siblings, 0 replies; 214+ messages in thread
From: Mauro Carvalho Chehab @ 2020-10-09 12:39 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Daniel Vetter, DRI Development, LKML, kvm, linux-mm,
	linux-arm-kernel, linux-samsung-soc, linux-media, linux-s390,
	Daniel Vetter, Kees Cook, Dan Williams, Andrew Morton,
	John Hubbard, Jérôme Glisse, Jan Kara, Linus Torvalds

Em Fri, 9 Oct 2020 14:37:23 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> escreveu:

> Em Fri, 9 Oct 2020 09:21:11 -0300
> Jason Gunthorpe <jgg@ziepe.ca> escreveu:
> 
> > On Fri, Oct 09, 2020 at 12:34:21PM +0200, Mauro Carvalho Chehab wrote:  
> > > Hi,
> > > 
> > > Em Fri,  9 Oct 2020 09:59:26 +0200
> > > Daniel Vetter <daniel.vetter@ffwll.ch> escreveu:
> > >     
> > > > Way back it was a reasonable assumptions that iomem mappings never
> > > > change the pfn range they point at. But this has changed:
> > > > 
> > > > - gpu drivers dynamically manage their memory nowadays, invalidating
> > > > ptes with unmap_mapping_range when buffers get moved
> > > > 
> > > > - contiguous dma allocations have moved from dedicated carvetouts to
> > > > cma regions. This means if we miss the unmap the pfn might contain
> > > > pagecache or anon memory (well anything allocated with GFP_MOVEABLE)
> > > > 
> > > > - even /dev/mem now invalidates mappings when the kernel requests that
> > > > iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
> > > > ("/dev/mem: Revoke mappings when a driver claims the region")
> > > > 
> > > > Accessing pfns obtained from ptes without holding all the locks is
> > > > therefore no longer a good idea.
> > > > 
> > > > Unfortunately there's some users where this is not fixable (like v4l
> > > > userptr of iomem mappings) or involves a pile of work (vfio type1
> > > > iommu). For now annotate these as unsafe and splat appropriately.
> > > > 
> > > > This patch adds an unsafe_follow_pfn, which later patches will then
> > > > roll out to all appropriate places.    
> > > 
> > > NACK, as this breaks an existing userspace API on media.    
> > 
> > It doesn't break it. You get a big warning the thing is broken and it
> > keeps working.
> > 
> > We can't leave such a huge security hole open - it impacts other
> > subsystems, distros need to be able to run in a secure mode.  
> 
> Well, if distros disable it, then apps will break.
> 
> > > While I agree that using the userptr on media is something that
> > > new drivers may not support, as DMABUF is a better way of
> > > handling it, changing this for existing ones is a big no, 
> > > as it may break usersapace.    
> > 
> > media community needs to work to fix this, not pretend it is OK to
> > keep going as-is.  
> 
> > Dealing with security issues is the one case where an uABI break might
> > be acceptable.
> > 
> > If you want to NAK it then you need to come up with the work to do
> > something here correctly that will support the old drivers without the
> > kernel taint.
> > 
> > Unfortunately making things uncomfortable for the subsystem is the big
> > hammer the core kernel needs to use to actually get this security work
> > done by those responsible.  
> 
> 
> I'm not pretending that this is ok. Just pointing that the approach
> taken is NOT OK.
> 
> I'm not a mm/ expert, but, from what I understood from Daniel's patch
> description is that this is unsafe *only if*  __GFP_MOVABLE is used.
> 
> Well, no drivers inside the media subsystem uses such flag, although
> they may rely on some infrastructure that could be using it behind
> the bars.
> 
> If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE 
> flag that it would be denying the core mm code to set __GFP_MOVABLE.
> 
> Please let address the issue on this way, instead of broken an
> userspace API that it is there since 1991.

In time: I meant to say 1998.

Thanks,
Mauro

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 09/17] mm: Add unsafe_follow_pfn
@ 2020-10-09 12:39           ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 214+ messages in thread
From: Mauro Carvalho Chehab @ 2020-10-09 12:39 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, kvm,
	Daniel Vetter, LKML, DRI Development, linux-mm,
	Jérôme Glisse, John Hubbard, Daniel Vetter,
	Dan Williams, Linus Torvalds, Andrew Morton, linux-arm-kernel,
	linux-media

Em Fri, 9 Oct 2020 14:37:23 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> escreveu:

> Em Fri, 9 Oct 2020 09:21:11 -0300
> Jason Gunthorpe <jgg@ziepe.ca> escreveu:
> 
> > On Fri, Oct 09, 2020 at 12:34:21PM +0200, Mauro Carvalho Chehab wrote:  
> > > Hi,
> > > 
> > > Em Fri,  9 Oct 2020 09:59:26 +0200
> > > Daniel Vetter <daniel.vetter@ffwll.ch> escreveu:
> > >     
> > > > Way back it was a reasonable assumptions that iomem mappings never
> > > > change the pfn range they point at. But this has changed:
> > > > 
> > > > - gpu drivers dynamically manage their memory nowadays, invalidating
> > > > ptes with unmap_mapping_range when buffers get moved
> > > > 
> > > > - contiguous dma allocations have moved from dedicated carvetouts to
> > > > cma regions. This means if we miss the unmap the pfn might contain
> > > > pagecache or anon memory (well anything allocated with GFP_MOVEABLE)
> > > > 
> > > > - even /dev/mem now invalidates mappings when the kernel requests that
> > > > iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
> > > > ("/dev/mem: Revoke mappings when a driver claims the region")
> > > > 
> > > > Accessing pfns obtained from ptes without holding all the locks is
> > > > therefore no longer a good idea.
> > > > 
> > > > Unfortunately there's some users where this is not fixable (like v4l
> > > > userptr of iomem mappings) or involves a pile of work (vfio type1
> > > > iommu). For now annotate these as unsafe and splat appropriately.
> > > > 
> > > > This patch adds an unsafe_follow_pfn, which later patches will then
> > > > roll out to all appropriate places.    
> > > 
> > > NACK, as this breaks an existing userspace API on media.    
> > 
> > It doesn't break it. You get a big warning the thing is broken and it
> > keeps working.
> > 
> > We can't leave such a huge security hole open - it impacts other
> > subsystems, distros need to be able to run in a secure mode.  
> 
> Well, if distros disable it, then apps will break.
> 
> > > While I agree that using the userptr on media is something that
> > > new drivers may not support, as DMABUF is a better way of
> > > handling it, changing this for existing ones is a big no, 
> > > as it may break usersapace.    
> > 
> > media community needs to work to fix this, not pretend it is OK to
> > keep going as-is.  
> 
> > Dealing with security issues is the one case where an uABI break might
> > be acceptable.
> > 
> > If you want to NAK it then you need to come up with the work to do
> > something here correctly that will support the old drivers without the
> > kernel taint.
> > 
> > Unfortunately making things uncomfortable for the subsystem is the big
> > hammer the core kernel needs to use to actually get this security work
> > done by those responsible.  
> 
> 
> I'm not pretending that this is ok. Just pointing that the approach
> taken is NOT OK.
> 
> I'm not a mm/ expert, but, from what I understood from Daniel's patch
> description is that this is unsafe *only if*  __GFP_MOVABLE is used.
> 
> Well, no drivers inside the media subsystem uses such flag, although
> they may rely on some infrastructure that could be using it behind
> the bars.
> 
> If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE 
> flag that it would be denying the core mm code to set __GFP_MOVABLE.
> 
> Please let address the issue on this way, instead of broken an
> userspace API that it is there since 1991.

In time: I meant to say 1998.

Thanks,
Mauro

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 09/17] mm: Add unsafe_follow_pfn
@ 2020-10-09 12:39           ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 214+ messages in thread
From: Mauro Carvalho Chehab @ 2020-10-09 12:39 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, kvm,
	Daniel Vetter, LKML, DRI Development, linux-mm,
	Jérôme Glisse, John Hubbard, Daniel Vetter,
	Dan Williams, Linus Torvalds, Andrew Morton, linux-arm-kernel,
	linux-media

Em Fri, 9 Oct 2020 14:37:23 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> escreveu:

> Em Fri, 9 Oct 2020 09:21:11 -0300
> Jason Gunthorpe <jgg@ziepe.ca> escreveu:
> 
> > On Fri, Oct 09, 2020 at 12:34:21PM +0200, Mauro Carvalho Chehab wrote:  
> > > Hi,
> > > 
> > > Em Fri,  9 Oct 2020 09:59:26 +0200
> > > Daniel Vetter <daniel.vetter@ffwll.ch> escreveu:
> > >     
> > > > Way back it was a reasonable assumptions that iomem mappings never
> > > > change the pfn range they point at. But this has changed:
> > > > 
> > > > - gpu drivers dynamically manage their memory nowadays, invalidating
> > > > ptes with unmap_mapping_range when buffers get moved
> > > > 
> > > > - contiguous dma allocations have moved from dedicated carvetouts to
> > > > cma regions. This means if we miss the unmap the pfn might contain
> > > > pagecache or anon memory (well anything allocated with GFP_MOVEABLE)
> > > > 
> > > > - even /dev/mem now invalidates mappings when the kernel requests that
> > > > iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
> > > > ("/dev/mem: Revoke mappings when a driver claims the region")
> > > > 
> > > > Accessing pfns obtained from ptes without holding all the locks is
> > > > therefore no longer a good idea.
> > > > 
> > > > Unfortunately there's some users where this is not fixable (like v4l
> > > > userptr of iomem mappings) or involves a pile of work (vfio type1
> > > > iommu). For now annotate these as unsafe and splat appropriately.
> > > > 
> > > > This patch adds an unsafe_follow_pfn, which later patches will then
> > > > roll out to all appropriate places.    
> > > 
> > > NACK, as this breaks an existing userspace API on media.    
> > 
> > It doesn't break it. You get a big warning the thing is broken and it
> > keeps working.
> > 
> > We can't leave such a huge security hole open - it impacts other
> > subsystems, distros need to be able to run in a secure mode.  
> 
> Well, if distros disable it, then apps will break.
> 
> > > While I agree that using the userptr on media is something that
> > > new drivers may not support, as DMABUF is a better way of
> > > handling it, changing this for existing ones is a big no, 
> > > as it may break usersapace.    
> > 
> > media community needs to work to fix this, not pretend it is OK to
> > keep going as-is.  
> 
> > Dealing with security issues is the one case where an uABI break might
> > be acceptable.
> > 
> > If you want to NAK it then you need to come up with the work to do
> > something here correctly that will support the old drivers without the
> > kernel taint.
> > 
> > Unfortunately making things uncomfortable for the subsystem is the big
> > hammer the core kernel needs to use to actually get this security work
> > done by those responsible.  
> 
> 
> I'm not pretending that this is ok. Just pointing that the approach
> taken is NOT OK.
> 
> I'm not a mm/ expert, but, from what I understood from Daniel's patch
> description is that this is unsafe *only if*  __GFP_MOVABLE is used.
> 
> Well, no drivers inside the media subsystem uses such flag, although
> they may rely on some infrastructure that could be using it behind
> the bars.
> 
> If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE 
> flag that it would be denying the core mm code to set __GFP_MOVABLE.
> 
> Please let address the issue on this way, instead of broken an
> userspace API that it is there since 1991.

In time: I meant to say 1998.

Thanks,
Mauro
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 09/17] mm: Add unsafe_follow_pfn
  2020-10-09 12:37         ` Mauro Carvalho Chehab
  (?)
@ 2020-10-09 12:48           ` Jason Gunthorpe
  -1 siblings, 0 replies; 214+ messages in thread
From: Jason Gunthorpe @ 2020-10-09 12:48 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Daniel Vetter, DRI Development, LKML, kvm, linux-mm,
	linux-arm-kernel, linux-samsung-soc, linux-media, linux-s390,
	Daniel Vetter, Kees Cook, Dan Williams, Andrew Morton,
	John Hubbard, Jérôme Glisse, Jan Kara, Linus Torvalds

On Fri, Oct 09, 2020 at 02:37:23PM +0200, Mauro Carvalho Chehab wrote:

> I'm not a mm/ expert, but, from what I understood from Daniel's patch
> description is that this is unsafe *only if*  __GFP_MOVABLE is used.

No, it is unconditionally unsafe. The CMA movable mappings are
specific VMAs that will have bad issues here, but there are other
types too.

The only way to do something at a VMA level is to have a list of OK
VMAs, eg because they were creatd via a special mmap helper from the
media subsystem.

> Well, no drivers inside the media subsystem uses such flag, although
> they may rely on some infrastructure that could be using it behind
> the bars.

It doesn't matter, nothing prevents the user from calling media APIs
on mmaps it gets from other subsystems.

> If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE
> flag that it would be denying the core mm code to set __GFP_MOVABLE.

We can't tell from the VMA these kinds of details..

It has to go the other direction, evey mmap that might be used as a
userptr here has to be found and the VMA specially created to allow
its use. At least that is a kernel only change, but will need people
with the HW to do this work.

> Please let address the issue on this way, instead of broken an
> userspace API that it is there since 1991.

It has happened before :( It took 4 years for RDMA to undo the uAPI
breakage caused by a security fix for something that was a 15 years
old bug. 

Jason

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 09/17] mm: Add unsafe_follow_pfn
@ 2020-10-09 12:48           ` Jason Gunthorpe
  0 siblings, 0 replies; 214+ messages in thread
From: Jason Gunthorpe @ 2020-10-09 12:48 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, kvm,
	Daniel Vetter, LKML, DRI Development, linux-mm,
	Jérôme Glisse, John Hubbard, Daniel Vetter,
	Dan Williams, Linus Torvalds, Andrew Morton, linux-arm-kernel,
	linux-media

On Fri, Oct 09, 2020 at 02:37:23PM +0200, Mauro Carvalho Chehab wrote:

> I'm not a mm/ expert, but, from what I understood from Daniel's patch
> description is that this is unsafe *only if*  __GFP_MOVABLE is used.

No, it is unconditionally unsafe. The CMA movable mappings are
specific VMAs that will have bad issues here, but there are other
types too.

The only way to do something at a VMA level is to have a list of OK
VMAs, eg because they were creatd via a special mmap helper from the
media subsystem.

> Well, no drivers inside the media subsystem uses such flag, although
> they may rely on some infrastructure that could be using it behind
> the bars.

It doesn't matter, nothing prevents the user from calling media APIs
on mmaps it gets from other subsystems.

> If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE
> flag that it would be denying the core mm code to set __GFP_MOVABLE.

We can't tell from the VMA these kinds of details..

It has to go the other direction, evey mmap that might be used as a
userptr here has to be found and the VMA specially created to allow
its use. At least that is a kernel only change, but will need people
with the HW to do this work.

> Please let address the issue on this way, instead of broken an
> userspace API that it is there since 1991.

It has happened before :( It took 4 years for RDMA to undo the uAPI
breakage caused by a security fix for something that was a 15 years
old bug. 

Jason

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 09/17] mm: Add unsafe_follow_pfn
@ 2020-10-09 12:48           ` Jason Gunthorpe
  0 siblings, 0 replies; 214+ messages in thread
From: Jason Gunthorpe @ 2020-10-09 12:48 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, kvm,
	Daniel Vetter, LKML, DRI Development, linux-mm,
	Jérôme Glisse, John Hubbard, Daniel Vetter,
	Dan Williams, Linus Torvalds, Andrew Morton, linux-arm-kernel,
	linux-media

On Fri, Oct 09, 2020 at 02:37:23PM +0200, Mauro Carvalho Chehab wrote:

> I'm not a mm/ expert, but, from what I understood from Daniel's patch
> description is that this is unsafe *only if*  __GFP_MOVABLE is used.

No, it is unconditionally unsafe. The CMA movable mappings are
specific VMAs that will have bad issues here, but there are other
types too.

The only way to do something at a VMA level is to have a list of OK
VMAs, eg because they were creatd via a special mmap helper from the
media subsystem.

> Well, no drivers inside the media subsystem uses such flag, although
> they may rely on some infrastructure that could be using it behind
> the bars.

It doesn't matter, nothing prevents the user from calling media APIs
on mmaps it gets from other subsystems.

> If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE
> flag that it would be denying the core mm code to set __GFP_MOVABLE.

We can't tell from the VMA these kinds of details..

It has to go the other direction, evey mmap that might be used as a
userptr here has to be found and the VMA specially created to allow
its use. At least that is a kernel only change, but will need people
with the HW to do this work.

> Please let address the issue on this way, instead of broken an
> userspace API that it is there since 1991.

It has happened before :( It took 4 years for RDMA to undo the uAPI
breakage caused by a security fix for something that was a 15 years
old bug. 

Jason
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 17/17] drm/i915: Properly request PCI BARs
  2020-10-09 10:41         ` Ville Syrjälä
  (?)
@ 2020-10-09 14:18           ` Daniel Vetter
  -1 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-09 14:18 UTC (permalink / raw)
  To: Ville Syrjälä
  Cc: DRI Development, LKML, linux-s390, linux-samsung-soc, Jan Kara,
	Kees Cook, KVM list, Jason Gunthorpe, Linux PCI, Linux MM,
	Jérôme Glisse, John Hubbard, Bjorn Helgaas,
	Daniel Vetter, Dan Williams, Andrew Morton, Linux ARM,
	open list:DMA BUFFER SHARING FRAMEWORK

On Fri, Oct 9, 2020 at 12:42 PM Ville Syrjälä
<ville.syrjala@linux.intel.com> wrote:
>
> On Fri, Oct 09, 2020 at 12:01:39PM +0200, Daniel Vetter wrote:
> > On Fri, Oct 9, 2020 at 11:47 AM Ville Syrjälä
> > <ville.syrjala@linux.intel.com> wrote:
> > >
> > > On Fri, Oct 09, 2020 at 09:59:34AM +0200, Daniel Vetter wrote:
> > > > When trying to test my CONFIG_IO_STRICT_DEVMEM changes I realized they
> > > > do nothing for i915. Because i915 doesn't request any regions, like
> > > > pretty much all drm pci drivers. I guess this is some very old
> > > > remnants from the userspace modesetting days, when we wanted to
> > > > co-exist with the fbdev driver. Which usually requested these
> > > > resources.
> > > >
> > > > But makes me wonder why the pci subsystem doesn't just request
> > > > resource automatically when we map a bar and a pci driver is bound?
> > > >
> > > > Knowledge about which pci bars we need kludged together from
> > > > intel_uncore.c and intel_gtt.c from i915 and intel-gtt.c over in the
> > > > fake agp driver.
> > > >
> > > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > > > Cc: Jason Gunthorpe <jgg@ziepe.ca>
> > > > Cc: Kees Cook <keescook@chromium.org>
> > > > Cc: Dan Williams <dan.j.williams@intel.com>
> > > > Cc: Andrew Morton <akpm@linux-foundation.org>
> > > > Cc: John Hubbard <jhubbard@nvidia.com>
> > > > Cc: Jérôme Glisse <jglisse@redhat.com>
> > > > Cc: Jan Kara <jack@suse.cz>
> > > > Cc: Dan Williams <dan.j.williams@intel.com>
> > > > Cc: linux-mm@kvack.org
> > > > Cc: linux-arm-kernel@lists.infradead.org
> > > > Cc: linux-samsung-soc@vger.kernel.org
> > > > Cc: linux-media@vger.kernel.org
> > > > Cc: Bjorn Helgaas <bhelgaas@google.com>
> > > > Cc: linux-pci@vger.kernel.org
> > > > ---
> > > >  drivers/gpu/drm/i915/intel_uncore.c | 25 +++++++++++++++++++++++--
> > > >  1 file changed, 23 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
> > > > index 54e201fdeba4..ce39049d8919 100644
> > > > --- a/drivers/gpu/drm/i915/intel_uncore.c
> > > > +++ b/drivers/gpu/drm/i915/intel_uncore.c
> > > > @@ -1692,10 +1692,13 @@ static int uncore_mmio_setup(struct intel_uncore *uncore)
> > > >       struct pci_dev *pdev = i915->drm.pdev;
> > > >       int mmio_bar;
> > > >       int mmio_size;
> > > > +     int bar_selection;
> > >
> > > Signed bitmasks always make me uneasy. But looks like
> > > that's what it is in the pci api. So meh.
> >
> > Yeah it's surprising.
> >
> > > > +     int ret;
> > > >
> > > >       mmio_bar = IS_GEN(i915, 2) ? 1 : 0;
> > > > +     bar_selection = BIT (2) | BIT(mmio_bar);
> > >                            ^
> > > spurious space
> > >
> > > That's also not correct for gen2 I think.
> > >
> > > gen2:
> > > 0 = GMADR
> > > 1 = MMADR
> > > 2 = IOBAR
> > >
> > > gen3:
> > > 0 = MMADR
> > > 1 = IOBAR
> > > 2 = GMADR
> > > 3 = GTTADR
> > >
> > > gen4+:
> > > 0+1 = GTTMMADR
> > > 2+3 = GMADR
> > > 4 = IOBAR
> > >
> > > Maybe we should just have an explicit list of bars like that in a
> > > comment?
> > >
> > > I'd also suggest sucking this bitmask calculation into a small helper
> > > so you can reuse it for the release.
> >
> > tbh I just hacked this up for testing. Given how almost no other drm
> > driver does this, I'm wondering whether we should or not.
> >
> > Also the only reason why I didn't just use the pci_request_regions
> > helper is to avoid the vga ioport range, since that's managed by
> > vgaarbiter.
>
> VGA io range isn't part of any bar. Or do you mean just the io decode
> enable bit in the pci command register? That should be just a matter
> or pci_enable_device() vs. pci_enable_device_mem() I think. So nothing
> to do with which bars we've requested IIRC.
>
> >
> > So I think if we go for this for real we should:
> > - register the vga ioport range in the vgaarbiter
> > - have a pci_request_iomem_regions helper that grabs all mem bars
> > - roll that out to all drm pci drivers
> >
> > Or something like that. The other complication is when we resize the
> > iobar. So not really sure what to do here.
>
> We resize it?

By default I thought firmware puts everything (well, squeezes) into
the lower 32bit. Maybe they stopped doing that. So when we want the
full bar (for discrete at least) we need to resize it and put it
somewhere in the 64bit range above end of system memory.

So I guess correct phrasing is "we will resize it" :-)
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 17/17] drm/i915: Properly request PCI BARs
@ 2020-10-09 14:18           ` Daniel Vetter
  0 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-09 14:18 UTC (permalink / raw)
  To: Ville Syrjälä
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, KVM list,
	Linux MM, Linux PCI, LKML, DRI Development, Jason Gunthorpe,
	Jérôme Glisse, John Hubbard, Bjorn Helgaas,
	Daniel Vetter, Dan Williams, Andrew Morton, Linux ARM,
	open list:DMA BUFFER SHARING FRAMEWORK

On Fri, Oct 9, 2020 at 12:42 PM Ville Syrjälä
<ville.syrjala@linux.intel.com> wrote:
>
> On Fri, Oct 09, 2020 at 12:01:39PM +0200, Daniel Vetter wrote:
> > On Fri, Oct 9, 2020 at 11:47 AM Ville Syrjälä
> > <ville.syrjala@linux.intel.com> wrote:
> > >
> > > On Fri, Oct 09, 2020 at 09:59:34AM +0200, Daniel Vetter wrote:
> > > > When trying to test my CONFIG_IO_STRICT_DEVMEM changes I realized they
> > > > do nothing for i915. Because i915 doesn't request any regions, like
> > > > pretty much all drm pci drivers. I guess this is some very old
> > > > remnants from the userspace modesetting days, when we wanted to
> > > > co-exist with the fbdev driver. Which usually requested these
> > > > resources.
> > > >
> > > > But makes me wonder why the pci subsystem doesn't just request
> > > > resource automatically when we map a bar and a pci driver is bound?
> > > >
> > > > Knowledge about which pci bars we need kludged together from
> > > > intel_uncore.c and intel_gtt.c from i915 and intel-gtt.c over in the
> > > > fake agp driver.
> > > >
> > > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > > > Cc: Jason Gunthorpe <jgg@ziepe.ca>
> > > > Cc: Kees Cook <keescook@chromium.org>
> > > > Cc: Dan Williams <dan.j.williams@intel.com>
> > > > Cc: Andrew Morton <akpm@linux-foundation.org>
> > > > Cc: John Hubbard <jhubbard@nvidia.com>
> > > > Cc: Jérôme Glisse <jglisse@redhat.com>
> > > > Cc: Jan Kara <jack@suse.cz>
> > > > Cc: Dan Williams <dan.j.williams@intel.com>
> > > > Cc: linux-mm@kvack.org
> > > > Cc: linux-arm-kernel@lists.infradead.org
> > > > Cc: linux-samsung-soc@vger.kernel.org
> > > > Cc: linux-media@vger.kernel.org
> > > > Cc: Bjorn Helgaas <bhelgaas@google.com>
> > > > Cc: linux-pci@vger.kernel.org
> > > > ---
> > > >  drivers/gpu/drm/i915/intel_uncore.c | 25 +++++++++++++++++++++++--
> > > >  1 file changed, 23 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
> > > > index 54e201fdeba4..ce39049d8919 100644
> > > > --- a/drivers/gpu/drm/i915/intel_uncore.c
> > > > +++ b/drivers/gpu/drm/i915/intel_uncore.c
> > > > @@ -1692,10 +1692,13 @@ static int uncore_mmio_setup(struct intel_uncore *uncore)
> > > >       struct pci_dev *pdev = i915->drm.pdev;
> > > >       int mmio_bar;
> > > >       int mmio_size;
> > > > +     int bar_selection;
> > >
> > > Signed bitmasks always make me uneasy. But looks like
> > > that's what it is in the pci api. So meh.
> >
> > Yeah it's surprising.
> >
> > > > +     int ret;
> > > >
> > > >       mmio_bar = IS_GEN(i915, 2) ? 1 : 0;
> > > > +     bar_selection = BIT (2) | BIT(mmio_bar);
> > >                            ^
> > > spurious space
> > >
> > > That's also not correct for gen2 I think.
> > >
> > > gen2:
> > > 0 = GMADR
> > > 1 = MMADR
> > > 2 = IOBAR
> > >
> > > gen3:
> > > 0 = MMADR
> > > 1 = IOBAR
> > > 2 = GMADR
> > > 3 = GTTADR
> > >
> > > gen4+:
> > > 0+1 = GTTMMADR
> > > 2+3 = GMADR
> > > 4 = IOBAR
> > >
> > > Maybe we should just have an explicit list of bars like that in a
> > > comment?
> > >
> > > I'd also suggest sucking this bitmask calculation into a small helper
> > > so you can reuse it for the release.
> >
> > tbh I just hacked this up for testing. Given how almost no other drm
> > driver does this, I'm wondering whether we should or not.
> >
> > Also the only reason why I didn't just use the pci_request_regions
> > helper is to avoid the vga ioport range, since that's managed by
> > vgaarbiter.
>
> VGA io range isn't part of any bar. Or do you mean just the io decode
> enable bit in the pci command register? That should be just a matter
> or pci_enable_device() vs. pci_enable_device_mem() I think. So nothing
> to do with which bars we've requested IIRC.
>
> >
> > So I think if we go for this for real we should:
> > - register the vga ioport range in the vgaarbiter
> > - have a pci_request_iomem_regions helper that grabs all mem bars
> > - roll that out to all drm pci drivers
> >
> > Or something like that. The other complication is when we resize the
> > iobar. So not really sure what to do here.
>
> We resize it?

By default I thought firmware puts everything (well, squeezes) into
the lower 32bit. Maybe they stopped doing that. So when we want the
full bar (for discrete at least) we need to resize it and put it
somewhere in the 64bit range above end of system memory.

So I guess correct phrasing is "we will resize it" :-)
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 17/17] drm/i915: Properly request PCI BARs
@ 2020-10-09 14:18           ` Daniel Vetter
  0 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-09 14:18 UTC (permalink / raw)
  To: Ville Syrjälä
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, KVM list,
	Linux MM, Linux PCI, LKML, DRI Development, Jason Gunthorpe,
	Jérôme Glisse, John Hubbard, Bjorn Helgaas,
	Daniel Vetter, Dan Williams, Andrew Morton, Linux ARM,
	open list:DMA BUFFER SHARING FRAMEWORK

On Fri, Oct 9, 2020 at 12:42 PM Ville Syrjälä
<ville.syrjala@linux.intel.com> wrote:
>
> On Fri, Oct 09, 2020 at 12:01:39PM +0200, Daniel Vetter wrote:
> > On Fri, Oct 9, 2020 at 11:47 AM Ville Syrjälä
> > <ville.syrjala@linux.intel.com> wrote:
> > >
> > > On Fri, Oct 09, 2020 at 09:59:34AM +0200, Daniel Vetter wrote:
> > > > When trying to test my CONFIG_IO_STRICT_DEVMEM changes I realized they
> > > > do nothing for i915. Because i915 doesn't request any regions, like
> > > > pretty much all drm pci drivers. I guess this is some very old
> > > > remnants from the userspace modesetting days, when we wanted to
> > > > co-exist with the fbdev driver. Which usually requested these
> > > > resources.
> > > >
> > > > But makes me wonder why the pci subsystem doesn't just request
> > > > resource automatically when we map a bar and a pci driver is bound?
> > > >
> > > > Knowledge about which pci bars we need kludged together from
> > > > intel_uncore.c and intel_gtt.c from i915 and intel-gtt.c over in the
> > > > fake agp driver.
> > > >
> > > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > > > Cc: Jason Gunthorpe <jgg@ziepe.ca>
> > > > Cc: Kees Cook <keescook@chromium.org>
> > > > Cc: Dan Williams <dan.j.williams@intel.com>
> > > > Cc: Andrew Morton <akpm@linux-foundation.org>
> > > > Cc: John Hubbard <jhubbard@nvidia.com>
> > > > Cc: Jérôme Glisse <jglisse@redhat.com>
> > > > Cc: Jan Kara <jack@suse.cz>
> > > > Cc: Dan Williams <dan.j.williams@intel.com>
> > > > Cc: linux-mm@kvack.org
> > > > Cc: linux-arm-kernel@lists.infradead.org
> > > > Cc: linux-samsung-soc@vger.kernel.org
> > > > Cc: linux-media@vger.kernel.org
> > > > Cc: Bjorn Helgaas <bhelgaas@google.com>
> > > > Cc: linux-pci@vger.kernel.org
> > > > ---
> > > >  drivers/gpu/drm/i915/intel_uncore.c | 25 +++++++++++++++++++++++--
> > > >  1 file changed, 23 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
> > > > index 54e201fdeba4..ce39049d8919 100644
> > > > --- a/drivers/gpu/drm/i915/intel_uncore.c
> > > > +++ b/drivers/gpu/drm/i915/intel_uncore.c
> > > > @@ -1692,10 +1692,13 @@ static int uncore_mmio_setup(struct intel_uncore *uncore)
> > > >       struct pci_dev *pdev = i915->drm.pdev;
> > > >       int mmio_bar;
> > > >       int mmio_size;
> > > > +     int bar_selection;
> > >
> > > Signed bitmasks always make me uneasy. But looks like
> > > that's what it is in the pci api. So meh.
> >
> > Yeah it's surprising.
> >
> > > > +     int ret;
> > > >
> > > >       mmio_bar = IS_GEN(i915, 2) ? 1 : 0;
> > > > +     bar_selection = BIT (2) | BIT(mmio_bar);
> > >                            ^
> > > spurious space
> > >
> > > That's also not correct for gen2 I think.
> > >
> > > gen2:
> > > 0 = GMADR
> > > 1 = MMADR
> > > 2 = IOBAR
> > >
> > > gen3:
> > > 0 = MMADR
> > > 1 = IOBAR
> > > 2 = GMADR
> > > 3 = GTTADR
> > >
> > > gen4+:
> > > 0+1 = GTTMMADR
> > > 2+3 = GMADR
> > > 4 = IOBAR
> > >
> > > Maybe we should just have an explicit list of bars like that in a
> > > comment?
> > >
> > > I'd also suggest sucking this bitmask calculation into a small helper
> > > so you can reuse it for the release.
> >
> > tbh I just hacked this up for testing. Given how almost no other drm
> > driver does this, I'm wondering whether we should or not.
> >
> > Also the only reason why I didn't just use the pci_request_regions
> > helper is to avoid the vga ioport range, since that's managed by
> > vgaarbiter.
>
> VGA io range isn't part of any bar. Or do you mean just the io decode
> enable bit in the pci command register? That should be just a matter
> or pci_enable_device() vs. pci_enable_device_mem() I think. So nothing
> to do with which bars we've requested IIRC.
>
> >
> > So I think if we go for this for real we should:
> > - register the vga ioport range in the vgaarbiter
> > - have a pci_request_iomem_regions helper that grabs all mem bars
> > - roll that out to all drm pci drivers
> >
> > Or something like that. The other complication is when we resize the
> > iobar. So not really sure what to do here.
>
> We resize it?

By default I thought firmware puts everything (well, squeezes) into
the lower 32bit. Maybe they stopped doing that. So when we want the
full bar (for discrete at least) we need to resize it and put it
somewhere in the 64bit range above end of system memory.

So I guess correct phrasing is "we will resize it" :-)
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 14/17] resource: Move devmem revoke code to resource framework
  2020-10-09 12:31     ` Jason Gunthorpe
  (?)
@ 2020-10-09 14:24       ` Daniel Vetter
  -1 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-09 14:24 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: DRI Development, LKML, KVM list, Linux MM, Linux ARM,
	linux-samsung-soc, open list:DMA BUFFER SHARING FRAMEWORK,
	linux-s390, Daniel Vetter, Kees Cook, Dan Williams,
	Andrew Morton, John Hubbard, Jérôme Glisse, Jan Kara,
	Arnd Bergmann, Greg Kroah-Hartman, David Hildenbrand,
	Rafael J. Wysocki

On Fri, Oct 9, 2020 at 2:31 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Fri, Oct 09, 2020 at 09:59:31AM +0200, Daniel Vetter wrote:
>
> > +struct address_space *iomem_get_mapping(void)
> > +{
> > +     return iomem_inode->i_mapping;
>
> This should pair an acquire with the release below
>
> > +     /*
> > +      * Publish /dev/mem initialized.
> > +      * Pairs with smp_load_acquire() in revoke_iomem().
> > +      */
> > +     smp_store_release(&iomem_inode, inode);
>
> However, this seems abnormal, initcalls rarely do this kind of stuff
> with global data..
>
> The kernel crashes if this fs_initcall is raced with
> iomem_get_mapping() due to the unconditional dereference, so I think
> it can be safely switched to a simple assignment.

Ah yes I checked this all, but forgot to correctly annotate the
iomem_get_mapping access. For reference, see b34e7e298d7a ("/dev/mem:
Add missing memory barriers for devmem_inode").

The reasons for the annotations is that iomem requests can happen
fairly early, way before fs_initcalls happen. That means revoke_iomem
needs to check for that and bail out if we race - nothing bad can
happen since userspace isn't running at this point anyway. And
apparently it needs to be a full acquire fence since we don't just
write a value, but need a barrier for the struct stuff.

Now iomem_get_mapping otoh can only be called after userspace is up &
running, so way after all the fs_initcalls are guaranteed to have
fininshed. Hence we don't really need anything there. But I expect the
kernel race checker thing to complain, plus that then gives me a good
spot to explain why we can't race and don't have to check for a NULL
iomem_inode.

I'll add that in v3.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 14/17] resource: Move devmem revoke code to resource framework
@ 2020-10-09 14:24       ` Daniel Vetter
  0 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-09 14:24 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, KVM list,
	David Hildenbrand, John Hubbard, Rafael J. Wysocki, LKML,
	DRI Development, Linux MM, Jérôme Glisse,
	Arnd Bergmann, Greg Kroah-Hartman, Daniel Vetter, Dan Williams,
	Andrew Morton, Linux ARM, open list:DMA BUFFER SHARING FRAMEWORK

On Fri, Oct 9, 2020 at 2:31 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Fri, Oct 09, 2020 at 09:59:31AM +0200, Daniel Vetter wrote:
>
> > +struct address_space *iomem_get_mapping(void)
> > +{
> > +     return iomem_inode->i_mapping;
>
> This should pair an acquire with the release below
>
> > +     /*
> > +      * Publish /dev/mem initialized.
> > +      * Pairs with smp_load_acquire() in revoke_iomem().
> > +      */
> > +     smp_store_release(&iomem_inode, inode);
>
> However, this seems abnormal, initcalls rarely do this kind of stuff
> with global data..
>
> The kernel crashes if this fs_initcall is raced with
> iomem_get_mapping() due to the unconditional dereference, so I think
> it can be safely switched to a simple assignment.

Ah yes I checked this all, but forgot to correctly annotate the
iomem_get_mapping access. For reference, see b34e7e298d7a ("/dev/mem:
Add missing memory barriers for devmem_inode").

The reasons for the annotations is that iomem requests can happen
fairly early, way before fs_initcalls happen. That means revoke_iomem
needs to check for that and bail out if we race - nothing bad can
happen since userspace isn't running at this point anyway. And
apparently it needs to be a full acquire fence since we don't just
write a value, but need a barrier for the struct stuff.

Now iomem_get_mapping otoh can only be called after userspace is up &
running, so way after all the fs_initcalls are guaranteed to have
fininshed. Hence we don't really need anything there. But I expect the
kernel race checker thing to complain, plus that then gives me a good
spot to explain why we can't race and don't have to check for a NULL
iomem_inode.

I'll add that in v3.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 14/17] resource: Move devmem revoke code to resource framework
@ 2020-10-09 14:24       ` Daniel Vetter
  0 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-09 14:24 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, KVM list,
	David Hildenbrand, John Hubbard, Rafael J. Wysocki, LKML,
	DRI Development, Linux MM, Jérôme Glisse,
	Arnd Bergmann, Greg Kroah-Hartman, Daniel Vetter, Dan Williams,
	Andrew Morton, Linux ARM, open list:DMA BUFFER SHARING FRAMEWORK

On Fri, Oct 9, 2020 at 2:31 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Fri, Oct 09, 2020 at 09:59:31AM +0200, Daniel Vetter wrote:
>
> > +struct address_space *iomem_get_mapping(void)
> > +{
> > +     return iomem_inode->i_mapping;
>
> This should pair an acquire with the release below
>
> > +     /*
> > +      * Publish /dev/mem initialized.
> > +      * Pairs with smp_load_acquire() in revoke_iomem().
> > +      */
> > +     smp_store_release(&iomem_inode, inode);
>
> However, this seems abnormal, initcalls rarely do this kind of stuff
> with global data..
>
> The kernel crashes if this fs_initcall is raced with
> iomem_get_mapping() due to the unconditional dereference, so I think
> it can be safely switched to a simple assignment.

Ah yes I checked this all, but forgot to correctly annotate the
iomem_get_mapping access. For reference, see b34e7e298d7a ("/dev/mem:
Add missing memory barriers for devmem_inode").

The reasons for the annotations is that iomem requests can happen
fairly early, way before fs_initcalls happen. That means revoke_iomem
needs to check for that and bail out if we race - nothing bad can
happen since userspace isn't running at this point anyway. And
apparently it needs to be a full acquire fence since we don't just
write a value, but need a barrier for the struct stuff.

Now iomem_get_mapping otoh can only be called after userspace is up &
running, so way after all the fs_initcalls are guaranteed to have
fininshed. Hence we don't really need anything there. But I expect the
kernel race checker thing to complain, plus that then gives me a good
spot to explain why we can't race and don't have to check for a NULL
iomem_inode.

I'll add that in v3.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 14/17] resource: Move devmem revoke code to resource framework
  2020-10-09 14:24       ` Daniel Vetter
  (?)
@ 2020-10-09 14:32         ` Jason Gunthorpe
  -1 siblings, 0 replies; 214+ messages in thread
From: Jason Gunthorpe @ 2020-10-09 14:32 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: DRI Development, LKML, KVM list, Linux MM, Linux ARM,
	linux-samsung-soc, open list:DMA BUFFER SHARING FRAMEWORK,
	linux-s390, Daniel Vetter, Kees Cook, Dan Williams,
	Andrew Morton, John Hubbard, Jérôme Glisse, Jan Kara,
	Arnd Bergmann, Greg Kroah-Hartman, David Hildenbrand,
	Rafael J. Wysocki

On Fri, Oct 09, 2020 at 04:24:45PM +0200, Daniel Vetter wrote:
> On Fri, Oct 9, 2020 at 2:31 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Fri, Oct 09, 2020 at 09:59:31AM +0200, Daniel Vetter wrote:
> >
> > > +struct address_space *iomem_get_mapping(void)
> > > +{
> > > +     return iomem_inode->i_mapping;
> >
> > This should pair an acquire with the release below
> >
> > > +     /*
> > > +      * Publish /dev/mem initialized.
> > > +      * Pairs with smp_load_acquire() in revoke_iomem().
> > > +      */
> > > +     smp_store_release(&iomem_inode, inode);
> >
> > However, this seems abnormal, initcalls rarely do this kind of stuff
> > with global data..
> >
> > The kernel crashes if this fs_initcall is raced with
> > iomem_get_mapping() due to the unconditional dereference, so I think
> > it can be safely switched to a simple assignment.
> 
> Ah yes I checked this all, but forgot to correctly annotate the
> iomem_get_mapping access. For reference, see b34e7e298d7a ("/dev/mem:
> Add missing memory barriers for devmem_inode").

Oh yikes, so revoke_iomem can run concurrently during early boot,
tricky.

> The reasons for the annotations is that iomem requests can happen
> fairly early, way before fs_initcalls happen. That means revoke_iomem
> needs to check for that and bail out if we race - nothing bad can
> happen since userspace isn't running at this point anyway. And
> apparently it needs to be a full acquire fence since we don't just
> write a value, but need a barrier for the struct stuff.

Yes, if that is what is happening it release/acquire is needed.

Jason

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 14/17] resource: Move devmem revoke code to resource framework
@ 2020-10-09 14:32         ` Jason Gunthorpe
  0 siblings, 0 replies; 214+ messages in thread
From: Jason Gunthorpe @ 2020-10-09 14:32 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, KVM list,
	David Hildenbrand, John Hubbard, Rafael J. Wysocki, LKML,
	DRI Development, Linux MM, Jérôme Glisse,
	Arnd Bergmann, Greg Kroah-Hartman, Daniel Vetter, Dan Williams,
	Andrew Morton, Linux ARM, open list:DMA BUFFER SHARING FRAMEWORK

On Fri, Oct 09, 2020 at 04:24:45PM +0200, Daniel Vetter wrote:
> On Fri, Oct 9, 2020 at 2:31 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Fri, Oct 09, 2020 at 09:59:31AM +0200, Daniel Vetter wrote:
> >
> > > +struct address_space *iomem_get_mapping(void)
> > > +{
> > > +     return iomem_inode->i_mapping;
> >
> > This should pair an acquire with the release below
> >
> > > +     /*
> > > +      * Publish /dev/mem initialized.
> > > +      * Pairs with smp_load_acquire() in revoke_iomem().
> > > +      */
> > > +     smp_store_release(&iomem_inode, inode);
> >
> > However, this seems abnormal, initcalls rarely do this kind of stuff
> > with global data..
> >
> > The kernel crashes if this fs_initcall is raced with
> > iomem_get_mapping() due to the unconditional dereference, so I think
> > it can be safely switched to a simple assignment.
> 
> Ah yes I checked this all, but forgot to correctly annotate the
> iomem_get_mapping access. For reference, see b34e7e298d7a ("/dev/mem:
> Add missing memory barriers for devmem_inode").

Oh yikes, so revoke_iomem can run concurrently during early boot,
tricky.

> The reasons for the annotations is that iomem requests can happen
> fairly early, way before fs_initcalls happen. That means revoke_iomem
> needs to check for that and bail out if we race - nothing bad can
> happen since userspace isn't running at this point anyway. And
> apparently it needs to be a full acquire fence since we don't just
> write a value, but need a barrier for the struct stuff.

Yes, if that is what is happening it release/acquire is needed.

Jason

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 14/17] resource: Move devmem revoke code to resource framework
@ 2020-10-09 14:32         ` Jason Gunthorpe
  0 siblings, 0 replies; 214+ messages in thread
From: Jason Gunthorpe @ 2020-10-09 14:32 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, KVM list,
	David Hildenbrand, John Hubbard, Rafael J. Wysocki, LKML,
	DRI Development, Linux MM, Jérôme Glisse,
	Arnd Bergmann, Greg Kroah-Hartman, Daniel Vetter, Dan Williams,
	Andrew Morton, Linux ARM, open list:DMA BUFFER SHARING FRAMEWORK

On Fri, Oct 09, 2020 at 04:24:45PM +0200, Daniel Vetter wrote:
> On Fri, Oct 9, 2020 at 2:31 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Fri, Oct 09, 2020 at 09:59:31AM +0200, Daniel Vetter wrote:
> >
> > > +struct address_space *iomem_get_mapping(void)
> > > +{
> > > +     return iomem_inode->i_mapping;
> >
> > This should pair an acquire with the release below
> >
> > > +     /*
> > > +      * Publish /dev/mem initialized.
> > > +      * Pairs with smp_load_acquire() in revoke_iomem().
> > > +      */
> > > +     smp_store_release(&iomem_inode, inode);
> >
> > However, this seems abnormal, initcalls rarely do this kind of stuff
> > with global data..
> >
> > The kernel crashes if this fs_initcall is raced with
> > iomem_get_mapping() due to the unconditional dereference, so I think
> > it can be safely switched to a simple assignment.
> 
> Ah yes I checked this all, but forgot to correctly annotate the
> iomem_get_mapping access. For reference, see b34e7e298d7a ("/dev/mem:
> Add missing memory barriers for devmem_inode").

Oh yikes, so revoke_iomem can run concurrently during early boot,
tricky.

> The reasons for the annotations is that iomem requests can happen
> fairly early, way before fs_initcalls happen. That means revoke_iomem
> needs to check for that and bail out if we race - nothing bad can
> happen since userspace isn't running at this point anyway. And
> apparently it needs to be a full acquire fence since we don't just
> write a value, but need a barrier for the struct stuff.

Yes, if that is what is happening it release/acquire is needed.

Jason
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 06/17] media: videobuf2: Move frame_vector into media subsystem
  2020-10-09 10:14     ` Mauro Carvalho Chehab
  (?)
  (?)
@ 2020-10-09 16:57       ` Daniel Vetter
  -1 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-09 16:57 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Daniel Vetter, DRI Development, LKML, kvm, linux-mm,
	linux-arm-kernel, linux-samsung-soc, linux-media, linux-s390,
	John Hubbard, Daniel Vetter, Jason Gunthorpe, Pawel Osciak,
	Marek Szyprowski, Kyungmin Park, Tomasz Figa, Andrew Morton,
	Jérôme Glisse, Jan Kara, Dan Williams

On Fri, Oct 09, 2020 at 12:14:17PM +0200, Mauro Carvalho Chehab wrote:
> Em Fri,  9 Oct 2020 09:59:23 +0200
> Daniel Vetter <daniel.vetter@ffwll.ch> escreveu:
> 
> > It's the only user. This also garbage collects the CONFIG_FRAME_VECTOR
> > symbol from all over the tree (well just one place, somehow omap media
> > driver still had this in its Kconfig, despite not using it).
> > 
> > Reviewed-by: John Hubbard <jhubbard@nvidia.com>
> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > Cc: Jason Gunthorpe <jgg@ziepe.ca>
> > Cc: Pawel Osciak <pawel@osciak.com>
> > Cc: Marek Szyprowski <m.szyprowski@samsung.com>
> > Cc: Kyungmin Park <kyungmin.park@samsung.com>
> > Cc: Tomasz Figa <tfiga@chromium.org>
> > Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: John Hubbard <jhubbard@nvidia.com>
> > Cc: Jérôme Glisse <jglisse@redhat.com>
> > Cc: Jan Kara <jack@suse.cz>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: linux-mm@kvack.org
> > Cc: linux-arm-kernel@lists.infradead.org
> > Cc: linux-samsung-soc@vger.kernel.org
> > Cc: linux-media@vger.kernel.org
> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > ---
> >  drivers/media/common/videobuf2/Kconfig        |  1 -
> >  drivers/media/common/videobuf2/Makefile       |  1 +
> >  .../media/common/videobuf2}/frame_vector.c    |  2 +
> >  drivers/media/platform/omap/Kconfig           |  1 -
> >  include/linux/mm.h                            | 42 -------------------
> >  include/media/videobuf2-core.h                | 42 +++++++++++++++++++
> >  mm/Kconfig                                    |  3 --
> >  mm/Makefile                                   |  1 -
> >  8 files changed, 45 insertions(+), 48 deletions(-)
> >  rename {mm => drivers/media/common/videobuf2}/frame_vector.c (99%)
> > 
> > diff --git a/drivers/media/common/videobuf2/Kconfig b/drivers/media/common/videobuf2/Kconfig
> > index edbc99ebba87..d2223a12c95f 100644
> > --- a/drivers/media/common/videobuf2/Kconfig
> > +++ b/drivers/media/common/videobuf2/Kconfig
> > @@ -9,7 +9,6 @@ config VIDEOBUF2_V4L2
> >  
> >  config VIDEOBUF2_MEMOPS
> >  	tristate
> > -	select FRAME_VECTOR
> >  
> >  config VIDEOBUF2_DMA_CONTIG
> >  	tristate
> > diff --git a/drivers/media/common/videobuf2/Makefile b/drivers/media/common/videobuf2/Makefile
> > index 77bebe8b202f..54306f8d096c 100644
> > --- a/drivers/media/common/videobuf2/Makefile
> > +++ b/drivers/media/common/videobuf2/Makefile
> > @@ -1,5 +1,6 @@
> >  # SPDX-License-Identifier: GPL-2.0
> >  videobuf2-common-objs := videobuf2-core.o
> > +videobuf2-common-objs += frame_vector.o
> >  
> >  ifeq ($(CONFIG_TRACEPOINTS),y)
> >    videobuf2-common-objs += vb2-trace.o
> > diff --git a/mm/frame_vector.c b/drivers/media/common/videobuf2/frame_vector.c
> > similarity index 99%
> > rename from mm/frame_vector.c
> > rename to drivers/media/common/videobuf2/frame_vector.c
> > index d44779e56313..2b0b97761d15 100644
> > --- a/mm/frame_vector.c
> > +++ b/drivers/media/common/videobuf2/frame_vector.c
> > @@ -8,6 +8,8 @@
> >  #include <linux/pagemap.h>
> >  #include <linux/sched.h>
> >  
> > +#include <media/videobuf2-core.h>
> > +
> 
> See my comment below...
> 
> >  /**
> >   * get_vaddr_frames() - map virtual addresses to pfns
> >   * @start:	starting user address
> > diff --git a/drivers/media/platform/omap/Kconfig b/drivers/media/platform/omap/Kconfig
> > index f73b5893220d..de16de46c0f4 100644
> > --- a/drivers/media/platform/omap/Kconfig
> > +++ b/drivers/media/platform/omap/Kconfig
> > @@ -12,6 +12,5 @@ config VIDEO_OMAP2_VOUT
> >  	depends on VIDEO_V4L2
> >  	select VIDEOBUF2_DMA_CONTIG
> >  	select OMAP2_VRFB if ARCH_OMAP2 || ARCH_OMAP3
> > -	select FRAME_VECTOR
> >  	help
> >  	  V4L2 Display driver support for OMAP2/3 based boards.
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 16b799a0522c..acd60fbf1a5a 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -1743,48 +1743,6 @@ int account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc);
> >  int __account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc,
> >  			struct task_struct *task, bool bypass_rlim);
> >  
> > -/* Container for pinned pfns / pages */
> > -struct frame_vector {
> > -	unsigned int nr_allocated;	/* Number of frames we have space for */
> > -	unsigned int nr_frames;	/* Number of frames stored in ptrs array */
> > -	bool got_ref;		/* Did we pin pages by getting page ref? */
> > -	bool is_pfns;		/* Does array contain pages or pfns? */
> > -	void *ptrs[];		/* Array of pinned pfns / pages. Use
> > -				 * pfns_vector_pages() or pfns_vector_pfns()
> > -				 * for access */
> > -};
> > -
> > -struct frame_vector *frame_vector_create(unsigned int nr_frames);
> > -void frame_vector_destroy(struct frame_vector *vec);
> > -int get_vaddr_frames(unsigned long start, unsigned int nr_pfns,
> > -		     unsigned int gup_flags, struct frame_vector *vec);
> > -void put_vaddr_frames(struct frame_vector *vec);
> > -int frame_vector_to_pages(struct frame_vector *vec);
> > -void frame_vector_to_pfns(struct frame_vector *vec);
> > -
> > -static inline unsigned int frame_vector_count(struct frame_vector *vec)
> > -{
> > -	return vec->nr_frames;
> > -}
> > -
> > -static inline struct page **frame_vector_pages(struct frame_vector *vec)
> > -{
> > -	if (vec->is_pfns) {
> > -		int err = frame_vector_to_pages(vec);
> > -
> > -		if (err)
> > -			return ERR_PTR(err);
> > -	}
> > -	return (struct page **)(vec->ptrs);
> > -}
> > -
> > -static inline unsigned long *frame_vector_pfns(struct frame_vector *vec)
> > -{
> > -	if (!vec->is_pfns)
> > -		frame_vector_to_pfns(vec);
> > -	return (unsigned long *)(vec->ptrs);
> > -}
> > -
> >  struct kvec;
> >  int get_kernel_pages(const struct kvec *iov, int nr_pages, int write,
> >  			struct page **pages);
> > diff --git a/include/media/videobuf2-core.h b/include/media/videobuf2-core.h
> > index bbb3f26fbde9..a2e75ca0334f 100644
> > --- a/include/media/videobuf2-core.h
> > +++ b/include/media/videobuf2-core.h
> > @@ -1254,4 +1254,46 @@ bool vb2_request_object_is_buffer(struct media_request_object *obj);
> >   */
> >  unsigned int vb2_request_buffer_cnt(struct media_request *req);
> >  
> > +/* Container for pinned pfns / pages in frame_vector.c */
> > +struct frame_vector {
> > +	unsigned int nr_allocated;	/* Number of frames we have space for */
> > +	unsigned int nr_frames;	/* Number of frames stored in ptrs array */
> > +	bool got_ref;		/* Did we pin pages by getting page ref? */
> > +	bool is_pfns;		/* Does array contain pages or pfns? */
> > +	void *ptrs[];		/* Array of pinned pfns / pages. Use
> > +				 * pfns_vector_pages() or pfns_vector_pfns()
> > +				 * for access */
> > +};
> > +
> > +struct frame_vector *frame_vector_create(unsigned int nr_frames);
> > +void frame_vector_destroy(struct frame_vector *vec);
> > +int get_vaddr_frames(unsigned long start, unsigned int nr_pfns,
> > +		     unsigned int gup_flags, struct frame_vector *vec);
> > +void put_vaddr_frames(struct frame_vector *vec);
> > +int frame_vector_to_pages(struct frame_vector *vec);
> > +void frame_vector_to_pfns(struct frame_vector *vec);
> > +
> > +static inline unsigned int frame_vector_count(struct frame_vector *vec)
> > +{
> > +	return vec->nr_frames;
> > +}
> > +
> > +static inline struct page **frame_vector_pages(struct frame_vector *vec)
> > +{
> > +	if (vec->is_pfns) {
> > +		int err = frame_vector_to_pages(vec);
> > +
> > +		if (err)
> > +			return ERR_PTR(err);
> > +	}
> > +	return (struct page **)(vec->ptrs);
> > +}
> > +
> > +static inline unsigned long *frame_vector_pfns(struct frame_vector *vec)
> > +{
> > +	if (!vec->is_pfns)
> > +		frame_vector_to_pfns(vec);
> > +	return (unsigned long *)(vec->ptrs);
> > +}
> > +
> >  #endif /* _MEDIA_VIDEOBUF2_CORE_H */
> 
> Please place those into a include/media/frame_vector.h file, instead of
> merging it directly at vb2 core header.
> 
> Then include the new header at videobuf2-core.h and at frame_vector.c.

Makes sense, I'll do that for v3.

> With such changes:
> 
> Acked-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

Thanks for taking a look.
-Daniel

> 
> > diff --git a/mm/Kconfig b/mm/Kconfig
> > index 6c974888f86f..da6c943fe9f1 100644
> > --- a/mm/Kconfig
> > +++ b/mm/Kconfig
> > @@ -815,9 +815,6 @@ config DEVICE_PRIVATE
> >  	  memory; i.e., memory that is only accessible from the device (or
> >  	  group of devices). You likely also want to select HMM_MIRROR.
> >  
> > -config FRAME_VECTOR
> > -	bool
> > -
> >  config ARCH_USES_HIGH_VMA_FLAGS
> >  	bool
> >  config ARCH_HAS_PKEYS
> > diff --git a/mm/Makefile b/mm/Makefile
> > index d5649f1c12c0..a025fd6c6afd 100644
> > --- a/mm/Makefile
> > +++ b/mm/Makefile
> > @@ -111,7 +111,6 @@ obj-$(CONFIG_PAGE_EXTENSION) += page_ext.o
> >  obj-$(CONFIG_CMA_DEBUGFS) += cma_debug.o
> >  obj-$(CONFIG_USERFAULTFD) += userfaultfd.o
> >  obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o
> > -obj-$(CONFIG_FRAME_VECTOR) += frame_vector.o
> >  obj-$(CONFIG_DEBUG_PAGE_REF) += debug_page_ref.o
> >  obj-$(CONFIG_HARDENED_USERCOPY) += usercopy.o
> >  obj-$(CONFIG_PERCPU_STATS) += percpu-stats.o
> 
> 
> 
> Thanks,
> Mauro

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 06/17] media: videobuf2: Move frame_vector into media subsystem
@ 2020-10-09 16:57       ` Daniel Vetter
  0 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-09 16:57 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Daniel Vetter, DRI Development, LKML, kvm, linux-mm,
	linux-arm-kernel, linux-samsung-soc, linux-media, linux-s390,
	John Hubbard, Daniel Vetter, Jason Gunthorpe, Pawel Osciak,
	Marek Szyprowski, Kyungmin Park, Tomasz Figa, Andrew Morton,
	Jérôme Glisse, Jan Kara, Dan Williams

On Fri, Oct 09, 2020 at 12:14:17PM +0200, Mauro Carvalho Chehab wrote:
> Em Fri,  9 Oct 2020 09:59:23 +0200
> Daniel Vetter <daniel.vetter@ffwll.ch> escreveu:
> 
> > It's the only user. This also garbage collects the CONFIG_FRAME_VECTOR
> > symbol from all over the tree (well just one place, somehow omap media
> > driver still had this in its Kconfig, despite not using it).
> > 
> > Reviewed-by: John Hubbard <jhubbard@nvidia.com>
> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > Cc: Jason Gunthorpe <jgg@ziepe.ca>
> > Cc: Pawel Osciak <pawel@osciak.com>
> > Cc: Marek Szyprowski <m.szyprowski@samsung.com>
> > Cc: Kyungmin Park <kyungmin.park@samsung.com>
> > Cc: Tomasz Figa <tfiga@chromium.org>
> > Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: John Hubbard <jhubbard@nvidia.com>
> > Cc: J�r�me Glisse <jglisse@redhat.com>
> > Cc: Jan Kara <jack@suse.cz>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: linux-mm@kvack.org
> > Cc: linux-arm-kernel@lists.infradead.org
> > Cc: linux-samsung-soc@vger.kernel.org
> > Cc: linux-media@vger.kernel.org
> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > ---
> >  drivers/media/common/videobuf2/Kconfig        |  1 -
> >  drivers/media/common/videobuf2/Makefile       |  1 +
> >  .../media/common/videobuf2}/frame_vector.c    |  2 +
> >  drivers/media/platform/omap/Kconfig           |  1 -
> >  include/linux/mm.h                            | 42 -------------------
> >  include/media/videobuf2-core.h                | 42 +++++++++++++++++++
> >  mm/Kconfig                                    |  3 --
> >  mm/Makefile                                   |  1 -
> >  8 files changed, 45 insertions(+), 48 deletions(-)
> >  rename {mm => drivers/media/common/videobuf2}/frame_vector.c (99%)
> > 
> > diff --git a/drivers/media/common/videobuf2/Kconfig b/drivers/media/common/videobuf2/Kconfig
> > index edbc99ebba87..d2223a12c95f 100644
> > --- a/drivers/media/common/videobuf2/Kconfig
> > +++ b/drivers/media/common/videobuf2/Kconfig
> > @@ -9,7 +9,6 @@ config VIDEOBUF2_V4L2
> >  
> >  config VIDEOBUF2_MEMOPS
> >  	tristate
> > -	select FRAME_VECTOR
> >  
> >  config VIDEOBUF2_DMA_CONTIG
> >  	tristate
> > diff --git a/drivers/media/common/videobuf2/Makefile b/drivers/media/common/videobuf2/Makefile
> > index 77bebe8b202f..54306f8d096c 100644
> > --- a/drivers/media/common/videobuf2/Makefile
> > +++ b/drivers/media/common/videobuf2/Makefile
> > @@ -1,5 +1,6 @@
> >  # SPDX-License-Identifier: GPL-2.0
> >  videobuf2-common-objs := videobuf2-core.o
> > +videobuf2-common-objs += frame_vector.o
> >  
> >  ifeq ($(CONFIG_TRACEPOINTS),y)
> >    videobuf2-common-objs += vb2-trace.o
> > diff --git a/mm/frame_vector.c b/drivers/media/common/videobuf2/frame_vector.c
> > similarity index 99%
> > rename from mm/frame_vector.c
> > rename to drivers/media/common/videobuf2/frame_vector.c
> > index d44779e56313..2b0b97761d15 100644
> > --- a/mm/frame_vector.c
> > +++ b/drivers/media/common/videobuf2/frame_vector.c
> > @@ -8,6 +8,8 @@
> >  #include <linux/pagemap.h>
> >  #include <linux/sched.h>
> >  
> > +#include <media/videobuf2-core.h>
> > +
> 
> See my comment below...
> 
> >  /**
> >   * get_vaddr_frames() - map virtual addresses to pfns
> >   * @start:	starting user address
> > diff --git a/drivers/media/platform/omap/Kconfig b/drivers/media/platform/omap/Kconfig
> > index f73b5893220d..de16de46c0f4 100644
> > --- a/drivers/media/platform/omap/Kconfig
> > +++ b/drivers/media/platform/omap/Kconfig
> > @@ -12,6 +12,5 @@ config VIDEO_OMAP2_VOUT
> >  	depends on VIDEO_V4L2
> >  	select VIDEOBUF2_DMA_CONTIG
> >  	select OMAP2_VRFB if ARCH_OMAP2 || ARCH_OMAP3
> > -	select FRAME_VECTOR
> >  	help
> >  	  V4L2 Display driver support for OMAP2/3 based boards.
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 16b799a0522c..acd60fbf1a5a 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -1743,48 +1743,6 @@ int account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc);
> >  int __account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc,
> >  			struct task_struct *task, bool bypass_rlim);
> >  
> > -/* Container for pinned pfns / pages */
> > -struct frame_vector {
> > -	unsigned int nr_allocated;	/* Number of frames we have space for */
> > -	unsigned int nr_frames;	/* Number of frames stored in ptrs array */
> > -	bool got_ref;		/* Did we pin pages by getting page ref? */
> > -	bool is_pfns;		/* Does array contain pages or pfns? */
> > -	void *ptrs[];		/* Array of pinned pfns / pages. Use
> > -				 * pfns_vector_pages() or pfns_vector_pfns()
> > -				 * for access */
> > -};
> > -
> > -struct frame_vector *frame_vector_create(unsigned int nr_frames);
> > -void frame_vector_destroy(struct frame_vector *vec);
> > -int get_vaddr_frames(unsigned long start, unsigned int nr_pfns,
> > -		     unsigned int gup_flags, struct frame_vector *vec);
> > -void put_vaddr_frames(struct frame_vector *vec);
> > -int frame_vector_to_pages(struct frame_vector *vec);
> > -void frame_vector_to_pfns(struct frame_vector *vec);
> > -
> > -static inline unsigned int frame_vector_count(struct frame_vector *vec)
> > -{
> > -	return vec->nr_frames;
> > -}
> > -
> > -static inline struct page **frame_vector_pages(struct frame_vector *vec)
> > -{
> > -	if (vec->is_pfns) {
> > -		int err = frame_vector_to_pages(vec);
> > -
> > -		if (err)
> > -			return ERR_PTR(err);
> > -	}
> > -	return (struct page **)(vec->ptrs);
> > -}
> > -
> > -static inline unsigned long *frame_vector_pfns(struct frame_vector *vec)
> > -{
> > -	if (!vec->is_pfns)
> > -		frame_vector_to_pfns(vec);
> > -	return (unsigned long *)(vec->ptrs);
> > -}
> > -
> >  struct kvec;
> >  int get_kernel_pages(const struct kvec *iov, int nr_pages, int write,
> >  			struct page **pages);
> > diff --git a/include/media/videobuf2-core.h b/include/media/videobuf2-core.h
> > index bbb3f26fbde9..a2e75ca0334f 100644
> > --- a/include/media/videobuf2-core.h
> > +++ b/include/media/videobuf2-core.h
> > @@ -1254,4 +1254,46 @@ bool vb2_request_object_is_buffer(struct media_request_object *obj);
> >   */
> >  unsigned int vb2_request_buffer_cnt(struct media_request *req);
> >  
> > +/* Container for pinned pfns / pages in frame_vector.c */
> > +struct frame_vector {
> > +	unsigned int nr_allocated;	/* Number of frames we have space for */
> > +	unsigned int nr_frames;	/* Number of frames stored in ptrs array */
> > +	bool got_ref;		/* Did we pin pages by getting page ref? */
> > +	bool is_pfns;		/* Does array contain pages or pfns? */
> > +	void *ptrs[];		/* Array of pinned pfns / pages. Use
> > +				 * pfns_vector_pages() or pfns_vector_pfns()
> > +				 * for access */
> > +};
> > +
> > +struct frame_vector *frame_vector_create(unsigned int nr_frames);
> > +void frame_vector_destroy(struct frame_vector *vec);
> > +int get_vaddr_frames(unsigned long start, unsigned int nr_pfns,
> > +		     unsigned int gup_flags, struct frame_vector *vec);
> > +void put_vaddr_frames(struct frame_vector *vec);
> > +int frame_vector_to_pages(struct frame_vector *vec);
> > +void frame_vector_to_pfns(struct frame_vector *vec);
> > +
> > +static inline unsigned int frame_vector_count(struct frame_vector *vec)
> > +{
> > +	return vec->nr_frames;
> > +}
> > +
> > +static inline struct page **frame_vector_pages(struct frame_vector *vec)
> > +{
> > +	if (vec->is_pfns) {
> > +		int err = frame_vector_to_pages(vec);
> > +
> > +		if (err)
> > +			return ERR_PTR(err);
> > +	}
> > +	return (struct page **)(vec->ptrs);
> > +}
> > +
> > +static inline unsigned long *frame_vector_pfns(struct frame_vector *vec)
> > +{
> > +	if (!vec->is_pfns)
> > +		frame_vector_to_pfns(vec);
> > +	return (unsigned long *)(vec->ptrs);
> > +}
> > +
> >  #endif /* _MEDIA_VIDEOBUF2_CORE_H */
> 
> Please place those into a include/media/frame_vector.h file, instead of
> merging it directly at vb2 core header.
> 
> Then include the new header at videobuf2-core.h and at frame_vector.c.

Makes sense, I'll do that for v3.

> With such changes:
> 
> Acked-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

Thanks for taking a look.
-Daniel

> 
> > diff --git a/mm/Kconfig b/mm/Kconfig
> > index 6c974888f86f..da6c943fe9f1 100644
> > --- a/mm/Kconfig
> > +++ b/mm/Kconfig
> > @@ -815,9 +815,6 @@ config DEVICE_PRIVATE
> >  	  memory; i.e., memory that is only accessible from the device (or
> >  	  group of devices). You likely also want to select HMM_MIRROR.
> >  
> > -config FRAME_VECTOR
> > -	bool
> > -
> >  config ARCH_USES_HIGH_VMA_FLAGS
> >  	bool
> >  config ARCH_HAS_PKEYS
> > diff --git a/mm/Makefile b/mm/Makefile
> > index d5649f1c12c0..a025fd6c6afd 100644
> > --- a/mm/Makefile
> > +++ b/mm/Makefile
> > @@ -111,7 +111,6 @@ obj-$(CONFIG_PAGE_EXTENSION) += page_ext.o
> >  obj-$(CONFIG_CMA_DEBUGFS) += cma_debug.o
> >  obj-$(CONFIG_USERFAULTFD) += userfaultfd.o
> >  obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o
> > -obj-$(CONFIG_FRAME_VECTOR) += frame_vector.o
> >  obj-$(CONFIG_DEBUG_PAGE_REF) += debug_page_ref.o
> >  obj-$(CONFIG_HARDENED_USERCOPY) += usercopy.o
> >  obj-$(CONFIG_PERCPU_STATS) += percpu-stats.o
> 
> 
> 
> Thanks,
> Mauro

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 06/17] media: videobuf2: Move frame_vector into media subsystem
@ 2020-10-09 16:57       ` Daniel Vetter
  0 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-09 16:57 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Jérôme Glisse, linux-s390, linux-samsung-soc, Jan Kara,
	Pawel Osciak, kvm, Jason Gunthorpe, Daniel Vetter, LKML,
	DRI Development, Tomasz Figa, linux-mm, Kyungmin Park,
	John Hubbard, Daniel Vetter, Andrew Morton, Marek Szyprowski,
	Dan Williams, linux-arm-kernel, linux-media

On Fri, Oct 09, 2020 at 12:14:17PM +0200, Mauro Carvalho Chehab wrote:
> Em Fri,  9 Oct 2020 09:59:23 +0200
> Daniel Vetter <daniel.vetter@ffwll.ch> escreveu:
> 
> > It's the only user. This also garbage collects the CONFIG_FRAME_VECTOR
> > symbol from all over the tree (well just one place, somehow omap media
> > driver still had this in its Kconfig, despite not using it).
> > 
> > Reviewed-by: John Hubbard <jhubbard@nvidia.com>
> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > Cc: Jason Gunthorpe <jgg@ziepe.ca>
> > Cc: Pawel Osciak <pawel@osciak.com>
> > Cc: Marek Szyprowski <m.szyprowski@samsung.com>
> > Cc: Kyungmin Park <kyungmin.park@samsung.com>
> > Cc: Tomasz Figa <tfiga@chromium.org>
> > Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: John Hubbard <jhubbard@nvidia.com>
> > Cc: Jérôme Glisse <jglisse@redhat.com>
> > Cc: Jan Kara <jack@suse.cz>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: linux-mm@kvack.org
> > Cc: linux-arm-kernel@lists.infradead.org
> > Cc: linux-samsung-soc@vger.kernel.org
> > Cc: linux-media@vger.kernel.org
> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > ---
> >  drivers/media/common/videobuf2/Kconfig        |  1 -
> >  drivers/media/common/videobuf2/Makefile       |  1 +
> >  .../media/common/videobuf2}/frame_vector.c    |  2 +
> >  drivers/media/platform/omap/Kconfig           |  1 -
> >  include/linux/mm.h                            | 42 -------------------
> >  include/media/videobuf2-core.h                | 42 +++++++++++++++++++
> >  mm/Kconfig                                    |  3 --
> >  mm/Makefile                                   |  1 -
> >  8 files changed, 45 insertions(+), 48 deletions(-)
> >  rename {mm => drivers/media/common/videobuf2}/frame_vector.c (99%)
> > 
> > diff --git a/drivers/media/common/videobuf2/Kconfig b/drivers/media/common/videobuf2/Kconfig
> > index edbc99ebba87..d2223a12c95f 100644
> > --- a/drivers/media/common/videobuf2/Kconfig
> > +++ b/drivers/media/common/videobuf2/Kconfig
> > @@ -9,7 +9,6 @@ config VIDEOBUF2_V4L2
> >  
> >  config VIDEOBUF2_MEMOPS
> >  	tristate
> > -	select FRAME_VECTOR
> >  
> >  config VIDEOBUF2_DMA_CONTIG
> >  	tristate
> > diff --git a/drivers/media/common/videobuf2/Makefile b/drivers/media/common/videobuf2/Makefile
> > index 77bebe8b202f..54306f8d096c 100644
> > --- a/drivers/media/common/videobuf2/Makefile
> > +++ b/drivers/media/common/videobuf2/Makefile
> > @@ -1,5 +1,6 @@
> >  # SPDX-License-Identifier: GPL-2.0
> >  videobuf2-common-objs := videobuf2-core.o
> > +videobuf2-common-objs += frame_vector.o
> >  
> >  ifeq ($(CONFIG_TRACEPOINTS),y)
> >    videobuf2-common-objs += vb2-trace.o
> > diff --git a/mm/frame_vector.c b/drivers/media/common/videobuf2/frame_vector.c
> > similarity index 99%
> > rename from mm/frame_vector.c
> > rename to drivers/media/common/videobuf2/frame_vector.c
> > index d44779e56313..2b0b97761d15 100644
> > --- a/mm/frame_vector.c
> > +++ b/drivers/media/common/videobuf2/frame_vector.c
> > @@ -8,6 +8,8 @@
> >  #include <linux/pagemap.h>
> >  #include <linux/sched.h>
> >  
> > +#include <media/videobuf2-core.h>
> > +
> 
> See my comment below...
> 
> >  /**
> >   * get_vaddr_frames() - map virtual addresses to pfns
> >   * @start:	starting user address
> > diff --git a/drivers/media/platform/omap/Kconfig b/drivers/media/platform/omap/Kconfig
> > index f73b5893220d..de16de46c0f4 100644
> > --- a/drivers/media/platform/omap/Kconfig
> > +++ b/drivers/media/platform/omap/Kconfig
> > @@ -12,6 +12,5 @@ config VIDEO_OMAP2_VOUT
> >  	depends on VIDEO_V4L2
> >  	select VIDEOBUF2_DMA_CONTIG
> >  	select OMAP2_VRFB if ARCH_OMAP2 || ARCH_OMAP3
> > -	select FRAME_VECTOR
> >  	help
> >  	  V4L2 Display driver support for OMAP2/3 based boards.
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 16b799a0522c..acd60fbf1a5a 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -1743,48 +1743,6 @@ int account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc);
> >  int __account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc,
> >  			struct task_struct *task, bool bypass_rlim);
> >  
> > -/* Container for pinned pfns / pages */
> > -struct frame_vector {
> > -	unsigned int nr_allocated;	/* Number of frames we have space for */
> > -	unsigned int nr_frames;	/* Number of frames stored in ptrs array */
> > -	bool got_ref;		/* Did we pin pages by getting page ref? */
> > -	bool is_pfns;		/* Does array contain pages or pfns? */
> > -	void *ptrs[];		/* Array of pinned pfns / pages. Use
> > -				 * pfns_vector_pages() or pfns_vector_pfns()
> > -				 * for access */
> > -};
> > -
> > -struct frame_vector *frame_vector_create(unsigned int nr_frames);
> > -void frame_vector_destroy(struct frame_vector *vec);
> > -int get_vaddr_frames(unsigned long start, unsigned int nr_pfns,
> > -		     unsigned int gup_flags, struct frame_vector *vec);
> > -void put_vaddr_frames(struct frame_vector *vec);
> > -int frame_vector_to_pages(struct frame_vector *vec);
> > -void frame_vector_to_pfns(struct frame_vector *vec);
> > -
> > -static inline unsigned int frame_vector_count(struct frame_vector *vec)
> > -{
> > -	return vec->nr_frames;
> > -}
> > -
> > -static inline struct page **frame_vector_pages(struct frame_vector *vec)
> > -{
> > -	if (vec->is_pfns) {
> > -		int err = frame_vector_to_pages(vec);
> > -
> > -		if (err)
> > -			return ERR_PTR(err);
> > -	}
> > -	return (struct page **)(vec->ptrs);
> > -}
> > -
> > -static inline unsigned long *frame_vector_pfns(struct frame_vector *vec)
> > -{
> > -	if (!vec->is_pfns)
> > -		frame_vector_to_pfns(vec);
> > -	return (unsigned long *)(vec->ptrs);
> > -}
> > -
> >  struct kvec;
> >  int get_kernel_pages(const struct kvec *iov, int nr_pages, int write,
> >  			struct page **pages);
> > diff --git a/include/media/videobuf2-core.h b/include/media/videobuf2-core.h
> > index bbb3f26fbde9..a2e75ca0334f 100644
> > --- a/include/media/videobuf2-core.h
> > +++ b/include/media/videobuf2-core.h
> > @@ -1254,4 +1254,46 @@ bool vb2_request_object_is_buffer(struct media_request_object *obj);
> >   */
> >  unsigned int vb2_request_buffer_cnt(struct media_request *req);
> >  
> > +/* Container for pinned pfns / pages in frame_vector.c */
> > +struct frame_vector {
> > +	unsigned int nr_allocated;	/* Number of frames we have space for */
> > +	unsigned int nr_frames;	/* Number of frames stored in ptrs array */
> > +	bool got_ref;		/* Did we pin pages by getting page ref? */
> > +	bool is_pfns;		/* Does array contain pages or pfns? */
> > +	void *ptrs[];		/* Array of pinned pfns / pages. Use
> > +				 * pfns_vector_pages() or pfns_vector_pfns()
> > +				 * for access */
> > +};
> > +
> > +struct frame_vector *frame_vector_create(unsigned int nr_frames);
> > +void frame_vector_destroy(struct frame_vector *vec);
> > +int get_vaddr_frames(unsigned long start, unsigned int nr_pfns,
> > +		     unsigned int gup_flags, struct frame_vector *vec);
> > +void put_vaddr_frames(struct frame_vector *vec);
> > +int frame_vector_to_pages(struct frame_vector *vec);
> > +void frame_vector_to_pfns(struct frame_vector *vec);
> > +
> > +static inline unsigned int frame_vector_count(struct frame_vector *vec)
> > +{
> > +	return vec->nr_frames;
> > +}
> > +
> > +static inline struct page **frame_vector_pages(struct frame_vector *vec)
> > +{
> > +	if (vec->is_pfns) {
> > +		int err = frame_vector_to_pages(vec);
> > +
> > +		if (err)
> > +			return ERR_PTR(err);
> > +	}
> > +	return (struct page **)(vec->ptrs);
> > +}
> > +
> > +static inline unsigned long *frame_vector_pfns(struct frame_vector *vec)
> > +{
> > +	if (!vec->is_pfns)
> > +		frame_vector_to_pfns(vec);
> > +	return (unsigned long *)(vec->ptrs);
> > +}
> > +
> >  #endif /* _MEDIA_VIDEOBUF2_CORE_H */
> 
> Please place those into a include/media/frame_vector.h file, instead of
> merging it directly at vb2 core header.
> 
> Then include the new header at videobuf2-core.h and at frame_vector.c.

Makes sense, I'll do that for v3.

> With such changes:
> 
> Acked-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

Thanks for taking a look.
-Daniel

> 
> > diff --git a/mm/Kconfig b/mm/Kconfig
> > index 6c974888f86f..da6c943fe9f1 100644
> > --- a/mm/Kconfig
> > +++ b/mm/Kconfig
> > @@ -815,9 +815,6 @@ config DEVICE_PRIVATE
> >  	  memory; i.e., memory that is only accessible from the device (or
> >  	  group of devices). You likely also want to select HMM_MIRROR.
> >  
> > -config FRAME_VECTOR
> > -	bool
> > -
> >  config ARCH_USES_HIGH_VMA_FLAGS
> >  	bool
> >  config ARCH_HAS_PKEYS
> > diff --git a/mm/Makefile b/mm/Makefile
> > index d5649f1c12c0..a025fd6c6afd 100644
> > --- a/mm/Makefile
> > +++ b/mm/Makefile
> > @@ -111,7 +111,6 @@ obj-$(CONFIG_PAGE_EXTENSION) += page_ext.o
> >  obj-$(CONFIG_CMA_DEBUGFS) += cma_debug.o
> >  obj-$(CONFIG_USERFAULTFD) += userfaultfd.o
> >  obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o
> > -obj-$(CONFIG_FRAME_VECTOR) += frame_vector.o
> >  obj-$(CONFIG_DEBUG_PAGE_REF) += debug_page_ref.o
> >  obj-$(CONFIG_HARDENED_USERCOPY) += usercopy.o
> >  obj-$(CONFIG_PERCPU_STATS) += percpu-stats.o
> 
> 
> 
> Thanks,
> Mauro

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 06/17] media: videobuf2: Move frame_vector into media subsystem
@ 2020-10-09 16:57       ` Daniel Vetter
  0 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-09 16:57 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Jérôme Glisse, linux-s390, linux-samsung-soc, Jan Kara,
	Pawel Osciak, kvm, Jason Gunthorpe, Daniel Vetter, LKML,
	DRI Development, Tomasz Figa, linux-mm, Kyungmin Park,
	John Hubbard, Daniel Vetter, Andrew Morton, Marek Szyprowski,
	Dan Williams, linux-arm-kernel, linux-media

On Fri, Oct 09, 2020 at 12:14:17PM +0200, Mauro Carvalho Chehab wrote:
> Em Fri,  9 Oct 2020 09:59:23 +0200
> Daniel Vetter <daniel.vetter@ffwll.ch> escreveu:
> 
> > It's the only user. This also garbage collects the CONFIG_FRAME_VECTOR
> > symbol from all over the tree (well just one place, somehow omap media
> > driver still had this in its Kconfig, despite not using it).
> > 
> > Reviewed-by: John Hubbard <jhubbard@nvidia.com>
> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > Cc: Jason Gunthorpe <jgg@ziepe.ca>
> > Cc: Pawel Osciak <pawel@osciak.com>
> > Cc: Marek Szyprowski <m.szyprowski@samsung.com>
> > Cc: Kyungmin Park <kyungmin.park@samsung.com>
> > Cc: Tomasz Figa <tfiga@chromium.org>
> > Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: John Hubbard <jhubbard@nvidia.com>
> > Cc: Jérôme Glisse <jglisse@redhat.com>
> > Cc: Jan Kara <jack@suse.cz>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: linux-mm@kvack.org
> > Cc: linux-arm-kernel@lists.infradead.org
> > Cc: linux-samsung-soc@vger.kernel.org
> > Cc: linux-media@vger.kernel.org
> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > ---
> >  drivers/media/common/videobuf2/Kconfig        |  1 -
> >  drivers/media/common/videobuf2/Makefile       |  1 +
> >  .../media/common/videobuf2}/frame_vector.c    |  2 +
> >  drivers/media/platform/omap/Kconfig           |  1 -
> >  include/linux/mm.h                            | 42 -------------------
> >  include/media/videobuf2-core.h                | 42 +++++++++++++++++++
> >  mm/Kconfig                                    |  3 --
> >  mm/Makefile                                   |  1 -
> >  8 files changed, 45 insertions(+), 48 deletions(-)
> >  rename {mm => drivers/media/common/videobuf2}/frame_vector.c (99%)
> > 
> > diff --git a/drivers/media/common/videobuf2/Kconfig b/drivers/media/common/videobuf2/Kconfig
> > index edbc99ebba87..d2223a12c95f 100644
> > --- a/drivers/media/common/videobuf2/Kconfig
> > +++ b/drivers/media/common/videobuf2/Kconfig
> > @@ -9,7 +9,6 @@ config VIDEOBUF2_V4L2
> >  
> >  config VIDEOBUF2_MEMOPS
> >  	tristate
> > -	select FRAME_VECTOR
> >  
> >  config VIDEOBUF2_DMA_CONTIG
> >  	tristate
> > diff --git a/drivers/media/common/videobuf2/Makefile b/drivers/media/common/videobuf2/Makefile
> > index 77bebe8b202f..54306f8d096c 100644
> > --- a/drivers/media/common/videobuf2/Makefile
> > +++ b/drivers/media/common/videobuf2/Makefile
> > @@ -1,5 +1,6 @@
> >  # SPDX-License-Identifier: GPL-2.0
> >  videobuf2-common-objs := videobuf2-core.o
> > +videobuf2-common-objs += frame_vector.o
> >  
> >  ifeq ($(CONFIG_TRACEPOINTS),y)
> >    videobuf2-common-objs += vb2-trace.o
> > diff --git a/mm/frame_vector.c b/drivers/media/common/videobuf2/frame_vector.c
> > similarity index 99%
> > rename from mm/frame_vector.c
> > rename to drivers/media/common/videobuf2/frame_vector.c
> > index d44779e56313..2b0b97761d15 100644
> > --- a/mm/frame_vector.c
> > +++ b/drivers/media/common/videobuf2/frame_vector.c
> > @@ -8,6 +8,8 @@
> >  #include <linux/pagemap.h>
> >  #include <linux/sched.h>
> >  
> > +#include <media/videobuf2-core.h>
> > +
> 
> See my comment below...
> 
> >  /**
> >   * get_vaddr_frames() - map virtual addresses to pfns
> >   * @start:	starting user address
> > diff --git a/drivers/media/platform/omap/Kconfig b/drivers/media/platform/omap/Kconfig
> > index f73b5893220d..de16de46c0f4 100644
> > --- a/drivers/media/platform/omap/Kconfig
> > +++ b/drivers/media/platform/omap/Kconfig
> > @@ -12,6 +12,5 @@ config VIDEO_OMAP2_VOUT
> >  	depends on VIDEO_V4L2
> >  	select VIDEOBUF2_DMA_CONTIG
> >  	select OMAP2_VRFB if ARCH_OMAP2 || ARCH_OMAP3
> > -	select FRAME_VECTOR
> >  	help
> >  	  V4L2 Display driver support for OMAP2/3 based boards.
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 16b799a0522c..acd60fbf1a5a 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -1743,48 +1743,6 @@ int account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc);
> >  int __account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc,
> >  			struct task_struct *task, bool bypass_rlim);
> >  
> > -/* Container for pinned pfns / pages */
> > -struct frame_vector {
> > -	unsigned int nr_allocated;	/* Number of frames we have space for */
> > -	unsigned int nr_frames;	/* Number of frames stored in ptrs array */
> > -	bool got_ref;		/* Did we pin pages by getting page ref? */
> > -	bool is_pfns;		/* Does array contain pages or pfns? */
> > -	void *ptrs[];		/* Array of pinned pfns / pages. Use
> > -				 * pfns_vector_pages() or pfns_vector_pfns()
> > -				 * for access */
> > -};
> > -
> > -struct frame_vector *frame_vector_create(unsigned int nr_frames);
> > -void frame_vector_destroy(struct frame_vector *vec);
> > -int get_vaddr_frames(unsigned long start, unsigned int nr_pfns,
> > -		     unsigned int gup_flags, struct frame_vector *vec);
> > -void put_vaddr_frames(struct frame_vector *vec);
> > -int frame_vector_to_pages(struct frame_vector *vec);
> > -void frame_vector_to_pfns(struct frame_vector *vec);
> > -
> > -static inline unsigned int frame_vector_count(struct frame_vector *vec)
> > -{
> > -	return vec->nr_frames;
> > -}
> > -
> > -static inline struct page **frame_vector_pages(struct frame_vector *vec)
> > -{
> > -	if (vec->is_pfns) {
> > -		int err = frame_vector_to_pages(vec);
> > -
> > -		if (err)
> > -			return ERR_PTR(err);
> > -	}
> > -	return (struct page **)(vec->ptrs);
> > -}
> > -
> > -static inline unsigned long *frame_vector_pfns(struct frame_vector *vec)
> > -{
> > -	if (!vec->is_pfns)
> > -		frame_vector_to_pfns(vec);
> > -	return (unsigned long *)(vec->ptrs);
> > -}
> > -
> >  struct kvec;
> >  int get_kernel_pages(const struct kvec *iov, int nr_pages, int write,
> >  			struct page **pages);
> > diff --git a/include/media/videobuf2-core.h b/include/media/videobuf2-core.h
> > index bbb3f26fbde9..a2e75ca0334f 100644
> > --- a/include/media/videobuf2-core.h
> > +++ b/include/media/videobuf2-core.h
> > @@ -1254,4 +1254,46 @@ bool vb2_request_object_is_buffer(struct media_request_object *obj);
> >   */
> >  unsigned int vb2_request_buffer_cnt(struct media_request *req);
> >  
> > +/* Container for pinned pfns / pages in frame_vector.c */
> > +struct frame_vector {
> > +	unsigned int nr_allocated;	/* Number of frames we have space for */
> > +	unsigned int nr_frames;	/* Number of frames stored in ptrs array */
> > +	bool got_ref;		/* Did we pin pages by getting page ref? */
> > +	bool is_pfns;		/* Does array contain pages or pfns? */
> > +	void *ptrs[];		/* Array of pinned pfns / pages. Use
> > +				 * pfns_vector_pages() or pfns_vector_pfns()
> > +				 * for access */
> > +};
> > +
> > +struct frame_vector *frame_vector_create(unsigned int nr_frames);
> > +void frame_vector_destroy(struct frame_vector *vec);
> > +int get_vaddr_frames(unsigned long start, unsigned int nr_pfns,
> > +		     unsigned int gup_flags, struct frame_vector *vec);
> > +void put_vaddr_frames(struct frame_vector *vec);
> > +int frame_vector_to_pages(struct frame_vector *vec);
> > +void frame_vector_to_pfns(struct frame_vector *vec);
> > +
> > +static inline unsigned int frame_vector_count(struct frame_vector *vec)
> > +{
> > +	return vec->nr_frames;
> > +}
> > +
> > +static inline struct page **frame_vector_pages(struct frame_vector *vec)
> > +{
> > +	if (vec->is_pfns) {
> > +		int err = frame_vector_to_pages(vec);
> > +
> > +		if (err)
> > +			return ERR_PTR(err);
> > +	}
> > +	return (struct page **)(vec->ptrs);
> > +}
> > +
> > +static inline unsigned long *frame_vector_pfns(struct frame_vector *vec)
> > +{
> > +	if (!vec->is_pfns)
> > +		frame_vector_to_pfns(vec);
> > +	return (unsigned long *)(vec->ptrs);
> > +}
> > +
> >  #endif /* _MEDIA_VIDEOBUF2_CORE_H */
> 
> Please place those into a include/media/frame_vector.h file, instead of
> merging it directly at vb2 core header.
> 
> Then include the new header at videobuf2-core.h and at frame_vector.c.

Makes sense, I'll do that for v3.

> With such changes:
> 
> Acked-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

Thanks for taking a look.
-Daniel

> 
> > diff --git a/mm/Kconfig b/mm/Kconfig
> > index 6c974888f86f..da6c943fe9f1 100644
> > --- a/mm/Kconfig
> > +++ b/mm/Kconfig
> > @@ -815,9 +815,6 @@ config DEVICE_PRIVATE
> >  	  memory; i.e., memory that is only accessible from the device (or
> >  	  group of devices). You likely also want to select HMM_MIRROR.
> >  
> > -config FRAME_VECTOR
> > -	bool
> > -
> >  config ARCH_USES_HIGH_VMA_FLAGS
> >  	bool
> >  config ARCH_HAS_PKEYS
> > diff --git a/mm/Makefile b/mm/Makefile
> > index d5649f1c12c0..a025fd6c6afd 100644
> > --- a/mm/Makefile
> > +++ b/mm/Makefile
> > @@ -111,7 +111,6 @@ obj-$(CONFIG_PAGE_EXTENSION) += page_ext.o
> >  obj-$(CONFIG_CMA_DEBUGFS) += cma_debug.o
> >  obj-$(CONFIG_USERFAULTFD) += userfaultfd.o
> >  obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o
> > -obj-$(CONFIG_FRAME_VECTOR) += frame_vector.o
> >  obj-$(CONFIG_DEBUG_PAGE_REF) += debug_page_ref.o
> >  obj-$(CONFIG_HARDENED_USERCOPY) += usercopy.o
> >  obj-$(CONFIG_PERCPU_STATS) += percpu-stats.o
> 
> 
> 
> Thanks,
> Mauro

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 09/17] mm: Add unsafe_follow_pfn
  2020-10-09 12:48           ` Jason Gunthorpe
  (?)
@ 2020-10-09 17:52             ` Daniel Vetter
  -1 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-09 17:52 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Mauro Carvalho Chehab, DRI Development, LKML, KVM list, Linux MM,
	Linux ARM, linux-samsung-soc,
	open list:DMA BUFFER SHARING FRAMEWORK, linux-s390,
	Daniel Vetter, Kees Cook, Dan Williams, Andrew Morton,
	John Hubbard, Jérôme Glisse, Jan Kara, Linus Torvalds

On Fri, Oct 9, 2020 at 2:48 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Fri, Oct 09, 2020 at 02:37:23PM +0200, Mauro Carvalho Chehab wrote:
>
> > I'm not a mm/ expert, but, from what I understood from Daniel's patch
> > description is that this is unsafe *only if*  __GFP_MOVABLE is used.
>
> No, it is unconditionally unsafe. The CMA movable mappings are
> specific VMAs that will have bad issues here, but there are other
> types too.
>
> The only way to do something at a VMA level is to have a list of OK
> VMAs, eg because they were creatd via a special mmap helper from the
> media subsystem.
>
> > Well, no drivers inside the media subsystem uses such flag, although
> > they may rely on some infrastructure that could be using it behind
> > the bars.
>
> It doesn't matter, nothing prevents the user from calling media APIs
> on mmaps it gets from other subsystems.

I think a good first step would be to disable userptr of non struct
page backed storage going forward for any new hw support. Even on
existing drivers. dma-buf sharing has been around for long enough now
that this shouldn't be a problem. Unfortunately right now this doesn't
seem to exist, so the entire problem keeps getting perpetuated.

> > If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE
> > flag that it would be denying the core mm code to set __GFP_MOVABLE.
>
> We can't tell from the VMA these kinds of details..
>
> It has to go the other direction, evey mmap that might be used as a
> userptr here has to be found and the VMA specially created to allow
> its use. At least that is a kernel only change, but will need people
> with the HW to do this work.

I think the only reasonable way to keep this working is:
- add a struct dma_buf *vma_tryget_dma_buf(struct vm_area_struct *vma);
- add dma-buf export support to fbdev and v4l
- roll this out everywhere we still need it.

Realistically this just isn't going to happen. And anything else just
reimplements half of dma-buf, which is kinda pointless (you need
minimally refcounting and some way to get at a promise of a permanent
sg list for dma. Plus probably the vmap for kernel cpu access.

> > Please let address the issue on this way, instead of broken an
> > userspace API that it is there since 1991.
>
> It has happened before :( It took 4 years for RDMA to undo the uAPI
> breakage caused by a security fix for something that was a 15 years
> old bug.

Yeah we have a bunch of these on the drm side too. Some of them are
really just "you have to upgrade userspace", and there's no real fix
for the security nightmare without that.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 09/17] mm: Add unsafe_follow_pfn
@ 2020-10-09 17:52             ` Daniel Vetter
  0 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-09 17:52 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, KVM list,
	Mauro Carvalho Chehab, John Hubbard, LKML, DRI Development,
	Linux MM, Jérôme Glisse, Daniel Vetter, Dan Williams,
	Linus Torvalds, Andrew Morton, Linux ARM,
	open list:DMA BUFFER SHARING FRAMEWORK

On Fri, Oct 9, 2020 at 2:48 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Fri, Oct 09, 2020 at 02:37:23PM +0200, Mauro Carvalho Chehab wrote:
>
> > I'm not a mm/ expert, but, from what I understood from Daniel's patch
> > description is that this is unsafe *only if*  __GFP_MOVABLE is used.
>
> No, it is unconditionally unsafe. The CMA movable mappings are
> specific VMAs that will have bad issues here, but there are other
> types too.
>
> The only way to do something at a VMA level is to have a list of OK
> VMAs, eg because they were creatd via a special mmap helper from the
> media subsystem.
>
> > Well, no drivers inside the media subsystem uses such flag, although
> > they may rely on some infrastructure that could be using it behind
> > the bars.
>
> It doesn't matter, nothing prevents the user from calling media APIs
> on mmaps it gets from other subsystems.

I think a good first step would be to disable userptr of non struct
page backed storage going forward for any new hw support. Even on
existing drivers. dma-buf sharing has been around for long enough now
that this shouldn't be a problem. Unfortunately right now this doesn't
seem to exist, so the entire problem keeps getting perpetuated.

> > If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE
> > flag that it would be denying the core mm code to set __GFP_MOVABLE.
>
> We can't tell from the VMA these kinds of details..
>
> It has to go the other direction, evey mmap that might be used as a
> userptr here has to be found and the VMA specially created to allow
> its use. At least that is a kernel only change, but will need people
> with the HW to do this work.

I think the only reasonable way to keep this working is:
- add a struct dma_buf *vma_tryget_dma_buf(struct vm_area_struct *vma);
- add dma-buf export support to fbdev and v4l
- roll this out everywhere we still need it.

Realistically this just isn't going to happen. And anything else just
reimplements half of dma-buf, which is kinda pointless (you need
minimally refcounting and some way to get at a promise of a permanent
sg list for dma. Plus probably the vmap for kernel cpu access.

> > Please let address the issue on this way, instead of broken an
> > userspace API that it is there since 1991.
>
> It has happened before :( It took 4 years for RDMA to undo the uAPI
> breakage caused by a security fix for something that was a 15 years
> old bug.

Yeah we have a bunch of these on the drm side too. Some of them are
really just "you have to upgrade userspace", and there's no real fix
for the security nightmare without that.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 09/17] mm: Add unsafe_follow_pfn
@ 2020-10-09 17:52             ` Daniel Vetter
  0 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-09 17:52 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, KVM list,
	Mauro Carvalho Chehab, John Hubbard, LKML, DRI Development,
	Linux MM, Jérôme Glisse, Daniel Vetter, Dan Williams,
	Linus Torvalds, Andrew Morton, Linux ARM,
	open list:DMA BUFFER SHARING FRAMEWORK

On Fri, Oct 9, 2020 at 2:48 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Fri, Oct 09, 2020 at 02:37:23PM +0200, Mauro Carvalho Chehab wrote:
>
> > I'm not a mm/ expert, but, from what I understood from Daniel's patch
> > description is that this is unsafe *only if*  __GFP_MOVABLE is used.
>
> No, it is unconditionally unsafe. The CMA movable mappings are
> specific VMAs that will have bad issues here, but there are other
> types too.
>
> The only way to do something at a VMA level is to have a list of OK
> VMAs, eg because they were creatd via a special mmap helper from the
> media subsystem.
>
> > Well, no drivers inside the media subsystem uses such flag, although
> > they may rely on some infrastructure that could be using it behind
> > the bars.
>
> It doesn't matter, nothing prevents the user from calling media APIs
> on mmaps it gets from other subsystems.

I think a good first step would be to disable userptr of non struct
page backed storage going forward for any new hw support. Even on
existing drivers. dma-buf sharing has been around for long enough now
that this shouldn't be a problem. Unfortunately right now this doesn't
seem to exist, so the entire problem keeps getting perpetuated.

> > If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE
> > flag that it would be denying the core mm code to set __GFP_MOVABLE.
>
> We can't tell from the VMA these kinds of details..
>
> It has to go the other direction, evey mmap that might be used as a
> userptr here has to be found and the VMA specially created to allow
> its use. At least that is a kernel only change, but will need people
> with the HW to do this work.

I think the only reasonable way to keep this working is:
- add a struct dma_buf *vma_tryget_dma_buf(struct vm_area_struct *vma);
- add dma-buf export support to fbdev and v4l
- roll this out everywhere we still need it.

Realistically this just isn't going to happen. And anything else just
reimplements half of dma-buf, which is kinda pointless (you need
minimally refcounting and some way to get at a promise of a permanent
sg list for dma. Plus probably the vmap for kernel cpu access.

> > Please let address the issue on this way, instead of broken an
> > userspace API that it is there since 1991.
>
> It has happened before :( It took 4 years for RDMA to undo the uAPI
> breakage caused by a security fix for something that was a 15 years
> old bug.

Yeah we have a bunch of these on the drm side too. Some of them are
really just "you have to upgrade userspace", and there's no real fix
for the security nightmare without that.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 09/17] mm: Add unsafe_follow_pfn
  2020-10-09 17:52             ` Daniel Vetter
  (?)
@ 2020-10-09 18:01               ` Jason Gunthorpe
  -1 siblings, 0 replies; 214+ messages in thread
From: Jason Gunthorpe @ 2020-10-09 18:01 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Mauro Carvalho Chehab, DRI Development, LKML, KVM list, Linux MM,
	Linux ARM, linux-samsung-soc,
	open list:DMA BUFFER SHARING FRAMEWORK, linux-s390,
	Daniel Vetter, Kees Cook, Dan Williams, Andrew Morton,
	John Hubbard, Jérôme Glisse, Jan Kara, Linus Torvalds

On Fri, Oct 09, 2020 at 07:52:05PM +0200, Daniel Vetter wrote:

> > > If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE
> > > flag that it would be denying the core mm code to set __GFP_MOVABLE.
> >
> > We can't tell from the VMA these kinds of details..
> >
> > It has to go the other direction, evey mmap that might be used as a
> > userptr here has to be found and the VMA specially created to allow
> > its use. At least that is a kernel only change, but will need people
> > with the HW to do this work.
> 
> I think the only reasonable way to keep this working is:
> - add a struct dma_buf *vma_tryget_dma_buf(struct vm_area_struct *vma);
> - add dma-buf export support to fbdev and v4l
> - roll this out everywhere we still need it.

It seems to me there is a technical way forward to restore user
compat, so it is really no different than RDMA/DRM pain we both
suffered before.

Thus no justification to NAK it. If media wants things to keep working
they have to do the technical path like you outline above.

> Realistically this just isn't going to happen. 

If your series goes ahead it will get solved. Someone will take on the
huge project to either add DMA buf to the drivers people still care
about, or do the work above to transparently handle in kernel.

If we allow things to keep working without consequence then nobody
will do it.

The only reason we did the 4 years of work in RDMA was because Linus
went in and broke the uABI for a security fix. It was hundreds of
patches to fix it, so I don't have much sympathy for "it is too hard"
here.

Jason

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 09/17] mm: Add unsafe_follow_pfn
@ 2020-10-09 18:01               ` Jason Gunthorpe
  0 siblings, 0 replies; 214+ messages in thread
From: Jason Gunthorpe @ 2020-10-09 18:01 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, KVM list,
	Mauro Carvalho Chehab, John Hubbard, LKML, DRI Development,
	Linux MM, Jérôme Glisse, Daniel Vetter, Dan Williams,
	Linus Torvalds, Andrew Morton, Linux ARM,
	open list:DMA BUFFER SHARING FRAMEWORK

On Fri, Oct 09, 2020 at 07:52:05PM +0200, Daniel Vetter wrote:

> > > If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE
> > > flag that it would be denying the core mm code to set __GFP_MOVABLE.
> >
> > We can't tell from the VMA these kinds of details..
> >
> > It has to go the other direction, evey mmap that might be used as a
> > userptr here has to be found and the VMA specially created to allow
> > its use. At least that is a kernel only change, but will need people
> > with the HW to do this work.
> 
> I think the only reasonable way to keep this working is:
> - add a struct dma_buf *vma_tryget_dma_buf(struct vm_area_struct *vma);
> - add dma-buf export support to fbdev and v4l
> - roll this out everywhere we still need it.

It seems to me there is a technical way forward to restore user
compat, so it is really no different than RDMA/DRM pain we both
suffered before.

Thus no justification to NAK it. If media wants things to keep working
they have to do the technical path like you outline above.

> Realistically this just isn't going to happen. 

If your series goes ahead it will get solved. Someone will take on the
huge project to either add DMA buf to the drivers people still care
about, or do the work above to transparently handle in kernel.

If we allow things to keep working without consequence then nobody
will do it.

The only reason we did the 4 years of work in RDMA was because Linus
went in and broke the uABI for a security fix. It was hundreds of
patches to fix it, so I don't have much sympathy for "it is too hard"
here.

Jason

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 09/17] mm: Add unsafe_follow_pfn
@ 2020-10-09 18:01               ` Jason Gunthorpe
  0 siblings, 0 replies; 214+ messages in thread
From: Jason Gunthorpe @ 2020-10-09 18:01 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, KVM list,
	Mauro Carvalho Chehab, John Hubbard, LKML, DRI Development,
	Linux MM, Jérôme Glisse, Daniel Vetter, Dan Williams,
	Linus Torvalds, Andrew Morton, Linux ARM,
	open list:DMA BUFFER SHARING FRAMEWORK

On Fri, Oct 09, 2020 at 07:52:05PM +0200, Daniel Vetter wrote:

> > > If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE
> > > flag that it would be denying the core mm code to set __GFP_MOVABLE.
> >
> > We can't tell from the VMA these kinds of details..
> >
> > It has to go the other direction, evey mmap that might be used as a
> > userptr here has to be found and the VMA specially created to allow
> > its use. At least that is a kernel only change, but will need people
> > with the HW to do this work.
> 
> I think the only reasonable way to keep this working is:
> - add a struct dma_buf *vma_tryget_dma_buf(struct vm_area_struct *vma);
> - add dma-buf export support to fbdev and v4l
> - roll this out everywhere we still need it.

It seems to me there is a technical way forward to restore user
compat, so it is really no different than RDMA/DRM pain we both
suffered before.

Thus no justification to NAK it. If media wants things to keep working
they have to do the technical path like you outline above.

> Realistically this just isn't going to happen. 

If your series goes ahead it will get solved. Someone will take on the
huge project to either add DMA buf to the drivers people still care
about, or do the work above to transparently handle in kernel.

If we allow things to keep working without consequence then nobody
will do it.

The only reason we did the 4 years of work in RDMA was because Linus
went in and broke the uABI for a security fix. It was hundreds of
patches to fix it, so I don't have much sympathy for "it is too hard"
here.

Jason
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 14/17] resource: Move devmem revoke code to resource framework
  2020-10-09 14:32         ` Jason Gunthorpe
  (?)
@ 2020-10-09 18:28           ` Dan Williams
  -1 siblings, 0 replies; 214+ messages in thread
From: Dan Williams @ 2020-10-09 18:28 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Daniel Vetter, DRI Development, LKML, KVM list, Linux MM,
	Linux ARM, linux-samsung-soc,
	open list:DMA BUFFER SHARING FRAMEWORK, linux-s390,
	Daniel Vetter, Kees Cook, Andrew Morton, John Hubbard,
	Jérôme Glisse, Jan Kara, Arnd Bergmann,
	Greg Kroah-Hartman, David Hildenbrand, Rafael J. Wysocki

On Fri, Oct 9, 2020 at 7:32 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Fri, Oct 09, 2020 at 04:24:45PM +0200, Daniel Vetter wrote:
> > On Fri, Oct 9, 2020 at 2:31 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > >
> > > On Fri, Oct 09, 2020 at 09:59:31AM +0200, Daniel Vetter wrote:
> > >
> > > > +struct address_space *iomem_get_mapping(void)
> > > > +{
> > > > +     return iomem_inode->i_mapping;
> > >
> > > This should pair an acquire with the release below
> > >
> > > > +     /*
> > > > +      * Publish /dev/mem initialized.
> > > > +      * Pairs with smp_load_acquire() in revoke_iomem().
> > > > +      */
> > > > +     smp_store_release(&iomem_inode, inode);
> > >
> > > However, this seems abnormal, initcalls rarely do this kind of stuff
> > > with global data..
> > >
> > > The kernel crashes if this fs_initcall is raced with
> > > iomem_get_mapping() due to the unconditional dereference, so I think
> > > it can be safely switched to a simple assignment.
> >
> > Ah yes I checked this all, but forgot to correctly annotate the
> > iomem_get_mapping access. For reference, see b34e7e298d7a ("/dev/mem:
> > Add missing memory barriers for devmem_inode").
>
> Oh yikes, so revoke_iomem can run concurrently during early boot,
> tricky.

It runs early because request_mem_region() can run before fs_initcall.
Rather than add an unnecessary lock just arrange for the revoke to be
skipped before the inode is initialized. The expectation is that any
early resource reservations will block future userspace mapping
attempts.

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 14/17] resource: Move devmem revoke code to resource framework
@ 2020-10-09 18:28           ` Dan Williams
  0 siblings, 0 replies; 214+ messages in thread
From: Dan Williams @ 2020-10-09 18:28 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, KVM list,
	Greg Kroah-Hartman, Daniel Vetter, David Hildenbrand, LKML,
	DRI Development, Rafael J. Wysocki, Linux MM,
	Jérôme Glisse, Arnd Bergmann, John Hubbard,
	Daniel Vetter, Andrew Morton, Linux ARM,
	open list:DMA BUFFER SHARING FRAMEWORK

On Fri, Oct 9, 2020 at 7:32 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Fri, Oct 09, 2020 at 04:24:45PM +0200, Daniel Vetter wrote:
> > On Fri, Oct 9, 2020 at 2:31 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > >
> > > On Fri, Oct 09, 2020 at 09:59:31AM +0200, Daniel Vetter wrote:
> > >
> > > > +struct address_space *iomem_get_mapping(void)
> > > > +{
> > > > +     return iomem_inode->i_mapping;
> > >
> > > This should pair an acquire with the release below
> > >
> > > > +     /*
> > > > +      * Publish /dev/mem initialized.
> > > > +      * Pairs with smp_load_acquire() in revoke_iomem().
> > > > +      */
> > > > +     smp_store_release(&iomem_inode, inode);
> > >
> > > However, this seems abnormal, initcalls rarely do this kind of stuff
> > > with global data..
> > >
> > > The kernel crashes if this fs_initcall is raced with
> > > iomem_get_mapping() due to the unconditional dereference, so I think
> > > it can be safely switched to a simple assignment.
> >
> > Ah yes I checked this all, but forgot to correctly annotate the
> > iomem_get_mapping access. For reference, see b34e7e298d7a ("/dev/mem:
> > Add missing memory barriers for devmem_inode").
>
> Oh yikes, so revoke_iomem can run concurrently during early boot,
> tricky.

It runs early because request_mem_region() can run before fs_initcall.
Rather than add an unnecessary lock just arrange for the revoke to be
skipped before the inode is initialized. The expectation is that any
early resource reservations will block future userspace mapping
attempts.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 14/17] resource: Move devmem revoke code to resource framework
@ 2020-10-09 18:28           ` Dan Williams
  0 siblings, 0 replies; 214+ messages in thread
From: Dan Williams @ 2020-10-09 18:28 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, KVM list,
	Greg Kroah-Hartman, Daniel Vetter, David Hildenbrand, LKML,
	DRI Development, Rafael J. Wysocki, Linux MM,
	Jérôme Glisse, Arnd Bergmann, John Hubbard,
	Daniel Vetter, Andrew Morton, Linux ARM,
	open list:DMA BUFFER SHARING FRAMEWORK

On Fri, Oct 9, 2020 at 7:32 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Fri, Oct 09, 2020 at 04:24:45PM +0200, Daniel Vetter wrote:
> > On Fri, Oct 9, 2020 at 2:31 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > >
> > > On Fri, Oct 09, 2020 at 09:59:31AM +0200, Daniel Vetter wrote:
> > >
> > > > +struct address_space *iomem_get_mapping(void)
> > > > +{
> > > > +     return iomem_inode->i_mapping;
> > >
> > > This should pair an acquire with the release below
> > >
> > > > +     /*
> > > > +      * Publish /dev/mem initialized.
> > > > +      * Pairs with smp_load_acquire() in revoke_iomem().
> > > > +      */
> > > > +     smp_store_release(&iomem_inode, inode);
> > >
> > > However, this seems abnormal, initcalls rarely do this kind of stuff
> > > with global data..
> > >
> > > The kernel crashes if this fs_initcall is raced with
> > > iomem_get_mapping() due to the unconditional dereference, so I think
> > > it can be safely switched to a simple assignment.
> >
> > Ah yes I checked this all, but forgot to correctly annotate the
> > iomem_get_mapping access. For reference, see b34e7e298d7a ("/dev/mem:
> > Add missing memory barriers for devmem_inode").
>
> Oh yikes, so revoke_iomem can run concurrently during early boot,
> tricky.

It runs early because request_mem_region() can run before fs_initcall.
Rather than add an unnecessary lock just arrange for the revoke to be
skipped before the inode is initialized. The expectation is that any
early resource reservations will block future userspace mapping
attempts.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 09/17] mm: Add unsafe_follow_pfn
  2020-10-09 18:01               ` Jason Gunthorpe
  (?)
@ 2020-10-09 19:31                 ` Daniel Vetter
  -1 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-09 19:31 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Mauro Carvalho Chehab, DRI Development, LKML, KVM list, Linux MM,
	Linux ARM, linux-samsung-soc,
	open list:DMA BUFFER SHARING FRAMEWORK, linux-s390,
	Daniel Vetter, Kees Cook, Dan Williams, Andrew Morton,
	John Hubbard, Jérôme Glisse, Jan Kara, Linus Torvalds

On Fri, Oct 9, 2020 at 8:01 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Fri, Oct 09, 2020 at 07:52:05PM +0200, Daniel Vetter wrote:
>
> > > > If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE
> > > > flag that it would be denying the core mm code to set __GFP_MOVABLE.
> > >
> > > We can't tell from the VMA these kinds of details..
> > >
> > > It has to go the other direction, evey mmap that might be used as a
> > > userptr here has to be found and the VMA specially created to allow
> > > its use. At least that is a kernel only change, but will need people
> > > with the HW to do this work.
> >
> > I think the only reasonable way to keep this working is:
> > - add a struct dma_buf *vma_tryget_dma_buf(struct vm_area_struct *vma);
> > - add dma-buf export support to fbdev and v4l
> > - roll this out everywhere we still need it.
>
> It seems to me there is a technical way forward to restore user
> compat, so it is really no different than RDMA/DRM pain we both
> suffered before.
>
> Thus no justification to NAK it. If media wants things to keep working
> they have to do the technical path like you outline above.
>
> > Realistically this just isn't going to happen.
>
> If your series goes ahead it will get solved. Someone will take on the
> huge project to either add DMA buf to the drivers people still care
> about, or do the work above to transparently handle in kernel.
>
> If we allow things to keep working without consequence then nobody
> will do it.
>
> The only reason we did the 4 years of work in RDMA was because Linus
> went in and broke the uABI for a security fix. It was hundreds of
> patches to fix it, so I don't have much sympathy for "it is too hard"
> here.

Oh fully agreeing with you here, I just wanted to lay out that a)
there is a solid plan to fix it and b) it's way too much work for me
to just type it as a part of a "learn me some core mm semantics"
project :-)

I was hoping that we could get away with a special marker for
problematic vma, and filter those out. But after all the digging I've
noticed that on anything remotely modern, there's just nothing left.
Device memory management has become massively more dynamic in the past
10 years.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 09/17] mm: Add unsafe_follow_pfn
@ 2020-10-09 19:31                 ` Daniel Vetter
  0 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-09 19:31 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, KVM list,
	Mauro Carvalho Chehab, John Hubbard, LKML, DRI Development,
	Linux MM, Jérôme Glisse, Daniel Vetter, Dan Williams,
	Linus Torvalds, Andrew Morton, Linux ARM,
	open list:DMA BUFFER SHARING FRAMEWORK

On Fri, Oct 9, 2020 at 8:01 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Fri, Oct 09, 2020 at 07:52:05PM +0200, Daniel Vetter wrote:
>
> > > > If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE
> > > > flag that it would be denying the core mm code to set __GFP_MOVABLE.
> > >
> > > We can't tell from the VMA these kinds of details..
> > >
> > > It has to go the other direction, evey mmap that might be used as a
> > > userptr here has to be found and the VMA specially created to allow
> > > its use. At least that is a kernel only change, but will need people
> > > with the HW to do this work.
> >
> > I think the only reasonable way to keep this working is:
> > - add a struct dma_buf *vma_tryget_dma_buf(struct vm_area_struct *vma);
> > - add dma-buf export support to fbdev and v4l
> > - roll this out everywhere we still need it.
>
> It seems to me there is a technical way forward to restore user
> compat, so it is really no different than RDMA/DRM pain we both
> suffered before.
>
> Thus no justification to NAK it. If media wants things to keep working
> they have to do the technical path like you outline above.
>
> > Realistically this just isn't going to happen.
>
> If your series goes ahead it will get solved. Someone will take on the
> huge project to either add DMA buf to the drivers people still care
> about, or do the work above to transparently handle in kernel.
>
> If we allow things to keep working without consequence then nobody
> will do it.
>
> The only reason we did the 4 years of work in RDMA was because Linus
> went in and broke the uABI for a security fix. It was hundreds of
> patches to fix it, so I don't have much sympathy for "it is too hard"
> here.

Oh fully agreeing with you here, I just wanted to lay out that a)
there is a solid plan to fix it and b) it's way too much work for me
to just type it as a part of a "learn me some core mm semantics"
project :-)

I was hoping that we could get away with a special marker for
problematic vma, and filter those out. But after all the digging I've
noticed that on anything remotely modern, there's just nothing left.
Device memory management has become massively more dynamic in the past
10 years.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 09/17] mm: Add unsafe_follow_pfn
@ 2020-10-09 19:31                 ` Daniel Vetter
  0 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-09 19:31 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, KVM list,
	Mauro Carvalho Chehab, John Hubbard, LKML, DRI Development,
	Linux MM, Jérôme Glisse, Daniel Vetter, Dan Williams,
	Linus Torvalds, Andrew Morton, Linux ARM,
	open list:DMA BUFFER SHARING FRAMEWORK

On Fri, Oct 9, 2020 at 8:01 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Fri, Oct 09, 2020 at 07:52:05PM +0200, Daniel Vetter wrote:
>
> > > > If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE
> > > > flag that it would be denying the core mm code to set __GFP_MOVABLE.
> > >
> > > We can't tell from the VMA these kinds of details..
> > >
> > > It has to go the other direction, evey mmap that might be used as a
> > > userptr here has to be found and the VMA specially created to allow
> > > its use. At least that is a kernel only change, but will need people
> > > with the HW to do this work.
> >
> > I think the only reasonable way to keep this working is:
> > - add a struct dma_buf *vma_tryget_dma_buf(struct vm_area_struct *vma);
> > - add dma-buf export support to fbdev and v4l
> > - roll this out everywhere we still need it.
>
> It seems to me there is a technical way forward to restore user
> compat, so it is really no different than RDMA/DRM pain we both
> suffered before.
>
> Thus no justification to NAK it. If media wants things to keep working
> they have to do the technical path like you outline above.
>
> > Realistically this just isn't going to happen.
>
> If your series goes ahead it will get solved. Someone will take on the
> huge project to either add DMA buf to the drivers people still care
> about, or do the work above to transparently handle in kernel.
>
> If we allow things to keep working without consequence then nobody
> will do it.
>
> The only reason we did the 4 years of work in RDMA was because Linus
> went in and broke the uABI for a security fix. It was hundreds of
> patches to fix it, so I don't have much sympathy for "it is too hard"
> here.

Oh fully agreeing with you here, I just wanted to lay out that a)
there is a solid plan to fix it and b) it's way too much work for me
to just type it as a part of a "learn me some core mm semantics"
project :-)

I was hoping that we could get away with a special marker for
problematic vma, and filter those out. But after all the digging I've
noticed that on anything remotely modern, there's just nothing left.
Device memory management has become massively more dynamic in the past
10 years.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 09/17] mm: Add unsafe_follow_pfn
  2020-10-09 17:52             ` Daniel Vetter
  (?)
@ 2020-10-10  9:21               ` Mauro Carvalho Chehab
  -1 siblings, 0 replies; 214+ messages in thread
From: Mauro Carvalho Chehab @ 2020-10-10  9:21 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Jason Gunthorpe, DRI Development, LKML, KVM list, Linux MM,
	Linux ARM, linux-samsung-soc,
	open list:DMA BUFFER SHARING FRAMEWORK, linux-s390,
	Daniel Vetter, Kees Cook, Dan Williams, Andrew Morton,
	John Hubbard, Jérôme Glisse, Jan Kara, Linus Torvalds

Em Fri, 9 Oct 2020 19:52:05 +0200
Daniel Vetter <daniel.vetter@ffwll.ch> escreveu:

> On Fri, Oct 9, 2020 at 2:48 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Fri, Oct 09, 2020 at 02:37:23PM +0200, Mauro Carvalho Chehab wrote:
> >  
> > > I'm not a mm/ expert, but, from what I understood from Daniel's patch
> > > description is that this is unsafe *only if*  __GFP_MOVABLE is used.  
> >
> > No, it is unconditionally unsafe. The CMA movable mappings are
> > specific VMAs that will have bad issues here, but there are other
> > types too.

I didn't check the mm dirty details, but I strongly suspect that the mm
code has a way to prevent moving a mmapped page while it is still in usage.

If not, then the entire movable pages concept sounds broken to me, and
has to be fixed at mm subsystem.

> >
> > The only way to do something at a VMA level is to have a list of OK
> > VMAs, eg because they were creatd via a special mmap helper from the
> > media subsystem.

I'm not sure if I'm following you on that. The media API can work with
different ways of sending buffers to userspace:

	- read();

	- using the overlay mode. This interface is deprecated in
	  practice, being replaced by DMABUF. Only a few older hardware
	  supports it, and it depends on an special X11 helper driver
	  for it to work.

	- via DMABUF:
		https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/dmabuf.html

	- via mmap, using a mmap helper:
		https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/mmap.html

	- via mmap, using userspace-provided pointers:
		https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/userp.html

The existing open-source programs usually chose one or more of the above
modes. if the Kernel driver returns -EINVAL when an mmapped streaming I/O
mode is not supported, userspace has to select a different method.

Most userspace open source programs have fallback support: if one
mmap I/O method fails, it selects another one, although this is not
a mandatory requirement. I found (and fixed) a few ones that were
relying exclusively on userptr support, but I didn't make a 
comprehensive check.

Also there are a number of relevant closed-source apps that we have no 
idea about what methods they use, like Skype, and other similar
videoconferencing programs. Breaking support for those, specially at
a time where people are relying on it in order to participate on
conferences and doing internal meetings is a **very bad** idea.

So, whatever solution is taken, it should not be dumping warning
messages at the system and tainting the Kernel, but, instead, checking
if the userspace request is valid or not. If it is invalid, return the
proper error code via the right V4L2 ioctl, in order to let userspace
choose a different method. I the request is valid, refcount the pages 
for them to not be moved while they're still in usage.

-

Let me provide some background about how things work at the media
subsytem. If you want to know more, the userspace-provided memory 
mapped pointers work is described here:

	https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/userp.html#userp

Basically, userspace calls either one of those ioctls:

	VIDIOC_CREATE_BUFS:
		https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/vidioc-create-bufs.html

Which is translated into a videobuf2 call to: vb2_ioctl_create_bufs()

	VIDIOC_REQBUFS
		https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/vidioc-reqbufs.html#vidioc-reqbufs

Which is translated into a videobuf2 call to: vb2_ioctl_reqbufs()

Both internally calls vb2_verify_memory_type(), which is responsible
for checking if the provided pointers are OK for the usage and/or do
all necessary page allocations, and taking care of any special 
requirements. This could easily have some additional checks to
verify if the requested VMA address has pages that are movable or
not, ensuring that ensure that the VMA is OK, and locking them in
order to prevent the mm code to move such pages while they are in
usage by the media subsystem.

Now, as I said before, I don't know the dirty details about how
to lock those pages at the mm subsystem in order to avoid it
to move the used pages. Yet, when vb2_create_framevec()
is called, the check if VMA is OK should already be happened
at vb2_verify_memory_type().

-

It should be noticed that the dirty hack added by patch 09/17
and 10/17 affects *all* types of memory allocations at V4L2, 
as this kAPI is called by the 3 different memory models
supported at the media subsystem:

	drivers/media/common/videobuf2/videobuf2-vmalloc.c
	drivers/media/common/videobuf2/videobuf2-dma-contig.c
	drivers/media/common/videobuf2/videobuf2-dma-sg.c

In other words, with this code:

	int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address,
		unsigned long *pfn)
	{
	#ifdef CONFIG_STRICT_FOLLOW_PFN
		pr_info("unsafe follow_pfn usage rejected, see CONFIG_STRICT_FOLLOW_PFN\n");
		return -EINVAL;
	#else
		WARN_ONCE(1, "unsafe follow_pfn usage\n");
		add_taint(TAINT_USER, LOCKDEP_STILL_OK);

		return follow_pfn(vma, address, pfn);
	#endif

you're unconditionally breaking the media userspace API support not
only for embedded systems that could be using userptr instead of
DMA_BUF, but also for *all* video devices, including USB cameras.

This is **NOT** an acceptable solution. 

So, I stand my NACK to this approach.

> > > Well, no drivers inside the media subsystem uses such flag, although
> > > they may rely on some infrastructure that could be using it behind
> > > the bars.  
> >
> > It doesn't matter, nothing prevents the user from calling media APIs
> > on mmaps it gets from other subsystems.  
> 
> I think a good first step would be to disable userptr of non struct
> page backed storage going forward for any new hw support. Even on
> existing drivers. dma-buf sharing has been around for long enough now
> that this shouldn't be a problem. Unfortunately right now this doesn't
> seem to exist, so the entire problem keeps getting perpetuated.

Well, the media uAPI does support DMABUF (both import and export):

	https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/dmabuf.html
	https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/vidioc-expbuf.html#vidioc-expbuf

And I fully agree that newer programs should use DMABUF when sharing
buffers with DRM subsystem, but that's not my main concern.

What I do want is to not break userspace support nor to taint the Kernel
due to a valid uAPI call.

A valid uAPI call should check if the parameters passed though it are
valid. If they are, it should handle. Otherwise, it should return
-EINVAL, without tainting the Kernel or printing warning messages.

The approach took by patches 09/17 and 10/17 doesn't do that.
Instead, they just unconditionally breaks the media uAPI.

What should be done, instead, is to drop patch 10/17, and work on
a way for the code inside vb2_create_framevec() to ensure that, if
USERPTR is used, the memory pages will be properly locked while the
driver is using, returning -EINVAL only if there's no way to proceed,
without tainting the Kernel.

> 
> > > If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE
> > > flag that it would be denying the core mm code to set __GFP_MOVABLE.  
> >
> > We can't tell from the VMA these kinds of details..
> >
> > It has to go the other direction, evey mmap that might be used as a
> > userptr here has to be found and the VMA specially created to allow
> > its use. At least that is a kernel only change, but will need people
> > with the HW to do this work.  
> 
> I think the only reasonable way to keep this working is:
> - add a struct dma_buf *vma_tryget_dma_buf(struct vm_area_struct *vma);

Not sure how an userspace buffer could be mapped to be using it,
specially since the buffer may not even be imported/exported
from the DRM subsystem, but it could simply be allocated
via glibc calloc()/malloc().

> - add dma-buf export support to fbdev and v4l

V4L has support for it already.

> - roll this out everywhere we still need it.

That's where things are hard. This is not like DRM, where the APIs
are called via some open source libraries that are also managed
by DRM upstream developers.

In the case of the media subsystem, while we added a libv4l sometime
ago, not all userspace apps use it, as a large part of them used
to exist before the addition of the libraries. Also, we're currently 
trying to deprecate libv4l, at least for embedded systems, in favor
of libcamera.

On media, there are lots of closed source apps that uses the media API
directly. Even talking about open source ones, there are lots of
different apps, including not only media-specific apps, but also
generic ones, like web browsers, which don't use the libraries we
wrote.

An userspace API breakage would take *huge* efforts and will take
lots of time to have it merged everywhere. It will cause lots of
troubles everywhere.

> Realistically this just isn't going to happen. 

Why not? Any app developer could already use DMA-BUF if required,
as the upstream support was added several years ago.

> And anything else just
> reimplements half of dma-buf, 

It is just the opposite: those uAPIs came years before dma-buf.
In other words, it was dma-buf that re-implemented it ;-)

Now, I agree with you that dma-buf is a way cooler than the past
alternatives.

-

It sounds to me that you're considering on only one use case of 
USERPTR: to pass a buffer created from DRM. As far as I'm aware,
only embedded userspace applications actually do that.

Yet, there are a number of other applications that do something like
the userptr_capture() function on this code:

	https://git.linuxtv.org/v4l-utils.git/tree/contrib/test/v4l2grab.c

E. g. using glibc alloc functions like calloc() to allocate memory, 
passing the user-allocated data to the Kernel via something like this:

	struct v4l2_requestbuffers req;
	struct v4l2_buffer buf;
	int n_buffers = 2;

	req.count  = 2;
	req.type   = V4L2_BUF_TYPE_VIDEO_CAPTURE;
	req.memory = V4L2_MEMORY_USERPTR;
	if (ioctl(fd, VIDIOC_REQBUFS, &req))
		return errno;

	for (i = 0; i < req.count; ++i) {
		buf.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
		buf.memory = V4L2_MEMORY_USERPTR;
		buf.index = i;
		buf.m.userptr = (unsigned long)buffers[i].start;
		buf.length = buffers[i].length;
		if (ioctl(fd, VIDIOC_QBUF, &buf))
			return errno;
	}
	if (ioctl(fd, VIDIOC_STREAMON, &req.type))
		return errno;

	/* some capture loop */

	ioctl(fd, VIDIOC_STREAMOFF, &req.type);

I can't possibly see *any* security issues with the above code.

As I said before, VIDIOC_REQBUFS should be checking if the userspace
buffers are OK and ensure that their refcounts will be incremented,
in order to avoid mm to move the pages used there, freeing the 
refconts when VIDIOC_STREAMOFF - or close(fd) - is called.

> which is kinda pointless (you need
> minimally refcounting and some way to get at a promise of a permanent
> sg list for dma. Plus probably the vmap for kernel cpu access.

Yeah, refcounting needs to happen. 

Thanks,
Mauro

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 09/17] mm: Add unsafe_follow_pfn
@ 2020-10-10  9:21               ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 214+ messages in thread
From: Mauro Carvalho Chehab @ 2020-10-10  9:21 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, KVM list,
	Jason Gunthorpe, John Hubbard, LKML, DRI Development, Linux MM,
	Jérôme Glisse, Daniel Vetter, Dan Williams,
	Linus Torvalds, Andrew Morton, Linux ARM,
	open list:DMA BUFFER SHARING FRAMEWORK

Em Fri, 9 Oct 2020 19:52:05 +0200
Daniel Vetter <daniel.vetter@ffwll.ch> escreveu:

> On Fri, Oct 9, 2020 at 2:48 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Fri, Oct 09, 2020 at 02:37:23PM +0200, Mauro Carvalho Chehab wrote:
> >  
> > > I'm not a mm/ expert, but, from what I understood from Daniel's patch
> > > description is that this is unsafe *only if*  __GFP_MOVABLE is used.  
> >
> > No, it is unconditionally unsafe. The CMA movable mappings are
> > specific VMAs that will have bad issues here, but there are other
> > types too.

I didn't check the mm dirty details, but I strongly suspect that the mm
code has a way to prevent moving a mmapped page while it is still in usage.

If not, then the entire movable pages concept sounds broken to me, and
has to be fixed at mm subsystem.

> >
> > The only way to do something at a VMA level is to have a list of OK
> > VMAs, eg because they were creatd via a special mmap helper from the
> > media subsystem.

I'm not sure if I'm following you on that. The media API can work with
different ways of sending buffers to userspace:

	- read();

	- using the overlay mode. This interface is deprecated in
	  practice, being replaced by DMABUF. Only a few older hardware
	  supports it, and it depends on an special X11 helper driver
	  for it to work.

	- via DMABUF:
		https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/dmabuf.html

	- via mmap, using a mmap helper:
		https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/mmap.html

	- via mmap, using userspace-provided pointers:
		https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/userp.html

The existing open-source programs usually chose one or more of the above
modes. if the Kernel driver returns -EINVAL when an mmapped streaming I/O
mode is not supported, userspace has to select a different method.

Most userspace open source programs have fallback support: if one
mmap I/O method fails, it selects another one, although this is not
a mandatory requirement. I found (and fixed) a few ones that were
relying exclusively on userptr support, but I didn't make a 
comprehensive check.

Also there are a number of relevant closed-source apps that we have no 
idea about what methods they use, like Skype, and other similar
videoconferencing programs. Breaking support for those, specially at
a time where people are relying on it in order to participate on
conferences and doing internal meetings is a **very bad** idea.

So, whatever solution is taken, it should not be dumping warning
messages at the system and tainting the Kernel, but, instead, checking
if the userspace request is valid or not. If it is invalid, return the
proper error code via the right V4L2 ioctl, in order to let userspace
choose a different method. I the request is valid, refcount the pages 
for them to not be moved while they're still in usage.

-

Let me provide some background about how things work at the media
subsytem. If you want to know more, the userspace-provided memory 
mapped pointers work is described here:

	https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/userp.html#userp

Basically, userspace calls either one of those ioctls:

	VIDIOC_CREATE_BUFS:
		https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/vidioc-create-bufs.html

Which is translated into a videobuf2 call to: vb2_ioctl_create_bufs()

	VIDIOC_REQBUFS
		https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/vidioc-reqbufs.html#vidioc-reqbufs

Which is translated into a videobuf2 call to: vb2_ioctl_reqbufs()

Both internally calls vb2_verify_memory_type(), which is responsible
for checking if the provided pointers are OK for the usage and/or do
all necessary page allocations, and taking care of any special 
requirements. This could easily have some additional checks to
verify if the requested VMA address has pages that are movable or
not, ensuring that ensure that the VMA is OK, and locking them in
order to prevent the mm code to move such pages while they are in
usage by the media subsystem.

Now, as I said before, I don't know the dirty details about how
to lock those pages at the mm subsystem in order to avoid it
to move the used pages. Yet, when vb2_create_framevec()
is called, the check if VMA is OK should already be happened
at vb2_verify_memory_type().

-

It should be noticed that the dirty hack added by patch 09/17
and 10/17 affects *all* types of memory allocations at V4L2, 
as this kAPI is called by the 3 different memory models
supported at the media subsystem:

	drivers/media/common/videobuf2/videobuf2-vmalloc.c
	drivers/media/common/videobuf2/videobuf2-dma-contig.c
	drivers/media/common/videobuf2/videobuf2-dma-sg.c

In other words, with this code:

	int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address,
		unsigned long *pfn)
	{
	#ifdef CONFIG_STRICT_FOLLOW_PFN
		pr_info("unsafe follow_pfn usage rejected, see CONFIG_STRICT_FOLLOW_PFN\n");
		return -EINVAL;
	#else
		WARN_ONCE(1, "unsafe follow_pfn usage\n");
		add_taint(TAINT_USER, LOCKDEP_STILL_OK);

		return follow_pfn(vma, address, pfn);
	#endif

you're unconditionally breaking the media userspace API support not
only for embedded systems that could be using userptr instead of
DMA_BUF, but also for *all* video devices, including USB cameras.

This is **NOT** an acceptable solution. 

So, I stand my NACK to this approach.

> > > Well, no drivers inside the media subsystem uses such flag, although
> > > they may rely on some infrastructure that could be using it behind
> > > the bars.  
> >
> > It doesn't matter, nothing prevents the user from calling media APIs
> > on mmaps it gets from other subsystems.  
> 
> I think a good first step would be to disable userptr of non struct
> page backed storage going forward for any new hw support. Even on
> existing drivers. dma-buf sharing has been around for long enough now
> that this shouldn't be a problem. Unfortunately right now this doesn't
> seem to exist, so the entire problem keeps getting perpetuated.

Well, the media uAPI does support DMABUF (both import and export):

	https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/dmabuf.html
	https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/vidioc-expbuf.html#vidioc-expbuf

And I fully agree that newer programs should use DMABUF when sharing
buffers with DRM subsystem, but that's not my main concern.

What I do want is to not break userspace support nor to taint the Kernel
due to a valid uAPI call.

A valid uAPI call should check if the parameters passed though it are
valid. If they are, it should handle. Otherwise, it should return
-EINVAL, without tainting the Kernel or printing warning messages.

The approach took by patches 09/17 and 10/17 doesn't do that.
Instead, they just unconditionally breaks the media uAPI.

What should be done, instead, is to drop patch 10/17, and work on
a way for the code inside vb2_create_framevec() to ensure that, if
USERPTR is used, the memory pages will be properly locked while the
driver is using, returning -EINVAL only if there's no way to proceed,
without tainting the Kernel.

> 
> > > If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE
> > > flag that it would be denying the core mm code to set __GFP_MOVABLE.  
> >
> > We can't tell from the VMA these kinds of details..
> >
> > It has to go the other direction, evey mmap that might be used as a
> > userptr here has to be found and the VMA specially created to allow
> > its use. At least that is a kernel only change, but will need people
> > with the HW to do this work.  
> 
> I think the only reasonable way to keep this working is:
> - add a struct dma_buf *vma_tryget_dma_buf(struct vm_area_struct *vma);

Not sure how an userspace buffer could be mapped to be using it,
specially since the buffer may not even be imported/exported
from the DRM subsystem, but it could simply be allocated
via glibc calloc()/malloc().

> - add dma-buf export support to fbdev and v4l

V4L has support for it already.

> - roll this out everywhere we still need it.

That's where things are hard. This is not like DRM, where the APIs
are called via some open source libraries that are also managed
by DRM upstream developers.

In the case of the media subsystem, while we added a libv4l sometime
ago, not all userspace apps use it, as a large part of them used
to exist before the addition of the libraries. Also, we're currently 
trying to deprecate libv4l, at least for embedded systems, in favor
of libcamera.

On media, there are lots of closed source apps that uses the media API
directly. Even talking about open source ones, there are lots of
different apps, including not only media-specific apps, but also
generic ones, like web browsers, which don't use the libraries we
wrote.

An userspace API breakage would take *huge* efforts and will take
lots of time to have it merged everywhere. It will cause lots of
troubles everywhere.

> Realistically this just isn't going to happen. 

Why not? Any app developer could already use DMA-BUF if required,
as the upstream support was added several years ago.

> And anything else just
> reimplements half of dma-buf, 

It is just the opposite: those uAPIs came years before dma-buf.
In other words, it was dma-buf that re-implemented it ;-)

Now, I agree with you that dma-buf is a way cooler than the past
alternatives.

-

It sounds to me that you're considering on only one use case of 
USERPTR: to pass a buffer created from DRM. As far as I'm aware,
only embedded userspace applications actually do that.

Yet, there are a number of other applications that do something like
the userptr_capture() function on this code:

	https://git.linuxtv.org/v4l-utils.git/tree/contrib/test/v4l2grab.c

E. g. using glibc alloc functions like calloc() to allocate memory, 
passing the user-allocated data to the Kernel via something like this:

	struct v4l2_requestbuffers req;
	struct v4l2_buffer buf;
	int n_buffers = 2;

	req.count  = 2;
	req.type   = V4L2_BUF_TYPE_VIDEO_CAPTURE;
	req.memory = V4L2_MEMORY_USERPTR;
	if (ioctl(fd, VIDIOC_REQBUFS, &req))
		return errno;

	for (i = 0; i < req.count; ++i) {
		buf.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
		buf.memory = V4L2_MEMORY_USERPTR;
		buf.index = i;
		buf.m.userptr = (unsigned long)buffers[i].start;
		buf.length = buffers[i].length;
		if (ioctl(fd, VIDIOC_QBUF, &buf))
			return errno;
	}
	if (ioctl(fd, VIDIOC_STREAMON, &req.type))
		return errno;

	/* some capture loop */

	ioctl(fd, VIDIOC_STREAMOFF, &req.type);

I can't possibly see *any* security issues with the above code.

As I said before, VIDIOC_REQBUFS should be checking if the userspace
buffers are OK and ensure that their refcounts will be incremented,
in order to avoid mm to move the pages used there, freeing the 
refconts when VIDIOC_STREAMOFF - or close(fd) - is called.

> which is kinda pointless (you need
> minimally refcounting and some way to get at a promise of a permanent
> sg list for dma. Plus probably the vmap for kernel cpu access.

Yeah, refcounting needs to happen. 

Thanks,
Mauro

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 09/17] mm: Add unsafe_follow_pfn
@ 2020-10-10  9:21               ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 214+ messages in thread
From: Mauro Carvalho Chehab @ 2020-10-10  9:21 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, KVM list,
	Jason Gunthorpe, John Hubbard, LKML, DRI Development, Linux MM,
	Jérôme Glisse, Daniel Vetter, Dan Williams,
	Linus Torvalds, Andrew Morton, Linux ARM,
	open list:DMA BUFFER SHARING FRAMEWORK

Em Fri, 9 Oct 2020 19:52:05 +0200
Daniel Vetter <daniel.vetter@ffwll.ch> escreveu:

> On Fri, Oct 9, 2020 at 2:48 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Fri, Oct 09, 2020 at 02:37:23PM +0200, Mauro Carvalho Chehab wrote:
> >  
> > > I'm not a mm/ expert, but, from what I understood from Daniel's patch
> > > description is that this is unsafe *only if*  __GFP_MOVABLE is used.  
> >
> > No, it is unconditionally unsafe. The CMA movable mappings are
> > specific VMAs that will have bad issues here, but there are other
> > types too.

I didn't check the mm dirty details, but I strongly suspect that the mm
code has a way to prevent moving a mmapped page while it is still in usage.

If not, then the entire movable pages concept sounds broken to me, and
has to be fixed at mm subsystem.

> >
> > The only way to do something at a VMA level is to have a list of OK
> > VMAs, eg because they were creatd via a special mmap helper from the
> > media subsystem.

I'm not sure if I'm following you on that. The media API can work with
different ways of sending buffers to userspace:

	- read();

	- using the overlay mode. This interface is deprecated in
	  practice, being replaced by DMABUF. Only a few older hardware
	  supports it, and it depends on an special X11 helper driver
	  for it to work.

	- via DMABUF:
		https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/dmabuf.html

	- via mmap, using a mmap helper:
		https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/mmap.html

	- via mmap, using userspace-provided pointers:
		https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/userp.html

The existing open-source programs usually chose one or more of the above
modes. if the Kernel driver returns -EINVAL when an mmapped streaming I/O
mode is not supported, userspace has to select a different method.

Most userspace open source programs have fallback support: if one
mmap I/O method fails, it selects another one, although this is not
a mandatory requirement. I found (and fixed) a few ones that were
relying exclusively on userptr support, but I didn't make a 
comprehensive check.

Also there are a number of relevant closed-source apps that we have no 
idea about what methods they use, like Skype, and other similar
videoconferencing programs. Breaking support for those, specially at
a time where people are relying on it in order to participate on
conferences and doing internal meetings is a **very bad** idea.

So, whatever solution is taken, it should not be dumping warning
messages at the system and tainting the Kernel, but, instead, checking
if the userspace request is valid or not. If it is invalid, return the
proper error code via the right V4L2 ioctl, in order to let userspace
choose a different method. I the request is valid, refcount the pages 
for them to not be moved while they're still in usage.

-

Let me provide some background about how things work at the media
subsytem. If you want to know more, the userspace-provided memory 
mapped pointers work is described here:

	https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/userp.html#userp

Basically, userspace calls either one of those ioctls:

	VIDIOC_CREATE_BUFS:
		https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/vidioc-create-bufs.html

Which is translated into a videobuf2 call to: vb2_ioctl_create_bufs()

	VIDIOC_REQBUFS
		https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/vidioc-reqbufs.html#vidioc-reqbufs

Which is translated into a videobuf2 call to: vb2_ioctl_reqbufs()

Both internally calls vb2_verify_memory_type(), which is responsible
for checking if the provided pointers are OK for the usage and/or do
all necessary page allocations, and taking care of any special 
requirements. This could easily have some additional checks to
verify if the requested VMA address has pages that are movable or
not, ensuring that ensure that the VMA is OK, and locking them in
order to prevent the mm code to move such pages while they are in
usage by the media subsystem.

Now, as I said before, I don't know the dirty details about how
to lock those pages at the mm subsystem in order to avoid it
to move the used pages. Yet, when vb2_create_framevec()
is called, the check if VMA is OK should already be happened
at vb2_verify_memory_type().

-

It should be noticed that the dirty hack added by patch 09/17
and 10/17 affects *all* types of memory allocations at V4L2, 
as this kAPI is called by the 3 different memory models
supported at the media subsystem:

	drivers/media/common/videobuf2/videobuf2-vmalloc.c
	drivers/media/common/videobuf2/videobuf2-dma-contig.c
	drivers/media/common/videobuf2/videobuf2-dma-sg.c

In other words, with this code:

	int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address,
		unsigned long *pfn)
	{
	#ifdef CONFIG_STRICT_FOLLOW_PFN
		pr_info("unsafe follow_pfn usage rejected, see CONFIG_STRICT_FOLLOW_PFN\n");
		return -EINVAL;
	#else
		WARN_ONCE(1, "unsafe follow_pfn usage\n");
		add_taint(TAINT_USER, LOCKDEP_STILL_OK);

		return follow_pfn(vma, address, pfn);
	#endif

you're unconditionally breaking the media userspace API support not
only for embedded systems that could be using userptr instead of
DMA_BUF, but also for *all* video devices, including USB cameras.

This is **NOT** an acceptable solution. 

So, I stand my NACK to this approach.

> > > Well, no drivers inside the media subsystem uses such flag, although
> > > they may rely on some infrastructure that could be using it behind
> > > the bars.  
> >
> > It doesn't matter, nothing prevents the user from calling media APIs
> > on mmaps it gets from other subsystems.  
> 
> I think a good first step would be to disable userptr of non struct
> page backed storage going forward for any new hw support. Even on
> existing drivers. dma-buf sharing has been around for long enough now
> that this shouldn't be a problem. Unfortunately right now this doesn't
> seem to exist, so the entire problem keeps getting perpetuated.

Well, the media uAPI does support DMABUF (both import and export):

	https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/dmabuf.html
	https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/vidioc-expbuf.html#vidioc-expbuf

And I fully agree that newer programs should use DMABUF when sharing
buffers with DRM subsystem, but that's not my main concern.

What I do want is to not break userspace support nor to taint the Kernel
due to a valid uAPI call.

A valid uAPI call should check if the parameters passed though it are
valid. If they are, it should handle. Otherwise, it should return
-EINVAL, without tainting the Kernel or printing warning messages.

The approach took by patches 09/17 and 10/17 doesn't do that.
Instead, they just unconditionally breaks the media uAPI.

What should be done, instead, is to drop patch 10/17, and work on
a way for the code inside vb2_create_framevec() to ensure that, if
USERPTR is used, the memory pages will be properly locked while the
driver is using, returning -EINVAL only if there's no way to proceed,
without tainting the Kernel.

> 
> > > If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE
> > > flag that it would be denying the core mm code to set __GFP_MOVABLE.  
> >
> > We can't tell from the VMA these kinds of details..
> >
> > It has to go the other direction, evey mmap that might be used as a
> > userptr here has to be found and the VMA specially created to allow
> > its use. At least that is a kernel only change, but will need people
> > with the HW to do this work.  
> 
> I think the only reasonable way to keep this working is:
> - add a struct dma_buf *vma_tryget_dma_buf(struct vm_area_struct *vma);

Not sure how an userspace buffer could be mapped to be using it,
specially since the buffer may not even be imported/exported
from the DRM subsystem, but it could simply be allocated
via glibc calloc()/malloc().

> - add dma-buf export support to fbdev and v4l

V4L has support for it already.

> - roll this out everywhere we still need it.

That's where things are hard. This is not like DRM, where the APIs
are called via some open source libraries that are also managed
by DRM upstream developers.

In the case of the media subsystem, while we added a libv4l sometime
ago, not all userspace apps use it, as a large part of them used
to exist before the addition of the libraries. Also, we're currently 
trying to deprecate libv4l, at least for embedded systems, in favor
of libcamera.

On media, there are lots of closed source apps that uses the media API
directly. Even talking about open source ones, there are lots of
different apps, including not only media-specific apps, but also
generic ones, like web browsers, which don't use the libraries we
wrote.

An userspace API breakage would take *huge* efforts and will take
lots of time to have it merged everywhere. It will cause lots of
troubles everywhere.

> Realistically this just isn't going to happen. 

Why not? Any app developer could already use DMA-BUF if required,
as the upstream support was added several years ago.

> And anything else just
> reimplements half of dma-buf, 

It is just the opposite: those uAPIs came years before dma-buf.
In other words, it was dma-buf that re-implemented it ;-)

Now, I agree with you that dma-buf is a way cooler than the past
alternatives.

-

It sounds to me that you're considering on only one use case of 
USERPTR: to pass a buffer created from DRM. As far as I'm aware,
only embedded userspace applications actually do that.

Yet, there are a number of other applications that do something like
the userptr_capture() function on this code:

	https://git.linuxtv.org/v4l-utils.git/tree/contrib/test/v4l2grab.c

E. g. using glibc alloc functions like calloc() to allocate memory, 
passing the user-allocated data to the Kernel via something like this:

	struct v4l2_requestbuffers req;
	struct v4l2_buffer buf;
	int n_buffers = 2;

	req.count  = 2;
	req.type   = V4L2_BUF_TYPE_VIDEO_CAPTURE;
	req.memory = V4L2_MEMORY_USERPTR;
	if (ioctl(fd, VIDIOC_REQBUFS, &req))
		return errno;

	for (i = 0; i < req.count; ++i) {
		buf.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
		buf.memory = V4L2_MEMORY_USERPTR;
		buf.index = i;
		buf.m.userptr = (unsigned long)buffers[i].start;
		buf.length = buffers[i].length;
		if (ioctl(fd, VIDIOC_QBUF, &buf))
			return errno;
	}
	if (ioctl(fd, VIDIOC_STREAMON, &req.type))
		return errno;

	/* some capture loop */

	ioctl(fd, VIDIOC_STREAMOFF, &req.type);

I can't possibly see *any* security issues with the above code.

As I said before, VIDIOC_REQBUFS should be checking if the userspace
buffers are OK and ensure that their refcounts will be incremented,
in order to avoid mm to move the pages used there, freeing the 
refconts when VIDIOC_STREAMOFF - or close(fd) - is called.

> which is kinda pointless (you need
> minimally refcounting and some way to get at a promise of a permanent
> sg list for dma. Plus probably the vmap for kernel cpu access.

Yeah, refcounting needs to happen. 

Thanks,
Mauro
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 10/17] media/videbuf1|2: Mark follow_pfn usage as unsafe
  2020-10-09  7:59   ` Daniel Vetter
  (?)
@ 2020-10-10  9:24     ` Mauro Carvalho Chehab
  -1 siblings, 0 replies; 214+ messages in thread
From: Mauro Carvalho Chehab @ 2020-10-10  9:24 UTC (permalink / raw)
  To: Daniel Vetter, Linus Torvalds
  Cc: DRI Development, LKML, kvm, linux-mm, linux-arm-kernel,
	linux-samsung-soc, linux-media, linux-s390, Daniel Vetter,
	Jason Gunthorpe, Kees Cook, Dan Williams, Andrew Morton,
	John Hubbard, Jérôme Glisse, Jan Kara, Pawel Osciak,
	Marek Szyprowski, Kyungmin Park, Tomasz Figa, Laurent Dufour,
	Vlastimil Babka, Daniel Jordan, Michel Lespinasse

Em Fri,  9 Oct 2020 09:59:27 +0200
Daniel Vetter <daniel.vetter@ffwll.ch> escreveu:

> The media model assumes that buffers are all preallocated, so that
> when a media pipeline is running we never miss a deadline because the
> buffers aren't allocated or available.
> 
> This means we cannot fix the v4l follow_pfn usage through
> mmu_notifier, without breaking how this all works. The only real fix
> is to deprecate userptr support for VM_IO | VM_PFNMAP mappings and
> tell everyone to cut over to dma-buf memory sharing for zerocopy.
> 
> userptr for normal memory will keep working as-is.

I won't repeat here the discussions for patch 09/17, but
just to be clear about this one:

NACK.

We need a better alternative to avoid breaking existing
media applications.

> 
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: linux-mm@kvack.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-samsung-soc@vger.kernel.org
> Cc: linux-media@vger.kernel.org
> Cc: Pawel Osciak <pawel@osciak.com>
> Cc: Marek Szyprowski <m.szyprowski@samsung.com>
> Cc: Kyungmin Park <kyungmin.park@samsung.com>
> Cc: Tomasz Figa <tfiga@chromium.org>
> Cc: Laurent Dufour <ldufour@linux.ibm.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
> Cc: Michel Lespinasse <walken@google.com>
> ---
>  drivers/media/common/videobuf2/frame_vector.c | 2 +-
>  drivers/media/v4l2-core/videobuf-dma-contig.c | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/media/common/videobuf2/frame_vector.c b/drivers/media/common/videobuf2/frame_vector.c
> index 2b0b97761d15..a1b85fe9e7c1 100644
> --- a/drivers/media/common/videobuf2/frame_vector.c
> +++ b/drivers/media/common/videobuf2/frame_vector.c
> @@ -69,7 +69,7 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
>  			break;
>  
>  		while (ret < nr_frames && start + PAGE_SIZE <= vma->vm_end) {
> -			err = follow_pfn(vma, start, &nums[ret]);
> +			err = unsafe_follow_pfn(vma, start, &nums[ret]);
>  			if (err) {
>  				if (ret == 0)
>  					ret = err;
> diff --git a/drivers/media/v4l2-core/videobuf-dma-contig.c b/drivers/media/v4l2-core/videobuf-dma-contig.c
> index 52312ce2ba05..821c4a76ab96 100644
> --- a/drivers/media/v4l2-core/videobuf-dma-contig.c
> +++ b/drivers/media/v4l2-core/videobuf-dma-contig.c
> @@ -183,7 +183,7 @@ static int videobuf_dma_contig_user_get(struct videobuf_dma_contig_memory *mem,
>  	user_address = untagged_baddr;
>  
>  	while (pages_done < (mem->size >> PAGE_SHIFT)) {
> -		ret = follow_pfn(vma, user_address, &this_pfn);
> +		ret = unsafe_follow_pfn(vma, user_address, &this_pfn);
>  		if (ret)
>  			break;
>  



Thanks,
Mauro

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 10/17] media/videbuf1|2: Mark follow_pfn usage as unsafe
@ 2020-10-10  9:24     ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 214+ messages in thread
From: Mauro Carvalho Chehab @ 2020-10-10  9:24 UTC (permalink / raw)
  To: Daniel Vetter, Linus Torvalds
  Cc: Jan Kara, kvm, DRI Development, linux-mm, Daniel Vetter,
	Michel Lespinasse, Marek Szyprowski, linux-s390,
	linux-samsung-soc, Daniel Jordan, Jason Gunthorpe,
	linux-arm-kernel, linux-media, Kees Cook, Pawel Osciak,
	John Hubbard, Jérôme Glisse, Dan Williams,
	Laurent Dufour, Vlastimil Babka, LKML, Tomasz Figa,
	Kyungmin Park, Andrew Morton

Em Fri,  9 Oct 2020 09:59:27 +0200
Daniel Vetter <daniel.vetter@ffwll.ch> escreveu:

> The media model assumes that buffers are all preallocated, so that
> when a media pipeline is running we never miss a deadline because the
> buffers aren't allocated or available.
> 
> This means we cannot fix the v4l follow_pfn usage through
> mmu_notifier, without breaking how this all works. The only real fix
> is to deprecate userptr support for VM_IO | VM_PFNMAP mappings and
> tell everyone to cut over to dma-buf memory sharing for zerocopy.
> 
> userptr for normal memory will keep working as-is.

I won't repeat here the discussions for patch 09/17, but
just to be clear about this one:

NACK.

We need a better alternative to avoid breaking existing
media applications.

> 
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: linux-mm@kvack.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-samsung-soc@vger.kernel.org
> Cc: linux-media@vger.kernel.org
> Cc: Pawel Osciak <pawel@osciak.com>
> Cc: Marek Szyprowski <m.szyprowski@samsung.com>
> Cc: Kyungmin Park <kyungmin.park@samsung.com>
> Cc: Tomasz Figa <tfiga@chromium.org>
> Cc: Laurent Dufour <ldufour@linux.ibm.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
> Cc: Michel Lespinasse <walken@google.com>
> ---
>  drivers/media/common/videobuf2/frame_vector.c | 2 +-
>  drivers/media/v4l2-core/videobuf-dma-contig.c | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/media/common/videobuf2/frame_vector.c b/drivers/media/common/videobuf2/frame_vector.c
> index 2b0b97761d15..a1b85fe9e7c1 100644
> --- a/drivers/media/common/videobuf2/frame_vector.c
> +++ b/drivers/media/common/videobuf2/frame_vector.c
> @@ -69,7 +69,7 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
>  			break;
>  
>  		while (ret < nr_frames && start + PAGE_SIZE <= vma->vm_end) {
> -			err = follow_pfn(vma, start, &nums[ret]);
> +			err = unsafe_follow_pfn(vma, start, &nums[ret]);
>  			if (err) {
>  				if (ret == 0)
>  					ret = err;
> diff --git a/drivers/media/v4l2-core/videobuf-dma-contig.c b/drivers/media/v4l2-core/videobuf-dma-contig.c
> index 52312ce2ba05..821c4a76ab96 100644
> --- a/drivers/media/v4l2-core/videobuf-dma-contig.c
> +++ b/drivers/media/v4l2-core/videobuf-dma-contig.c
> @@ -183,7 +183,7 @@ static int videobuf_dma_contig_user_get(struct videobuf_dma_contig_memory *mem,
>  	user_address = untagged_baddr;
>  
>  	while (pages_done < (mem->size >> PAGE_SHIFT)) {
> -		ret = follow_pfn(vma, user_address, &this_pfn);
> +		ret = unsafe_follow_pfn(vma, user_address, &this_pfn);
>  		if (ret)
>  			break;
>  



Thanks,
Mauro

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 10/17] media/videbuf1|2: Mark follow_pfn usage as unsafe
@ 2020-10-10  9:24     ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 214+ messages in thread
From: Mauro Carvalho Chehab @ 2020-10-10  9:24 UTC (permalink / raw)
  To: Daniel Vetter, Linus Torvalds
  Cc: Jan Kara, kvm, DRI Development, linux-mm, Daniel Vetter,
	Michel Lespinasse, Marek Szyprowski, linux-s390,
	linux-samsung-soc, Daniel Jordan, Jason Gunthorpe,
	linux-arm-kernel, linux-media, Kees Cook, Pawel Osciak,
	John Hubbard, Jérôme Glisse, Dan Williams,
	Laurent Dufour, Vlastimil Babka, LKML, Tomasz Figa,
	Kyungmin Park, Andrew Morton

Em Fri,  9 Oct 2020 09:59:27 +0200
Daniel Vetter <daniel.vetter@ffwll.ch> escreveu:

> The media model assumes that buffers are all preallocated, so that
> when a media pipeline is running we never miss a deadline because the
> buffers aren't allocated or available.
> 
> This means we cannot fix the v4l follow_pfn usage through
> mmu_notifier, without breaking how this all works. The only real fix
> is to deprecate userptr support for VM_IO | VM_PFNMAP mappings and
> tell everyone to cut over to dma-buf memory sharing for zerocopy.
> 
> userptr for normal memory will keep working as-is.

I won't repeat here the discussions for patch 09/17, but
just to be clear about this one:

NACK.

We need a better alternative to avoid breaking existing
media applications.

> 
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: linux-mm@kvack.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-samsung-soc@vger.kernel.org
> Cc: linux-media@vger.kernel.org
> Cc: Pawel Osciak <pawel@osciak.com>
> Cc: Marek Szyprowski <m.szyprowski@samsung.com>
> Cc: Kyungmin Park <kyungmin.park@samsung.com>
> Cc: Tomasz Figa <tfiga@chromium.org>
> Cc: Laurent Dufour <ldufour@linux.ibm.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
> Cc: Michel Lespinasse <walken@google.com>
> ---
>  drivers/media/common/videobuf2/frame_vector.c | 2 +-
>  drivers/media/v4l2-core/videobuf-dma-contig.c | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/media/common/videobuf2/frame_vector.c b/drivers/media/common/videobuf2/frame_vector.c
> index 2b0b97761d15..a1b85fe9e7c1 100644
> --- a/drivers/media/common/videobuf2/frame_vector.c
> +++ b/drivers/media/common/videobuf2/frame_vector.c
> @@ -69,7 +69,7 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
>  			break;
>  
>  		while (ret < nr_frames && start + PAGE_SIZE <= vma->vm_end) {
> -			err = follow_pfn(vma, start, &nums[ret]);
> +			err = unsafe_follow_pfn(vma, start, &nums[ret]);
>  			if (err) {
>  				if (ret == 0)
>  					ret = err;
> diff --git a/drivers/media/v4l2-core/videobuf-dma-contig.c b/drivers/media/v4l2-core/videobuf-dma-contig.c
> index 52312ce2ba05..821c4a76ab96 100644
> --- a/drivers/media/v4l2-core/videobuf-dma-contig.c
> +++ b/drivers/media/v4l2-core/videobuf-dma-contig.c
> @@ -183,7 +183,7 @@ static int videobuf_dma_contig_user_get(struct videobuf_dma_contig_memory *mem,
>  	user_address = untagged_baddr;
>  
>  	while (pages_done < (mem->size >> PAGE_SHIFT)) {
> -		ret = follow_pfn(vma, user_address, &this_pfn);
> +		ret = unsafe_follow_pfn(vma, user_address, &this_pfn);
>  		if (ret)
>  			break;
>  



Thanks,
Mauro
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 09/17] mm: Add unsafe_follow_pfn
  2020-10-10  9:21               ` Mauro Carvalho Chehab
  (?)
@ 2020-10-10 10:53                 ` Daniel Vetter
  -1 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-10 10:53 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Jason Gunthorpe, DRI Development, LKML, KVM list, Linux MM,
	Linux ARM, linux-samsung-soc,
	open list:DMA BUFFER SHARING FRAMEWORK, linux-s390,
	Daniel Vetter, Kees Cook, Dan Williams, Andrew Morton,
	John Hubbard, Jérôme Glisse, Jan Kara, Linus Torvalds

Hi Mauro,

You might want to read the patches more carefully, because what you're
demanding is what my patches do. Short summary:

- if STRICT_FOLLOW_PFN is set:
a) normal memory is handled as-is (i.e. your example works) through
the addition of FOLL_LONGTERM. This is the "pin the pages correctly"
approach you're demanding
b) for non-page memory (zerocopy sharing before dma-buf was upstreamed
is the only use-case for this) it is correctly rejected with -EINVAL

- if you do have blobby userspace which requires the zero-copy using
userptr to work, and doesn't have any of the fallbacks implemented
that you describe, this would be a regression. That's why
STRICT_FOLLOW_PFN can be unset. And yes there's a real security issue
in this usage, Marek also confirmed that the removal of the vma copy
code a few years ago essentially broke even the weak assumptions that
made the code work 10+ years ago when it was merged.

so tdlr; Everything you described will keep working even with the new
flag set, and everything you demand must be implemented _is_
implemented in this patch series.

Also please keep in mind that we are _not_ talking about the general
userptr support that was merge ~20 years ago. This patch series here
is _only_ about the zerocpy userptr support merged with 50ac952d2263
("[media] videobuf2-dma-sg: Support io userptr operations on io
memory") in 2013.

Why this hack was merged in 2013 when we merged dma-buf almost 2 years
before that I have no idea about. Imo that patch simply should never
have landed, and instead dma-buf support prioritized.

Cheers, Daniel


On Sat, Oct 10, 2020 at 11:21 AM Mauro Carvalho Chehab
<mchehab+huawei@kernel.org> wrote:
>
> Em Fri, 9 Oct 2020 19:52:05 +0200
> Daniel Vetter <daniel.vetter@ffwll.ch> escreveu:
>
> > On Fri, Oct 9, 2020 at 2:48 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > >
> > > On Fri, Oct 09, 2020 at 02:37:23PM +0200, Mauro Carvalho Chehab wrote:
> > >
> > > > I'm not a mm/ expert, but, from what I understood from Daniel's patch
> > > > description is that this is unsafe *only if*  __GFP_MOVABLE is used.
> > >
> > > No, it is unconditionally unsafe. The CMA movable mappings are
> > > specific VMAs that will have bad issues here, but there are other
> > > types too.
>
> I didn't check the mm dirty details, but I strongly suspect that the mm
> code has a way to prevent moving a mmapped page while it is still in usage.
>
> If not, then the entire movable pages concept sounds broken to me, and
> has to be fixed at mm subsystem.
>
> > >
> > > The only way to do something at a VMA level is to have a list of OK
> > > VMAs, eg because they were creatd via a special mmap helper from the
> > > media subsystem.
>
> I'm not sure if I'm following you on that. The media API can work with
> different ways of sending buffers to userspace:
>
>         - read();
>
>         - using the overlay mode. This interface is deprecated in
>           practice, being replaced by DMABUF. Only a few older hardware
>           supports it, and it depends on an special X11 helper driver
>           for it to work.
>
>         - via DMABUF:
>                 https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/dmabuf.html
>
>         - via mmap, using a mmap helper:
>                 https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/mmap.html
>
>         - via mmap, using userspace-provided pointers:
>                 https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/userp.html
>
> The existing open-source programs usually chose one or more of the above
> modes. if the Kernel driver returns -EINVAL when an mmapped streaming I/O
> mode is not supported, userspace has to select a different method.
>
> Most userspace open source programs have fallback support: if one
> mmap I/O method fails, it selects another one, although this is not
> a mandatory requirement. I found (and fixed) a few ones that were
> relying exclusively on userptr support, but I didn't make a
> comprehensive check.
>
> Also there are a number of relevant closed-source apps that we have no
> idea about what methods they use, like Skype, and other similar
> videoconferencing programs. Breaking support for those, specially at
> a time where people are relying on it in order to participate on
> conferences and doing internal meetings is a **very bad** idea.
>
> So, whatever solution is taken, it should not be dumping warning
> messages at the system and tainting the Kernel, but, instead, checking
> if the userspace request is valid or not. If it is invalid, return the
> proper error code via the right V4L2 ioctl, in order to let userspace
> choose a different method. I the request is valid, refcount the pages
> for them to not be moved while they're still in usage.
>
> -
>
> Let me provide some background about how things work at the media
> subsytem. If you want to know more, the userspace-provided memory
> mapped pointers work is described here:
>
>         https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/userp.html#userp
>
> Basically, userspace calls either one of those ioctls:
>
>         VIDIOC_CREATE_BUFS:
>                 https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/vidioc-create-bufs.html
>
> Which is translated into a videobuf2 call to: vb2_ioctl_create_bufs()
>
>         VIDIOC_REQBUFS
>                 https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/vidioc-reqbufs.html#vidioc-reqbufs
>
> Which is translated into a videobuf2 call to: vb2_ioctl_reqbufs()
>
> Both internally calls vb2_verify_memory_type(), which is responsible
> for checking if the provided pointers are OK for the usage and/or do
> all necessary page allocations, and taking care of any special
> requirements. This could easily have some additional checks to
> verify if the requested VMA address has pages that are movable or
> not, ensuring that ensure that the VMA is OK, and locking them in
> order to prevent the mm code to move such pages while they are in
> usage by the media subsystem.
>
> Now, as I said before, I don't know the dirty details about how
> to lock those pages at the mm subsystem in order to avoid it
> to move the used pages. Yet, when vb2_create_framevec()
> is called, the check if VMA is OK should already be happened
> at vb2_verify_memory_type().
>
> -
>
> It should be noticed that the dirty hack added by patch 09/17
> and 10/17 affects *all* types of memory allocations at V4L2,
> as this kAPI is called by the 3 different memory models
> supported at the media subsystem:
>
>         drivers/media/common/videobuf2/videobuf2-vmalloc.c
>         drivers/media/common/videobuf2/videobuf2-dma-contig.c
>         drivers/media/common/videobuf2/videobuf2-dma-sg.c
>
> In other words, with this code:
>
>         int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address,
>                 unsigned long *pfn)
>         {
>         #ifdef CONFIG_STRICT_FOLLOW_PFN
>                 pr_info("unsafe follow_pfn usage rejected, see CONFIG_STRICT_FOLLOW_PFN\n");
>                 return -EINVAL;
>         #else
>                 WARN_ONCE(1, "unsafe follow_pfn usage\n");
>                 add_taint(TAINT_USER, LOCKDEP_STILL_OK);
>
>                 return follow_pfn(vma, address, pfn);
>         #endif
>
> you're unconditionally breaking the media userspace API support not
> only for embedded systems that could be using userptr instead of
> DMA_BUF, but also for *all* video devices, including USB cameras.
>
> This is **NOT** an acceptable solution.
>
> So, I stand my NACK to this approach.
>
> > > > Well, no drivers inside the media subsystem uses such flag, although
> > > > they may rely on some infrastructure that could be using it behind
> > > > the bars.
> > >
> > > It doesn't matter, nothing prevents the user from calling media APIs
> > > on mmaps it gets from other subsystems.
> >
> > I think a good first step would be to disable userptr of non struct
> > page backed storage going forward for any new hw support. Even on
> > existing drivers. dma-buf sharing has been around for long enough now
> > that this shouldn't be a problem. Unfortunately right now this doesn't
> > seem to exist, so the entire problem keeps getting perpetuated.
>
> Well, the media uAPI does support DMABUF (both import and export):
>
>         https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/dmabuf.html
>         https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/vidioc-expbuf.html#vidioc-expbuf
>
> And I fully agree that newer programs should use DMABUF when sharing
> buffers with DRM subsystem, but that's not my main concern.
>
> What I do want is to not break userspace support nor to taint the Kernel
> due to a valid uAPI call.
>
> A valid uAPI call should check if the parameters passed though it are
> valid. If they are, it should handle. Otherwise, it should return
> -EINVAL, without tainting the Kernel or printing warning messages.
>
> The approach took by patches 09/17 and 10/17 doesn't do that.
> Instead, they just unconditionally breaks the media uAPI.
>
> What should be done, instead, is to drop patch 10/17, and work on
> a way for the code inside vb2_create_framevec() to ensure that, if
> USERPTR is used, the memory pages will be properly locked while the
> driver is using, returning -EINVAL only if there's no way to proceed,
> without tainting the Kernel.
>
> >
> > > > If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE
> > > > flag that it would be denying the core mm code to set __GFP_MOVABLE.
> > >
> > > We can't tell from the VMA these kinds of details..
> > >
> > > It has to go the other direction, evey mmap that might be used as a
> > > userptr here has to be found and the VMA specially created to allow
> > > its use. At least that is a kernel only change, but will need people
> > > with the HW to do this work.
> >
> > I think the only reasonable way to keep this working is:
> > - add a struct dma_buf *vma_tryget_dma_buf(struct vm_area_struct *vma);
>
> Not sure how an userspace buffer could be mapped to be using it,
> specially since the buffer may not even be imported/exported
> from the DRM subsystem, but it could simply be allocated
> via glibc calloc()/malloc().
>
> > - add dma-buf export support to fbdev and v4l
>
> V4L has support for it already.
>
> > - roll this out everywhere we still need it.
>
> That's where things are hard. This is not like DRM, where the APIs
> are called via some open source libraries that are also managed
> by DRM upstream developers.
>
> In the case of the media subsystem, while we added a libv4l sometime
> ago, not all userspace apps use it, as a large part of them used
> to exist before the addition of the libraries. Also, we're currently
> trying to deprecate libv4l, at least for embedded systems, in favor
> of libcamera.
>
> On media, there are lots of closed source apps that uses the media API
> directly. Even talking about open source ones, there are lots of
> different apps, including not only media-specific apps, but also
> generic ones, like web browsers, which don't use the libraries we
> wrote.
>
> An userspace API breakage would take *huge* efforts and will take
> lots of time to have it merged everywhere. It will cause lots of
> troubles everywhere.
>
> > Realistically this just isn't going to happen.
>
> Why not? Any app developer could already use DMA-BUF if required,
> as the upstream support was added several years ago.
>
> > And anything else just
> > reimplements half of dma-buf,
>
> It is just the opposite: those uAPIs came years before dma-buf.
> In other words, it was dma-buf that re-implemented it ;-)
>
> Now, I agree with you that dma-buf is a way cooler than the past
> alternatives.
>
> -
>
> It sounds to me that you're considering on only one use case of
> USERPTR: to pass a buffer created from DRM. As far as I'm aware,
> only embedded userspace applications actually do that.
>
> Yet, there are a number of other applications that do something like
> the userptr_capture() function on this code:
>
>         https://git.linuxtv.org/v4l-utils.git/tree/contrib/test/v4l2grab.c
>
> E. g. using glibc alloc functions like calloc() to allocate memory,
> passing the user-allocated data to the Kernel via something like this:
>
>         struct v4l2_requestbuffers req;
>         struct v4l2_buffer buf;
>         int n_buffers = 2;
>
>         req.count  = 2;
>         req.type   = V4L2_BUF_TYPE_VIDEO_CAPTURE;
>         req.memory = V4L2_MEMORY_USERPTR;
>         if (ioctl(fd, VIDIOC_REQBUFS, &req))
>                 return errno;
>
>         for (i = 0; i < req.count; ++i) {
>                 buf.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
>                 buf.memory = V4L2_MEMORY_USERPTR;
>                 buf.index = i;
>                 buf.m.userptr = (unsigned long)buffers[i].start;
>                 buf.length = buffers[i].length;
>                 if (ioctl(fd, VIDIOC_QBUF, &buf))
>                         return errno;
>         }
>         if (ioctl(fd, VIDIOC_STREAMON, &req.type))
>                 return errno;
>
>         /* some capture loop */
>
>         ioctl(fd, VIDIOC_STREAMOFF, &req.type);
>
> I can't possibly see *any* security issues with the above code.
>
> As I said before, VIDIOC_REQBUFS should be checking if the userspace
> buffers are OK and ensure that their refcounts will be incremented,
> in order to avoid mm to move the pages used there, freeing the
> refconts when VIDIOC_STREAMOFF - or close(fd) - is called.
>
> > which is kinda pointless (you need
> > minimally refcounting and some way to get at a promise of a permanent
> > sg list for dma. Plus probably the vmap for kernel cpu access.
>
> Yeah, refcounting needs to happen.
>
> Thanks,
> Mauro



--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 09/17] mm: Add unsafe_follow_pfn
@ 2020-10-10 10:53                 ` Daniel Vetter
  0 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-10 10:53 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, KVM list,
	Jason Gunthorpe, John Hubbard, LKML, DRI Development, Linux MM,
	Jérôme Glisse, Daniel Vetter, Dan Williams,
	Linus Torvalds, Andrew Morton, Linux ARM,
	open list:DMA BUFFER SHARING FRAMEWORK

Hi Mauro,

You might want to read the patches more carefully, because what you're
demanding is what my patches do. Short summary:

- if STRICT_FOLLOW_PFN is set:
a) normal memory is handled as-is (i.e. your example works) through
the addition of FOLL_LONGTERM. This is the "pin the pages correctly"
approach you're demanding
b) for non-page memory (zerocopy sharing before dma-buf was upstreamed
is the only use-case for this) it is correctly rejected with -EINVAL

- if you do have blobby userspace which requires the zero-copy using
userptr to work, and doesn't have any of the fallbacks implemented
that you describe, this would be a regression. That's why
STRICT_FOLLOW_PFN can be unset. And yes there's a real security issue
in this usage, Marek also confirmed that the removal of the vma copy
code a few years ago essentially broke even the weak assumptions that
made the code work 10+ years ago when it was merged.

so tdlr; Everything you described will keep working even with the new
flag set, and everything you demand must be implemented _is_
implemented in this patch series.

Also please keep in mind that we are _not_ talking about the general
userptr support that was merge ~20 years ago. This patch series here
is _only_ about the zerocpy userptr support merged with 50ac952d2263
("[media] videobuf2-dma-sg: Support io userptr operations on io
memory") in 2013.

Why this hack was merged in 2013 when we merged dma-buf almost 2 years
before that I have no idea about. Imo that patch simply should never
have landed, and instead dma-buf support prioritized.

Cheers, Daniel


On Sat, Oct 10, 2020 at 11:21 AM Mauro Carvalho Chehab
<mchehab+huawei@kernel.org> wrote:
>
> Em Fri, 9 Oct 2020 19:52:05 +0200
> Daniel Vetter <daniel.vetter@ffwll.ch> escreveu:
>
> > On Fri, Oct 9, 2020 at 2:48 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > >
> > > On Fri, Oct 09, 2020 at 02:37:23PM +0200, Mauro Carvalho Chehab wrote:
> > >
> > > > I'm not a mm/ expert, but, from what I understood from Daniel's patch
> > > > description is that this is unsafe *only if*  __GFP_MOVABLE is used.
> > >
> > > No, it is unconditionally unsafe. The CMA movable mappings are
> > > specific VMAs that will have bad issues here, but there are other
> > > types too.
>
> I didn't check the mm dirty details, but I strongly suspect that the mm
> code has a way to prevent moving a mmapped page while it is still in usage.
>
> If not, then the entire movable pages concept sounds broken to me, and
> has to be fixed at mm subsystem.
>
> > >
> > > The only way to do something at a VMA level is to have a list of OK
> > > VMAs, eg because they were creatd via a special mmap helper from the
> > > media subsystem.
>
> I'm not sure if I'm following you on that. The media API can work with
> different ways of sending buffers to userspace:
>
>         - read();
>
>         - using the overlay mode. This interface is deprecated in
>           practice, being replaced by DMABUF. Only a few older hardware
>           supports it, and it depends on an special X11 helper driver
>           for it to work.
>
>         - via DMABUF:
>                 https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/dmabuf.html
>
>         - via mmap, using a mmap helper:
>                 https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/mmap.html
>
>         - via mmap, using userspace-provided pointers:
>                 https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/userp.html
>
> The existing open-source programs usually chose one or more of the above
> modes. if the Kernel driver returns -EINVAL when an mmapped streaming I/O
> mode is not supported, userspace has to select a different method.
>
> Most userspace open source programs have fallback support: if one
> mmap I/O method fails, it selects another one, although this is not
> a mandatory requirement. I found (and fixed) a few ones that were
> relying exclusively on userptr support, but I didn't make a
> comprehensive check.
>
> Also there are a number of relevant closed-source apps that we have no
> idea about what methods they use, like Skype, and other similar
> videoconferencing programs. Breaking support for those, specially at
> a time where people are relying on it in order to participate on
> conferences and doing internal meetings is a **very bad** idea.
>
> So, whatever solution is taken, it should not be dumping warning
> messages at the system and tainting the Kernel, but, instead, checking
> if the userspace request is valid or not. If it is invalid, return the
> proper error code via the right V4L2 ioctl, in order to let userspace
> choose a different method. I the request is valid, refcount the pages
> for them to not be moved while they're still in usage.
>
> -
>
> Let me provide some background about how things work at the media
> subsytem. If you want to know more, the userspace-provided memory
> mapped pointers work is described here:
>
>         https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/userp.html#userp
>
> Basically, userspace calls either one of those ioctls:
>
>         VIDIOC_CREATE_BUFS:
>                 https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/vidioc-create-bufs.html
>
> Which is translated into a videobuf2 call to: vb2_ioctl_create_bufs()
>
>         VIDIOC_REQBUFS
>                 https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/vidioc-reqbufs.html#vidioc-reqbufs
>
> Which is translated into a videobuf2 call to: vb2_ioctl_reqbufs()
>
> Both internally calls vb2_verify_memory_type(), which is responsible
> for checking if the provided pointers are OK for the usage and/or do
> all necessary page allocations, and taking care of any special
> requirements. This could easily have some additional checks to
> verify if the requested VMA address has pages that are movable or
> not, ensuring that ensure that the VMA is OK, and locking them in
> order to prevent the mm code to move such pages while they are in
> usage by the media subsystem.
>
> Now, as I said before, I don't know the dirty details about how
> to lock those pages at the mm subsystem in order to avoid it
> to move the used pages. Yet, when vb2_create_framevec()
> is called, the check if VMA is OK should already be happened
> at vb2_verify_memory_type().
>
> -
>
> It should be noticed that the dirty hack added by patch 09/17
> and 10/17 affects *all* types of memory allocations at V4L2,
> as this kAPI is called by the 3 different memory models
> supported at the media subsystem:
>
>         drivers/media/common/videobuf2/videobuf2-vmalloc.c
>         drivers/media/common/videobuf2/videobuf2-dma-contig.c
>         drivers/media/common/videobuf2/videobuf2-dma-sg.c
>
> In other words, with this code:
>
>         int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address,
>                 unsigned long *pfn)
>         {
>         #ifdef CONFIG_STRICT_FOLLOW_PFN
>                 pr_info("unsafe follow_pfn usage rejected, see CONFIG_STRICT_FOLLOW_PFN\n");
>                 return -EINVAL;
>         #else
>                 WARN_ONCE(1, "unsafe follow_pfn usage\n");
>                 add_taint(TAINT_USER, LOCKDEP_STILL_OK);
>
>                 return follow_pfn(vma, address, pfn);
>         #endif
>
> you're unconditionally breaking the media userspace API support not
> only for embedded systems that could be using userptr instead of
> DMA_BUF, but also for *all* video devices, including USB cameras.
>
> This is **NOT** an acceptable solution.
>
> So, I stand my NACK to this approach.
>
> > > > Well, no drivers inside the media subsystem uses such flag, although
> > > > they may rely on some infrastructure that could be using it behind
> > > > the bars.
> > >
> > > It doesn't matter, nothing prevents the user from calling media APIs
> > > on mmaps it gets from other subsystems.
> >
> > I think a good first step would be to disable userptr of non struct
> > page backed storage going forward for any new hw support. Even on
> > existing drivers. dma-buf sharing has been around for long enough now
> > that this shouldn't be a problem. Unfortunately right now this doesn't
> > seem to exist, so the entire problem keeps getting perpetuated.
>
> Well, the media uAPI does support DMABUF (both import and export):
>
>         https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/dmabuf.html
>         https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/vidioc-expbuf.html#vidioc-expbuf
>
> And I fully agree that newer programs should use DMABUF when sharing
> buffers with DRM subsystem, but that's not my main concern.
>
> What I do want is to not break userspace support nor to taint the Kernel
> due to a valid uAPI call.
>
> A valid uAPI call should check if the parameters passed though it are
> valid. If they are, it should handle. Otherwise, it should return
> -EINVAL, without tainting the Kernel or printing warning messages.
>
> The approach took by patches 09/17 and 10/17 doesn't do that.
> Instead, they just unconditionally breaks the media uAPI.
>
> What should be done, instead, is to drop patch 10/17, and work on
> a way for the code inside vb2_create_framevec() to ensure that, if
> USERPTR is used, the memory pages will be properly locked while the
> driver is using, returning -EINVAL only if there's no way to proceed,
> without tainting the Kernel.
>
> >
> > > > If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE
> > > > flag that it would be denying the core mm code to set __GFP_MOVABLE.
> > >
> > > We can't tell from the VMA these kinds of details..
> > >
> > > It has to go the other direction, evey mmap that might be used as a
> > > userptr here has to be found and the VMA specially created to allow
> > > its use. At least that is a kernel only change, but will need people
> > > with the HW to do this work.
> >
> > I think the only reasonable way to keep this working is:
> > - add a struct dma_buf *vma_tryget_dma_buf(struct vm_area_struct *vma);
>
> Not sure how an userspace buffer could be mapped to be using it,
> specially since the buffer may not even be imported/exported
> from the DRM subsystem, but it could simply be allocated
> via glibc calloc()/malloc().
>
> > - add dma-buf export support to fbdev and v4l
>
> V4L has support for it already.
>
> > - roll this out everywhere we still need it.
>
> That's where things are hard. This is not like DRM, where the APIs
> are called via some open source libraries that are also managed
> by DRM upstream developers.
>
> In the case of the media subsystem, while we added a libv4l sometime
> ago, not all userspace apps use it, as a large part of them used
> to exist before the addition of the libraries. Also, we're currently
> trying to deprecate libv4l, at least for embedded systems, in favor
> of libcamera.
>
> On media, there are lots of closed source apps that uses the media API
> directly. Even talking about open source ones, there are lots of
> different apps, including not only media-specific apps, but also
> generic ones, like web browsers, which don't use the libraries we
> wrote.
>
> An userspace API breakage would take *huge* efforts and will take
> lots of time to have it merged everywhere. It will cause lots of
> troubles everywhere.
>
> > Realistically this just isn't going to happen.
>
> Why not? Any app developer could already use DMA-BUF if required,
> as the upstream support was added several years ago.
>
> > And anything else just
> > reimplements half of dma-buf,
>
> It is just the opposite: those uAPIs came years before dma-buf.
> In other words, it was dma-buf that re-implemented it ;-)
>
> Now, I agree with you that dma-buf is a way cooler than the past
> alternatives.
>
> -
>
> It sounds to me that you're considering on only one use case of
> USERPTR: to pass a buffer created from DRM. As far as I'm aware,
> only embedded userspace applications actually do that.
>
> Yet, there are a number of other applications that do something like
> the userptr_capture() function on this code:
>
>         https://git.linuxtv.org/v4l-utils.git/tree/contrib/test/v4l2grab.c
>
> E. g. using glibc alloc functions like calloc() to allocate memory,
> passing the user-allocated data to the Kernel via something like this:
>
>         struct v4l2_requestbuffers req;
>         struct v4l2_buffer buf;
>         int n_buffers = 2;
>
>         req.count  = 2;
>         req.type   = V4L2_BUF_TYPE_VIDEO_CAPTURE;
>         req.memory = V4L2_MEMORY_USERPTR;
>         if (ioctl(fd, VIDIOC_REQBUFS, &req))
>                 return errno;
>
>         for (i = 0; i < req.count; ++i) {
>                 buf.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
>                 buf.memory = V4L2_MEMORY_USERPTR;
>                 buf.index = i;
>                 buf.m.userptr = (unsigned long)buffers[i].start;
>                 buf.length = buffers[i].length;
>                 if (ioctl(fd, VIDIOC_QBUF, &buf))
>                         return errno;
>         }
>         if (ioctl(fd, VIDIOC_STREAMON, &req.type))
>                 return errno;
>
>         /* some capture loop */
>
>         ioctl(fd, VIDIOC_STREAMOFF, &req.type);
>
> I can't possibly see *any* security issues with the above code.
>
> As I said before, VIDIOC_REQBUFS should be checking if the userspace
> buffers are OK and ensure that their refcounts will be incremented,
> in order to avoid mm to move the pages used there, freeing the
> refconts when VIDIOC_STREAMOFF - or close(fd) - is called.
>
> > which is kinda pointless (you need
> > minimally refcounting and some way to get at a promise of a permanent
> > sg list for dma. Plus probably the vmap for kernel cpu access.
>
> Yeah, refcounting needs to happen.
>
> Thanks,
> Mauro



--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 09/17] mm: Add unsafe_follow_pfn
@ 2020-10-10 10:53                 ` Daniel Vetter
  0 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-10 10:53 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, KVM list,
	Jason Gunthorpe, John Hubbard, LKML, DRI Development, Linux MM,
	Jérôme Glisse, Daniel Vetter, Dan Williams,
	Linus Torvalds, Andrew Morton, Linux ARM,
	open list:DMA BUFFER SHARING FRAMEWORK

Hi Mauro,

You might want to read the patches more carefully, because what you're
demanding is what my patches do. Short summary:

- if STRICT_FOLLOW_PFN is set:
a) normal memory is handled as-is (i.e. your example works) through
the addition of FOLL_LONGTERM. This is the "pin the pages correctly"
approach you're demanding
b) for non-page memory (zerocopy sharing before dma-buf was upstreamed
is the only use-case for this) it is correctly rejected with -EINVAL

- if you do have blobby userspace which requires the zero-copy using
userptr to work, and doesn't have any of the fallbacks implemented
that you describe, this would be a regression. That's why
STRICT_FOLLOW_PFN can be unset. And yes there's a real security issue
in this usage, Marek also confirmed that the removal of the vma copy
code a few years ago essentially broke even the weak assumptions that
made the code work 10+ years ago when it was merged.

so tdlr; Everything you described will keep working even with the new
flag set, and everything you demand must be implemented _is_
implemented in this patch series.

Also please keep in mind that we are _not_ talking about the general
userptr support that was merge ~20 years ago. This patch series here
is _only_ about the zerocpy userptr support merged with 50ac952d2263
("[media] videobuf2-dma-sg: Support io userptr operations on io
memory") in 2013.

Why this hack was merged in 2013 when we merged dma-buf almost 2 years
before that I have no idea about. Imo that patch simply should never
have landed, and instead dma-buf support prioritized.

Cheers, Daniel


On Sat, Oct 10, 2020 at 11:21 AM Mauro Carvalho Chehab
<mchehab+huawei@kernel.org> wrote:
>
> Em Fri, 9 Oct 2020 19:52:05 +0200
> Daniel Vetter <daniel.vetter@ffwll.ch> escreveu:
>
> > On Fri, Oct 9, 2020 at 2:48 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > >
> > > On Fri, Oct 09, 2020 at 02:37:23PM +0200, Mauro Carvalho Chehab wrote:
> > >
> > > > I'm not a mm/ expert, but, from what I understood from Daniel's patch
> > > > description is that this is unsafe *only if*  __GFP_MOVABLE is used.
> > >
> > > No, it is unconditionally unsafe. The CMA movable mappings are
> > > specific VMAs that will have bad issues here, but there are other
> > > types too.
>
> I didn't check the mm dirty details, but I strongly suspect that the mm
> code has a way to prevent moving a mmapped page while it is still in usage.
>
> If not, then the entire movable pages concept sounds broken to me, and
> has to be fixed at mm subsystem.
>
> > >
> > > The only way to do something at a VMA level is to have a list of OK
> > > VMAs, eg because they were creatd via a special mmap helper from the
> > > media subsystem.
>
> I'm not sure if I'm following you on that. The media API can work with
> different ways of sending buffers to userspace:
>
>         - read();
>
>         - using the overlay mode. This interface is deprecated in
>           practice, being replaced by DMABUF. Only a few older hardware
>           supports it, and it depends on an special X11 helper driver
>           for it to work.
>
>         - via DMABUF:
>                 https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/dmabuf.html
>
>         - via mmap, using a mmap helper:
>                 https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/mmap.html
>
>         - via mmap, using userspace-provided pointers:
>                 https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/userp.html
>
> The existing open-source programs usually chose one or more of the above
> modes. if the Kernel driver returns -EINVAL when an mmapped streaming I/O
> mode is not supported, userspace has to select a different method.
>
> Most userspace open source programs have fallback support: if one
> mmap I/O method fails, it selects another one, although this is not
> a mandatory requirement. I found (and fixed) a few ones that were
> relying exclusively on userptr support, but I didn't make a
> comprehensive check.
>
> Also there are a number of relevant closed-source apps that we have no
> idea about what methods they use, like Skype, and other similar
> videoconferencing programs. Breaking support for those, specially at
> a time where people are relying on it in order to participate on
> conferences and doing internal meetings is a **very bad** idea.
>
> So, whatever solution is taken, it should not be dumping warning
> messages at the system and tainting the Kernel, but, instead, checking
> if the userspace request is valid or not. If it is invalid, return the
> proper error code via the right V4L2 ioctl, in order to let userspace
> choose a different method. I the request is valid, refcount the pages
> for them to not be moved while they're still in usage.
>
> -
>
> Let me provide some background about how things work at the media
> subsytem. If you want to know more, the userspace-provided memory
> mapped pointers work is described here:
>
>         https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/userp.html#userp
>
> Basically, userspace calls either one of those ioctls:
>
>         VIDIOC_CREATE_BUFS:
>                 https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/vidioc-create-bufs.html
>
> Which is translated into a videobuf2 call to: vb2_ioctl_create_bufs()
>
>         VIDIOC_REQBUFS
>                 https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/vidioc-reqbufs.html#vidioc-reqbufs
>
> Which is translated into a videobuf2 call to: vb2_ioctl_reqbufs()
>
> Both internally calls vb2_verify_memory_type(), which is responsible
> for checking if the provided pointers are OK for the usage and/or do
> all necessary page allocations, and taking care of any special
> requirements. This could easily have some additional checks to
> verify if the requested VMA address has pages that are movable or
> not, ensuring that ensure that the VMA is OK, and locking them in
> order to prevent the mm code to move such pages while they are in
> usage by the media subsystem.
>
> Now, as I said before, I don't know the dirty details about how
> to lock those pages at the mm subsystem in order to avoid it
> to move the used pages. Yet, when vb2_create_framevec()
> is called, the check if VMA is OK should already be happened
> at vb2_verify_memory_type().
>
> -
>
> It should be noticed that the dirty hack added by patch 09/17
> and 10/17 affects *all* types of memory allocations at V4L2,
> as this kAPI is called by the 3 different memory models
> supported at the media subsystem:
>
>         drivers/media/common/videobuf2/videobuf2-vmalloc.c
>         drivers/media/common/videobuf2/videobuf2-dma-contig.c
>         drivers/media/common/videobuf2/videobuf2-dma-sg.c
>
> In other words, with this code:
>
>         int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address,
>                 unsigned long *pfn)
>         {
>         #ifdef CONFIG_STRICT_FOLLOW_PFN
>                 pr_info("unsafe follow_pfn usage rejected, see CONFIG_STRICT_FOLLOW_PFN\n");
>                 return -EINVAL;
>         #else
>                 WARN_ONCE(1, "unsafe follow_pfn usage\n");
>                 add_taint(TAINT_USER, LOCKDEP_STILL_OK);
>
>                 return follow_pfn(vma, address, pfn);
>         #endif
>
> you're unconditionally breaking the media userspace API support not
> only for embedded systems that could be using userptr instead of
> DMA_BUF, but also for *all* video devices, including USB cameras.
>
> This is **NOT** an acceptable solution.
>
> So, I stand my NACK to this approach.
>
> > > > Well, no drivers inside the media subsystem uses such flag, although
> > > > they may rely on some infrastructure that could be using it behind
> > > > the bars.
> > >
> > > It doesn't matter, nothing prevents the user from calling media APIs
> > > on mmaps it gets from other subsystems.
> >
> > I think a good first step would be to disable userptr of non struct
> > page backed storage going forward for any new hw support. Even on
> > existing drivers. dma-buf sharing has been around for long enough now
> > that this shouldn't be a problem. Unfortunately right now this doesn't
> > seem to exist, so the entire problem keeps getting perpetuated.
>
> Well, the media uAPI does support DMABUF (both import and export):
>
>         https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/dmabuf.html
>         https://linuxtv.org/downloads/v4l-dvb-apis-new/userspace-api/v4l/vidioc-expbuf.html#vidioc-expbuf
>
> And I fully agree that newer programs should use DMABUF when sharing
> buffers with DRM subsystem, but that's not my main concern.
>
> What I do want is to not break userspace support nor to taint the Kernel
> due to a valid uAPI call.
>
> A valid uAPI call should check if the parameters passed though it are
> valid. If they are, it should handle. Otherwise, it should return
> -EINVAL, without tainting the Kernel or printing warning messages.
>
> The approach took by patches 09/17 and 10/17 doesn't do that.
> Instead, they just unconditionally breaks the media uAPI.
>
> What should be done, instead, is to drop patch 10/17, and work on
> a way for the code inside vb2_create_framevec() to ensure that, if
> USERPTR is used, the memory pages will be properly locked while the
> driver is using, returning -EINVAL only if there's no way to proceed,
> without tainting the Kernel.
>
> >
> > > > If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE
> > > > flag that it would be denying the core mm code to set __GFP_MOVABLE.
> > >
> > > We can't tell from the VMA these kinds of details..
> > >
> > > It has to go the other direction, evey mmap that might be used as a
> > > userptr here has to be found and the VMA specially created to allow
> > > its use. At least that is a kernel only change, but will need people
> > > with the HW to do this work.
> >
> > I think the only reasonable way to keep this working is:
> > - add a struct dma_buf *vma_tryget_dma_buf(struct vm_area_struct *vma);
>
> Not sure how an userspace buffer could be mapped to be using it,
> specially since the buffer may not even be imported/exported
> from the DRM subsystem, but it could simply be allocated
> via glibc calloc()/malloc().
>
> > - add dma-buf export support to fbdev and v4l
>
> V4L has support for it already.
>
> > - roll this out everywhere we still need it.
>
> That's where things are hard. This is not like DRM, where the APIs
> are called via some open source libraries that are also managed
> by DRM upstream developers.
>
> In the case of the media subsystem, while we added a libv4l sometime
> ago, not all userspace apps use it, as a large part of them used
> to exist before the addition of the libraries. Also, we're currently
> trying to deprecate libv4l, at least for embedded systems, in favor
> of libcamera.
>
> On media, there are lots of closed source apps that uses the media API
> directly. Even talking about open source ones, there are lots of
> different apps, including not only media-specific apps, but also
> generic ones, like web browsers, which don't use the libraries we
> wrote.
>
> An userspace API breakage would take *huge* efforts and will take
> lots of time to have it merged everywhere. It will cause lots of
> troubles everywhere.
>
> > Realistically this just isn't going to happen.
>
> Why not? Any app developer could already use DMA-BUF if required,
> as the upstream support was added several years ago.
>
> > And anything else just
> > reimplements half of dma-buf,
>
> It is just the opposite: those uAPIs came years before dma-buf.
> In other words, it was dma-buf that re-implemented it ;-)
>
> Now, I agree with you that dma-buf is a way cooler than the past
> alternatives.
>
> -
>
> It sounds to me that you're considering on only one use case of
> USERPTR: to pass a buffer created from DRM. As far as I'm aware,
> only embedded userspace applications actually do that.
>
> Yet, there are a number of other applications that do something like
> the userptr_capture() function on this code:
>
>         https://git.linuxtv.org/v4l-utils.git/tree/contrib/test/v4l2grab.c
>
> E. g. using glibc alloc functions like calloc() to allocate memory,
> passing the user-allocated data to the Kernel via something like this:
>
>         struct v4l2_requestbuffers req;
>         struct v4l2_buffer buf;
>         int n_buffers = 2;
>
>         req.count  = 2;
>         req.type   = V4L2_BUF_TYPE_VIDEO_CAPTURE;
>         req.memory = V4L2_MEMORY_USERPTR;
>         if (ioctl(fd, VIDIOC_REQBUFS, &req))
>                 return errno;
>
>         for (i = 0; i < req.count; ++i) {
>                 buf.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
>                 buf.memory = V4L2_MEMORY_USERPTR;
>                 buf.index = i;
>                 buf.m.userptr = (unsigned long)buffers[i].start;
>                 buf.length = buffers[i].length;
>                 if (ioctl(fd, VIDIOC_QBUF, &buf))
>                         return errno;
>         }
>         if (ioctl(fd, VIDIOC_STREAMON, &req.type))
>                 return errno;
>
>         /* some capture loop */
>
>         ioctl(fd, VIDIOC_STREAMOFF, &req.type);
>
> I can't possibly see *any* security issues with the above code.
>
> As I said before, VIDIOC_REQBUFS should be checking if the userspace
> buffers are OK and ensure that their refcounts will be incremented,
> in order to avoid mm to move the pages used there, freeing the
> refconts when VIDIOC_STREAMOFF - or close(fd) - is called.
>
> > which is kinda pointless (you need
> > minimally refcounting and some way to get at a promise of a permanent
> > sg list for dma. Plus probably the vmap for kernel cpu access.
>
> Yeah, refcounting needs to happen.
>
> Thanks,
> Mauro



--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 09/17] mm: Add unsafe_follow_pfn
  2020-10-10 10:53                 ` Daniel Vetter
  (?)
@ 2020-10-10 11:39                   ` Mauro Carvalho Chehab
  -1 siblings, 0 replies; 214+ messages in thread
From: Mauro Carvalho Chehab @ 2020-10-10 11:39 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Jason Gunthorpe, DRI Development, LKML, KVM list, Linux MM,
	Linux ARM, linux-samsung-soc,
	open list:DMA BUFFER SHARING FRAMEWORK, linux-s390,
	Daniel Vetter, Kees Cook, Dan Williams, Andrew Morton,
	John Hubbard, Jérôme Glisse, Jan Kara, Linus Torvalds

Em Sat, 10 Oct 2020 12:53:49 +0200
Daniel Vetter <daniel.vetter@ffwll.ch> escreveu:

> Hi Mauro,
> 
> You might want to read the patches more carefully, because what you're
> demanding is what my patches do. Short summary:
> 
> - if STRICT_FOLLOW_PFN is set:
> a) normal memory is handled as-is (i.e. your example works) through
> the addition of FOLL_LONGTERM. This is the "pin the pages correctly"
> approach you're demanding
> b) for non-page memory (zerocopy sharing before dma-buf was upstreamed
> is the only use-case for this) it is correctly rejected with -EINVAL
> 
> - if you do have blobby userspace which requires the zero-copy using
> userptr to work, and doesn't have any of the fallbacks implemented
> that you describe, this would be a regression. That's why
> STRICT_FOLLOW_PFN can be unset. And yes there's a real security issue
> in this usage, Marek also confirmed that the removal of the vma copy
> code a few years ago essentially broke even the weak assumptions that
> made the code work 10+ years ago when it was merged.
> 
> so tdlr; Everything you described will keep working even with the new
> flag set, and everything you demand must be implemented _is_
> implemented in this patch series.
> 
> Also please keep in mind that we are _not_ talking about the general
> userptr support that was merge ~20 years ago. This patch series here
> is _only_ about the zerocpy userptr support merged with 50ac952d2263
> ("[media] videobuf2-dma-sg: Support io userptr operations on io
> memory") in 2013.

Ok, now it is making more sense. Please update the comments for
patch 10/17 to describe the above.

We need some time to test this though, in order to check if no
regressions were added (except the ones due to changeset 50ac952d2263).

> 
> Why this hack was merged in 2013 when we merged dma-buf almost 2 years
> before that I have no idea about. Imo that patch simply should never
> have landed, and instead dma-buf support prioritized.

If I recall correctly, we didn't have any DMABUF support
at the media subsystem, back on 2013.

It took some time for the DMA-BUF to arrive at media, as this
was not a top priority. Also, there aren't many developers that
understand the memory model well enough to implement DMA-BUF support
and touch the VB2 code, which is quite complex, as it supports
lots of different ways for I/O, plus works with vmalloc, DMA
contig and DMA scatter/gather. 

Changes there should carefully be tested against different
drivers, in order to avoid regressions on it.

> Cheers, Daniel

Thanks,
Mauro

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 09/17] mm: Add unsafe_follow_pfn
@ 2020-10-10 11:39                   ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 214+ messages in thread
From: Mauro Carvalho Chehab @ 2020-10-10 11:39 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, KVM list,
	Jason Gunthorpe, John Hubbard, LKML, DRI Development, Linux MM,
	Jérôme Glisse, Daniel Vetter, Dan Williams,
	Linus Torvalds, Andrew Morton, Linux ARM,
	open list:DMA BUFFER SHARING FRAMEWORK

Em Sat, 10 Oct 2020 12:53:49 +0200
Daniel Vetter <daniel.vetter@ffwll.ch> escreveu:

> Hi Mauro,
> 
> You might want to read the patches more carefully, because what you're
> demanding is what my patches do. Short summary:
> 
> - if STRICT_FOLLOW_PFN is set:
> a) normal memory is handled as-is (i.e. your example works) through
> the addition of FOLL_LONGTERM. This is the "pin the pages correctly"
> approach you're demanding
> b) for non-page memory (zerocopy sharing before dma-buf was upstreamed
> is the only use-case for this) it is correctly rejected with -EINVAL
> 
> - if you do have blobby userspace which requires the zero-copy using
> userptr to work, and doesn't have any of the fallbacks implemented
> that you describe, this would be a regression. That's why
> STRICT_FOLLOW_PFN can be unset. And yes there's a real security issue
> in this usage, Marek also confirmed that the removal of the vma copy
> code a few years ago essentially broke even the weak assumptions that
> made the code work 10+ years ago when it was merged.
> 
> so tdlr; Everything you described will keep working even with the new
> flag set, and everything you demand must be implemented _is_
> implemented in this patch series.
> 
> Also please keep in mind that we are _not_ talking about the general
> userptr support that was merge ~20 years ago. This patch series here
> is _only_ about the zerocpy userptr support merged with 50ac952d2263
> ("[media] videobuf2-dma-sg: Support io userptr operations on io
> memory") in 2013.

Ok, now it is making more sense. Please update the comments for
patch 10/17 to describe the above.

We need some time to test this though, in order to check if no
regressions were added (except the ones due to changeset 50ac952d2263).

> 
> Why this hack was merged in 2013 when we merged dma-buf almost 2 years
> before that I have no idea about. Imo that patch simply should never
> have landed, and instead dma-buf support prioritized.

If I recall correctly, we didn't have any DMABUF support
at the media subsystem, back on 2013.

It took some time for the DMA-BUF to arrive at media, as this
was not a top priority. Also, there aren't many developers that
understand the memory model well enough to implement DMA-BUF support
and touch the VB2 code, which is quite complex, as it supports
lots of different ways for I/O, plus works with vmalloc, DMA
contig and DMA scatter/gather. 

Changes there should carefully be tested against different
drivers, in order to avoid regressions on it.

> Cheers, Daniel

Thanks,
Mauro

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 09/17] mm: Add unsafe_follow_pfn
@ 2020-10-10 11:39                   ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 214+ messages in thread
From: Mauro Carvalho Chehab @ 2020-10-10 11:39 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, KVM list,
	Jason Gunthorpe, John Hubbard, LKML, DRI Development, Linux MM,
	Jérôme Glisse, Daniel Vetter, Dan Williams,
	Linus Torvalds, Andrew Morton, Linux ARM,
	open list:DMA BUFFER SHARING FRAMEWORK

Em Sat, 10 Oct 2020 12:53:49 +0200
Daniel Vetter <daniel.vetter@ffwll.ch> escreveu:

> Hi Mauro,
> 
> You might want to read the patches more carefully, because what you're
> demanding is what my patches do. Short summary:
> 
> - if STRICT_FOLLOW_PFN is set:
> a) normal memory is handled as-is (i.e. your example works) through
> the addition of FOLL_LONGTERM. This is the "pin the pages correctly"
> approach you're demanding
> b) for non-page memory (zerocopy sharing before dma-buf was upstreamed
> is the only use-case for this) it is correctly rejected with -EINVAL
> 
> - if you do have blobby userspace which requires the zero-copy using
> userptr to work, and doesn't have any of the fallbacks implemented
> that you describe, this would be a regression. That's why
> STRICT_FOLLOW_PFN can be unset. And yes there's a real security issue
> in this usage, Marek also confirmed that the removal of the vma copy
> code a few years ago essentially broke even the weak assumptions that
> made the code work 10+ years ago when it was merged.
> 
> so tdlr; Everything you described will keep working even with the new
> flag set, and everything you demand must be implemented _is_
> implemented in this patch series.
> 
> Also please keep in mind that we are _not_ talking about the general
> userptr support that was merge ~20 years ago. This patch series here
> is _only_ about the zerocpy userptr support merged with 50ac952d2263
> ("[media] videobuf2-dma-sg: Support io userptr operations on io
> memory") in 2013.

Ok, now it is making more sense. Please update the comments for
patch 10/17 to describe the above.

We need some time to test this though, in order to check if no
regressions were added (except the ones due to changeset 50ac952d2263).

> 
> Why this hack was merged in 2013 when we merged dma-buf almost 2 years
> before that I have no idea about. Imo that patch simply should never
> have landed, and instead dma-buf support prioritized.

If I recall correctly, we didn't have any DMABUF support
at the media subsystem, back on 2013.

It took some time for the DMA-BUF to arrive at media, as this
was not a top priority. Also, there aren't many developers that
understand the memory model well enough to implement DMA-BUF support
and touch the VB2 code, which is quite complex, as it supports
lots of different ways for I/O, plus works with vmalloc, DMA
contig and DMA scatter/gather. 

Changes there should carefully be tested against different
drivers, in order to avoid regressions on it.

> Cheers, Daniel

Thanks,
Mauro
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 09/17] mm: Add unsafe_follow_pfn
  2020-10-10 11:39                   ` Mauro Carvalho Chehab
  (?)
@ 2020-10-10 11:56                     ` Daniel Vetter
  -1 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-10 11:56 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Jason Gunthorpe, DRI Development, LKML, KVM list, Linux MM,
	Linux ARM, linux-samsung-soc,
	open list:DMA BUFFER SHARING FRAMEWORK, linux-s390,
	Daniel Vetter, Kees Cook, Dan Williams, Andrew Morton,
	John Hubbard, Jérôme Glisse, Jan Kara, Linus Torvalds

On Sat, Oct 10, 2020 at 1:39 PM Mauro Carvalho Chehab
<mchehab+huawei@kernel.org> wrote:
>
> Em Sat, 10 Oct 2020 12:53:49 +0200
> Daniel Vetter <daniel.vetter@ffwll.ch> escreveu:
>
> > Hi Mauro,
> >
> > You might want to read the patches more carefully, because what you're
> > demanding is what my patches do. Short summary:
> >
> > - if STRICT_FOLLOW_PFN is set:
> > a) normal memory is handled as-is (i.e. your example works) through
> > the addition of FOLL_LONGTERM. This is the "pin the pages correctly"
> > approach you're demanding
> > b) for non-page memory (zerocopy sharing before dma-buf was upstreamed
> > is the only use-case for this) it is correctly rejected with -EINVAL
> >
> > - if you do have blobby userspace which requires the zero-copy using
> > userptr to work, and doesn't have any of the fallbacks implemented
> > that you describe, this would be a regression. That's why
> > STRICT_FOLLOW_PFN can be unset. And yes there's a real security issue
> > in this usage, Marek also confirmed that the removal of the vma copy
> > code a few years ago essentially broke even the weak assumptions that
> > made the code work 10+ years ago when it was merged.
> >
> > so tdlr; Everything you described will keep working even with the new
> > flag set, and everything you demand must be implemented _is_
> > implemented in this patch series.
> >
> > Also please keep in mind that we are _not_ talking about the general
> > userptr support that was merge ~20 years ago. This patch series here
> > is _only_ about the zerocpy userptr support merged with 50ac952d2263
> > ("[media] videobuf2-dma-sg: Support io userptr operations on io
> > memory") in 2013.
>
> Ok, now it is making more sense. Please update the comments for
> patch 10/17 to describe the above.

Will do.

> We need some time to test this though, in order to check if no
> regressions were added (except the ones due to changeset 50ac952d2263).

Yeah testing of the previous patches to switch to FOLL_LONGTERM would
be really good. I also need that for habanalabs and ideally exynos
too. All the userptr for normal memory should keep working, and with
FOLL_LONGTERM it should actually work better, since with that it
should now correctly interact with pagecache and fs code, not just
with anon memory from malloc.

Thanks, Daniel

> > Why this hack was merged in 2013 when we merged dma-buf almost 2 years
> > before that I have no idea about. Imo that patch simply should never
> > have landed, and instead dma-buf support prioritized.
>
> If I recall correctly, we didn't have any DMABUF support
> at the media subsystem, back on 2013.
>
> It took some time for the DMA-BUF to arrive at media, as this
> was not a top priority. Also, there aren't many developers that
> understand the memory model well enough to implement DMA-BUF support
> and touch the VB2 code, which is quite complex, as it supports
> lots of different ways for I/O, plus works with vmalloc, DMA
> contig and DMA scatter/gather.
>
> Changes there should carefully be tested against different
> drivers, in order to avoid regressions on it.
>
> > Cheers, Daniel
>
> Thanks,
> Mauro



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 09/17] mm: Add unsafe_follow_pfn
@ 2020-10-10 11:56                     ` Daniel Vetter
  0 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-10 11:56 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, KVM list,
	Jason Gunthorpe, John Hubbard, LKML, DRI Development, Linux MM,
	Jérôme Glisse, Daniel Vetter, Dan Williams,
	Linus Torvalds, Andrew Morton, Linux ARM,
	open list:DMA BUFFER SHARING FRAMEWORK

On Sat, Oct 10, 2020 at 1:39 PM Mauro Carvalho Chehab
<mchehab+huawei@kernel.org> wrote:
>
> Em Sat, 10 Oct 2020 12:53:49 +0200
> Daniel Vetter <daniel.vetter@ffwll.ch> escreveu:
>
> > Hi Mauro,
> >
> > You might want to read the patches more carefully, because what you're
> > demanding is what my patches do. Short summary:
> >
> > - if STRICT_FOLLOW_PFN is set:
> > a) normal memory is handled as-is (i.e. your example works) through
> > the addition of FOLL_LONGTERM. This is the "pin the pages correctly"
> > approach you're demanding
> > b) for non-page memory (zerocopy sharing before dma-buf was upstreamed
> > is the only use-case for this) it is correctly rejected with -EINVAL
> >
> > - if you do have blobby userspace which requires the zero-copy using
> > userptr to work, and doesn't have any of the fallbacks implemented
> > that you describe, this would be a regression. That's why
> > STRICT_FOLLOW_PFN can be unset. And yes there's a real security issue
> > in this usage, Marek also confirmed that the removal of the vma copy
> > code a few years ago essentially broke even the weak assumptions that
> > made the code work 10+ years ago when it was merged.
> >
> > so tdlr; Everything you described will keep working even with the new
> > flag set, and everything you demand must be implemented _is_
> > implemented in this patch series.
> >
> > Also please keep in mind that we are _not_ talking about the general
> > userptr support that was merge ~20 years ago. This patch series here
> > is _only_ about the zerocpy userptr support merged with 50ac952d2263
> > ("[media] videobuf2-dma-sg: Support io userptr operations on io
> > memory") in 2013.
>
> Ok, now it is making more sense. Please update the comments for
> patch 10/17 to describe the above.

Will do.

> We need some time to test this though, in order to check if no
> regressions were added (except the ones due to changeset 50ac952d2263).

Yeah testing of the previous patches to switch to FOLL_LONGTERM would
be really good. I also need that for habanalabs and ideally exynos
too. All the userptr for normal memory should keep working, and with
FOLL_LONGTERM it should actually work better, since with that it
should now correctly interact with pagecache and fs code, not just
with anon memory from malloc.

Thanks, Daniel

> > Why this hack was merged in 2013 when we merged dma-buf almost 2 years
> > before that I have no idea about. Imo that patch simply should never
> > have landed, and instead dma-buf support prioritized.
>
> If I recall correctly, we didn't have any DMABUF support
> at the media subsystem, back on 2013.
>
> It took some time for the DMA-BUF to arrive at media, as this
> was not a top priority. Also, there aren't many developers that
> understand the memory model well enough to implement DMA-BUF support
> and touch the VB2 code, which is quite complex, as it supports
> lots of different ways for I/O, plus works with vmalloc, DMA
> contig and DMA scatter/gather.
>
> Changes there should carefully be tested against different
> drivers, in order to avoid regressions on it.
>
> > Cheers, Daniel
>
> Thanks,
> Mauro



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 09/17] mm: Add unsafe_follow_pfn
@ 2020-10-10 11:56                     ` Daniel Vetter
  0 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-10 11:56 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, KVM list,
	Jason Gunthorpe, John Hubbard, LKML, DRI Development, Linux MM,
	Jérôme Glisse, Daniel Vetter, Dan Williams,
	Linus Torvalds, Andrew Morton, Linux ARM,
	open list:DMA BUFFER SHARING FRAMEWORK

On Sat, Oct 10, 2020 at 1:39 PM Mauro Carvalho Chehab
<mchehab+huawei@kernel.org> wrote:
>
> Em Sat, 10 Oct 2020 12:53:49 +0200
> Daniel Vetter <daniel.vetter@ffwll.ch> escreveu:
>
> > Hi Mauro,
> >
> > You might want to read the patches more carefully, because what you're
> > demanding is what my patches do. Short summary:
> >
> > - if STRICT_FOLLOW_PFN is set:
> > a) normal memory is handled as-is (i.e. your example works) through
> > the addition of FOLL_LONGTERM. This is the "pin the pages correctly"
> > approach you're demanding
> > b) for non-page memory (zerocopy sharing before dma-buf was upstreamed
> > is the only use-case for this) it is correctly rejected with -EINVAL
> >
> > - if you do have blobby userspace which requires the zero-copy using
> > userptr to work, and doesn't have any of the fallbacks implemented
> > that you describe, this would be a regression. That's why
> > STRICT_FOLLOW_PFN can be unset. And yes there's a real security issue
> > in this usage, Marek also confirmed that the removal of the vma copy
> > code a few years ago essentially broke even the weak assumptions that
> > made the code work 10+ years ago when it was merged.
> >
> > so tdlr; Everything you described will keep working even with the new
> > flag set, and everything you demand must be implemented _is_
> > implemented in this patch series.
> >
> > Also please keep in mind that we are _not_ talking about the general
> > userptr support that was merge ~20 years ago. This patch series here
> > is _only_ about the zerocpy userptr support merged with 50ac952d2263
> > ("[media] videobuf2-dma-sg: Support io userptr operations on io
> > memory") in 2013.
>
> Ok, now it is making more sense. Please update the comments for
> patch 10/17 to describe the above.

Will do.

> We need some time to test this though, in order to check if no
> regressions were added (except the ones due to changeset 50ac952d2263).

Yeah testing of the previous patches to switch to FOLL_LONGTERM would
be really good. I also need that for habanalabs and ideally exynos
too. All the userptr for normal memory should keep working, and with
FOLL_LONGTERM it should actually work better, since with that it
should now correctly interact with pagecache and fs code, not just
with anon memory from malloc.

Thanks, Daniel

> > Why this hack was merged in 2013 when we merged dma-buf almost 2 years
> > before that I have no idea about. Imo that patch simply should never
> > have landed, and instead dma-buf support prioritized.
>
> If I recall correctly, we didn't have any DMABUF support
> at the media subsystem, back on 2013.
>
> It took some time for the DMA-BUF to arrive at media, as this
> was not a top priority. Also, there aren't many developers that
> understand the memory model well enough to implement DMA-BUF support
> and touch the VB2 code, which is quite complex, as it supports
> lots of different ways for I/O, plus works with vmalloc, DMA
> contig and DMA scatter/gather.
>
> Changes there should carefully be tested against different
> drivers, in order to avoid regressions on it.
>
> > Cheers, Daniel
>
> Thanks,
> Mauro



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 09/17] mm: Add unsafe_follow_pfn
  2020-10-09 17:52             ` Daniel Vetter
  (?)
@ 2020-10-10 17:22               ` Tomasz Figa
  -1 siblings, 0 replies; 214+ messages in thread
From: Tomasz Figa @ 2020-10-10 17:22 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Jason Gunthorpe, Mauro Carvalho Chehab, DRI Development, LKML,
	KVM list, Linux MM, Linux ARM, linux-samsung-soc,
	open list:DMA BUFFER SHARING FRAMEWORK, linux-s390,
	Daniel Vetter, Kees Cook, Dan Williams, Andrew Morton,
	John Hubbard, Jérôme Glisse, Jan Kara, Linus Torvalds

Hi Daniel,

On Fri, Oct 9, 2020 at 7:52 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
>
> On Fri, Oct 9, 2020 at 2:48 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Fri, Oct 09, 2020 at 02:37:23PM +0200, Mauro Carvalho Chehab wrote:
> >
> > > I'm not a mm/ expert, but, from what I understood from Daniel's patch
> > > description is that this is unsafe *only if*  __GFP_MOVABLE is used.
> >
> > No, it is unconditionally unsafe. The CMA movable mappings are
> > specific VMAs that will have bad issues here, but there are other
> > types too.
> >
> > The only way to do something at a VMA level is to have a list of OK
> > VMAs, eg because they were creatd via a special mmap helper from the
> > media subsystem.
> >
> > > Well, no drivers inside the media subsystem uses such flag, although
> > > they may rely on some infrastructure that could be using it behind
> > > the bars.
> >
> > It doesn't matter, nothing prevents the user from calling media APIs
> > on mmaps it gets from other subsystems.
>
> I think a good first step would be to disable userptr of non struct
> page backed storage going forward for any new hw support. Even on
> existing drivers. dma-buf sharing has been around for long enough now
> that this shouldn't be a problem. Unfortunately right now this doesn't
> seem to exist, so the entire problem keeps getting perpetuated.
>
> > > If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE
> > > flag that it would be denying the core mm code to set __GFP_MOVABLE.
> >
> > We can't tell from the VMA these kinds of details..
> >
> > It has to go the other direction, evey mmap that might be used as a
> > userptr here has to be found and the VMA specially created to allow
> > its use. At least that is a kernel only change, but will need people
> > with the HW to do this work.
>
> I think the only reasonable way to keep this working is:
> - add a struct dma_buf *vma_tryget_dma_buf(struct vm_area_struct *vma);
> - add dma-buf export support to fbdev and v4l

I assume you mean V4L2 and not the obsolete V4L that is emulated in
the userspace by libv4l. If so, every video device that uses videobuf2
gets DMA-buf export for free and there is nothing needed to enable it.

We probably still have a few legacy drivers using videobuf (non-2),
but IMHO those should be safe to put behind some disabled-by-default
Kconfig symbol or even completely drop, as the legacy framework has
been deprecated for many years already.

> - roll this out everywhere we still need it.
>
> Realistically this just isn't going to happen. And anything else just
> reimplements half of dma-buf, which is kinda pointless (you need
> minimally refcounting and some way to get at a promise of a permanent
> sg list for dma. Plus probably the vmap for kernel cpu access.
>
> > > Please let address the issue on this way, instead of broken an
> > > userspace API that it is there since 1991.
> >
> > It has happened before :( It took 4 years for RDMA to undo the uAPI
> > breakage caused by a security fix for something that was a 15 years
> > old bug.
>
> Yeah we have a bunch of these on the drm side too. Some of them are
> really just "you have to upgrade userspace", and there's no real fix
> for the security nightmare without that.

I think we need to phase out such userspace indeed. The Kconfig symbol
allows enabling the unsafe functionality for anyone who still needs
it, so I think it's not entirely a breakage.

Best regards,
Tomasz

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 09/17] mm: Add unsafe_follow_pfn
@ 2020-10-10 17:22               ` Tomasz Figa
  0 siblings, 0 replies; 214+ messages in thread
From: Tomasz Figa @ 2020-10-10 17:22 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, KVM list,
	Linux MM, Mauro Carvalho Chehab, John Hubbard, LKML,
	DRI Development, Jason Gunthorpe, Jérôme Glisse,
	Daniel Vetter, Dan Williams, Linus Torvalds, Andrew Morton,
	Linux ARM, open list:DMA BUFFER SHARING FRAMEWORK

Hi Daniel,

On Fri, Oct 9, 2020 at 7:52 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
>
> On Fri, Oct 9, 2020 at 2:48 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Fri, Oct 09, 2020 at 02:37:23PM +0200, Mauro Carvalho Chehab wrote:
> >
> > > I'm not a mm/ expert, but, from what I understood from Daniel's patch
> > > description is that this is unsafe *only if*  __GFP_MOVABLE is used.
> >
> > No, it is unconditionally unsafe. The CMA movable mappings are
> > specific VMAs that will have bad issues here, but there are other
> > types too.
> >
> > The only way to do something at a VMA level is to have a list of OK
> > VMAs, eg because they were creatd via a special mmap helper from the
> > media subsystem.
> >
> > > Well, no drivers inside the media subsystem uses such flag, although
> > > they may rely on some infrastructure that could be using it behind
> > > the bars.
> >
> > It doesn't matter, nothing prevents the user from calling media APIs
> > on mmaps it gets from other subsystems.
>
> I think a good first step would be to disable userptr of non struct
> page backed storage going forward for any new hw support. Even on
> existing drivers. dma-buf sharing has been around for long enough now
> that this shouldn't be a problem. Unfortunately right now this doesn't
> seem to exist, so the entire problem keeps getting perpetuated.
>
> > > If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE
> > > flag that it would be denying the core mm code to set __GFP_MOVABLE.
> >
> > We can't tell from the VMA these kinds of details..
> >
> > It has to go the other direction, evey mmap that might be used as a
> > userptr here has to be found and the VMA specially created to allow
> > its use. At least that is a kernel only change, but will need people
> > with the HW to do this work.
>
> I think the only reasonable way to keep this working is:
> - add a struct dma_buf *vma_tryget_dma_buf(struct vm_area_struct *vma);
> - add dma-buf export support to fbdev and v4l

I assume you mean V4L2 and not the obsolete V4L that is emulated in
the userspace by libv4l. If so, every video device that uses videobuf2
gets DMA-buf export for free and there is nothing needed to enable it.

We probably still have a few legacy drivers using videobuf (non-2),
but IMHO those should be safe to put behind some disabled-by-default
Kconfig symbol or even completely drop, as the legacy framework has
been deprecated for many years already.

> - roll this out everywhere we still need it.
>
> Realistically this just isn't going to happen. And anything else just
> reimplements half of dma-buf, which is kinda pointless (you need
> minimally refcounting and some way to get at a promise of a permanent
> sg list for dma. Plus probably the vmap for kernel cpu access.
>
> > > Please let address the issue on this way, instead of broken an
> > > userspace API that it is there since 1991.
> >
> > It has happened before :( It took 4 years for RDMA to undo the uAPI
> > breakage caused by a security fix for something that was a 15 years
> > old bug.
>
> Yeah we have a bunch of these on the drm side too. Some of them are
> really just "you have to upgrade userspace", and there's no real fix
> for the security nightmare without that.

I think we need to phase out such userspace indeed. The Kconfig symbol
allows enabling the unsafe functionality for anyone who still needs
it, so I think it's not entirely a breakage.

Best regards,
Tomasz

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 09/17] mm: Add unsafe_follow_pfn
@ 2020-10-10 17:22               ` Tomasz Figa
  0 siblings, 0 replies; 214+ messages in thread
From: Tomasz Figa @ 2020-10-10 17:22 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, KVM list,
	Linux MM, Mauro Carvalho Chehab, John Hubbard, LKML,
	DRI Development, Jason Gunthorpe, Jérôme Glisse,
	Daniel Vetter, Dan Williams, Linus Torvalds, Andrew Morton,
	Linux ARM, open list:DMA BUFFER SHARING FRAMEWORK

Hi Daniel,

On Fri, Oct 9, 2020 at 7:52 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
>
> On Fri, Oct 9, 2020 at 2:48 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Fri, Oct 09, 2020 at 02:37:23PM +0200, Mauro Carvalho Chehab wrote:
> >
> > > I'm not a mm/ expert, but, from what I understood from Daniel's patch
> > > description is that this is unsafe *only if*  __GFP_MOVABLE is used.
> >
> > No, it is unconditionally unsafe. The CMA movable mappings are
> > specific VMAs that will have bad issues here, but there are other
> > types too.
> >
> > The only way to do something at a VMA level is to have a list of OK
> > VMAs, eg because they were creatd via a special mmap helper from the
> > media subsystem.
> >
> > > Well, no drivers inside the media subsystem uses such flag, although
> > > they may rely on some infrastructure that could be using it behind
> > > the bars.
> >
> > It doesn't matter, nothing prevents the user from calling media APIs
> > on mmaps it gets from other subsystems.
>
> I think a good first step would be to disable userptr of non struct
> page backed storage going forward for any new hw support. Even on
> existing drivers. dma-buf sharing has been around for long enough now
> that this shouldn't be a problem. Unfortunately right now this doesn't
> seem to exist, so the entire problem keeps getting perpetuated.
>
> > > If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE
> > > flag that it would be denying the core mm code to set __GFP_MOVABLE.
> >
> > We can't tell from the VMA these kinds of details..
> >
> > It has to go the other direction, evey mmap that might be used as a
> > userptr here has to be found and the VMA specially created to allow
> > its use. At least that is a kernel only change, but will need people
> > with the HW to do this work.
>
> I think the only reasonable way to keep this working is:
> - add a struct dma_buf *vma_tryget_dma_buf(struct vm_area_struct *vma);
> - add dma-buf export support to fbdev and v4l

I assume you mean V4L2 and not the obsolete V4L that is emulated in
the userspace by libv4l. If so, every video device that uses videobuf2
gets DMA-buf export for free and there is nothing needed to enable it.

We probably still have a few legacy drivers using videobuf (non-2),
but IMHO those should be safe to put behind some disabled-by-default
Kconfig symbol or even completely drop, as the legacy framework has
been deprecated for many years already.

> - roll this out everywhere we still need it.
>
> Realistically this just isn't going to happen. And anything else just
> reimplements half of dma-buf, which is kinda pointless (you need
> minimally refcounting and some way to get at a promise of a permanent
> sg list for dma. Plus probably the vmap for kernel cpu access.
>
> > > Please let address the issue on this way, instead of broken an
> > > userspace API that it is there since 1991.
> >
> > It has happened before :( It took 4 years for RDMA to undo the uAPI
> > breakage caused by a security fix for something that was a 15 years
> > old bug.
>
> Yeah we have a bunch of these on the drm side too. Some of them are
> really just "you have to upgrade userspace", and there's no real fix
> for the security nightmare without that.

I think we need to phase out such userspace indeed. The Kconfig symbol
allows enabling the unsafe functionality for anyone who still needs
it, so I think it's not entirely a breakage.

Best regards,
Tomasz
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 09/17] mm: Add unsafe_follow_pfn
  2020-10-09 12:37         ` Mauro Carvalho Chehab
  (?)
@ 2020-10-10 17:30           ` Tomasz Figa
  -1 siblings, 0 replies; 214+ messages in thread
From: Tomasz Figa @ 2020-10-10 17:30 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Jason Gunthorpe, Daniel Vetter, DRI Development, LKML, kvm,
	Linux MM,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>,,
	linux-samsung-soc, Linux Media Mailing List, linux-s390,
	Daniel Vetter, Kees Cook, Dan Williams, Andrew Morton,
	John Hubbard, Jérôme Glisse, Jan Kara, Linus Torvalds

Hi Mauro,

On Fri, Oct 9, 2020 at 2:37 PM Mauro Carvalho Chehab
<mchehab+huawei@kernel.org> wrote:
>
> Em Fri, 9 Oct 2020 09:21:11 -0300
> Jason Gunthorpe <jgg@ziepe.ca> escreveu:
>
> > On Fri, Oct 09, 2020 at 12:34:21PM +0200, Mauro Carvalho Chehab wrote:
> > > Hi,
> > >
> > > Em Fri,  9 Oct 2020 09:59:26 +0200
> > > Daniel Vetter <daniel.vetter@ffwll.ch> escreveu:
> > >
> > > > Way back it was a reasonable assumptions that iomem mappings never
> > > > change the pfn range they point at. But this has changed:
> > > >
> > > > - gpu drivers dynamically manage their memory nowadays, invalidating
> > > > ptes with unmap_mapping_range when buffers get moved
> > > >
> > > > - contiguous dma allocations have moved from dedicated carvetouts to
> > > > cma regions. This means if we miss the unmap the pfn might contain
> > > > pagecache or anon memory (well anything allocated with GFP_MOVEABLE)
> > > >
> > > > - even /dev/mem now invalidates mappings when the kernel requests that
> > > > iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
> > > > ("/dev/mem: Revoke mappings when a driver claims the region")
> > > >
> > > > Accessing pfns obtained from ptes without holding all the locks is
> > > > therefore no longer a good idea.
> > > >
> > > > Unfortunately there's some users where this is not fixable (like v4l
> > > > userptr of iomem mappings) or involves a pile of work (vfio type1
> > > > iommu). For now annotate these as unsafe and splat appropriately.
> > > >
> > > > This patch adds an unsafe_follow_pfn, which later patches will then
> > > > roll out to all appropriate places.
> > >
> > > NACK, as this breaks an existing userspace API on media.
> >
> > It doesn't break it. You get a big warning the thing is broken and it
> > keeps working.
> >
> > We can't leave such a huge security hole open - it impacts other
> > subsystems, distros need to be able to run in a secure mode.
>
> Well, if distros disable it, then apps will break.
>

Do we have any information on userspace that actually needs this functionality?

Note that we're _not_ talking here about the complete USERPTR
functionality, but rather just the very corner case of carveout memory
not backed by struct pages.

Given that the current in-tree ways of reserving carveout memory, such
as shared-dma-pool, actually give memory backed by struct pages, do we
even have a source of such legacy memory in the kernel today?

I think that given that this is a very niche functionality, we could
have it disabled by default for security reasons and if someone
_really_ (i.e. there is no replacement) needs it, they probably need
to use a custom kernel build anyway for their exotic hardware setup
(with PFN-backed carveout memory), so they can enable it.

> > > While I agree that using the userptr on media is something that
> > > new drivers may not support, as DMABUF is a better way of
> > > handling it, changing this for existing ones is a big no,
> > > as it may break usersapace.
> >
> > media community needs to work to fix this, not pretend it is OK to
> > keep going as-is.
>
> > Dealing with security issues is the one case where an uABI break might
> > be acceptable.
> >
> > If you want to NAK it then you need to come up with the work to do
> > something here correctly that will support the old drivers without the
> > kernel taint.
> >
> > Unfortunately making things uncomfortable for the subsystem is the big
> > hammer the core kernel needs to use to actually get this security work
> > done by those responsible.
>
>
> I'm not pretending that this is ok. Just pointing that the approach
> taken is NOT OK.
>
> I'm not a mm/ expert, but, from what I understood from Daniel's patch
> description is that this is unsafe *only if*  __GFP_MOVABLE is used.
>
> Well, no drivers inside the media subsystem uses such flag, although
> they may rely on some infrastructure that could be using it behind
> the bars.
>
> If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE
> flag that it would be denying the core mm code to set __GFP_MOVABLE.
>
> Please let address the issue on this way, instead of broken an
> userspace API that it is there since 1991.

Note that USERPTR as a whole generally has been considered deprecated
in V4L2 for many years and people have been actively discouraged to
use it. And, still, we're just talking here about the very rare corner
case, not the whole USERPTR API.

Best regards,
Tomasz

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 09/17] mm: Add unsafe_follow_pfn
@ 2020-10-10 17:30           ` Tomasz Figa
  0 siblings, 0 replies; 214+ messages in thread
From: Tomasz Figa @ 2020-10-10 17:30 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, kvm,
	Jason Gunthorpe, Daniel Vetter, LKML, DRI Development, Linux MM,
	Jérôme Glisse, John Hubbard, Daniel Vetter,
	Dan Williams, Linus Torvalds, Andrew Morton,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>, ,
	Linux Media Mailing List

Hi Mauro,

On Fri, Oct 9, 2020 at 2:37 PM Mauro Carvalho Chehab
<mchehab+huawei@kernel.org> wrote:
>
> Em Fri, 9 Oct 2020 09:21:11 -0300
> Jason Gunthorpe <jgg@ziepe.ca> escreveu:
>
> > On Fri, Oct 09, 2020 at 12:34:21PM +0200, Mauro Carvalho Chehab wrote:
> > > Hi,
> > >
> > > Em Fri,  9 Oct 2020 09:59:26 +0200
> > > Daniel Vetter <daniel.vetter@ffwll.ch> escreveu:
> > >
> > > > Way back it was a reasonable assumptions that iomem mappings never
> > > > change the pfn range they point at. But this has changed:
> > > >
> > > > - gpu drivers dynamically manage their memory nowadays, invalidating
> > > > ptes with unmap_mapping_range when buffers get moved
> > > >
> > > > - contiguous dma allocations have moved from dedicated carvetouts to
> > > > cma regions. This means if we miss the unmap the pfn might contain
> > > > pagecache or anon memory (well anything allocated with GFP_MOVEABLE)
> > > >
> > > > - even /dev/mem now invalidates mappings when the kernel requests that
> > > > iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
> > > > ("/dev/mem: Revoke mappings when a driver claims the region")
> > > >
> > > > Accessing pfns obtained from ptes without holding all the locks is
> > > > therefore no longer a good idea.
> > > >
> > > > Unfortunately there's some users where this is not fixable (like v4l
> > > > userptr of iomem mappings) or involves a pile of work (vfio type1
> > > > iommu). For now annotate these as unsafe and splat appropriately.
> > > >
> > > > This patch adds an unsafe_follow_pfn, which later patches will then
> > > > roll out to all appropriate places.
> > >
> > > NACK, as this breaks an existing userspace API on media.
> >
> > It doesn't break it. You get a big warning the thing is broken and it
> > keeps working.
> >
> > We can't leave such a huge security hole open - it impacts other
> > subsystems, distros need to be able to run in a secure mode.
>
> Well, if distros disable it, then apps will break.
>

Do we have any information on userspace that actually needs this functionality?

Note that we're _not_ talking here about the complete USERPTR
functionality, but rather just the very corner case of carveout memory
not backed by struct pages.

Given that the current in-tree ways of reserving carveout memory, such
as shared-dma-pool, actually give memory backed by struct pages, do we
even have a source of such legacy memory in the kernel today?

I think that given that this is a very niche functionality, we could
have it disabled by default for security reasons and if someone
_really_ (i.e. there is no replacement) needs it, they probably need
to use a custom kernel build anyway for their exotic hardware setup
(with PFN-backed carveout memory), so they can enable it.

> > > While I agree that using the userptr on media is something that
> > > new drivers may not support, as DMABUF is a better way of
> > > handling it, changing this for existing ones is a big no,
> > > as it may break usersapace.
> >
> > media community needs to work to fix this, not pretend it is OK to
> > keep going as-is.
>
> > Dealing with security issues is the one case where an uABI break might
> > be acceptable.
> >
> > If you want to NAK it then you need to come up with the work to do
> > something here correctly that will support the old drivers without the
> > kernel taint.
> >
> > Unfortunately making things uncomfortable for the subsystem is the big
> > hammer the core kernel needs to use to actually get this security work
> > done by those responsible.
>
>
> I'm not pretending that this is ok. Just pointing that the approach
> taken is NOT OK.
>
> I'm not a mm/ expert, but, from what I understood from Daniel's patch
> description is that this is unsafe *only if*  __GFP_MOVABLE is used.
>
> Well, no drivers inside the media subsystem uses such flag, although
> they may rely on some infrastructure that could be using it behind
> the bars.
>
> If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE
> flag that it would be denying the core mm code to set __GFP_MOVABLE.
>
> Please let address the issue on this way, instead of broken an
> userspace API that it is there since 1991.

Note that USERPTR as a whole generally has been considered deprecated
in V4L2 for many years and people have been actively discouraged to
use it. And, still, we're just talking here about the very rare corner
case, not the whole USERPTR API.

Best regards,
Tomasz

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 09/17] mm: Add unsafe_follow_pfn
@ 2020-10-10 17:30           ` Tomasz Figa
  0 siblings, 0 replies; 214+ messages in thread
From: Tomasz Figa @ 2020-10-10 17:30 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, kvm,
	Jason Gunthorpe, Daniel Vetter, LKML, DRI Development, Linux MM,
	Jérôme Glisse, John Hubbard, Daniel Vetter,
	Dan Williams, Linus Torvalds, Andrew Morton,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>, ,
	Linux Media Mailing List

Hi Mauro,

On Fri, Oct 9, 2020 at 2:37 PM Mauro Carvalho Chehab
<mchehab+huawei@kernel.org> wrote:
>
> Em Fri, 9 Oct 2020 09:21:11 -0300
> Jason Gunthorpe <jgg@ziepe.ca> escreveu:
>
> > On Fri, Oct 09, 2020 at 12:34:21PM +0200, Mauro Carvalho Chehab wrote:
> > > Hi,
> > >
> > > Em Fri,  9 Oct 2020 09:59:26 +0200
> > > Daniel Vetter <daniel.vetter@ffwll.ch> escreveu:
> > >
> > > > Way back it was a reasonable assumptions that iomem mappings never
> > > > change the pfn range they point at. But this has changed:
> > > >
> > > > - gpu drivers dynamically manage their memory nowadays, invalidating
> > > > ptes with unmap_mapping_range when buffers get moved
> > > >
> > > > - contiguous dma allocations have moved from dedicated carvetouts to
> > > > cma regions. This means if we miss the unmap the pfn might contain
> > > > pagecache or anon memory (well anything allocated with GFP_MOVEABLE)
> > > >
> > > > - even /dev/mem now invalidates mappings when the kernel requests that
> > > > iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
> > > > ("/dev/mem: Revoke mappings when a driver claims the region")
> > > >
> > > > Accessing pfns obtained from ptes without holding all the locks is
> > > > therefore no longer a good idea.
> > > >
> > > > Unfortunately there's some users where this is not fixable (like v4l
> > > > userptr of iomem mappings) or involves a pile of work (vfio type1
> > > > iommu). For now annotate these as unsafe and splat appropriately.
> > > >
> > > > This patch adds an unsafe_follow_pfn, which later patches will then
> > > > roll out to all appropriate places.
> > >
> > > NACK, as this breaks an existing userspace API on media.
> >
> > It doesn't break it. You get a big warning the thing is broken and it
> > keeps working.
> >
> > We can't leave such a huge security hole open - it impacts other
> > subsystems, distros need to be able to run in a secure mode.
>
> Well, if distros disable it, then apps will break.
>

Do we have any information on userspace that actually needs this functionality?

Note that we're _not_ talking here about the complete USERPTR
functionality, but rather just the very corner case of carveout memory
not backed by struct pages.

Given that the current in-tree ways of reserving carveout memory, such
as shared-dma-pool, actually give memory backed by struct pages, do we
even have a source of such legacy memory in the kernel today?

I think that given that this is a very niche functionality, we could
have it disabled by default for security reasons and if someone
_really_ (i.e. there is no replacement) needs it, they probably need
to use a custom kernel build anyway for their exotic hardware setup
(with PFN-backed carveout memory), so they can enable it.

> > > While I agree that using the userptr on media is something that
> > > new drivers may not support, as DMABUF is a better way of
> > > handling it, changing this for existing ones is a big no,
> > > as it may break usersapace.
> >
> > media community needs to work to fix this, not pretend it is OK to
> > keep going as-is.
>
> > Dealing with security issues is the one case where an uABI break might
> > be acceptable.
> >
> > If you want to NAK it then you need to come up with the work to do
> > something here correctly that will support the old drivers without the
> > kernel taint.
> >
> > Unfortunately making things uncomfortable for the subsystem is the big
> > hammer the core kernel needs to use to actually get this security work
> > done by those responsible.
>
>
> I'm not pretending that this is ok. Just pointing that the approach
> taken is NOT OK.
>
> I'm not a mm/ expert, but, from what I understood from Daniel's patch
> description is that this is unsafe *only if*  __GFP_MOVABLE is used.
>
> Well, no drivers inside the media subsystem uses such flag, although
> they may rely on some infrastructure that could be using it behind
> the bars.
>
> If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE
> flag that it would be denying the core mm code to set __GFP_MOVABLE.
>
> Please let address the issue on this way, instead of broken an
> userspace API that it is there since 1991.

Note that USERPTR as a whole generally has been considered deprecated
in V4L2 for many years and people have been actively discouraged to
use it. And, still, we're just talking here about the very rare corner
case, not the whole USERPTR API.

Best regards,
Tomasz
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 03/17] misc/habana: Stop using frame_vector helpers
  2020-10-09  7:59   ` Daniel Vetter
  (?)
@ 2020-10-10 20:26     ` Oded Gabbay
  -1 siblings, 0 replies; 214+ messages in thread
From: Oded Gabbay @ 2020-10-10 20:26 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: DRI Development, LKML, kvm, linux-mm,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>,,
	linux-samsung-soc, Linux Media Mailing List, linux-s390,
	Daniel Vetter, Jason Gunthorpe, Andrew Morton, John Hubbard,
	Jérôme Glisse, Jan Kara, Dan Williams, Omer Shpigelman,
	Ofir Bitton, Tomer Tayar, Moti Haimovski, Greg Kroah-Hartman,
	Pawel Piskorski

On Fri, Oct 9, 2020 at 10:59 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
>
> All we need are a pages array, pin_user_pages_fast can give us that
> directly. Plus this avoids the entire raw pfn side of get_vaddr_frames.
>
Thanks for the patch Daniel.

> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: linux-mm@kvack.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-samsung-soc@vger.kernel.org
> Cc: linux-media@vger.kernel.org
> Cc: Oded Gabbay <oded.gabbay@gmail.com>
> Cc: Omer Shpigelman <oshpigelman@habana.ai>
> Cc: Ofir Bitton <obitton@habana.ai>
> Cc: Tomer Tayar <ttayar@habana.ai>
> Cc: Moti Haimovski <mhaimovski@habana.ai>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Pawel Piskorski <ppiskorski@habana.ai>
> --
> v2: Use unpin_user_pages_dirty_lock (John)
> ---
>  drivers/misc/habanalabs/Kconfig             |  1 -
>  drivers/misc/habanalabs/common/habanalabs.h |  3 +-
>  drivers/misc/habanalabs/common/memory.c     | 49 ++++++++-------------
>  3 files changed, 20 insertions(+), 33 deletions(-)
>
> diff --git a/drivers/misc/habanalabs/Kconfig b/drivers/misc/habanalabs/Kconfig
> index 8eb5d38c618e..2f04187f7167 100644
> --- a/drivers/misc/habanalabs/Kconfig
> +++ b/drivers/misc/habanalabs/Kconfig
> @@ -6,7 +6,6 @@
>  config HABANA_AI
>         tristate "HabanaAI accelerators (habanalabs)"
>         depends on PCI && HAS_IOMEM
> -       select FRAME_VECTOR
>         select DMA_SHARED_BUFFER
>         select GENERIC_ALLOCATOR
>         select HWMON
> diff --git a/drivers/misc/habanalabs/common/habanalabs.h b/drivers/misc/habanalabs/common/habanalabs.h
> index edbd627b29d2..c1b3ad613b15 100644
> --- a/drivers/misc/habanalabs/common/habanalabs.h
> +++ b/drivers/misc/habanalabs/common/habanalabs.h
> @@ -881,7 +881,8 @@ struct hl_ctx_mgr {
>  struct hl_userptr {
>         enum vm_type_t          vm_type; /* must be first */
>         struct list_head        job_node;
> -       struct frame_vector     *vec;
> +       struct page             **pages;
> +       unsigned int            npages;
Can you please update the kerneldoc comment section of this structure
according to your changes ?

>         struct sg_table         *sgt;
>         enum dma_data_direction dir;
>         struct list_head        debugfs_list;
> diff --git a/drivers/misc/habanalabs/common/memory.c b/drivers/misc/habanalabs/common/memory.c
> index 5ff4688683fd..327b64479f97 100644
> --- a/drivers/misc/habanalabs/common/memory.c
> +++ b/drivers/misc/habanalabs/common/memory.c
> @@ -1281,45 +1281,41 @@ static int get_user_memory(struct hl_device *hdev, u64 addr, u64 size,
>                 return -EFAULT;
>         }
>
> -       userptr->vec = frame_vector_create(npages);
> -       if (!userptr->vec) {
> +       userptr->pages = kvmalloc_array(npages, sizeof(*userptr->pages),
> +                                       GFP_KERNEL);
> +       if (!userptr->pages) {
>                 dev_err(hdev->dev, "Failed to create frame vector\n");
>                 return -ENOMEM;
>         }
>
> -       rc = get_vaddr_frames(start, npages, FOLL_FORCE | FOLL_WRITE,
> -                               userptr->vec);
> +       rc = pin_user_pages_fast(start, npages, FOLL_FORCE | FOLL_WRITE,
> +                                userptr->pages);
>
>         if (rc != npages) {
>                 dev_err(hdev->dev,
>                         "Failed to map host memory, user ptr probably wrong\n");
>                 if (rc < 0)
> -                       goto destroy_framevec;
> +                       goto destroy_pages;
> +               npages = rc;
>                 rc = -EFAULT;
> -               goto put_framevec;
> -       }
> -
> -       if (frame_vector_to_pages(userptr->vec) < 0) {
> -               dev_err(hdev->dev,
> -                       "Failed to translate frame vector to pages\n");
> -               rc = -EFAULT;
> -               goto put_framevec;
> +               goto put_pages;
>         }
> +       userptr->npages = npages;
>
>         rc = sg_alloc_table_from_pages(userptr->sgt,
> -                                       frame_vector_pages(userptr->vec),
> -                                       npages, offset, size, GFP_ATOMIC);
> +                                      userptr->pages,
> +                                      npages, offset, size, GFP_ATOMIC);
I think that because the call to kvmalloc_array() is done with
GFP_KERNEL, there is no point in using GFP_ATOMIC here.
And actually, this path only needs to avoid yielding when using a
special debug mode.
So I suggest putting here GFP_KERNEL.

In the meanwhile, I'll run this patch (coupled with the next patch) in
our C/I to make sure there are no regressions.
Thanks,
Oded

>         if (rc < 0) {
>                 dev_err(hdev->dev, "failed to create SG table from pages\n");
> -               goto put_framevec;
> +               goto put_pages;
>         }
>
>         return 0;
>
> -put_framevec:
> -       put_vaddr_frames(userptr->vec);
> -destroy_framevec:
> -       frame_vector_destroy(userptr->vec);
> +put_pages:
> +       unpin_user_pages(userptr->pages, npages);
> +destroy_pages:
> +       kvfree(userptr->pages);
>         return rc;
>  }
>
> @@ -1405,8 +1401,6 @@ int hl_pin_host_memory(struct hl_device *hdev, u64 addr, u64 size,
>   */
>  void hl_unpin_host_memory(struct hl_device *hdev, struct hl_userptr *userptr)
>  {
> -       struct page **pages;
> -
>         hl_debugfs_remove_userptr(hdev, userptr);
>
>         if (userptr->dma_mapped)
> @@ -1414,15 +1408,8 @@ void hl_unpin_host_memory(struct hl_device *hdev, struct hl_userptr *userptr)
>                                                         userptr->sgt->nents,
>                                                         userptr->dir);
>
> -       pages = frame_vector_pages(userptr->vec);
> -       if (!IS_ERR(pages)) {
> -               int i;
> -
> -               for (i = 0; i < frame_vector_count(userptr->vec); i++)
> -                       set_page_dirty_lock(pages[i]);
> -       }
> -       put_vaddr_frames(userptr->vec);
> -       frame_vector_destroy(userptr->vec);
> +       unpin_user_pages_dirty_lock(userptr->pages, userptr->npages, true);
> +       kvfree(userptr->pages);
>
>         list_del(&userptr->job_node);
>
> --
> 2.28.0
>

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 03/17] misc/habana: Stop using frame_vector helpers
@ 2020-10-10 20:26     ` Oded Gabbay
  0 siblings, 0 replies; 214+ messages in thread
From: Oded Gabbay @ 2020-10-10 20:26 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: linux-s390, linux-samsung-soc, Jan Kara, kvm, Jason Gunthorpe,
	Pawel Piskorski, John Hubbard, LKML, DRI Development,
	Ofir Bitton, linux-mm, Jérôme Glisse, Tomer Tayar,
	Omer Shpigelman, Greg Kroah-Hartman, Daniel Vetter,
	Andrew Morton, Moti Haimovski, Dan Williams,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>, ,
	Linux Media Mailing List

On Fri, Oct 9, 2020 at 10:59 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
>
> All we need are a pages array, pin_user_pages_fast can give us that
> directly. Plus this avoids the entire raw pfn side of get_vaddr_frames.
>
Thanks for the patch Daniel.

> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: linux-mm@kvack.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-samsung-soc@vger.kernel.org
> Cc: linux-media@vger.kernel.org
> Cc: Oded Gabbay <oded.gabbay@gmail.com>
> Cc: Omer Shpigelman <oshpigelman@habana.ai>
> Cc: Ofir Bitton <obitton@habana.ai>
> Cc: Tomer Tayar <ttayar@habana.ai>
> Cc: Moti Haimovski <mhaimovski@habana.ai>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Pawel Piskorski <ppiskorski@habana.ai>
> --
> v2: Use unpin_user_pages_dirty_lock (John)
> ---
>  drivers/misc/habanalabs/Kconfig             |  1 -
>  drivers/misc/habanalabs/common/habanalabs.h |  3 +-
>  drivers/misc/habanalabs/common/memory.c     | 49 ++++++++-------------
>  3 files changed, 20 insertions(+), 33 deletions(-)
>
> diff --git a/drivers/misc/habanalabs/Kconfig b/drivers/misc/habanalabs/Kconfig
> index 8eb5d38c618e..2f04187f7167 100644
> --- a/drivers/misc/habanalabs/Kconfig
> +++ b/drivers/misc/habanalabs/Kconfig
> @@ -6,7 +6,6 @@
>  config HABANA_AI
>         tristate "HabanaAI accelerators (habanalabs)"
>         depends on PCI && HAS_IOMEM
> -       select FRAME_VECTOR
>         select DMA_SHARED_BUFFER
>         select GENERIC_ALLOCATOR
>         select HWMON
> diff --git a/drivers/misc/habanalabs/common/habanalabs.h b/drivers/misc/habanalabs/common/habanalabs.h
> index edbd627b29d2..c1b3ad613b15 100644
> --- a/drivers/misc/habanalabs/common/habanalabs.h
> +++ b/drivers/misc/habanalabs/common/habanalabs.h
> @@ -881,7 +881,8 @@ struct hl_ctx_mgr {
>  struct hl_userptr {
>         enum vm_type_t          vm_type; /* must be first */
>         struct list_head        job_node;
> -       struct frame_vector     *vec;
> +       struct page             **pages;
> +       unsigned int            npages;
Can you please update the kerneldoc comment section of this structure
according to your changes ?

>         struct sg_table         *sgt;
>         enum dma_data_direction dir;
>         struct list_head        debugfs_list;
> diff --git a/drivers/misc/habanalabs/common/memory.c b/drivers/misc/habanalabs/common/memory.c
> index 5ff4688683fd..327b64479f97 100644
> --- a/drivers/misc/habanalabs/common/memory.c
> +++ b/drivers/misc/habanalabs/common/memory.c
> @@ -1281,45 +1281,41 @@ static int get_user_memory(struct hl_device *hdev, u64 addr, u64 size,
>                 return -EFAULT;
>         }
>
> -       userptr->vec = frame_vector_create(npages);
> -       if (!userptr->vec) {
> +       userptr->pages = kvmalloc_array(npages, sizeof(*userptr->pages),
> +                                       GFP_KERNEL);
> +       if (!userptr->pages) {
>                 dev_err(hdev->dev, "Failed to create frame vector\n");
>                 return -ENOMEM;
>         }
>
> -       rc = get_vaddr_frames(start, npages, FOLL_FORCE | FOLL_WRITE,
> -                               userptr->vec);
> +       rc = pin_user_pages_fast(start, npages, FOLL_FORCE | FOLL_WRITE,
> +                                userptr->pages);
>
>         if (rc != npages) {
>                 dev_err(hdev->dev,
>                         "Failed to map host memory, user ptr probably wrong\n");
>                 if (rc < 0)
> -                       goto destroy_framevec;
> +                       goto destroy_pages;
> +               npages = rc;
>                 rc = -EFAULT;
> -               goto put_framevec;
> -       }
> -
> -       if (frame_vector_to_pages(userptr->vec) < 0) {
> -               dev_err(hdev->dev,
> -                       "Failed to translate frame vector to pages\n");
> -               rc = -EFAULT;
> -               goto put_framevec;
> +               goto put_pages;
>         }
> +       userptr->npages = npages;
>
>         rc = sg_alloc_table_from_pages(userptr->sgt,
> -                                       frame_vector_pages(userptr->vec),
> -                                       npages, offset, size, GFP_ATOMIC);
> +                                      userptr->pages,
> +                                      npages, offset, size, GFP_ATOMIC);
I think that because the call to kvmalloc_array() is done with
GFP_KERNEL, there is no point in using GFP_ATOMIC here.
And actually, this path only needs to avoid yielding when using a
special debug mode.
So I suggest putting here GFP_KERNEL.

In the meanwhile, I'll run this patch (coupled with the next patch) in
our C/I to make sure there are no regressions.
Thanks,
Oded

>         if (rc < 0) {
>                 dev_err(hdev->dev, "failed to create SG table from pages\n");
> -               goto put_framevec;
> +               goto put_pages;
>         }
>
>         return 0;
>
> -put_framevec:
> -       put_vaddr_frames(userptr->vec);
> -destroy_framevec:
> -       frame_vector_destroy(userptr->vec);
> +put_pages:
> +       unpin_user_pages(userptr->pages, npages);
> +destroy_pages:
> +       kvfree(userptr->pages);
>         return rc;
>  }
>
> @@ -1405,8 +1401,6 @@ int hl_pin_host_memory(struct hl_device *hdev, u64 addr, u64 size,
>   */
>  void hl_unpin_host_memory(struct hl_device *hdev, struct hl_userptr *userptr)
>  {
> -       struct page **pages;
> -
>         hl_debugfs_remove_userptr(hdev, userptr);
>
>         if (userptr->dma_mapped)
> @@ -1414,15 +1408,8 @@ void hl_unpin_host_memory(struct hl_device *hdev, struct hl_userptr *userptr)
>                                                         userptr->sgt->nents,
>                                                         userptr->dir);
>
> -       pages = frame_vector_pages(userptr->vec);
> -       if (!IS_ERR(pages)) {
> -               int i;
> -
> -               for (i = 0; i < frame_vector_count(userptr->vec); i++)
> -                       set_page_dirty_lock(pages[i]);
> -       }
> -       put_vaddr_frames(userptr->vec);
> -       frame_vector_destroy(userptr->vec);
> +       unpin_user_pages_dirty_lock(userptr->pages, userptr->npages, true);
> +       kvfree(userptr->pages);
>
>         list_del(&userptr->job_node);
>
> --
> 2.28.0
>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 03/17] misc/habana: Stop using frame_vector helpers
@ 2020-10-10 20:26     ` Oded Gabbay
  0 siblings, 0 replies; 214+ messages in thread
From: Oded Gabbay @ 2020-10-10 20:26 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: linux-s390, linux-samsung-soc, Jan Kara, kvm, Jason Gunthorpe,
	Pawel Piskorski, John Hubbard, LKML, DRI Development,
	Ofir Bitton, linux-mm, Jérôme Glisse, Tomer Tayar,
	Omer Shpigelman, Greg Kroah-Hartman, Daniel Vetter,
	Andrew Morton, Moti Haimovski, Dan Williams,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>, ,
	Linux Media Mailing List

On Fri, Oct 9, 2020 at 10:59 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
>
> All we need are a pages array, pin_user_pages_fast can give us that
> directly. Plus this avoids the entire raw pfn side of get_vaddr_frames.
>
Thanks for the patch Daniel.

> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: linux-mm@kvack.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-samsung-soc@vger.kernel.org
> Cc: linux-media@vger.kernel.org
> Cc: Oded Gabbay <oded.gabbay@gmail.com>
> Cc: Omer Shpigelman <oshpigelman@habana.ai>
> Cc: Ofir Bitton <obitton@habana.ai>
> Cc: Tomer Tayar <ttayar@habana.ai>
> Cc: Moti Haimovski <mhaimovski@habana.ai>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Pawel Piskorski <ppiskorski@habana.ai>
> --
> v2: Use unpin_user_pages_dirty_lock (John)
> ---
>  drivers/misc/habanalabs/Kconfig             |  1 -
>  drivers/misc/habanalabs/common/habanalabs.h |  3 +-
>  drivers/misc/habanalabs/common/memory.c     | 49 ++++++++-------------
>  3 files changed, 20 insertions(+), 33 deletions(-)
>
> diff --git a/drivers/misc/habanalabs/Kconfig b/drivers/misc/habanalabs/Kconfig
> index 8eb5d38c618e..2f04187f7167 100644
> --- a/drivers/misc/habanalabs/Kconfig
> +++ b/drivers/misc/habanalabs/Kconfig
> @@ -6,7 +6,6 @@
>  config HABANA_AI
>         tristate "HabanaAI accelerators (habanalabs)"
>         depends on PCI && HAS_IOMEM
> -       select FRAME_VECTOR
>         select DMA_SHARED_BUFFER
>         select GENERIC_ALLOCATOR
>         select HWMON
> diff --git a/drivers/misc/habanalabs/common/habanalabs.h b/drivers/misc/habanalabs/common/habanalabs.h
> index edbd627b29d2..c1b3ad613b15 100644
> --- a/drivers/misc/habanalabs/common/habanalabs.h
> +++ b/drivers/misc/habanalabs/common/habanalabs.h
> @@ -881,7 +881,8 @@ struct hl_ctx_mgr {
>  struct hl_userptr {
>         enum vm_type_t          vm_type; /* must be first */
>         struct list_head        job_node;
> -       struct frame_vector     *vec;
> +       struct page             **pages;
> +       unsigned int            npages;
Can you please update the kerneldoc comment section of this structure
according to your changes ?

>         struct sg_table         *sgt;
>         enum dma_data_direction dir;
>         struct list_head        debugfs_list;
> diff --git a/drivers/misc/habanalabs/common/memory.c b/drivers/misc/habanalabs/common/memory.c
> index 5ff4688683fd..327b64479f97 100644
> --- a/drivers/misc/habanalabs/common/memory.c
> +++ b/drivers/misc/habanalabs/common/memory.c
> @@ -1281,45 +1281,41 @@ static int get_user_memory(struct hl_device *hdev, u64 addr, u64 size,
>                 return -EFAULT;
>         }
>
> -       userptr->vec = frame_vector_create(npages);
> -       if (!userptr->vec) {
> +       userptr->pages = kvmalloc_array(npages, sizeof(*userptr->pages),
> +                                       GFP_KERNEL);
> +       if (!userptr->pages) {
>                 dev_err(hdev->dev, "Failed to create frame vector\n");
>                 return -ENOMEM;
>         }
>
> -       rc = get_vaddr_frames(start, npages, FOLL_FORCE | FOLL_WRITE,
> -                               userptr->vec);
> +       rc = pin_user_pages_fast(start, npages, FOLL_FORCE | FOLL_WRITE,
> +                                userptr->pages);
>
>         if (rc != npages) {
>                 dev_err(hdev->dev,
>                         "Failed to map host memory, user ptr probably wrong\n");
>                 if (rc < 0)
> -                       goto destroy_framevec;
> +                       goto destroy_pages;
> +               npages = rc;
>                 rc = -EFAULT;
> -               goto put_framevec;
> -       }
> -
> -       if (frame_vector_to_pages(userptr->vec) < 0) {
> -               dev_err(hdev->dev,
> -                       "Failed to translate frame vector to pages\n");
> -               rc = -EFAULT;
> -               goto put_framevec;
> +               goto put_pages;
>         }
> +       userptr->npages = npages;
>
>         rc = sg_alloc_table_from_pages(userptr->sgt,
> -                                       frame_vector_pages(userptr->vec),
> -                                       npages, offset, size, GFP_ATOMIC);
> +                                      userptr->pages,
> +                                      npages, offset, size, GFP_ATOMIC);
I think that because the call to kvmalloc_array() is done with
GFP_KERNEL, there is no point in using GFP_ATOMIC here.
And actually, this path only needs to avoid yielding when using a
special debug mode.
So I suggest putting here GFP_KERNEL.

In the meanwhile, I'll run this patch (coupled with the next patch) in
our C/I to make sure there are no regressions.
Thanks,
Oded

>         if (rc < 0) {
>                 dev_err(hdev->dev, "failed to create SG table from pages\n");
> -               goto put_framevec;
> +               goto put_pages;
>         }
>
>         return 0;
>
> -put_framevec:
> -       put_vaddr_frames(userptr->vec);
> -destroy_framevec:
> -       frame_vector_destroy(userptr->vec);
> +put_pages:
> +       unpin_user_pages(userptr->pages, npages);
> +destroy_pages:
> +       kvfree(userptr->pages);
>         return rc;
>  }
>
> @@ -1405,8 +1401,6 @@ int hl_pin_host_memory(struct hl_device *hdev, u64 addr, u64 size,
>   */
>  void hl_unpin_host_memory(struct hl_device *hdev, struct hl_userptr *userptr)
>  {
> -       struct page **pages;
> -
>         hl_debugfs_remove_userptr(hdev, userptr);
>
>         if (userptr->dma_mapped)
> @@ -1414,15 +1408,8 @@ void hl_unpin_host_memory(struct hl_device *hdev, struct hl_userptr *userptr)
>                                                         userptr->sgt->nents,
>                                                         userptr->dir);
>
> -       pages = frame_vector_pages(userptr->vec);
> -       if (!IS_ERR(pages)) {
> -               int i;
> -
> -               for (i = 0; i < frame_vector_count(userptr->vec); i++)
> -                       set_page_dirty_lock(pages[i]);
> -       }
> -       put_vaddr_frames(userptr->vec);
> -       frame_vector_destroy(userptr->vec);
> +       unpin_user_pages_dirty_lock(userptr->pages, userptr->npages, true);
> +       kvfree(userptr->pages);
>
>         list_del(&userptr->job_node);
>
> --
> 2.28.0
>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 09/17] mm: Add unsafe_follow_pfn
  2020-10-09 17:52             ` Daniel Vetter
  (?)
@ 2020-10-10 21:11               ` Laurent Pinchart
  -1 siblings, 0 replies; 214+ messages in thread
From: Laurent Pinchart @ 2020-10-10 21:11 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Jason Gunthorpe, Mauro Carvalho Chehab, DRI Development, LKML,
	KVM list, Linux MM, Linux ARM, linux-samsung-soc,
	open list:DMA BUFFER SHARING FRAMEWORK, linux-s390,
	Daniel Vetter, Kees Cook, Dan Williams, Andrew Morton,
	John Hubbard, Jérôme Glisse, Jan Kara, Linus Torvalds

Hi Daniel,

On Fri, Oct 09, 2020 at 07:52:05PM +0200, Daniel Vetter wrote:
> On Fri, Oct 9, 2020 at 2:48 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > On Fri, Oct 09, 2020 at 02:37:23PM +0200, Mauro Carvalho Chehab wrote:
> >
> > > I'm not a mm/ expert, but, from what I understood from Daniel's patch
> > > description is that this is unsafe *only if*  __GFP_MOVABLE is used.
> >
> > No, it is unconditionally unsafe. The CMA movable mappings are
> > specific VMAs that will have bad issues here, but there are other
> > types too.
> >
> > The only way to do something at a VMA level is to have a list of OK
> > VMAs, eg because they were creatd via a special mmap helper from the
> > media subsystem.
> >
> > > Well, no drivers inside the media subsystem uses such flag, although
> > > they may rely on some infrastructure that could be using it behind
> > > the bars.
> >
> > It doesn't matter, nothing prevents the user from calling media APIs
> > on mmaps it gets from other subsystems.
> 
> I think a good first step would be to disable userptr of non struct
> page backed storage going forward for any new hw support. Even on
> existing drivers. dma-buf sharing has been around for long enough now
> that this shouldn't be a problem. Unfortunately right now this doesn't
> seem to exist, so the entire problem keeps getting perpetuated.

On the V4L2 side, I think we should disable USERPTR for any new driver,
period. That's what I've been recommended when reviewing patches for
several years already. It's a deprecated API, it should be phased out,
which starts by not allowing any new use case.

> > > If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE
> > > flag that it would be denying the core mm code to set __GFP_MOVABLE.
> >
> > We can't tell from the VMA these kinds of details..
> >
> > It has to go the other direction, evey mmap that might be used as a
> > userptr here has to be found and the VMA specially created to allow
> > its use. At least that is a kernel only change, but will need people
> > with the HW to do this work.
> 
> I think the only reasonable way to keep this working is:
> - add a struct dma_buf *vma_tryget_dma_buf(struct vm_area_struct *vma);
> - add dma-buf export support to fbdev and v4l
> - roll this out everywhere we still need it.
> 
> Realistically this just isn't going to happen. And anything else just
> reimplements half of dma-buf, which is kinda pointless (you need
> minimally refcounting and some way to get at a promise of a permanent
> sg list for dma. Plus probably the vmap for kernel cpu access.
> 
> > > Please let address the issue on this way, instead of broken an
> > > userspace API that it is there since 1991.
> >
> > It has happened before :( It took 4 years for RDMA to undo the uAPI
> > breakage caused by a security fix for something that was a 15 years
> > old bug.
> 
> Yeah we have a bunch of these on the drm side too. Some of them are
> really just "you have to upgrade userspace", and there's no real fix
> for the security nightmare without that.

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 09/17] mm: Add unsafe_follow_pfn
@ 2020-10-10 21:11               ` Laurent Pinchart
  0 siblings, 0 replies; 214+ messages in thread
From: Laurent Pinchart @ 2020-10-10 21:11 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, KVM list,
	Linux MM, Mauro Carvalho Chehab, John Hubbard, LKML,
	DRI Development, Jason Gunthorpe, Jérôme Glisse,
	Daniel Vetter, Dan Williams, Linus Torvalds, Andrew Morton,
	Linux ARM, open list:DMA BUFFER SHARING FRAMEWORK

Hi Daniel,

On Fri, Oct 09, 2020 at 07:52:05PM +0200, Daniel Vetter wrote:
> On Fri, Oct 9, 2020 at 2:48 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > On Fri, Oct 09, 2020 at 02:37:23PM +0200, Mauro Carvalho Chehab wrote:
> >
> > > I'm not a mm/ expert, but, from what I understood from Daniel's patch
> > > description is that this is unsafe *only if*  __GFP_MOVABLE is used.
> >
> > No, it is unconditionally unsafe. The CMA movable mappings are
> > specific VMAs that will have bad issues here, but there are other
> > types too.
> >
> > The only way to do something at a VMA level is to have a list of OK
> > VMAs, eg because they were creatd via a special mmap helper from the
> > media subsystem.
> >
> > > Well, no drivers inside the media subsystem uses such flag, although
> > > they may rely on some infrastructure that could be using it behind
> > > the bars.
> >
> > It doesn't matter, nothing prevents the user from calling media APIs
> > on mmaps it gets from other subsystems.
> 
> I think a good first step would be to disable userptr of non struct
> page backed storage going forward for any new hw support. Even on
> existing drivers. dma-buf sharing has been around for long enough now
> that this shouldn't be a problem. Unfortunately right now this doesn't
> seem to exist, so the entire problem keeps getting perpetuated.

On the V4L2 side, I think we should disable USERPTR for any new driver,
period. That's what I've been recommended when reviewing patches for
several years already. It's a deprecated API, it should be phased out,
which starts by not allowing any new use case.

> > > If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE
> > > flag that it would be denying the core mm code to set __GFP_MOVABLE.
> >
> > We can't tell from the VMA these kinds of details..
> >
> > It has to go the other direction, evey mmap that might be used as a
> > userptr here has to be found and the VMA specially created to allow
> > its use. At least that is a kernel only change, but will need people
> > with the HW to do this work.
> 
> I think the only reasonable way to keep this working is:
> - add a struct dma_buf *vma_tryget_dma_buf(struct vm_area_struct *vma);
> - add dma-buf export support to fbdev and v4l
> - roll this out everywhere we still need it.
> 
> Realistically this just isn't going to happen. And anything else just
> reimplements half of dma-buf, which is kinda pointless (you need
> minimally refcounting and some way to get at a promise of a permanent
> sg list for dma. Plus probably the vmap for kernel cpu access.
> 
> > > Please let address the issue on this way, instead of broken an
> > > userspace API that it is there since 1991.
> >
> > It has happened before :( It took 4 years for RDMA to undo the uAPI
> > breakage caused by a security fix for something that was a 15 years
> > old bug.
> 
> Yeah we have a bunch of these on the drm side too. Some of them are
> really just "you have to upgrade userspace", and there's no real fix
> for the security nightmare without that.

-- 
Regards,

Laurent Pinchart

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 09/17] mm: Add unsafe_follow_pfn
@ 2020-10-10 21:11               ` Laurent Pinchart
  0 siblings, 0 replies; 214+ messages in thread
From: Laurent Pinchart @ 2020-10-10 21:11 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, KVM list,
	Linux MM, Mauro Carvalho Chehab, John Hubbard, LKML,
	DRI Development, Jason Gunthorpe, Jérôme Glisse,
	Daniel Vetter, Dan Williams, Linus Torvalds, Andrew Morton,
	Linux ARM, open list:DMA BUFFER SHARING FRAMEWORK

Hi Daniel,

On Fri, Oct 09, 2020 at 07:52:05PM +0200, Daniel Vetter wrote:
> On Fri, Oct 9, 2020 at 2:48 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > On Fri, Oct 09, 2020 at 02:37:23PM +0200, Mauro Carvalho Chehab wrote:
> >
> > > I'm not a mm/ expert, but, from what I understood from Daniel's patch
> > > description is that this is unsafe *only if*  __GFP_MOVABLE is used.
> >
> > No, it is unconditionally unsafe. The CMA movable mappings are
> > specific VMAs that will have bad issues here, but there are other
> > types too.
> >
> > The only way to do something at a VMA level is to have a list of OK
> > VMAs, eg because they were creatd via a special mmap helper from the
> > media subsystem.
> >
> > > Well, no drivers inside the media subsystem uses such flag, although
> > > they may rely on some infrastructure that could be using it behind
> > > the bars.
> >
> > It doesn't matter, nothing prevents the user from calling media APIs
> > on mmaps it gets from other subsystems.
> 
> I think a good first step would be to disable userptr of non struct
> page backed storage going forward for any new hw support. Even on
> existing drivers. dma-buf sharing has been around for long enough now
> that this shouldn't be a problem. Unfortunately right now this doesn't
> seem to exist, so the entire problem keeps getting perpetuated.

On the V4L2 side, I think we should disable USERPTR for any new driver,
period. That's what I've been recommended when reviewing patches for
several years already. It's a deprecated API, it should be phased out,
which starts by not allowing any new use case.

> > > If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE
> > > flag that it would be denying the core mm code to set __GFP_MOVABLE.
> >
> > We can't tell from the VMA these kinds of details..
> >
> > It has to go the other direction, evey mmap that might be used as a
> > userptr here has to be found and the VMA specially created to allow
> > its use. At least that is a kernel only change, but will need people
> > with the HW to do this work.
> 
> I think the only reasonable way to keep this working is:
> - add a struct dma_buf *vma_tryget_dma_buf(struct vm_area_struct *vma);
> - add dma-buf export support to fbdev and v4l
> - roll this out everywhere we still need it.
> 
> Realistically this just isn't going to happen. And anything else just
> reimplements half of dma-buf, which is kinda pointless (you need
> minimally refcounting and some way to get at a promise of a permanent
> sg list for dma. Plus probably the vmap for kernel cpu access.
> 
> > > Please let address the issue on this way, instead of broken an
> > > userspace API that it is there since 1991.
> >
> > It has happened before :( It took 4 years for RDMA to undo the uAPI
> > breakage caused by a security fix for something that was a 15 years
> > old bug.
> 
> Yeah we have a bunch of these on the drm side too. Some of them are
> really just "you have to upgrade userspace", and there's no real fix
> for the security nightmare without that.

-- 
Regards,

Laurent Pinchart
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 03/17] misc/habana: Stop using frame_vector helpers
  2020-10-10 20:26     ` Oded Gabbay
  (?)
@ 2020-10-10 21:32       ` Daniel Vetter
  -1 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-10 21:32 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: DRI Development, LKML, KVM list, linux-mm,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>,,
	linux-samsung-soc, Linux Media Mailing List, linux-s390,
	Daniel Vetter, Jason Gunthorpe, Andrew Morton, John Hubbard,
	Jérôme Glisse, Jan Kara, Dan Williams, Omer Shpigelman,
	Ofir Bitton, Tomer Tayar, Moti Haimovski, Greg Kroah-Hartman,
	Pawel Piskorski

On Sat, Oct 10, 2020 at 10:27 PM Oded Gabbay <oded.gabbay@gmail.com> wrote:
>
> On Fri, Oct 9, 2020 at 10:59 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> >
> > All we need are a pages array, pin_user_pages_fast can give us that
> > directly. Plus this avoids the entire raw pfn side of get_vaddr_frames.
> >
> Thanks for the patch Daniel.
>
> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > Cc: Jason Gunthorpe <jgg@ziepe.ca>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: John Hubbard <jhubbard@nvidia.com>
> > Cc: Jérôme Glisse <jglisse@redhat.com>
> > Cc: Jan Kara <jack@suse.cz>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: linux-mm@kvack.org
> > Cc: linux-arm-kernel@lists.infradead.org
> > Cc: linux-samsung-soc@vger.kernel.org
> > Cc: linux-media@vger.kernel.org
> > Cc: Oded Gabbay <oded.gabbay@gmail.com>
> > Cc: Omer Shpigelman <oshpigelman@habana.ai>
> > Cc: Ofir Bitton <obitton@habana.ai>
> > Cc: Tomer Tayar <ttayar@habana.ai>
> > Cc: Moti Haimovski <mhaimovski@habana.ai>
> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > Cc: Pawel Piskorski <ppiskorski@habana.ai>
> > --
> > v2: Use unpin_user_pages_dirty_lock (John)
> > ---
> >  drivers/misc/habanalabs/Kconfig             |  1 -
> >  drivers/misc/habanalabs/common/habanalabs.h |  3 +-
> >  drivers/misc/habanalabs/common/memory.c     | 49 ++++++++-------------
> >  3 files changed, 20 insertions(+), 33 deletions(-)
> >
> > diff --git a/drivers/misc/habanalabs/Kconfig b/drivers/misc/habanalabs/Kconfig
> > index 8eb5d38c618e..2f04187f7167 100644
> > --- a/drivers/misc/habanalabs/Kconfig
> > +++ b/drivers/misc/habanalabs/Kconfig
> > @@ -6,7 +6,6 @@
> >  config HABANA_AI
> >         tristate "HabanaAI accelerators (habanalabs)"
> >         depends on PCI && HAS_IOMEM
> > -       select FRAME_VECTOR
> >         select DMA_SHARED_BUFFER
> >         select GENERIC_ALLOCATOR
> >         select HWMON
> > diff --git a/drivers/misc/habanalabs/common/habanalabs.h b/drivers/misc/habanalabs/common/habanalabs.h
> > index edbd627b29d2..c1b3ad613b15 100644
> > --- a/drivers/misc/habanalabs/common/habanalabs.h
> > +++ b/drivers/misc/habanalabs/common/habanalabs.h
> > @@ -881,7 +881,8 @@ struct hl_ctx_mgr {
> >  struct hl_userptr {
> >         enum vm_type_t          vm_type; /* must be first */
> >         struct list_head        job_node;
> > -       struct frame_vector     *vec;
> > +       struct page             **pages;
> > +       unsigned int            npages;
> Can you please update the kerneldoc comment section of this structure
> according to your changes ?

Apologies I missed the nice kerneldoc. I'll fix that in the next round.


> >         struct sg_table         *sgt;
> >         enum dma_data_direction dir;
> >         struct list_head        debugfs_list;
> > diff --git a/drivers/misc/habanalabs/common/memory.c b/drivers/misc/habanalabs/common/memory.c
> > index 5ff4688683fd..327b64479f97 100644
> > --- a/drivers/misc/habanalabs/common/memory.c
> > +++ b/drivers/misc/habanalabs/common/memory.c
> > @@ -1281,45 +1281,41 @@ static int get_user_memory(struct hl_device *hdev, u64 addr, u64 size,
> >                 return -EFAULT;
> >         }
> >
> > -       userptr->vec = frame_vector_create(npages);
> > -       if (!userptr->vec) {
> > +       userptr->pages = kvmalloc_array(npages, sizeof(*userptr->pages),
> > +                                       GFP_KERNEL);
> > +       if (!userptr->pages) {
> >                 dev_err(hdev->dev, "Failed to create frame vector\n");
> >                 return -ENOMEM;
> >         }
> >
> > -       rc = get_vaddr_frames(start, npages, FOLL_FORCE | FOLL_WRITE,
> > -                               userptr->vec);
> > +       rc = pin_user_pages_fast(start, npages, FOLL_FORCE | FOLL_WRITE,
> > +                                userptr->pages);
> >
> >         if (rc != npages) {
> >                 dev_err(hdev->dev,
> >                         "Failed to map host memory, user ptr probably wrong\n");
> >                 if (rc < 0)
> > -                       goto destroy_framevec;
> > +                       goto destroy_pages;
> > +               npages = rc;
> >                 rc = -EFAULT;
> > -               goto put_framevec;
> > -       }
> > -
> > -       if (frame_vector_to_pages(userptr->vec) < 0) {
> > -               dev_err(hdev->dev,
> > -                       "Failed to translate frame vector to pages\n");
> > -               rc = -EFAULT;
> > -               goto put_framevec;
> > +               goto put_pages;
> >         }
> > +       userptr->npages = npages;
> >
> >         rc = sg_alloc_table_from_pages(userptr->sgt,
> > -                                       frame_vector_pages(userptr->vec),
> > -                                       npages, offset, size, GFP_ATOMIC);
> > +                                      userptr->pages,
> > +                                      npages, offset, size, GFP_ATOMIC);
> I think that because the call to kvmalloc_array() is done with
> GFP_KERNEL, there is no point in using GFP_ATOMIC here.
> And actually, this path only needs to avoid yielding when using a
> special debug mode.
> So I suggest putting here GFP_KERNEL.

Huh, I didn't even notice the GFP_ATOMIC here. This looks indeed
strange and GFP_KERNEL should be perfectly fine in a function that
also calls pin_user_pages (since that one can allocate and do worse
stuff like userspace pagefaults).

But since that GFP_ATOMIC is there already I'll do that in a separate patch.

> In the meanwhile, I'll run this patch (coupled with the next patch) in
> our C/I to make sure there are no regressions.

Excellent. I'll wait with v3 until that's done, just in case you hit a
snag I need to fix.

Cheers, Daniel

> Thanks,
> Oded
>
> >         if (rc < 0) {
> >                 dev_err(hdev->dev, "failed to create SG table from pages\n");
> > -               goto put_framevec;
> > +               goto put_pages;
> >         }
> >
> >         return 0;
> >
> > -put_framevec:
> > -       put_vaddr_frames(userptr->vec);
> > -destroy_framevec:
> > -       frame_vector_destroy(userptr->vec);
> > +put_pages:
> > +       unpin_user_pages(userptr->pages, npages);
> > +destroy_pages:
> > +       kvfree(userptr->pages);
> >         return rc;
> >  }
> >
> > @@ -1405,8 +1401,6 @@ int hl_pin_host_memory(struct hl_device *hdev, u64 addr, u64 size,
> >   */
> >  void hl_unpin_host_memory(struct hl_device *hdev, struct hl_userptr *userptr)
> >  {
> > -       struct page **pages;
> > -
> >         hl_debugfs_remove_userptr(hdev, userptr);
> >
> >         if (userptr->dma_mapped)
> > @@ -1414,15 +1408,8 @@ void hl_unpin_host_memory(struct hl_device *hdev, struct hl_userptr *userptr)
> >                                                         userptr->sgt->nents,
> >                                                         userptr->dir);
> >
> > -       pages = frame_vector_pages(userptr->vec);
> > -       if (!IS_ERR(pages)) {
> > -               int i;
> > -
> > -               for (i = 0; i < frame_vector_count(userptr->vec); i++)
> > -                       set_page_dirty_lock(pages[i]);
> > -       }
> > -       put_vaddr_frames(userptr->vec);
> > -       frame_vector_destroy(userptr->vec);
> > +       unpin_user_pages_dirty_lock(userptr->pages, userptr->npages, true);
> > +       kvfree(userptr->pages);
> >
> >         list_del(&userptr->job_node);
> >
> > --
> > 2.28.0
> >



--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 03/17] misc/habana: Stop using frame_vector helpers
@ 2020-10-10 21:32       ` Daniel Vetter
  0 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-10 21:32 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: linux-s390, linux-samsung-soc, Jan Kara, KVM list,
	Jason Gunthorpe, Pawel Piskorski, John Hubbard, LKML,
	DRI Development, Ofir Bitton, linux-mm, Jérôme Glisse,
	Tomer Tayar, Omer Shpigelman, Greg Kroah-Hartman, Daniel Vetter,
	Andrew Morton, Moti Haimovski, Dan Williams,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>, ,
	Linux Media Mailing List

On Sat, Oct 10, 2020 at 10:27 PM Oded Gabbay <oded.gabbay@gmail.com> wrote:
>
> On Fri, Oct 9, 2020 at 10:59 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> >
> > All we need are a pages array, pin_user_pages_fast can give us that
> > directly. Plus this avoids the entire raw pfn side of get_vaddr_frames.
> >
> Thanks for the patch Daniel.
>
> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > Cc: Jason Gunthorpe <jgg@ziepe.ca>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: John Hubbard <jhubbard@nvidia.com>
> > Cc: Jérôme Glisse <jglisse@redhat.com>
> > Cc: Jan Kara <jack@suse.cz>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: linux-mm@kvack.org
> > Cc: linux-arm-kernel@lists.infradead.org
> > Cc: linux-samsung-soc@vger.kernel.org
> > Cc: linux-media@vger.kernel.org
> > Cc: Oded Gabbay <oded.gabbay@gmail.com>
> > Cc: Omer Shpigelman <oshpigelman@habana.ai>
> > Cc: Ofir Bitton <obitton@habana.ai>
> > Cc: Tomer Tayar <ttayar@habana.ai>
> > Cc: Moti Haimovski <mhaimovski@habana.ai>
> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > Cc: Pawel Piskorski <ppiskorski@habana.ai>
> > --
> > v2: Use unpin_user_pages_dirty_lock (John)
> > ---
> >  drivers/misc/habanalabs/Kconfig             |  1 -
> >  drivers/misc/habanalabs/common/habanalabs.h |  3 +-
> >  drivers/misc/habanalabs/common/memory.c     | 49 ++++++++-------------
> >  3 files changed, 20 insertions(+), 33 deletions(-)
> >
> > diff --git a/drivers/misc/habanalabs/Kconfig b/drivers/misc/habanalabs/Kconfig
> > index 8eb5d38c618e..2f04187f7167 100644
> > --- a/drivers/misc/habanalabs/Kconfig
> > +++ b/drivers/misc/habanalabs/Kconfig
> > @@ -6,7 +6,6 @@
> >  config HABANA_AI
> >         tristate "HabanaAI accelerators (habanalabs)"
> >         depends on PCI && HAS_IOMEM
> > -       select FRAME_VECTOR
> >         select DMA_SHARED_BUFFER
> >         select GENERIC_ALLOCATOR
> >         select HWMON
> > diff --git a/drivers/misc/habanalabs/common/habanalabs.h b/drivers/misc/habanalabs/common/habanalabs.h
> > index edbd627b29d2..c1b3ad613b15 100644
> > --- a/drivers/misc/habanalabs/common/habanalabs.h
> > +++ b/drivers/misc/habanalabs/common/habanalabs.h
> > @@ -881,7 +881,8 @@ struct hl_ctx_mgr {
> >  struct hl_userptr {
> >         enum vm_type_t          vm_type; /* must be first */
> >         struct list_head        job_node;
> > -       struct frame_vector     *vec;
> > +       struct page             **pages;
> > +       unsigned int            npages;
> Can you please update the kerneldoc comment section of this structure
> according to your changes ?

Apologies I missed the nice kerneldoc. I'll fix that in the next round.


> >         struct sg_table         *sgt;
> >         enum dma_data_direction dir;
> >         struct list_head        debugfs_list;
> > diff --git a/drivers/misc/habanalabs/common/memory.c b/drivers/misc/habanalabs/common/memory.c
> > index 5ff4688683fd..327b64479f97 100644
> > --- a/drivers/misc/habanalabs/common/memory.c
> > +++ b/drivers/misc/habanalabs/common/memory.c
> > @@ -1281,45 +1281,41 @@ static int get_user_memory(struct hl_device *hdev, u64 addr, u64 size,
> >                 return -EFAULT;
> >         }
> >
> > -       userptr->vec = frame_vector_create(npages);
> > -       if (!userptr->vec) {
> > +       userptr->pages = kvmalloc_array(npages, sizeof(*userptr->pages),
> > +                                       GFP_KERNEL);
> > +       if (!userptr->pages) {
> >                 dev_err(hdev->dev, "Failed to create frame vector\n");
> >                 return -ENOMEM;
> >         }
> >
> > -       rc = get_vaddr_frames(start, npages, FOLL_FORCE | FOLL_WRITE,
> > -                               userptr->vec);
> > +       rc = pin_user_pages_fast(start, npages, FOLL_FORCE | FOLL_WRITE,
> > +                                userptr->pages);
> >
> >         if (rc != npages) {
> >                 dev_err(hdev->dev,
> >                         "Failed to map host memory, user ptr probably wrong\n");
> >                 if (rc < 0)
> > -                       goto destroy_framevec;
> > +                       goto destroy_pages;
> > +               npages = rc;
> >                 rc = -EFAULT;
> > -               goto put_framevec;
> > -       }
> > -
> > -       if (frame_vector_to_pages(userptr->vec) < 0) {
> > -               dev_err(hdev->dev,
> > -                       "Failed to translate frame vector to pages\n");
> > -               rc = -EFAULT;
> > -               goto put_framevec;
> > +               goto put_pages;
> >         }
> > +       userptr->npages = npages;
> >
> >         rc = sg_alloc_table_from_pages(userptr->sgt,
> > -                                       frame_vector_pages(userptr->vec),
> > -                                       npages, offset, size, GFP_ATOMIC);
> > +                                      userptr->pages,
> > +                                      npages, offset, size, GFP_ATOMIC);
> I think that because the call to kvmalloc_array() is done with
> GFP_KERNEL, there is no point in using GFP_ATOMIC here.
> And actually, this path only needs to avoid yielding when using a
> special debug mode.
> So I suggest putting here GFP_KERNEL.

Huh, I didn't even notice the GFP_ATOMIC here. This looks indeed
strange and GFP_KERNEL should be perfectly fine in a function that
also calls pin_user_pages (since that one can allocate and do worse
stuff like userspace pagefaults).

But since that GFP_ATOMIC is there already I'll do that in a separate patch.

> In the meanwhile, I'll run this patch (coupled with the next patch) in
> our C/I to make sure there are no regressions.

Excellent. I'll wait with v3 until that's done, just in case you hit a
snag I need to fix.

Cheers, Daniel

> Thanks,
> Oded
>
> >         if (rc < 0) {
> >                 dev_err(hdev->dev, "failed to create SG table from pages\n");
> > -               goto put_framevec;
> > +               goto put_pages;
> >         }
> >
> >         return 0;
> >
> > -put_framevec:
> > -       put_vaddr_frames(userptr->vec);
> > -destroy_framevec:
> > -       frame_vector_destroy(userptr->vec);
> > +put_pages:
> > +       unpin_user_pages(userptr->pages, npages);
> > +destroy_pages:
> > +       kvfree(userptr->pages);
> >         return rc;
> >  }
> >
> > @@ -1405,8 +1401,6 @@ int hl_pin_host_memory(struct hl_device *hdev, u64 addr, u64 size,
> >   */
> >  void hl_unpin_host_memory(struct hl_device *hdev, struct hl_userptr *userptr)
> >  {
> > -       struct page **pages;
> > -
> >         hl_debugfs_remove_userptr(hdev, userptr);
> >
> >         if (userptr->dma_mapped)
> > @@ -1414,15 +1408,8 @@ void hl_unpin_host_memory(struct hl_device *hdev, struct hl_userptr *userptr)
> >                                                         userptr->sgt->nents,
> >                                                         userptr->dir);
> >
> > -       pages = frame_vector_pages(userptr->vec);
> > -       if (!IS_ERR(pages)) {
> > -               int i;
> > -
> > -               for (i = 0; i < frame_vector_count(userptr->vec); i++)
> > -                       set_page_dirty_lock(pages[i]);
> > -       }
> > -       put_vaddr_frames(userptr->vec);
> > -       frame_vector_destroy(userptr->vec);
> > +       unpin_user_pages_dirty_lock(userptr->pages, userptr->npages, true);
> > +       kvfree(userptr->pages);
> >
> >         list_del(&userptr->job_node);
> >
> > --
> > 2.28.0
> >



--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 03/17] misc/habana: Stop using frame_vector helpers
@ 2020-10-10 21:32       ` Daniel Vetter
  0 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-10 21:32 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: linux-s390, linux-samsung-soc, Jan Kara, KVM list,
	Jason Gunthorpe, Pawel Piskorski, John Hubbard, LKML,
	DRI Development, Ofir Bitton, linux-mm, Jérôme Glisse,
	Tomer Tayar, Omer Shpigelman, Greg Kroah-Hartman, Daniel Vetter,
	Andrew Morton, Moti Haimovski, Dan Williams,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>, ,
	Linux Media Mailing List

On Sat, Oct 10, 2020 at 10:27 PM Oded Gabbay <oded.gabbay@gmail.com> wrote:
>
> On Fri, Oct 9, 2020 at 10:59 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> >
> > All we need are a pages array, pin_user_pages_fast can give us that
> > directly. Plus this avoids the entire raw pfn side of get_vaddr_frames.
> >
> Thanks for the patch Daniel.
>
> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > Cc: Jason Gunthorpe <jgg@ziepe.ca>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: John Hubbard <jhubbard@nvidia.com>
> > Cc: Jérôme Glisse <jglisse@redhat.com>
> > Cc: Jan Kara <jack@suse.cz>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: linux-mm@kvack.org
> > Cc: linux-arm-kernel@lists.infradead.org
> > Cc: linux-samsung-soc@vger.kernel.org
> > Cc: linux-media@vger.kernel.org
> > Cc: Oded Gabbay <oded.gabbay@gmail.com>
> > Cc: Omer Shpigelman <oshpigelman@habana.ai>
> > Cc: Ofir Bitton <obitton@habana.ai>
> > Cc: Tomer Tayar <ttayar@habana.ai>
> > Cc: Moti Haimovski <mhaimovski@habana.ai>
> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > Cc: Pawel Piskorski <ppiskorski@habana.ai>
> > --
> > v2: Use unpin_user_pages_dirty_lock (John)
> > ---
> >  drivers/misc/habanalabs/Kconfig             |  1 -
> >  drivers/misc/habanalabs/common/habanalabs.h |  3 +-
> >  drivers/misc/habanalabs/common/memory.c     | 49 ++++++++-------------
> >  3 files changed, 20 insertions(+), 33 deletions(-)
> >
> > diff --git a/drivers/misc/habanalabs/Kconfig b/drivers/misc/habanalabs/Kconfig
> > index 8eb5d38c618e..2f04187f7167 100644
> > --- a/drivers/misc/habanalabs/Kconfig
> > +++ b/drivers/misc/habanalabs/Kconfig
> > @@ -6,7 +6,6 @@
> >  config HABANA_AI
> >         tristate "HabanaAI accelerators (habanalabs)"
> >         depends on PCI && HAS_IOMEM
> > -       select FRAME_VECTOR
> >         select DMA_SHARED_BUFFER
> >         select GENERIC_ALLOCATOR
> >         select HWMON
> > diff --git a/drivers/misc/habanalabs/common/habanalabs.h b/drivers/misc/habanalabs/common/habanalabs.h
> > index edbd627b29d2..c1b3ad613b15 100644
> > --- a/drivers/misc/habanalabs/common/habanalabs.h
> > +++ b/drivers/misc/habanalabs/common/habanalabs.h
> > @@ -881,7 +881,8 @@ struct hl_ctx_mgr {
> >  struct hl_userptr {
> >         enum vm_type_t          vm_type; /* must be first */
> >         struct list_head        job_node;
> > -       struct frame_vector     *vec;
> > +       struct page             **pages;
> > +       unsigned int            npages;
> Can you please update the kerneldoc comment section of this structure
> according to your changes ?

Apologies I missed the nice kerneldoc. I'll fix that in the next round.


> >         struct sg_table         *sgt;
> >         enum dma_data_direction dir;
> >         struct list_head        debugfs_list;
> > diff --git a/drivers/misc/habanalabs/common/memory.c b/drivers/misc/habanalabs/common/memory.c
> > index 5ff4688683fd..327b64479f97 100644
> > --- a/drivers/misc/habanalabs/common/memory.c
> > +++ b/drivers/misc/habanalabs/common/memory.c
> > @@ -1281,45 +1281,41 @@ static int get_user_memory(struct hl_device *hdev, u64 addr, u64 size,
> >                 return -EFAULT;
> >         }
> >
> > -       userptr->vec = frame_vector_create(npages);
> > -       if (!userptr->vec) {
> > +       userptr->pages = kvmalloc_array(npages, sizeof(*userptr->pages),
> > +                                       GFP_KERNEL);
> > +       if (!userptr->pages) {
> >                 dev_err(hdev->dev, "Failed to create frame vector\n");
> >                 return -ENOMEM;
> >         }
> >
> > -       rc = get_vaddr_frames(start, npages, FOLL_FORCE | FOLL_WRITE,
> > -                               userptr->vec);
> > +       rc = pin_user_pages_fast(start, npages, FOLL_FORCE | FOLL_WRITE,
> > +                                userptr->pages);
> >
> >         if (rc != npages) {
> >                 dev_err(hdev->dev,
> >                         "Failed to map host memory, user ptr probably wrong\n");
> >                 if (rc < 0)
> > -                       goto destroy_framevec;
> > +                       goto destroy_pages;
> > +               npages = rc;
> >                 rc = -EFAULT;
> > -               goto put_framevec;
> > -       }
> > -
> > -       if (frame_vector_to_pages(userptr->vec) < 0) {
> > -               dev_err(hdev->dev,
> > -                       "Failed to translate frame vector to pages\n");
> > -               rc = -EFAULT;
> > -               goto put_framevec;
> > +               goto put_pages;
> >         }
> > +       userptr->npages = npages;
> >
> >         rc = sg_alloc_table_from_pages(userptr->sgt,
> > -                                       frame_vector_pages(userptr->vec),
> > -                                       npages, offset, size, GFP_ATOMIC);
> > +                                      userptr->pages,
> > +                                      npages, offset, size, GFP_ATOMIC);
> I think that because the call to kvmalloc_array() is done with
> GFP_KERNEL, there is no point in using GFP_ATOMIC here.
> And actually, this path only needs to avoid yielding when using a
> special debug mode.
> So I suggest putting here GFP_KERNEL.

Huh, I didn't even notice the GFP_ATOMIC here. This looks indeed
strange and GFP_KERNEL should be perfectly fine in a function that
also calls pin_user_pages (since that one can allocate and do worse
stuff like userspace pagefaults).

But since that GFP_ATOMIC is there already I'll do that in a separate patch.

> In the meanwhile, I'll run this patch (coupled with the next patch) in
> our C/I to make sure there are no regressions.

Excellent. I'll wait with v3 until that's done, just in case you hit a
snag I need to fix.

Cheers, Daniel

> Thanks,
> Oded
>
> >         if (rc < 0) {
> >                 dev_err(hdev->dev, "failed to create SG table from pages\n");
> > -               goto put_framevec;
> > +               goto put_pages;
> >         }
> >
> >         return 0;
> >
> > -put_framevec:
> > -       put_vaddr_frames(userptr->vec);
> > -destroy_framevec:
> > -       frame_vector_destroy(userptr->vec);
> > +put_pages:
> > +       unpin_user_pages(userptr->pages, npages);
> > +destroy_pages:
> > +       kvfree(userptr->pages);
> >         return rc;
> >  }
> >
> > @@ -1405,8 +1401,6 @@ int hl_pin_host_memory(struct hl_device *hdev, u64 addr, u64 size,
> >   */
> >  void hl_unpin_host_memory(struct hl_device *hdev, struct hl_userptr *userptr)
> >  {
> > -       struct page **pages;
> > -
> >         hl_debugfs_remove_userptr(hdev, userptr);
> >
> >         if (userptr->dma_mapped)
> > @@ -1414,15 +1408,8 @@ void hl_unpin_host_memory(struct hl_device *hdev, struct hl_userptr *userptr)
> >                                                         userptr->sgt->nents,
> >                                                         userptr->dir);
> >
> > -       pages = frame_vector_pages(userptr->vec);
> > -       if (!IS_ERR(pages)) {
> > -               int i;
> > -
> > -               for (i = 0; i < frame_vector_count(userptr->vec); i++)
> > -                       set_page_dirty_lock(pages[i]);
> > -       }
> > -       put_vaddr_frames(userptr->vec);
> > -       frame_vector_destroy(userptr->vec);
> > +       unpin_user_pages_dirty_lock(userptr->pages, userptr->npages, true);
> > +       kvfree(userptr->pages);
> >
> >         list_del(&userptr->job_node);
> >
> > --
> > 2.28.0
> >



--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 09/17] mm: Add unsafe_follow_pfn
  2020-10-10 17:22               ` Tomasz Figa
  (?)
@ 2020-10-10 21:35                 ` Laurent Pinchart
  -1 siblings, 0 replies; 214+ messages in thread
From: Laurent Pinchart @ 2020-10-10 21:35 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: Daniel Vetter, linux-s390, linux-samsung-soc, Jan Kara,
	Kees Cook, KVM list, Linux MM, Mauro Carvalho Chehab,
	John Hubbard, LKML, DRI Development, Jason Gunthorpe,
	Jérôme Glisse, Daniel Vetter, Dan Williams,
	Linus Torvalds, Andrew Morton, Linux ARM,
	open list:DMA BUFFER SHARING FRAMEWORK

Hi Tomasz,

On Sat, Oct 10, 2020 at 07:22:48PM +0200, Tomasz Figa wrote:
> On Fri, Oct 9, 2020 at 7:52 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> > On Fri, Oct 9, 2020 at 2:48 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > > On Fri, Oct 09, 2020 at 02:37:23PM +0200, Mauro Carvalho Chehab wrote:
> > >
> > > > I'm not a mm/ expert, but, from what I understood from Daniel's patch
> > > > description is that this is unsafe *only if*  __GFP_MOVABLE is used.
> > >
> > > No, it is unconditionally unsafe. The CMA movable mappings are
> > > specific VMAs that will have bad issues here, but there are other
> > > types too.
> > >
> > > The only way to do something at a VMA level is to have a list of OK
> > > VMAs, eg because they were creatd via a special mmap helper from the
> > > media subsystem.
> > >
> > > > Well, no drivers inside the media subsystem uses such flag, although
> > > > they may rely on some infrastructure that could be using it behind
> > > > the bars.
> > >
> > > It doesn't matter, nothing prevents the user from calling media APIs
> > > on mmaps it gets from other subsystems.
> >
> > I think a good first step would be to disable userptr of non struct
> > page backed storage going forward for any new hw support. Even on
> > existing drivers. dma-buf sharing has been around for long enough now
> > that this shouldn't be a problem. Unfortunately right now this doesn't
> > seem to exist, so the entire problem keeps getting perpetuated.
> >
> > > > If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE
> > > > flag that it would be denying the core mm code to set __GFP_MOVABLE.
> > >
> > > We can't tell from the VMA these kinds of details..
> > >
> > > It has to go the other direction, evey mmap that might be used as a
> > > userptr here has to be found and the VMA specially created to allow
> > > its use. At least that is a kernel only change, but will need people
> > > with the HW to do this work.
> >
> > I think the only reasonable way to keep this working is:
> > - add a struct dma_buf *vma_tryget_dma_buf(struct vm_area_struct *vma);
> > - add dma-buf export support to fbdev and v4l
> 
> I assume you mean V4L2 and not the obsolete V4L that is emulated in
> the userspace by libv4l. If so, every video device that uses videobuf2
> gets DMA-buf export for free and there is nothing needed to enable it.
> 
> We probably still have a few legacy drivers using videobuf (non-2),
> but IMHO those should be safe to put behind some disabled-by-default
> Kconfig symbol or even completely drop, as the legacy framework has
> been deprecated for many years already.

There's 8 drivers left, and they support a very large number of devices.
I expect unhappy users distros stop shipping them. On the other hand,
videobuf has been deprecated for a loooooooong time, so there has been
plenty of time to convert the remaining drivers to videobuf2. If nobody
can do it, then we'll have to drop support for these devices given the
security issues.

We have moved media drivers to staging in the past when there wasn't
enough maintenance effort, we could do so here too.

> > - roll this out everywhere we still need it.
> >
> > Realistically this just isn't going to happen. And anything else just
> > reimplements half of dma-buf, which is kinda pointless (you need
> > minimally refcounting and some way to get at a promise of a permanent
> > sg list for dma. Plus probably the vmap for kernel cpu access.
> >
> > > > Please let address the issue on this way, instead of broken an
> > > > userspace API that it is there since 1991.
> > >
> > > It has happened before :( It took 4 years for RDMA to undo the uAPI
> > > breakage caused by a security fix for something that was a 15 years
> > > old bug.
> >
> > Yeah we have a bunch of these on the drm side too. Some of them are
> > really just "you have to upgrade userspace", and there's no real fix
> > for the security nightmare without that.
> 
> I think we need to phase out such userspace indeed. The Kconfig symbol
> allows enabling the unsafe functionality for anyone who still needs
> it, so I think it's not entirely a breakage.

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 09/17] mm: Add unsafe_follow_pfn
@ 2020-10-10 21:35                 ` Laurent Pinchart
  0 siblings, 0 replies; 214+ messages in thread
From: Laurent Pinchart @ 2020-10-10 21:35 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, KVM list,
	Jason Gunthorpe, Mauro Carvalho Chehab, Daniel Vetter, LKML,
	DRI Development, Linux MM, Jérôme Glisse, John Hubbard,
	Daniel Vetter, Dan Williams, Linus Torvalds, Andrew Morton,
	Linux ARM, open list:DMA BUFFER SHARING FRAMEWORK

Hi Tomasz,

On Sat, Oct 10, 2020 at 07:22:48PM +0200, Tomasz Figa wrote:
> On Fri, Oct 9, 2020 at 7:52 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> > On Fri, Oct 9, 2020 at 2:48 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > > On Fri, Oct 09, 2020 at 02:37:23PM +0200, Mauro Carvalho Chehab wrote:
> > >
> > > > I'm not a mm/ expert, but, from what I understood from Daniel's patch
> > > > description is that this is unsafe *only if*  __GFP_MOVABLE is used.
> > >
> > > No, it is unconditionally unsafe. The CMA movable mappings are
> > > specific VMAs that will have bad issues here, but there are other
> > > types too.
> > >
> > > The only way to do something at a VMA level is to have a list of OK
> > > VMAs, eg because they were creatd via a special mmap helper from the
> > > media subsystem.
> > >
> > > > Well, no drivers inside the media subsystem uses such flag, although
> > > > they may rely on some infrastructure that could be using it behind
> > > > the bars.
> > >
> > > It doesn't matter, nothing prevents the user from calling media APIs
> > > on mmaps it gets from other subsystems.
> >
> > I think a good first step would be to disable userptr of non struct
> > page backed storage going forward for any new hw support. Even on
> > existing drivers. dma-buf sharing has been around for long enough now
> > that this shouldn't be a problem. Unfortunately right now this doesn't
> > seem to exist, so the entire problem keeps getting perpetuated.
> >
> > > > If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE
> > > > flag that it would be denying the core mm code to set __GFP_MOVABLE.
> > >
> > > We can't tell from the VMA these kinds of details..
> > >
> > > It has to go the other direction, evey mmap that might be used as a
> > > userptr here has to be found and the VMA specially created to allow
> > > its use. At least that is a kernel only change, but will need people
> > > with the HW to do this work.
> >
> > I think the only reasonable way to keep this working is:
> > - add a struct dma_buf *vma_tryget_dma_buf(struct vm_area_struct *vma);
> > - add dma-buf export support to fbdev and v4l
> 
> I assume you mean V4L2 and not the obsolete V4L that is emulated in
> the userspace by libv4l. If so, every video device that uses videobuf2
> gets DMA-buf export for free and there is nothing needed to enable it.
> 
> We probably still have a few legacy drivers using videobuf (non-2),
> but IMHO those should be safe to put behind some disabled-by-default
> Kconfig symbol or even completely drop, as the legacy framework has
> been deprecated for many years already.

There's 8 drivers left, and they support a very large number of devices.
I expect unhappy users distros stop shipping them. On the other hand,
videobuf has been deprecated for a loooooooong time, so there has been
plenty of time to convert the remaining drivers to videobuf2. If nobody
can do it, then we'll have to drop support for these devices given the
security issues.

We have moved media drivers to staging in the past when there wasn't
enough maintenance effort, we could do so here too.

> > - roll this out everywhere we still need it.
> >
> > Realistically this just isn't going to happen. And anything else just
> > reimplements half of dma-buf, which is kinda pointless (you need
> > minimally refcounting and some way to get at a promise of a permanent
> > sg list for dma. Plus probably the vmap for kernel cpu access.
> >
> > > > Please let address the issue on this way, instead of broken an
> > > > userspace API that it is there since 1991.
> > >
> > > It has happened before :( It took 4 years for RDMA to undo the uAPI
> > > breakage caused by a security fix for something that was a 15 years
> > > old bug.
> >
> > Yeah we have a bunch of these on the drm side too. Some of them are
> > really just "you have to upgrade userspace", and there's no real fix
> > for the security nightmare without that.
> 
> I think we need to phase out such userspace indeed. The Kconfig symbol
> allows enabling the unsafe functionality for anyone who still needs
> it, so I think it's not entirely a breakage.

-- 
Regards,

Laurent Pinchart

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 09/17] mm: Add unsafe_follow_pfn
@ 2020-10-10 21:35                 ` Laurent Pinchart
  0 siblings, 0 replies; 214+ messages in thread
From: Laurent Pinchart @ 2020-10-10 21:35 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: linux-s390, linux-samsung-soc, Jan Kara, Kees Cook, KVM list,
	Jason Gunthorpe, Mauro Carvalho Chehab, Daniel Vetter, LKML,
	DRI Development, Linux MM, Jérôme Glisse, John Hubbard,
	Daniel Vetter, Dan Williams, Linus Torvalds, Andrew Morton,
	Linux ARM, open list:DMA BUFFER SHARING FRAMEWORK

Hi Tomasz,

On Sat, Oct 10, 2020 at 07:22:48PM +0200, Tomasz Figa wrote:
> On Fri, Oct 9, 2020 at 7:52 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> > On Fri, Oct 9, 2020 at 2:48 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > > On Fri, Oct 09, 2020 at 02:37:23PM +0200, Mauro Carvalho Chehab wrote:
> > >
> > > > I'm not a mm/ expert, but, from what I understood from Daniel's patch
> > > > description is that this is unsafe *only if*  __GFP_MOVABLE is used.
> > >
> > > No, it is unconditionally unsafe. The CMA movable mappings are
> > > specific VMAs that will have bad issues here, but there are other
> > > types too.
> > >
> > > The only way to do something at a VMA level is to have a list of OK
> > > VMAs, eg because they were creatd via a special mmap helper from the
> > > media subsystem.
> > >
> > > > Well, no drivers inside the media subsystem uses such flag, although
> > > > they may rely on some infrastructure that could be using it behind
> > > > the bars.
> > >
> > > It doesn't matter, nothing prevents the user from calling media APIs
> > > on mmaps it gets from other subsystems.
> >
> > I think a good first step would be to disable userptr of non struct
> > page backed storage going forward for any new hw support. Even on
> > existing drivers. dma-buf sharing has been around for long enough now
> > that this shouldn't be a problem. Unfortunately right now this doesn't
> > seem to exist, so the entire problem keeps getting perpetuated.
> >
> > > > If this is the case, the proper fix seems to have a GFP_NOT_MOVABLE
> > > > flag that it would be denying the core mm code to set __GFP_MOVABLE.
> > >
> > > We can't tell from the VMA these kinds of details..
> > >
> > > It has to go the other direction, evey mmap that might be used as a
> > > userptr here has to be found and the VMA specially created to allow
> > > its use. At least that is a kernel only change, but will need people
> > > with the HW to do this work.
> >
> > I think the only reasonable way to keep this working is:
> > - add a struct dma_buf *vma_tryget_dma_buf(struct vm_area_struct *vma);
> > - add dma-buf export support to fbdev and v4l
> 
> I assume you mean V4L2 and not the obsolete V4L that is emulated in
> the userspace by libv4l. If so, every video device that uses videobuf2
> gets DMA-buf export for free and there is nothing needed to enable it.
> 
> We probably still have a few legacy drivers using videobuf (non-2),
> but IMHO those should be safe to put behind some disabled-by-default
> Kconfig symbol or even completely drop, as the legacy framework has
> been deprecated for many years already.

There's 8 drivers left, and they support a very large number of devices.
I expect unhappy users distros stop shipping them. On the other hand,
videobuf has been deprecated for a loooooooong time, so there has been
plenty of time to convert the remaining drivers to videobuf2. If nobody
can do it, then we'll have to drop support for these devices given the
security issues.

We have moved media drivers to staging in the past when there wasn't
enough maintenance effort, we could do so here too.

> > - roll this out everywhere we still need it.
> >
> > Realistically this just isn't going to happen. And anything else just
> > reimplements half of dma-buf, which is kinda pointless (you need
> > minimally refcounting and some way to get at a promise of a permanent
> > sg list for dma. Plus probably the vmap for kernel cpu access.
> >
> > > > Please let address the issue on this way, instead of broken an
> > > > userspace API that it is there since 1991.
> > >
> > > It has happened before :( It took 4 years for RDMA to undo the uAPI
> > > breakage caused by a security fix for something that was a 15 years
> > > old bug.
> >
> > Yeah we have a bunch of these on the drm side too. Some of them are
> > really just "you have to upgrade userspace", and there's no real fix
> > for the security nightmare without that.
> 
> I think we need to phase out such userspace indeed. The Kconfig symbol
> allows enabling the unsafe functionality for anyone who still needs
> it, so I think it's not entirely a breakage.

-- 
Regards,

Laurent Pinchart
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 03/17] misc/habana: Stop using frame_vector helpers
  2020-10-10 21:32       ` Daniel Vetter
  (?)
@ 2020-10-10 21:41         ` Daniel Vetter
  -1 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-10 21:41 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: DRI Development, LKML, KVM list, linux-mm,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>,,
	linux-samsung-soc, Linux Media Mailing List, linux-s390,
	Daniel Vetter, Jason Gunthorpe, Andrew Morton, John Hubbard,
	Jérôme Glisse, Jan Kara, Dan Williams, Omer Shpigelman,
	Ofir Bitton, Tomer Tayar, Moti Haimovski, Greg Kroah-Hartman,
	Pawel Piskorski

On Sat, Oct 10, 2020 at 11:32 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
>
> On Sat, Oct 10, 2020 at 10:27 PM Oded Gabbay <oded.gabbay@gmail.com> wrote:
> >
> > On Fri, Oct 9, 2020 at 10:59 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> > >
> > > All we need are a pages array, pin_user_pages_fast can give us that
> > > directly. Plus this avoids the entire raw pfn side of get_vaddr_frames.
> > >
> > Thanks for the patch Daniel.
> >
> > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > > Cc: Jason Gunthorpe <jgg@ziepe.ca>
> > > Cc: Andrew Morton <akpm@linux-foundation.org>
> > > Cc: John Hubbard <jhubbard@nvidia.com>
> > > Cc: Jérôme Glisse <jglisse@redhat.com>
> > > Cc: Jan Kara <jack@suse.cz>
> > > Cc: Dan Williams <dan.j.williams@intel.com>
> > > Cc: linux-mm@kvack.org
> > > Cc: linux-arm-kernel@lists.infradead.org
> > > Cc: linux-samsung-soc@vger.kernel.org
> > > Cc: linux-media@vger.kernel.org
> > > Cc: Oded Gabbay <oded.gabbay@gmail.com>
> > > Cc: Omer Shpigelman <oshpigelman@habana.ai>
> > > Cc: Ofir Bitton <obitton@habana.ai>
> > > Cc: Tomer Tayar <ttayar@habana.ai>
> > > Cc: Moti Haimovski <mhaimovski@habana.ai>
> > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > > Cc: Pawel Piskorski <ppiskorski@habana.ai>
> > > --
> > > v2: Use unpin_user_pages_dirty_lock (John)
> > > ---
> > >  drivers/misc/habanalabs/Kconfig             |  1 -
> > >  drivers/misc/habanalabs/common/habanalabs.h |  3 +-
> > >  drivers/misc/habanalabs/common/memory.c     | 49 ++++++++-------------
> > >  3 files changed, 20 insertions(+), 33 deletions(-)
> > >
> > > diff --git a/drivers/misc/habanalabs/Kconfig b/drivers/misc/habanalabs/Kconfig
> > > index 8eb5d38c618e..2f04187f7167 100644
> > > --- a/drivers/misc/habanalabs/Kconfig
> > > +++ b/drivers/misc/habanalabs/Kconfig
> > > @@ -6,7 +6,6 @@
> > >  config HABANA_AI
> > >         tristate "HabanaAI accelerators (habanalabs)"
> > >         depends on PCI && HAS_IOMEM
> > > -       select FRAME_VECTOR
> > >         select DMA_SHARED_BUFFER
> > >         select GENERIC_ALLOCATOR
> > >         select HWMON
> > > diff --git a/drivers/misc/habanalabs/common/habanalabs.h b/drivers/misc/habanalabs/common/habanalabs.h
> > > index edbd627b29d2..c1b3ad613b15 100644
> > > --- a/drivers/misc/habanalabs/common/habanalabs.h
> > > +++ b/drivers/misc/habanalabs/common/habanalabs.h
> > > @@ -881,7 +881,8 @@ struct hl_ctx_mgr {
> > >  struct hl_userptr {
> > >         enum vm_type_t          vm_type; /* must be first */
> > >         struct list_head        job_node;
> > > -       struct frame_vector     *vec;
> > > +       struct page             **pages;
> > > +       unsigned int            npages;
> > Can you please update the kerneldoc comment section of this structure
> > according to your changes ?
>
> Apologies I missed the nice kerneldoc. I'll fix that in the next round.
>
>
> > >         struct sg_table         *sgt;
> > >         enum dma_data_direction dir;
> > >         struct list_head        debugfs_list;
> > > diff --git a/drivers/misc/habanalabs/common/memory.c b/drivers/misc/habanalabs/common/memory.c
> > > index 5ff4688683fd..327b64479f97 100644
> > > --- a/drivers/misc/habanalabs/common/memory.c
> > > +++ b/drivers/misc/habanalabs/common/memory.c
> > > @@ -1281,45 +1281,41 @@ static int get_user_memory(struct hl_device *hdev, u64 addr, u64 size,
> > >                 return -EFAULT;
> > >         }
> > >
> > > -       userptr->vec = frame_vector_create(npages);
> > > -       if (!userptr->vec) {
> > > +       userptr->pages = kvmalloc_array(npages, sizeof(*userptr->pages),
> > > +                                       GFP_KERNEL);
> > > +       if (!userptr->pages) {
> > >                 dev_err(hdev->dev, "Failed to create frame vector\n");
> > >                 return -ENOMEM;
> > >         }
> > >
> > > -       rc = get_vaddr_frames(start, npages, FOLL_FORCE | FOLL_WRITE,
> > > -                               userptr->vec);
> > > +       rc = pin_user_pages_fast(start, npages, FOLL_FORCE | FOLL_WRITE,
> > > +                                userptr->pages);
> > >
> > >         if (rc != npages) {
> > >                 dev_err(hdev->dev,
> > >                         "Failed to map host memory, user ptr probably wrong\n");
> > >                 if (rc < 0)
> > > -                       goto destroy_framevec;
> > > +                       goto destroy_pages;
> > > +               npages = rc;
> > >                 rc = -EFAULT;
> > > -               goto put_framevec;
> > > -       }
> > > -
> > > -       if (frame_vector_to_pages(userptr->vec) < 0) {
> > > -               dev_err(hdev->dev,
> > > -                       "Failed to translate frame vector to pages\n");
> > > -               rc = -EFAULT;
> > > -               goto put_framevec;
> > > +               goto put_pages;
> > >         }
> > > +       userptr->npages = npages;
> > >
> > >         rc = sg_alloc_table_from_pages(userptr->sgt,
> > > -                                       frame_vector_pages(userptr->vec),
> > > -                                       npages, offset, size, GFP_ATOMIC);
> > > +                                      userptr->pages,
> > > +                                      npages, offset, size, GFP_ATOMIC);
> > I think that because the call to kvmalloc_array() is done with
> > GFP_KERNEL, there is no point in using GFP_ATOMIC here.
> > And actually, this path only needs to avoid yielding when using a
> > special debug mode.
> > So I suggest putting here GFP_KERNEL.
>
> Huh, I didn't even notice the GFP_ATOMIC here. This looks indeed
> strange and GFP_KERNEL should be perfectly fine in a function that
> also calls pin_user_pages (since that one can allocate and do worse
> stuff like userspace pagefaults).
>
> But since that GFP_ATOMIC is there already I'll do that in a separate patch.

Ok I read up on your usage of GFP_ATOMIC in habanalabs, and I'm not
going to touch this. But I'm pretty sure it's broken.

You seem to have some requirement of not allocating memory with
blocking (see hl_cb_alloc()), and that seems to be way you allocate
tons of structures with GFP_ATOMIC. There's 2 pretty tough problems
with that:
- GFP_ATOMIC can fail, even when the system hasn't run out of memory
yet. You _must_ have a fallback back to handle allocation failures for
these. Quick survey shows you a ton of GFP_ATOMIC callsites, and very
little fallback code - I've found none, but I didn't check the failure
handlers all going up the possible callchains.
- pin_user_pages can allocate memory, so you're breaking your own "no
sleeping in these paths" rules.

This isn't going to get fixed with a quick oneliner patch, depending
what's needed you're looking at a driver rearchitecture here :-/ Hence
I'm not going to touch this in the next patch, but leave it all as-is.

Cheers, Daniel

>
> > In the meanwhile, I'll run this patch (coupled with the next patch) in
> > our C/I to make sure there are no regressions.
>
> Excellent. I'll wait with v3 until that's done, just in case you hit a
> snag I need to fix.
>
> Cheers, Daniel
>
> > Thanks,
> > Oded
> >
> > >         if (rc < 0) {
> > >                 dev_err(hdev->dev, "failed to create SG table from pages\n");
> > > -               goto put_framevec;
> > > +               goto put_pages;
> > >         }
> > >
> > >         return 0;
> > >
> > > -put_framevec:
> > > -       put_vaddr_frames(userptr->vec);
> > > -destroy_framevec:
> > > -       frame_vector_destroy(userptr->vec);
> > > +put_pages:
> > > +       unpin_user_pages(userptr->pages, npages);
> > > +destroy_pages:
> > > +       kvfree(userptr->pages);
> > >         return rc;
> > >  }
> > >
> > > @@ -1405,8 +1401,6 @@ int hl_pin_host_memory(struct hl_device *hdev, u64 addr, u64 size,
> > >   */
> > >  void hl_unpin_host_memory(struct hl_device *hdev, struct hl_userptr *userptr)
> > >  {
> > > -       struct page **pages;
> > > -
> > >         hl_debugfs_remove_userptr(hdev, userptr);
> > >
> > >         if (userptr->dma_mapped)
> > > @@ -1414,15 +1408,8 @@ void hl_unpin_host_memory(struct hl_device *hdev, struct hl_userptr *userptr)
> > >                                                         userptr->sgt->nents,
> > >                                                         userptr->dir);
> > >
> > > -       pages = frame_vector_pages(userptr->vec);
> > > -       if (!IS_ERR(pages)) {
> > > -               int i;
> > > -
> > > -               for (i = 0; i < frame_vector_count(userptr->vec); i++)
> > > -                       set_page_dirty_lock(pages[i]);
> > > -       }
> > > -       put_vaddr_frames(userptr->vec);
> > > -       frame_vector_destroy(userptr->vec);
> > > +       unpin_user_pages_dirty_lock(userptr->pages, userptr->npages, true);
> > > +       kvfree(userptr->pages);
> > >
> > >         list_del(&userptr->job_node);
> > >
> > > --
> > > 2.28.0
> > >
>
>
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 03/17] misc/habana: Stop using frame_vector helpers
@ 2020-10-10 21:41         ` Daniel Vetter
  0 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-10 21:41 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: linux-s390, linux-samsung-soc, Jan Kara, KVM list,
	Jason Gunthorpe, Pawel Piskorski, John Hubbard, LKML,
	DRI Development, Ofir Bitton, linux-mm, Jérôme Glisse,
	Tomer Tayar, Omer Shpigelman, Greg Kroah-Hartman, Daniel Vetter,
	Andrew Morton, Moti Haimovski, Dan Williams,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>, ,
	Linux Media Mailing List

On Sat, Oct 10, 2020 at 11:32 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
>
> On Sat, Oct 10, 2020 at 10:27 PM Oded Gabbay <oded.gabbay@gmail.com> wrote:
> >
> > On Fri, Oct 9, 2020 at 10:59 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> > >
> > > All we need are a pages array, pin_user_pages_fast can give us that
> > > directly. Plus this avoids the entire raw pfn side of get_vaddr_frames.
> > >
> > Thanks for the patch Daniel.
> >
> > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > > Cc: Jason Gunthorpe <jgg@ziepe.ca>
> > > Cc: Andrew Morton <akpm@linux-foundation.org>
> > > Cc: John Hubbard <jhubbard@nvidia.com>
> > > Cc: Jérôme Glisse <jglisse@redhat.com>
> > > Cc: Jan Kara <jack@suse.cz>
> > > Cc: Dan Williams <dan.j.williams@intel.com>
> > > Cc: linux-mm@kvack.org
> > > Cc: linux-arm-kernel@lists.infradead.org
> > > Cc: linux-samsung-soc@vger.kernel.org
> > > Cc: linux-media@vger.kernel.org
> > > Cc: Oded Gabbay <oded.gabbay@gmail.com>
> > > Cc: Omer Shpigelman <oshpigelman@habana.ai>
> > > Cc: Ofir Bitton <obitton@habana.ai>
> > > Cc: Tomer Tayar <ttayar@habana.ai>
> > > Cc: Moti Haimovski <mhaimovski@habana.ai>
> > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > > Cc: Pawel Piskorski <ppiskorski@habana.ai>
> > > --
> > > v2: Use unpin_user_pages_dirty_lock (John)
> > > ---
> > >  drivers/misc/habanalabs/Kconfig             |  1 -
> > >  drivers/misc/habanalabs/common/habanalabs.h |  3 +-
> > >  drivers/misc/habanalabs/common/memory.c     | 49 ++++++++-------------
> > >  3 files changed, 20 insertions(+), 33 deletions(-)
> > >
> > > diff --git a/drivers/misc/habanalabs/Kconfig b/drivers/misc/habanalabs/Kconfig
> > > index 8eb5d38c618e..2f04187f7167 100644
> > > --- a/drivers/misc/habanalabs/Kconfig
> > > +++ b/drivers/misc/habanalabs/Kconfig
> > > @@ -6,7 +6,6 @@
> > >  config HABANA_AI
> > >         tristate "HabanaAI accelerators (habanalabs)"
> > >         depends on PCI && HAS_IOMEM
> > > -       select FRAME_VECTOR
> > >         select DMA_SHARED_BUFFER
> > >         select GENERIC_ALLOCATOR
> > >         select HWMON
> > > diff --git a/drivers/misc/habanalabs/common/habanalabs.h b/drivers/misc/habanalabs/common/habanalabs.h
> > > index edbd627b29d2..c1b3ad613b15 100644
> > > --- a/drivers/misc/habanalabs/common/habanalabs.h
> > > +++ b/drivers/misc/habanalabs/common/habanalabs.h
> > > @@ -881,7 +881,8 @@ struct hl_ctx_mgr {
> > >  struct hl_userptr {
> > >         enum vm_type_t          vm_type; /* must be first */
> > >         struct list_head        job_node;
> > > -       struct frame_vector     *vec;
> > > +       struct page             **pages;
> > > +       unsigned int            npages;
> > Can you please update the kerneldoc comment section of this structure
> > according to your changes ?
>
> Apologies I missed the nice kerneldoc. I'll fix that in the next round.
>
>
> > >         struct sg_table         *sgt;
> > >         enum dma_data_direction dir;
> > >         struct list_head        debugfs_list;
> > > diff --git a/drivers/misc/habanalabs/common/memory.c b/drivers/misc/habanalabs/common/memory.c
> > > index 5ff4688683fd..327b64479f97 100644
> > > --- a/drivers/misc/habanalabs/common/memory.c
> > > +++ b/drivers/misc/habanalabs/common/memory.c
> > > @@ -1281,45 +1281,41 @@ static int get_user_memory(struct hl_device *hdev, u64 addr, u64 size,
> > >                 return -EFAULT;
> > >         }
> > >
> > > -       userptr->vec = frame_vector_create(npages);
> > > -       if (!userptr->vec) {
> > > +       userptr->pages = kvmalloc_array(npages, sizeof(*userptr->pages),
> > > +                                       GFP_KERNEL);
> > > +       if (!userptr->pages) {
> > >                 dev_err(hdev->dev, "Failed to create frame vector\n");
> > >                 return -ENOMEM;
> > >         }
> > >
> > > -       rc = get_vaddr_frames(start, npages, FOLL_FORCE | FOLL_WRITE,
> > > -                               userptr->vec);
> > > +       rc = pin_user_pages_fast(start, npages, FOLL_FORCE | FOLL_WRITE,
> > > +                                userptr->pages);
> > >
> > >         if (rc != npages) {
> > >                 dev_err(hdev->dev,
> > >                         "Failed to map host memory, user ptr probably wrong\n");
> > >                 if (rc < 0)
> > > -                       goto destroy_framevec;
> > > +                       goto destroy_pages;
> > > +               npages = rc;
> > >                 rc = -EFAULT;
> > > -               goto put_framevec;
> > > -       }
> > > -
> > > -       if (frame_vector_to_pages(userptr->vec) < 0) {
> > > -               dev_err(hdev->dev,
> > > -                       "Failed to translate frame vector to pages\n");
> > > -               rc = -EFAULT;
> > > -               goto put_framevec;
> > > +               goto put_pages;
> > >         }
> > > +       userptr->npages = npages;
> > >
> > >         rc = sg_alloc_table_from_pages(userptr->sgt,
> > > -                                       frame_vector_pages(userptr->vec),
> > > -                                       npages, offset, size, GFP_ATOMIC);
> > > +                                      userptr->pages,
> > > +                                      npages, offset, size, GFP_ATOMIC);
> > I think that because the call to kvmalloc_array() is done with
> > GFP_KERNEL, there is no point in using GFP_ATOMIC here.
> > And actually, this path only needs to avoid yielding when using a
> > special debug mode.
> > So I suggest putting here GFP_KERNEL.
>
> Huh, I didn't even notice the GFP_ATOMIC here. This looks indeed
> strange and GFP_KERNEL should be perfectly fine in a function that
> also calls pin_user_pages (since that one can allocate and do worse
> stuff like userspace pagefaults).
>
> But since that GFP_ATOMIC is there already I'll do that in a separate patch.

Ok I read up on your usage of GFP_ATOMIC in habanalabs, and I'm not
going to touch this. But I'm pretty sure it's broken.

You seem to have some requirement of not allocating memory with
blocking (see hl_cb_alloc()), and that seems to be way you allocate
tons of structures with GFP_ATOMIC. There's 2 pretty tough problems
with that:
- GFP_ATOMIC can fail, even when the system hasn't run out of memory
yet. You _must_ have a fallback back to handle allocation failures for
these. Quick survey shows you a ton of GFP_ATOMIC callsites, and very
little fallback code - I've found none, but I didn't check the failure
handlers all going up the possible callchains.
- pin_user_pages can allocate memory, so you're breaking your own "no
sleeping in these paths" rules.

This isn't going to get fixed with a quick oneliner patch, depending
what's needed you're looking at a driver rearchitecture here :-/ Hence
I'm not going to touch this in the next patch, but leave it all as-is.

Cheers, Daniel

>
> > In the meanwhile, I'll run this patch (coupled with the next patch) in
> > our C/I to make sure there are no regressions.
>
> Excellent. I'll wait with v3 until that's done, just in case you hit a
> snag I need to fix.
>
> Cheers, Daniel
>
> > Thanks,
> > Oded
> >
> > >         if (rc < 0) {
> > >                 dev_err(hdev->dev, "failed to create SG table from pages\n");
> > > -               goto put_framevec;
> > > +               goto put_pages;
> > >         }
> > >
> > >         return 0;
> > >
> > > -put_framevec:
> > > -       put_vaddr_frames(userptr->vec);
> > > -destroy_framevec:
> > > -       frame_vector_destroy(userptr->vec);
> > > +put_pages:
> > > +       unpin_user_pages(userptr->pages, npages);
> > > +destroy_pages:
> > > +       kvfree(userptr->pages);
> > >         return rc;
> > >  }
> > >
> > > @@ -1405,8 +1401,6 @@ int hl_pin_host_memory(struct hl_device *hdev, u64 addr, u64 size,
> > >   */
> > >  void hl_unpin_host_memory(struct hl_device *hdev, struct hl_userptr *userptr)
> > >  {
> > > -       struct page **pages;
> > > -
> > >         hl_debugfs_remove_userptr(hdev, userptr);
> > >
> > >         if (userptr->dma_mapped)
> > > @@ -1414,15 +1408,8 @@ void hl_unpin_host_memory(struct hl_device *hdev, struct hl_userptr *userptr)
> > >                                                         userptr->sgt->nents,
> > >                                                         userptr->dir);
> > >
> > > -       pages = frame_vector_pages(userptr->vec);
> > > -       if (!IS_ERR(pages)) {
> > > -               int i;
> > > -
> > > -               for (i = 0; i < frame_vector_count(userptr->vec); i++)
> > > -                       set_page_dirty_lock(pages[i]);
> > > -       }
> > > -       put_vaddr_frames(userptr->vec);
> > > -       frame_vector_destroy(userptr->vec);
> > > +       unpin_user_pages_dirty_lock(userptr->pages, userptr->npages, true);
> > > +       kvfree(userptr->pages);
> > >
> > >         list_del(&userptr->job_node);
> > >
> > > --
> > > 2.28.0
> > >
>
>
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 214+ messages in thread

* Re: [PATCH v2 03/17] misc/habana: Stop using frame_vector helpers
@ 2020-10-10 21:41         ` Daniel Vetter
  0 siblings, 0 replies; 214+ messages in thread
From: Daniel Vetter @ 2020-10-10 21:41 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: linux-s390, linux-samsung-soc, Jan Kara, KVM list,
	Jason Gunthorpe, Pawel Piskorski, John Hubbard, LKML,
	DRI Development, Ofir Bitton, linux-mm, Jérôme Glisse,
	Tomer Tayar, Omer Shpigelman, Greg Kroah-Hartman, Daniel Vetter,
	Andrew Morton, Moti Haimovski, Dan Williams,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>, ,
	Linux Media Mailing List

On Sat, Oct 10, 2020 at 11:32 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
>
> On Sat, Oct 10, 2020 at 10:27 PM Oded Gabbay <oded.gabbay@gmail.com> wrote:
> >
> > On Fri, Oct 9, 2020 at 10:59 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> > >
> > > All we need are a pages array, pin_user_pages_fast can give us that
> > > directly. Plus this avoids the entire raw pfn side of get_vaddr_frames.
> > >
> > Thanks for the patch Daniel.
> >
> > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > > Cc: Jason Gunthorpe <jgg@ziepe.ca>
> > > Cc: Andrew Morton <akpm@linux-foundation.org>
> > > Cc: John Hubbard <jhubbard@nvidia.com>
> > > Cc: Jérôme Glisse <jglisse@redhat.com>
> > > Cc: Jan Kara <jack@suse.cz>
> > > Cc: Dan Williams <dan.j.williams@intel.com>
> > > Cc: linux-mm@kvack.org
> > > Cc: linux-arm-kernel@lists.infradead.org
> > > Cc: linux-samsung-soc@vger.kernel.org
> > > Cc: linux-media@vger.kernel.org
> > > Cc: Oded Gabbay <oded.gabbay@gmail.com>
> > > Cc: Omer Shpigelman <oshpigelman@habana.ai>
> > > Cc: Ofir Bitton <obitton@habana.ai>
> > > Cc: Tomer Tayar <ttayar@habana.ai>
> > > Cc: Moti Haimovski <mhaimovski@habana.ai>
> > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > > Cc: Pawel Piskorski <ppiskorski@habana.ai>
> > > --
> > > v2: Use unpin_user_pages_dirty_lock (John)
> > > ---
> > >  drivers/misc/habanalabs/Kconfig             |  1 -
> > >  drivers/misc/habanalabs/common/habanalabs.h |  3 +-
> > >  drivers/misc/habanalabs/common/memory.c     | 49 ++++++++-------------
> > >  3 files changed, 20 insertions(+), 33 deletions(-)
> > >
> > > diff --git a/drivers/misc/habanalabs/Kconfig b/drivers/misc/habanalabs/Kconfig
> > > index 8eb5d38c618e..2f04187f7167 100644
> > > --- a/drivers/misc/habanalabs/Kconfig
> > > +++ b/drivers/misc/habanalabs/Kconfig
> > > @@ -6,7 +6,6 @@
> > >  config HABANA_AI
> > >         tristate "HabanaAI accelerators (habanalabs)"
> > >         depends on PCI && HAS_IOMEM
> > > -       select FRAME_VECTOR
> > >         select DMA_SHARED_BUFFER
> > >         select GENERIC_ALLOCATOR
> > >         select HWMON
> > > diff --git a/drivers/misc/habanalabs/common/habanalabs.h b/drivers/misc/habanalabs/common/habanalabs.h
> > > index edbd627b29d2..c1b3ad613b15 100644
> > > --- a/drivers/misc/habanalabs/common/habanalabs.h
> > > +++ b/drivers/misc/habanalabs/common/habanalabs.h
> > > @@ -881,7 +881,8 @@ struct hl_ctx_mgr {
> > >  struct hl_userptr {
> > >         enum vm_type_t          vm_type; /* must be first */
> > >         struct list_head        job_node;
> > > -       struct frame_vector     *vec;
> > > +       struct page             **pages;
> > > +       unsigned int            npages;
> > Can you please update the kerneldoc comment section of this structure
> > according to your changes ?
>
> Apologies I missed the nice kerneldoc. I'll fix that in the next round.
>
>
> > >         struct sg_table         *sgt;
> > >         enum dma_data_direction dir;
> > >         struct list_head        debugfs_list;
> > > diff --git a/drivers/misc/habanalabs/common/memory.c b/drivers/misc/habanalabs/common/memory.c
> > > index 5ff4688683fd..327b64479f97 100644
> > > --- a/drivers/misc/habanalabs/common/memory.c
> > > +++ b/drivers/misc/habanalabs/common/memory.c
> > > @@ -1281,45 +1281,41 @@ static int get_user_memory(struct hl_device *hdev, u64 addr, u64 size,
> > >                 return -EFAULT;
> > >         }
> > >
> > > -       userptr->vec = frame_vector_create(npages);
> > > -       if (!userptr->vec) {
> > > +       userptr->pages = kvmalloc_array(npages, sizeof(*userptr->pages),
> > > +                                       GFP_KERNEL);
> > > +       if (!userptr->pages) {
> > >                 dev_err(hdev->dev, "Failed to create frame vector\n");
> > >                 return -ENOMEM;
> > >         }
> > >
> > > -       rc = get_vaddr_frames(start, npages, FOLL_FORCE | FOLL_WRITE,
> > > -                               userptr->vec);
> > > +       rc = pin_user_pages_fast(start, npages, FOLL_FORCE | FOLL_WRITE,
> > > +                                userptr->pages);
> > >
> > >         if (rc != npages) {
> > >                 dev_err(hdev->dev,
> > >                         "Failed to map host memory, user ptr probably wrong\n");
> > >                 if (rc < 0)
> > > -                       goto destroy_framevec;
> > > +                       goto destroy_pages;
> > > +               npages = rc;
> > >                 rc = -EFAULT;
> > > -               goto put_framevec;
> > > -       }
> > > -
> > > -       if (frame_vector_to_pages(userptr->vec) < 0) {
> > > -               dev_err(hdev->dev,
> > > -                       "Failed to translate frame vector to pages\n");
> > > -               rc = -EFAULT;
> > > -               goto put_framevec;
> > > +               goto put_pages;
> > >         }
> > > +       userptr->npages = npages;
> > >
> > >         rc = sg_alloc_table_from_pages(userptr->sgt,
> > > -                                       frame_vector_pages(userptr->vec),
> > > -                                       npages, offset, size, GFP_ATOMIC);
> > > +                                      userptr->pages,
> > > +                                      npages, offset, size, GFP_ATOMIC);
> > I think that because the call to kvmalloc_array() is done with
> > GFP_KERNEL, there is no point in using GFP_ATOMIC here.
> > And actually, this path only needs to avoid yielding when using a
> > special debug mode.
> > So I suggest putting here GFP_KERNEL.
>
> Huh, I didn't even notice the GFP_ATOMIC here. This looks indeed
> strange and GFP_KERNEL should be perfectly fine in a function that
> also calls pin_user_pages (since that one can allocate and do worse
> stuff like userspace pagefaults).
>
> But since that GFP_ATOMIC is there already I'll do that in a separate patch.

Ok I read up on your usage of GFP_ATOMIC in habanalabs, and I'm not
going to touch this. But I'm pretty sure it's broken.

You seem to have some requirement of not allocating memory with
blocking (see hl_cb_alloc()), and that seems to be way you allocate
tons of structures with GFP_ATOMIC. There's 2 pretty tough problems
with that:
- GFP_ATOMIC can fail, even when the system hasn't run out of memory
yet. You _must_ have a fallback back to handle allocation failures for
these. Quick survey shows you a ton of GFP_ATOMIC callsites, and very
little fallback code - I've found none, but I didn't check the failure
handlers all going up the possible callchains.
- pin_user_pages can allocate memory, so you're breaking your own "no
sleeping in these paths" rules.

This isn't going to get fixed with a quick oneliner patch, depending
what's needed you're looking at a driver rearchitecture here :-/ Hence
I'm not going to touch this in the next patch, but leave it all as-is.

Cheers, Daniel

>
> > In the meanwhile, I'll run this patch (coupled with the next patch) in
> > our C/I to make sure there are no regressions.
>
> Excellent. I'll wait with v3 until that's done, just in case you hit a
> snag I need to fix.
>
> Cheers, Daniel
>
> > Thanks,
> > Oded
> >
> > >         if (rc < 0) {
> > >                 dev_err(hdev->dev, "failed to create SG table from pages\n");
> > > -               goto put_framevec;
> > > +               goto put_pages;
> > >         }
&