All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC 0/3] MAP_POPULATE for device memory
@ 2022-03-06  5:32 ` Jarkko Sakkinen
  0 siblings, 0 replies; 68+ messages in thread
From: Jarkko Sakkinen @ 2022-03-06  5:32 UTC (permalink / raw)
  To: linux-mm
  Cc: Dave Hansen, Nathaniel McCallum, Reinette Chatre, Andrew Morton,
	linux-sgx, linux-kernel, Jarkko Sakkinen, Florian Fainelli,
	Thomas Bogendoerfer, Matthew Auld, Thomas Hellström,
	Daniel Vetter, Jason Ekstrand, Chris Wilson, Maarten Lankhorst,
	Greg Kroah-Hartman, Tvrtko Ursulin, Vasily Averin, Shakeel Butt,
	Michal Hocko, zhangyiru, Alexey Gladkov, Alexander Mikhalitsyn,
	linux-mips, intel-gfx, dri-devel, codalist, linux-unionfs,
	linux-fsdevel

For device memory (aka VM_IO | VM_PFNMAP) MAP_POPULATE does nothing. Allow
to use that for initializing the device memory by providing a new callback
f_ops->populate() for the purpose.

SGX patches are provided to show the callback in context.

An obvious alternative is a ioctl but it is less elegant and requires
two syscalls (mmap + ioctl) per memory range, instead of just one
(mmap).

Jarkko Sakkinen (3):
  mm: Add f_ops->populate()
  x86/sgx: Export sgx_encl_page_alloc()
  x86/sgx: Implement EAUG population with MAP_POPULATE

 arch/mips/kernel/vdso.c                    |   2 +-
 arch/x86/kernel/cpu/sgx/driver.c           | 129 +++++++++++++++++++++
 arch/x86/kernel/cpu/sgx/encl.c             |  38 ++++++
 arch/x86/kernel/cpu/sgx/encl.h             |   3 +
 arch/x86/kernel/cpu/sgx/ioctl.c            |  38 ------
 drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c |   2 +-
 fs/coda/file.c                             |   2 +-
 fs/overlayfs/file.c                        |   2 +-
 include/linux/fs.h                         |  12 +-
 include/linux/mm.h                         |   2 +-
 ipc/shm.c                                  |   2 +-
 mm/mmap.c                                  |  10 +-
 mm/nommu.c                                 |   4 +-
 13 files changed, 193 insertions(+), 53 deletions(-)

-- 
2.35.1


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH RFC 0/3] MAP_POPULATE for device memory
@ 2022-03-06  5:32 ` Jarkko Sakkinen
  0 siblings, 0 replies; 68+ messages in thread
From: Jarkko Sakkinen @ 2022-03-06  5:32 UTC (permalink / raw)
  To: linux-mm
  Cc: Michal Hocko, zhangyiru, Daniel Vetter, Dave Hansen, dri-devel,
	Chris Wilson, Jason Ekstrand, Alexander Mikhalitsyn,
	Florian Fainelli, linux-unionfs, codalist, Matthew Auld,
	Vasily Averin, Thomas Hellström, intel-gfx, linux-mips,
	Shakeel Butt, Reinette Chatre, linux-sgx, Thomas Bogendoerfer,
	Nathaniel McCallum, Greg Kroah-Hartman, Tvrtko Ursulin,
	linux-kernel, Jarkko Sakkinen, linux-fsdevel, Andrew Morton,
	Alexey Gladkov

For device memory (aka VM_IO | VM_PFNMAP) MAP_POPULATE does nothing. Allow
to use that for initializing the device memory by providing a new callback
f_ops->populate() for the purpose.

SGX patches are provided to show the callback in context.

An obvious alternative is a ioctl but it is less elegant and requires
two syscalls (mmap + ioctl) per memory range, instead of just one
(mmap).

Jarkko Sakkinen (3):
  mm: Add f_ops->populate()
  x86/sgx: Export sgx_encl_page_alloc()
  x86/sgx: Implement EAUG population with MAP_POPULATE

 arch/mips/kernel/vdso.c                    |   2 +-
 arch/x86/kernel/cpu/sgx/driver.c           | 129 +++++++++++++++++++++
 arch/x86/kernel/cpu/sgx/encl.c             |  38 ++++++
 arch/x86/kernel/cpu/sgx/encl.h             |   3 +
 arch/x86/kernel/cpu/sgx/ioctl.c            |  38 ------
 drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c |   2 +-
 fs/coda/file.c                             |   2 +-
 fs/overlayfs/file.c                        |   2 +-
 include/linux/fs.h                         |  12 +-
 include/linux/mm.h                         |   2 +-
 ipc/shm.c                                  |   2 +-
 mm/mmap.c                                  |  10 +-
 mm/nommu.c                                 |   4 +-
 13 files changed, 193 insertions(+), 53 deletions(-)

-- 
2.35.1


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Intel-gfx] [PATCH RFC 0/3] MAP_POPULATE for device memory
@ 2022-03-06  5:32 ` Jarkko Sakkinen
  0 siblings, 0 replies; 68+ messages in thread
From: Jarkko Sakkinen @ 2022-03-06  5:32 UTC (permalink / raw)
  To: linux-mm
  Cc: Michal Hocko, zhangyiru, Daniel Vetter, Dave Hansen, dri-devel,
	Chris Wilson, Alexander Mikhalitsyn, Florian Fainelli,
	linux-unionfs, codalist, Matthew Auld, Vasily Averin,
	Thomas Hellström, intel-gfx, linux-mips, Shakeel Butt,
	Reinette Chatre, linux-sgx, Thomas Bogendoerfer,
	Nathaniel McCallum, Greg Kroah-Hartman, linux-kernel,
	Jarkko Sakkinen, linux-fsdevel, Andrew Morton, Alexey Gladkov

For device memory (aka VM_IO | VM_PFNMAP) MAP_POPULATE does nothing. Allow
to use that for initializing the device memory by providing a new callback
f_ops->populate() for the purpose.

SGX patches are provided to show the callback in context.

An obvious alternative is a ioctl but it is less elegant and requires
two syscalls (mmap + ioctl) per memory range, instead of just one
(mmap).

Jarkko Sakkinen (3):
  mm: Add f_ops->populate()
  x86/sgx: Export sgx_encl_page_alloc()
  x86/sgx: Implement EAUG population with MAP_POPULATE

 arch/mips/kernel/vdso.c                    |   2 +-
 arch/x86/kernel/cpu/sgx/driver.c           | 129 +++++++++++++++++++++
 arch/x86/kernel/cpu/sgx/encl.c             |  38 ++++++
 arch/x86/kernel/cpu/sgx/encl.h             |   3 +
 arch/x86/kernel/cpu/sgx/ioctl.c            |  38 ------
 drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c |   2 +-
 fs/coda/file.c                             |   2 +-
 fs/overlayfs/file.c                        |   2 +-
 include/linux/fs.h                         |  12 +-
 include/linux/mm.h                         |   2 +-
 ipc/shm.c                                  |   2 +-
 mm/mmap.c                                  |  10 +-
 mm/nommu.c                                 |   4 +-
 13 files changed, 193 insertions(+), 53 deletions(-)

-- 
2.35.1


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH RFC 1/3] mm: Add f_ops->populate()
  2022-03-06  5:32 ` Jarkko Sakkinen
  (?)
@ 2022-03-06  5:32   ` Jarkko Sakkinen
  -1 siblings, 0 replies; 68+ messages in thread
From: Jarkko Sakkinen @ 2022-03-06  5:32 UTC (permalink / raw)
  To: linux-mm
  Cc: Dave Hansen, Nathaniel McCallum, Reinette Chatre, Andrew Morton,
	linux-sgx, linux-kernel, Jarkko Sakkinen, Florian Fainelli,
	Thomas Bogendoerfer, Matthew Auld, Thomas Hellström,
	Daniel Vetter, Jason Ekstrand, Chris Wilson, Maarten Lankhorst,
	Tvrtko Ursulin, Greg Kroah-Hartman, Shakeel Butt, Vasily Averin,
	zhangyiru, Alexander Mikhalitsyn, Alexey Gladkov, linux-mips,
	intel-gfx, dri-devel, codalist, linux-unionfs, linux-fsdevel

Sometimes you might want to use MAP_POPULATE to ask a device driver to
initialize the device memory in some specific manner. SGX driver can use
this to request more memory by issuing ENCLS[EAUG] x86 opcode for each
page in the address range.

Add f_ops->populate() with the same parameters as f_ops->mmap() and make
it conditionally called inside call_mmap(). Update call sites
accodingly.
---
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
v3:
-       if (!ret && do_populate && file->f_op->populate)
+       if (!ret && do_populate && file->f_op->populate &&
+           !!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
(reported by Matthew Wilcox)
v2:
-       if (!ret && do_populate)
+       if (!ret && do_populate && file->f_op->populate)
(reported by Jan Harkes)
---
 arch/mips/kernel/vdso.c                    |  2 +-
 drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c |  2 +-
 fs/coda/file.c                             |  2 +-
 fs/overlayfs/file.c                        |  2 +-
 include/linux/fs.h                         | 12 ++++++++++--
 include/linux/mm.h                         |  2 +-
 ipc/shm.c                                  |  2 +-
 mm/mmap.c                                  | 10 +++++-----
 mm/nommu.c                                 |  4 ++--
 9 files changed, 23 insertions(+), 15 deletions(-)

diff --git a/arch/mips/kernel/vdso.c b/arch/mips/kernel/vdso.c
index 3d0cf471f2fe..89f3f3da9abd 100644
--- a/arch/mips/kernel/vdso.c
+++ b/arch/mips/kernel/vdso.c
@@ -102,7 +102,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
 		base = mmap_region(NULL, STACK_TOP, PAGE_SIZE,
 				VM_READ | VM_EXEC |
 				VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC,
-				0, NULL);
+				0, NULL, false);
 		if (IS_ERR_VALUE(base)) {
 			ret = base;
 			goto out;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
index 1b526039a60d..4c71f64d6a79 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
@@ -107,7 +107,7 @@ static int i915_gem_dmabuf_mmap(struct dma_buf *dma_buf, struct vm_area_struct *
 	if (!obj->base.filp)
 		return -ENODEV;
 
-	ret = call_mmap(obj->base.filp, vma);
+	ret = call_mmap(obj->base.filp, vma, false);
 	if (ret)
 		return ret;
 
diff --git a/fs/coda/file.c b/fs/coda/file.c
index 29dd87be2fb8..e14f312fdbf8 100644
--- a/fs/coda/file.c
+++ b/fs/coda/file.c
@@ -173,7 +173,7 @@ coda_file_mmap(struct file *coda_file, struct vm_area_struct *vma)
 	spin_unlock(&cii->c_lock);
 
 	vma->vm_file = get_file(host_file);
-	ret = call_mmap(vma->vm_file, vma);
+	ret = call_mmap(vma->vm_file, vma, false);
 
 	if (ret) {
 		/* if call_mmap fails, our caller will put host_file so we
diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index fa125feed0ff..b963a9397e80 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -503,7 +503,7 @@ static int ovl_mmap(struct file *file, struct vm_area_struct *vma)
 	vma_set_file(vma, realfile);
 
 	old_cred = ovl_override_creds(file_inode(file)->i_sb);
-	ret = call_mmap(vma->vm_file, vma);
+	ret = call_mmap(vma->vm_file, vma, false);
 	revert_creds(old_cred);
 	ovl_file_accessed(file);
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index e2d892b201b0..2909e2d14af8 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -42,6 +42,7 @@
 #include <linux/mount.h>
 #include <linux/cred.h>
 #include <linux/mnt_idmapping.h>
+#include <linux/mm.h>
 
 #include <asm/byteorder.h>
 #include <uapi/linux/fs.h>
@@ -1993,6 +1994,7 @@ struct file_operations {
 	long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
 	long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
 	int (*mmap) (struct file *, struct vm_area_struct *);
+	int (*populate)(struct file *, struct vm_area_struct *);
 	unsigned long mmap_supported_flags;
 	int (*open) (struct inode *, struct file *);
 	int (*flush) (struct file *, fl_owner_t id);
@@ -2074,9 +2076,15 @@ static inline ssize_t call_write_iter(struct file *file, struct kiocb *kio,
 	return file->f_op->write_iter(kio, iter);
 }
 
-static inline int call_mmap(struct file *file, struct vm_area_struct *vma)
+static inline int call_mmap(struct file *file, struct vm_area_struct *vma, bool do_populate)
 {
-	return file->f_op->mmap(file, vma);
+	int ret = file->f_op->mmap(file, vma);
+
+	if (!ret && do_populate && file->f_op->populate &&
+	    !!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
+		ret = file->f_op->populate(file, vma);
+
+	return ret;
 }
 
 extern ssize_t vfs_read(struct file *, char __user *, size_t, loff_t *);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 213cc569b192..6c8c036f423b 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2683,7 +2683,7 @@ extern unsigned long get_unmapped_area(struct file *, unsigned long, unsigned lo
 
 extern unsigned long mmap_region(struct file *file, unsigned long addr,
 	unsigned long len, vm_flags_t vm_flags, unsigned long pgoff,
-	struct list_head *uf);
+	struct list_head *uf, bool do_populate);
 extern unsigned long do_mmap(struct file *file, unsigned long addr,
 	unsigned long len, unsigned long prot, unsigned long flags,
 	unsigned long pgoff, unsigned long *populate, struct list_head *uf);
diff --git a/ipc/shm.c b/ipc/shm.c
index b3048ebd5c31..89b28f32acf0 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -587,7 +587,7 @@ static int shm_mmap(struct file *file, struct vm_area_struct *vma)
 	if (ret)
 		return ret;
 
-	ret = call_mmap(sfd->file, vma);
+	ret = call_mmap(sfd->file, vma, do_populate);
 	if (ret) {
 		shm_close(vma);
 		return ret;
diff --git a/mm/mmap.c b/mm/mmap.c
index 1e8fdb0b51ed..5eca79957d4c 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1413,6 +1413,7 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
 			unsigned long flags, unsigned long pgoff,
 			unsigned long *populate, struct list_head *uf)
 {
+	bool do_populate = (flags & (MAP_POPULATE | MAP_NONBLOCK)) == MAP_POPULATE;
 	struct mm_struct *mm = current->mm;
 	vm_flags_t vm_flags;
 	int pkey = 0;
@@ -1579,10 +1580,9 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
 			vm_flags |= VM_NORESERVE;
 	}
 
-	addr = mmap_region(file, addr, len, vm_flags, pgoff, uf);
+	addr = mmap_region(file, addr, len, vm_flags, pgoff, uf, do_populate);
 	if (!IS_ERR_VALUE(addr) &&
-	    ((vm_flags & VM_LOCKED) ||
-	     (flags & (MAP_POPULATE | MAP_NONBLOCK)) == MAP_POPULATE))
+	    ((vm_flags & VM_LOCKED) || do_populate))
 		*populate = len;
 	return addr;
 }
@@ -1721,7 +1721,7 @@ static inline int accountable_mapping(struct file *file, vm_flags_t vm_flags)
 
 unsigned long mmap_region(struct file *file, unsigned long addr,
 		unsigned long len, vm_flags_t vm_flags, unsigned long pgoff,
-		struct list_head *uf)
+		struct list_head *uf, bool do_populate)
 {
 	struct mm_struct *mm = current->mm;
 	struct vm_area_struct *vma, *prev, *merge;
@@ -1790,7 +1790,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
 		}
 
 		vma->vm_file = get_file(file);
-		error = call_mmap(file, vma);
+		error = call_mmap(file, vma, do_populate);
 		if (error)
 			goto unmap_and_free_vma;
 
diff --git a/mm/nommu.c b/mm/nommu.c
index 55a9e48a7a02..a3c20b803c27 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -941,7 +941,7 @@ static int do_mmap_shared_file(struct vm_area_struct *vma)
 {
 	int ret;
 
-	ret = call_mmap(vma->vm_file, vma);
+	ret = call_mmap(vma->vm_file, vma, false);
 	if (ret == 0) {
 		vma->vm_region->vm_top = vma->vm_region->vm_end;
 		return 0;
@@ -972,7 +972,7 @@ static int do_mmap_private(struct vm_area_struct *vma,
 	 * - VM_MAYSHARE will be set if it may attempt to share
 	 */
 	if (capabilities & NOMMU_MAP_DIRECT) {
-		ret = call_mmap(vma->vm_file, vma);
+		ret = call_mmap(vma->vm_file, vma, false);
 		if (ret == 0) {
 			/* shouldn't return success if we're not sharing */
 			BUG_ON(!(vma->vm_flags & VM_MAYSHARE));
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH RFC 1/3] mm: Add f_ops->populate()
@ 2022-03-06  5:32   ` Jarkko Sakkinen
  0 siblings, 0 replies; 68+ messages in thread
From: Jarkko Sakkinen @ 2022-03-06  5:32 UTC (permalink / raw)
  To: linux-mm
  Cc: zhangyiru, Daniel Vetter, Dave Hansen, dri-devel, Chris Wilson,
	Jason Ekstrand, Alexander Mikhalitsyn, Florian Fainelli,
	linux-unionfs, codalist, Matthew Auld, Vasily Averin,
	Thomas Hellström, intel-gfx, linux-mips, Shakeel Butt,
	Reinette Chatre, linux-sgx, Thomas Bogendoerfer,
	Nathaniel McCallum, Greg Kroah-Hartman, Tvrtko Ursulin,
	linux-kernel, Jarkko Sakkinen, linux-fsdevel, Andrew Morton,
	Alexey Gladkov

Sometimes you might want to use MAP_POPULATE to ask a device driver to
initialize the device memory in some specific manner. SGX driver can use
this to request more memory by issuing ENCLS[EAUG] x86 opcode for each
page in the address range.

Add f_ops->populate() with the same parameters as f_ops->mmap() and make
it conditionally called inside call_mmap(). Update call sites
accodingly.
---
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
v3:
-       if (!ret && do_populate && file->f_op->populate)
+       if (!ret && do_populate && file->f_op->populate &&
+           !!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
(reported by Matthew Wilcox)
v2:
-       if (!ret && do_populate)
+       if (!ret && do_populate && file->f_op->populate)
(reported by Jan Harkes)
---
 arch/mips/kernel/vdso.c                    |  2 +-
 drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c |  2 +-
 fs/coda/file.c                             |  2 +-
 fs/overlayfs/file.c                        |  2 +-
 include/linux/fs.h                         | 12 ++++++++++--
 include/linux/mm.h                         |  2 +-
 ipc/shm.c                                  |  2 +-
 mm/mmap.c                                  | 10 +++++-----
 mm/nommu.c                                 |  4 ++--
 9 files changed, 23 insertions(+), 15 deletions(-)

diff --git a/arch/mips/kernel/vdso.c b/arch/mips/kernel/vdso.c
index 3d0cf471f2fe..89f3f3da9abd 100644
--- a/arch/mips/kernel/vdso.c
+++ b/arch/mips/kernel/vdso.c
@@ -102,7 +102,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
 		base = mmap_region(NULL, STACK_TOP, PAGE_SIZE,
 				VM_READ | VM_EXEC |
 				VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC,
-				0, NULL);
+				0, NULL, false);
 		if (IS_ERR_VALUE(base)) {
 			ret = base;
 			goto out;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
index 1b526039a60d..4c71f64d6a79 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
@@ -107,7 +107,7 @@ static int i915_gem_dmabuf_mmap(struct dma_buf *dma_buf, struct vm_area_struct *
 	if (!obj->base.filp)
 		return -ENODEV;
 
-	ret = call_mmap(obj->base.filp, vma);
+	ret = call_mmap(obj->base.filp, vma, false);
 	if (ret)
 		return ret;
 
diff --git a/fs/coda/file.c b/fs/coda/file.c
index 29dd87be2fb8..e14f312fdbf8 100644
--- a/fs/coda/file.c
+++ b/fs/coda/file.c
@@ -173,7 +173,7 @@ coda_file_mmap(struct file *coda_file, struct vm_area_struct *vma)
 	spin_unlock(&cii->c_lock);
 
 	vma->vm_file = get_file(host_file);
-	ret = call_mmap(vma->vm_file, vma);
+	ret = call_mmap(vma->vm_file, vma, false);
 
 	if (ret) {
 		/* if call_mmap fails, our caller will put host_file so we
diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index fa125feed0ff..b963a9397e80 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -503,7 +503,7 @@ static int ovl_mmap(struct file *file, struct vm_area_struct *vma)
 	vma_set_file(vma, realfile);
 
 	old_cred = ovl_override_creds(file_inode(file)->i_sb);
-	ret = call_mmap(vma->vm_file, vma);
+	ret = call_mmap(vma->vm_file, vma, false);
 	revert_creds(old_cred);
 	ovl_file_accessed(file);
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index e2d892b201b0..2909e2d14af8 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -42,6 +42,7 @@
 #include <linux/mount.h>
 #include <linux/cred.h>
 #include <linux/mnt_idmapping.h>
+#include <linux/mm.h>
 
 #include <asm/byteorder.h>
 #include <uapi/linux/fs.h>
@@ -1993,6 +1994,7 @@ struct file_operations {
 	long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
 	long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
 	int (*mmap) (struct file *, struct vm_area_struct *);
+	int (*populate)(struct file *, struct vm_area_struct *);
 	unsigned long mmap_supported_flags;
 	int (*open) (struct inode *, struct file *);
 	int (*flush) (struct file *, fl_owner_t id);
@@ -2074,9 +2076,15 @@ static inline ssize_t call_write_iter(struct file *file, struct kiocb *kio,
 	return file->f_op->write_iter(kio, iter);
 }
 
-static inline int call_mmap(struct file *file, struct vm_area_struct *vma)
+static inline int call_mmap(struct file *file, struct vm_area_struct *vma, bool do_populate)
 {
-	return file->f_op->mmap(file, vma);
+	int ret = file->f_op->mmap(file, vma);
+
+	if (!ret && do_populate && file->f_op->populate &&
+	    !!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
+		ret = file->f_op->populate(file, vma);
+
+	return ret;
 }
 
 extern ssize_t vfs_read(struct file *, char __user *, size_t, loff_t *);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 213cc569b192..6c8c036f423b 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2683,7 +2683,7 @@ extern unsigned long get_unmapped_area(struct file *, unsigned long, unsigned lo
 
 extern unsigned long mmap_region(struct file *file, unsigned long addr,
 	unsigned long len, vm_flags_t vm_flags, unsigned long pgoff,
-	struct list_head *uf);
+	struct list_head *uf, bool do_populate);
 extern unsigned long do_mmap(struct file *file, unsigned long addr,
 	unsigned long len, unsigned long prot, unsigned long flags,
 	unsigned long pgoff, unsigned long *populate, struct list_head *uf);
diff --git a/ipc/shm.c b/ipc/shm.c
index b3048ebd5c31..89b28f32acf0 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -587,7 +587,7 @@ static int shm_mmap(struct file *file, struct vm_area_struct *vma)
 	if (ret)
 		return ret;
 
-	ret = call_mmap(sfd->file, vma);
+	ret = call_mmap(sfd->file, vma, do_populate);
 	if (ret) {
 		shm_close(vma);
 		return ret;
diff --git a/mm/mmap.c b/mm/mmap.c
index 1e8fdb0b51ed..5eca79957d4c 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1413,6 +1413,7 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
 			unsigned long flags, unsigned long pgoff,
 			unsigned long *populate, struct list_head *uf)
 {
+	bool do_populate = (flags & (MAP_POPULATE | MAP_NONBLOCK)) == MAP_POPULATE;
 	struct mm_struct *mm = current->mm;
 	vm_flags_t vm_flags;
 	int pkey = 0;
@@ -1579,10 +1580,9 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
 			vm_flags |= VM_NORESERVE;
 	}
 
-	addr = mmap_region(file, addr, len, vm_flags, pgoff, uf);
+	addr = mmap_region(file, addr, len, vm_flags, pgoff, uf, do_populate);
 	if (!IS_ERR_VALUE(addr) &&
-	    ((vm_flags & VM_LOCKED) ||
-	     (flags & (MAP_POPULATE | MAP_NONBLOCK)) == MAP_POPULATE))
+	    ((vm_flags & VM_LOCKED) || do_populate))
 		*populate = len;
 	return addr;
 }
@@ -1721,7 +1721,7 @@ static inline int accountable_mapping(struct file *file, vm_flags_t vm_flags)
 
 unsigned long mmap_region(struct file *file, unsigned long addr,
 		unsigned long len, vm_flags_t vm_flags, unsigned long pgoff,
-		struct list_head *uf)
+		struct list_head *uf, bool do_populate)
 {
 	struct mm_struct *mm = current->mm;
 	struct vm_area_struct *vma, *prev, *merge;
@@ -1790,7 +1790,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
 		}
 
 		vma->vm_file = get_file(file);
-		error = call_mmap(file, vma);
+		error = call_mmap(file, vma, do_populate);
 		if (error)
 			goto unmap_and_free_vma;
 
diff --git a/mm/nommu.c b/mm/nommu.c
index 55a9e48a7a02..a3c20b803c27 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -941,7 +941,7 @@ static int do_mmap_shared_file(struct vm_area_struct *vma)
 {
 	int ret;
 
-	ret = call_mmap(vma->vm_file, vma);
+	ret = call_mmap(vma->vm_file, vma, false);
 	if (ret == 0) {
 		vma->vm_region->vm_top = vma->vm_region->vm_end;
 		return 0;
@@ -972,7 +972,7 @@ static int do_mmap_private(struct vm_area_struct *vma,
 	 * - VM_MAYSHARE will be set if it may attempt to share
 	 */
 	if (capabilities & NOMMU_MAP_DIRECT) {
-		ret = call_mmap(vma->vm_file, vma);
+		ret = call_mmap(vma->vm_file, vma, false);
 		if (ret == 0) {
 			/* shouldn't return success if we're not sharing */
 			BUG_ON(!(vma->vm_flags & VM_MAYSHARE));
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [Intel-gfx] [PATCH RFC 1/3] mm: Add f_ops->populate()
@ 2022-03-06  5:32   ` Jarkko Sakkinen
  0 siblings, 0 replies; 68+ messages in thread
From: Jarkko Sakkinen @ 2022-03-06  5:32 UTC (permalink / raw)
  To: linux-mm
  Cc: zhangyiru, Daniel Vetter, Dave Hansen, dri-devel, Chris Wilson,
	Alexander Mikhalitsyn, Florian Fainelli, linux-unionfs, codalist,
	Matthew Auld, Vasily Averin, Thomas Hellström, intel-gfx,
	linux-mips, Shakeel Butt, Reinette Chatre, linux-sgx,
	Thomas Bogendoerfer, Nathaniel McCallum, Greg Kroah-Hartman,
	linux-kernel, Jarkko Sakkinen, linux-fsdevel, Andrew Morton,
	Alexey Gladkov

Sometimes you might want to use MAP_POPULATE to ask a device driver to
initialize the device memory in some specific manner. SGX driver can use
this to request more memory by issuing ENCLS[EAUG] x86 opcode for each
page in the address range.

Add f_ops->populate() with the same parameters as f_ops->mmap() and make
it conditionally called inside call_mmap(). Update call sites
accodingly.
---
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
v3:
-       if (!ret && do_populate && file->f_op->populate)
+       if (!ret && do_populate && file->f_op->populate &&
+           !!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
(reported by Matthew Wilcox)
v2:
-       if (!ret && do_populate)
+       if (!ret && do_populate && file->f_op->populate)
(reported by Jan Harkes)
---
 arch/mips/kernel/vdso.c                    |  2 +-
 drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c |  2 +-
 fs/coda/file.c                             |  2 +-
 fs/overlayfs/file.c                        |  2 +-
 include/linux/fs.h                         | 12 ++++++++++--
 include/linux/mm.h                         |  2 +-
 ipc/shm.c                                  |  2 +-
 mm/mmap.c                                  | 10 +++++-----
 mm/nommu.c                                 |  4 ++--
 9 files changed, 23 insertions(+), 15 deletions(-)

diff --git a/arch/mips/kernel/vdso.c b/arch/mips/kernel/vdso.c
index 3d0cf471f2fe..89f3f3da9abd 100644
--- a/arch/mips/kernel/vdso.c
+++ b/arch/mips/kernel/vdso.c
@@ -102,7 +102,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
 		base = mmap_region(NULL, STACK_TOP, PAGE_SIZE,
 				VM_READ | VM_EXEC |
 				VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC,
-				0, NULL);
+				0, NULL, false);
 		if (IS_ERR_VALUE(base)) {
 			ret = base;
 			goto out;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
index 1b526039a60d..4c71f64d6a79 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
@@ -107,7 +107,7 @@ static int i915_gem_dmabuf_mmap(struct dma_buf *dma_buf, struct vm_area_struct *
 	if (!obj->base.filp)
 		return -ENODEV;
 
-	ret = call_mmap(obj->base.filp, vma);
+	ret = call_mmap(obj->base.filp, vma, false);
 	if (ret)
 		return ret;
 
diff --git a/fs/coda/file.c b/fs/coda/file.c
index 29dd87be2fb8..e14f312fdbf8 100644
--- a/fs/coda/file.c
+++ b/fs/coda/file.c
@@ -173,7 +173,7 @@ coda_file_mmap(struct file *coda_file, struct vm_area_struct *vma)
 	spin_unlock(&cii->c_lock);
 
 	vma->vm_file = get_file(host_file);
-	ret = call_mmap(vma->vm_file, vma);
+	ret = call_mmap(vma->vm_file, vma, false);
 
 	if (ret) {
 		/* if call_mmap fails, our caller will put host_file so we
diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index fa125feed0ff..b963a9397e80 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -503,7 +503,7 @@ static int ovl_mmap(struct file *file, struct vm_area_struct *vma)
 	vma_set_file(vma, realfile);
 
 	old_cred = ovl_override_creds(file_inode(file)->i_sb);
-	ret = call_mmap(vma->vm_file, vma);
+	ret = call_mmap(vma->vm_file, vma, false);
 	revert_creds(old_cred);
 	ovl_file_accessed(file);
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index e2d892b201b0..2909e2d14af8 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -42,6 +42,7 @@
 #include <linux/mount.h>
 #include <linux/cred.h>
 #include <linux/mnt_idmapping.h>
+#include <linux/mm.h>
 
 #include <asm/byteorder.h>
 #include <uapi/linux/fs.h>
@@ -1993,6 +1994,7 @@ struct file_operations {
 	long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
 	long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
 	int (*mmap) (struct file *, struct vm_area_struct *);
+	int (*populate)(struct file *, struct vm_area_struct *);
 	unsigned long mmap_supported_flags;
 	int (*open) (struct inode *, struct file *);
 	int (*flush) (struct file *, fl_owner_t id);
@@ -2074,9 +2076,15 @@ static inline ssize_t call_write_iter(struct file *file, struct kiocb *kio,
 	return file->f_op->write_iter(kio, iter);
 }
 
-static inline int call_mmap(struct file *file, struct vm_area_struct *vma)
+static inline int call_mmap(struct file *file, struct vm_area_struct *vma, bool do_populate)
 {
-	return file->f_op->mmap(file, vma);
+	int ret = file->f_op->mmap(file, vma);
+
+	if (!ret && do_populate && file->f_op->populate &&
+	    !!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
+		ret = file->f_op->populate(file, vma);
+
+	return ret;
 }
 
 extern ssize_t vfs_read(struct file *, char __user *, size_t, loff_t *);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 213cc569b192..6c8c036f423b 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2683,7 +2683,7 @@ extern unsigned long get_unmapped_area(struct file *, unsigned long, unsigned lo
 
 extern unsigned long mmap_region(struct file *file, unsigned long addr,
 	unsigned long len, vm_flags_t vm_flags, unsigned long pgoff,
-	struct list_head *uf);
+	struct list_head *uf, bool do_populate);
 extern unsigned long do_mmap(struct file *file, unsigned long addr,
 	unsigned long len, unsigned long prot, unsigned long flags,
 	unsigned long pgoff, unsigned long *populate, struct list_head *uf);
diff --git a/ipc/shm.c b/ipc/shm.c
index b3048ebd5c31..89b28f32acf0 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -587,7 +587,7 @@ static int shm_mmap(struct file *file, struct vm_area_struct *vma)
 	if (ret)
 		return ret;
 
-	ret = call_mmap(sfd->file, vma);
+	ret = call_mmap(sfd->file, vma, do_populate);
 	if (ret) {
 		shm_close(vma);
 		return ret;
diff --git a/mm/mmap.c b/mm/mmap.c
index 1e8fdb0b51ed..5eca79957d4c 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1413,6 +1413,7 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
 			unsigned long flags, unsigned long pgoff,
 			unsigned long *populate, struct list_head *uf)
 {
+	bool do_populate = (flags & (MAP_POPULATE | MAP_NONBLOCK)) == MAP_POPULATE;
 	struct mm_struct *mm = current->mm;
 	vm_flags_t vm_flags;
 	int pkey = 0;
@@ -1579,10 +1580,9 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
 			vm_flags |= VM_NORESERVE;
 	}
 
-	addr = mmap_region(file, addr, len, vm_flags, pgoff, uf);
+	addr = mmap_region(file, addr, len, vm_flags, pgoff, uf, do_populate);
 	if (!IS_ERR_VALUE(addr) &&
-	    ((vm_flags & VM_LOCKED) ||
-	     (flags & (MAP_POPULATE | MAP_NONBLOCK)) == MAP_POPULATE))
+	    ((vm_flags & VM_LOCKED) || do_populate))
 		*populate = len;
 	return addr;
 }
@@ -1721,7 +1721,7 @@ static inline int accountable_mapping(struct file *file, vm_flags_t vm_flags)
 
 unsigned long mmap_region(struct file *file, unsigned long addr,
 		unsigned long len, vm_flags_t vm_flags, unsigned long pgoff,
-		struct list_head *uf)
+		struct list_head *uf, bool do_populate)
 {
 	struct mm_struct *mm = current->mm;
 	struct vm_area_struct *vma, *prev, *merge;
@@ -1790,7 +1790,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
 		}
 
 		vma->vm_file = get_file(file);
-		error = call_mmap(file, vma);
+		error = call_mmap(file, vma, do_populate);
 		if (error)
 			goto unmap_and_free_vma;
 
diff --git a/mm/nommu.c b/mm/nommu.c
index 55a9e48a7a02..a3c20b803c27 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -941,7 +941,7 @@ static int do_mmap_shared_file(struct vm_area_struct *vma)
 {
 	int ret;
 
-	ret = call_mmap(vma->vm_file, vma);
+	ret = call_mmap(vma->vm_file, vma, false);
 	if (ret == 0) {
 		vma->vm_region->vm_top = vma->vm_region->vm_end;
 		return 0;
@@ -972,7 +972,7 @@ static int do_mmap_private(struct vm_area_struct *vma,
 	 * - VM_MAYSHARE will be set if it may attempt to share
 	 */
 	if (capabilities & NOMMU_MAP_DIRECT) {
-		ret = call_mmap(vma->vm_file, vma);
+		ret = call_mmap(vma->vm_file, vma, false);
 		if (ret == 0) {
 			/* shouldn't return success if we're not sharing */
 			BUG_ON(!(vma->vm_flags & VM_MAYSHARE));
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH RFC 2/3] x86/sgx: Export sgx_encl_page_alloc()
  2022-03-06  5:32 ` Jarkko Sakkinen
  (?)
@ 2022-03-06  5:32   ` Jarkko Sakkinen
  -1 siblings, 0 replies; 68+ messages in thread
From: Jarkko Sakkinen @ 2022-03-06  5:32 UTC (permalink / raw)
  To: linux-mm
  Cc: Dave Hansen, Nathaniel McCallum, Reinette Chatre, Andrew Morton,
	linux-sgx, linux-kernel, Jarkko Sakkinen, H. Peter Anvin,
	Thomas Bogendoerfer, Florian Fainelli, Matthew Auld,
	Thomas Hellström, Daniel Vetter, Jason Ekstrand,
	Chris Wilson, Maarten Lankhorst, Greg Kroah-Hartman,
	Tvrtko Ursulin, Vasily Averin, Shakeel Butt, Mike Kravetz,
	Alexey Gladkov, zhangyiru, Alexander Mikhalitsyn, linux-mips,
	intel-gfx, dri-devel, codalist, linux-unionfs, linux-fsdevel

Move sgx_encl_page_alloc() to encl.c and export it so that it can be
used in the implementation for MAP_POPULATE, which requires to allocate
new enclave pages.

Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
---
 arch/x86/kernel/cpu/sgx/encl.c  | 38 +++++++++++++++++++++++++++++++++
 arch/x86/kernel/cpu/sgx/encl.h  |  3 +++
 arch/x86/kernel/cpu/sgx/ioctl.c | 38 ---------------------------------
 3 files changed, 41 insertions(+), 38 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 89aeed798ffb..79e39bd99c09 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -914,6 +914,44 @@ int sgx_encl_test_and_clear_young(struct mm_struct *mm,
 	return ret;
 }
 
+struct sgx_encl_page *sgx_encl_page_alloc(struct sgx_encl *encl,
+					  unsigned long offset,
+					  u64 secinfo_flags)
+{
+	struct sgx_encl_page *encl_page;
+	unsigned long prot;
+
+	encl_page = kzalloc(sizeof(*encl_page), GFP_KERNEL);
+	if (!encl_page)
+		return ERR_PTR(-ENOMEM);
+
+	encl_page->desc = encl->base + offset;
+	encl_page->encl = encl;
+
+	prot = _calc_vm_trans(secinfo_flags, SGX_SECINFO_R, PROT_READ)  |
+	       _calc_vm_trans(secinfo_flags, SGX_SECINFO_W, PROT_WRITE) |
+	       _calc_vm_trans(secinfo_flags, SGX_SECINFO_X, PROT_EXEC);
+
+	/*
+	 * TCS pages must always RW set for CPU access while the SECINFO
+	 * permissions are *always* zero - the CPU ignores the user provided
+	 * values and silently overwrites them with zero permissions.
+	 */
+	if ((secinfo_flags & SGX_SECINFO_PAGE_TYPE_MASK) == SGX_SECINFO_TCS)
+		prot |= PROT_READ | PROT_WRITE;
+
+	/* Calculate maximum of the VM flags for the page. */
+	encl_page->vm_max_prot_bits = calc_vm_prot_bits(prot, 0);
+
+	/*
+	 * At time of allocation, the runtime protection bits are the same
+	 * as the maximum protection bits.
+	 */
+	encl_page->vm_run_prot_bits = encl_page->vm_max_prot_bits;
+
+	return encl_page;
+}
+
 /**
  * sgx_zap_enclave_ptes() - remove PTEs mapping the address from enclave
  * @encl: the enclave
diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
index 1b6ce1da7c92..3df0d3faf3a1 100644
--- a/arch/x86/kernel/cpu/sgx/encl.h
+++ b/arch/x86/kernel/cpu/sgx/encl.h
@@ -113,6 +113,9 @@ int sgx_encl_get_backing(struct sgx_encl *encl, unsigned long page_index,
 void sgx_encl_put_backing(struct sgx_backing *backing, bool do_write);
 int sgx_encl_test_and_clear_young(struct mm_struct *mm,
 				  struct sgx_encl_page *page);
+struct sgx_encl_page *sgx_encl_page_alloc(struct sgx_encl *encl,
+					  unsigned long offset,
+					  u64 secinfo_flags);
 void sgx_zap_enclave_ptes(struct sgx_encl *encl, unsigned long addr);
 struct sgx_epc_page *sgx_alloc_va_page(void);
 unsigned int sgx_alloc_va_slot(struct sgx_va_page *va_page);
diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index d8c3c07badb3..3e3ca27a6f72 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -169,44 +169,6 @@ static long sgx_ioc_enclave_create(struct sgx_encl *encl, void __user *arg)
 	return ret;
 }
 
-static struct sgx_encl_page *sgx_encl_page_alloc(struct sgx_encl *encl,
-						 unsigned long offset,
-						 u64 secinfo_flags)
-{
-	struct sgx_encl_page *encl_page;
-	unsigned long prot;
-
-	encl_page = kzalloc(sizeof(*encl_page), GFP_KERNEL);
-	if (!encl_page)
-		return ERR_PTR(-ENOMEM);
-
-	encl_page->desc = encl->base + offset;
-	encl_page->encl = encl;
-
-	prot = _calc_vm_trans(secinfo_flags, SGX_SECINFO_R, PROT_READ)  |
-	       _calc_vm_trans(secinfo_flags, SGX_SECINFO_W, PROT_WRITE) |
-	       _calc_vm_trans(secinfo_flags, SGX_SECINFO_X, PROT_EXEC);
-
-	/*
-	 * TCS pages must always RW set for CPU access while the SECINFO
-	 * permissions are *always* zero - the CPU ignores the user provided
-	 * values and silently overwrites them with zero permissions.
-	 */
-	if ((secinfo_flags & SGX_SECINFO_PAGE_TYPE_MASK) == SGX_SECINFO_TCS)
-		prot |= PROT_READ | PROT_WRITE;
-
-	/* Calculate maximum of the VM flags for the page. */
-	encl_page->vm_max_prot_bits = calc_vm_prot_bits(prot, 0);
-
-	/*
-	 * At time of allocation, the runtime protection bits are the same
-	 * as the maximum protection bits.
-	 */
-	encl_page->vm_run_prot_bits = encl_page->vm_max_prot_bits;
-
-	return encl_page;
-}
-
 static int sgx_validate_secinfo(struct sgx_secinfo *secinfo)
 {
 	u64 perm = secinfo->flags & SGX_SECINFO_PERMISSION_MASK;
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH RFC 2/3] x86/sgx: Export sgx_encl_page_alloc()
@ 2022-03-06  5:32   ` Jarkko Sakkinen
  0 siblings, 0 replies; 68+ messages in thread
From: Jarkko Sakkinen @ 2022-03-06  5:32 UTC (permalink / raw)
  To: linux-mm
  Cc: zhangyiru, Daniel Vetter, Dave Hansen, dri-devel, Chris Wilson,
	Jason Ekstrand, H. Peter Anvin, Alexander Mikhalitsyn,
	Florian Fainelli, linux-unionfs, codalist, Matthew Auld,
	Vasily Averin, Thomas Hellström, intel-gfx, linux-mips,
	Shakeel Butt, Reinette Chatre, linux-sgx, Thomas Bogendoerfer,
	Nathaniel McCallum, Greg Kroah-Hartman, Tvrtko Ursulin,
	linux-kernel, Jarkko Sakkinen, linux-fsdevel, Andrew Morton,
	Alexey Gladkov, Mike Kravetz

Move sgx_encl_page_alloc() to encl.c and export it so that it can be
used in the implementation for MAP_POPULATE, which requires to allocate
new enclave pages.

Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
---
 arch/x86/kernel/cpu/sgx/encl.c  | 38 +++++++++++++++++++++++++++++++++
 arch/x86/kernel/cpu/sgx/encl.h  |  3 +++
 arch/x86/kernel/cpu/sgx/ioctl.c | 38 ---------------------------------
 3 files changed, 41 insertions(+), 38 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 89aeed798ffb..79e39bd99c09 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -914,6 +914,44 @@ int sgx_encl_test_and_clear_young(struct mm_struct *mm,
 	return ret;
 }
 
+struct sgx_encl_page *sgx_encl_page_alloc(struct sgx_encl *encl,
+					  unsigned long offset,
+					  u64 secinfo_flags)
+{
+	struct sgx_encl_page *encl_page;
+	unsigned long prot;
+
+	encl_page = kzalloc(sizeof(*encl_page), GFP_KERNEL);
+	if (!encl_page)
+		return ERR_PTR(-ENOMEM);
+
+	encl_page->desc = encl->base + offset;
+	encl_page->encl = encl;
+
+	prot = _calc_vm_trans(secinfo_flags, SGX_SECINFO_R, PROT_READ)  |
+	       _calc_vm_trans(secinfo_flags, SGX_SECINFO_W, PROT_WRITE) |
+	       _calc_vm_trans(secinfo_flags, SGX_SECINFO_X, PROT_EXEC);
+
+	/*
+	 * TCS pages must always RW set for CPU access while the SECINFO
+	 * permissions are *always* zero - the CPU ignores the user provided
+	 * values and silently overwrites them with zero permissions.
+	 */
+	if ((secinfo_flags & SGX_SECINFO_PAGE_TYPE_MASK) == SGX_SECINFO_TCS)
+		prot |= PROT_READ | PROT_WRITE;
+
+	/* Calculate maximum of the VM flags for the page. */
+	encl_page->vm_max_prot_bits = calc_vm_prot_bits(prot, 0);
+
+	/*
+	 * At time of allocation, the runtime protection bits are the same
+	 * as the maximum protection bits.
+	 */
+	encl_page->vm_run_prot_bits = encl_page->vm_max_prot_bits;
+
+	return encl_page;
+}
+
 /**
  * sgx_zap_enclave_ptes() - remove PTEs mapping the address from enclave
  * @encl: the enclave
diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
index 1b6ce1da7c92..3df0d3faf3a1 100644
--- a/arch/x86/kernel/cpu/sgx/encl.h
+++ b/arch/x86/kernel/cpu/sgx/encl.h
@@ -113,6 +113,9 @@ int sgx_encl_get_backing(struct sgx_encl *encl, unsigned long page_index,
 void sgx_encl_put_backing(struct sgx_backing *backing, bool do_write);
 int sgx_encl_test_and_clear_young(struct mm_struct *mm,
 				  struct sgx_encl_page *page);
+struct sgx_encl_page *sgx_encl_page_alloc(struct sgx_encl *encl,
+					  unsigned long offset,
+					  u64 secinfo_flags);
 void sgx_zap_enclave_ptes(struct sgx_encl *encl, unsigned long addr);
 struct sgx_epc_page *sgx_alloc_va_page(void);
 unsigned int sgx_alloc_va_slot(struct sgx_va_page *va_page);
diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index d8c3c07badb3..3e3ca27a6f72 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -169,44 +169,6 @@ static long sgx_ioc_enclave_create(struct sgx_encl *encl, void __user *arg)
 	return ret;
 }
 
-static struct sgx_encl_page *sgx_encl_page_alloc(struct sgx_encl *encl,
-						 unsigned long offset,
-						 u64 secinfo_flags)
-{
-	struct sgx_encl_page *encl_page;
-	unsigned long prot;
-
-	encl_page = kzalloc(sizeof(*encl_page), GFP_KERNEL);
-	if (!encl_page)
-		return ERR_PTR(-ENOMEM);
-
-	encl_page->desc = encl->base + offset;
-	encl_page->encl = encl;
-
-	prot = _calc_vm_trans(secinfo_flags, SGX_SECINFO_R, PROT_READ)  |
-	       _calc_vm_trans(secinfo_flags, SGX_SECINFO_W, PROT_WRITE) |
-	       _calc_vm_trans(secinfo_flags, SGX_SECINFO_X, PROT_EXEC);
-
-	/*
-	 * TCS pages must always RW set for CPU access while the SECINFO
-	 * permissions are *always* zero - the CPU ignores the user provided
-	 * values and silently overwrites them with zero permissions.
-	 */
-	if ((secinfo_flags & SGX_SECINFO_PAGE_TYPE_MASK) == SGX_SECINFO_TCS)
-		prot |= PROT_READ | PROT_WRITE;
-
-	/* Calculate maximum of the VM flags for the page. */
-	encl_page->vm_max_prot_bits = calc_vm_prot_bits(prot, 0);
-
-	/*
-	 * At time of allocation, the runtime protection bits are the same
-	 * as the maximum protection bits.
-	 */
-	encl_page->vm_run_prot_bits = encl_page->vm_max_prot_bits;
-
-	return encl_page;
-}
-
 static int sgx_validate_secinfo(struct sgx_secinfo *secinfo)
 {
 	u64 perm = secinfo->flags & SGX_SECINFO_PERMISSION_MASK;
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [Intel-gfx] [PATCH RFC 2/3] x86/sgx: Export sgx_encl_page_alloc()
@ 2022-03-06  5:32   ` Jarkko Sakkinen
  0 siblings, 0 replies; 68+ messages in thread
From: Jarkko Sakkinen @ 2022-03-06  5:32 UTC (permalink / raw)
  To: linux-mm
  Cc: zhangyiru, Daniel Vetter, Dave Hansen, dri-devel, Chris Wilson,
	H. Peter Anvin, Alexander Mikhalitsyn, Florian Fainelli,
	linux-unionfs, codalist, Matthew Auld, Vasily Averin,
	Thomas Hellström, intel-gfx, linux-mips, Shakeel Butt,
	Reinette Chatre, linux-sgx, Thomas Bogendoerfer,
	Nathaniel McCallum, Greg Kroah-Hartman, linux-kernel,
	Jarkko Sakkinen, linux-fsdevel, Andrew Morton, Alexey Gladkov,
	Mike Kravetz

Move sgx_encl_page_alloc() to encl.c and export it so that it can be
used in the implementation for MAP_POPULATE, which requires to allocate
new enclave pages.

Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
---
 arch/x86/kernel/cpu/sgx/encl.c  | 38 +++++++++++++++++++++++++++++++++
 arch/x86/kernel/cpu/sgx/encl.h  |  3 +++
 arch/x86/kernel/cpu/sgx/ioctl.c | 38 ---------------------------------
 3 files changed, 41 insertions(+), 38 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 89aeed798ffb..79e39bd99c09 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -914,6 +914,44 @@ int sgx_encl_test_and_clear_young(struct mm_struct *mm,
 	return ret;
 }
 
+struct sgx_encl_page *sgx_encl_page_alloc(struct sgx_encl *encl,
+					  unsigned long offset,
+					  u64 secinfo_flags)
+{
+	struct sgx_encl_page *encl_page;
+	unsigned long prot;
+
+	encl_page = kzalloc(sizeof(*encl_page), GFP_KERNEL);
+	if (!encl_page)
+		return ERR_PTR(-ENOMEM);
+
+	encl_page->desc = encl->base + offset;
+	encl_page->encl = encl;
+
+	prot = _calc_vm_trans(secinfo_flags, SGX_SECINFO_R, PROT_READ)  |
+	       _calc_vm_trans(secinfo_flags, SGX_SECINFO_W, PROT_WRITE) |
+	       _calc_vm_trans(secinfo_flags, SGX_SECINFO_X, PROT_EXEC);
+
+	/*
+	 * TCS pages must always RW set for CPU access while the SECINFO
+	 * permissions are *always* zero - the CPU ignores the user provided
+	 * values and silently overwrites them with zero permissions.
+	 */
+	if ((secinfo_flags & SGX_SECINFO_PAGE_TYPE_MASK) == SGX_SECINFO_TCS)
+		prot |= PROT_READ | PROT_WRITE;
+
+	/* Calculate maximum of the VM flags for the page. */
+	encl_page->vm_max_prot_bits = calc_vm_prot_bits(prot, 0);
+
+	/*
+	 * At time of allocation, the runtime protection bits are the same
+	 * as the maximum protection bits.
+	 */
+	encl_page->vm_run_prot_bits = encl_page->vm_max_prot_bits;
+
+	return encl_page;
+}
+
 /**
  * sgx_zap_enclave_ptes() - remove PTEs mapping the address from enclave
  * @encl: the enclave
diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
index 1b6ce1da7c92..3df0d3faf3a1 100644
--- a/arch/x86/kernel/cpu/sgx/encl.h
+++ b/arch/x86/kernel/cpu/sgx/encl.h
@@ -113,6 +113,9 @@ int sgx_encl_get_backing(struct sgx_encl *encl, unsigned long page_index,
 void sgx_encl_put_backing(struct sgx_backing *backing, bool do_write);
 int sgx_encl_test_and_clear_young(struct mm_struct *mm,
 				  struct sgx_encl_page *page);
+struct sgx_encl_page *sgx_encl_page_alloc(struct sgx_encl *encl,
+					  unsigned long offset,
+					  u64 secinfo_flags);
 void sgx_zap_enclave_ptes(struct sgx_encl *encl, unsigned long addr);
 struct sgx_epc_page *sgx_alloc_va_page(void);
 unsigned int sgx_alloc_va_slot(struct sgx_va_page *va_page);
diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index d8c3c07badb3..3e3ca27a6f72 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -169,44 +169,6 @@ static long sgx_ioc_enclave_create(struct sgx_encl *encl, void __user *arg)
 	return ret;
 }
 
-static struct sgx_encl_page *sgx_encl_page_alloc(struct sgx_encl *encl,
-						 unsigned long offset,
-						 u64 secinfo_flags)
-{
-	struct sgx_encl_page *encl_page;
-	unsigned long prot;
-
-	encl_page = kzalloc(sizeof(*encl_page), GFP_KERNEL);
-	if (!encl_page)
-		return ERR_PTR(-ENOMEM);
-
-	encl_page->desc = encl->base + offset;
-	encl_page->encl = encl;
-
-	prot = _calc_vm_trans(secinfo_flags, SGX_SECINFO_R, PROT_READ)  |
-	       _calc_vm_trans(secinfo_flags, SGX_SECINFO_W, PROT_WRITE) |
-	       _calc_vm_trans(secinfo_flags, SGX_SECINFO_X, PROT_EXEC);
-
-	/*
-	 * TCS pages must always RW set for CPU access while the SECINFO
-	 * permissions are *always* zero - the CPU ignores the user provided
-	 * values and silently overwrites them with zero permissions.
-	 */
-	if ((secinfo_flags & SGX_SECINFO_PAGE_TYPE_MASK) == SGX_SECINFO_TCS)
-		prot |= PROT_READ | PROT_WRITE;
-
-	/* Calculate maximum of the VM flags for the page. */
-	encl_page->vm_max_prot_bits = calc_vm_prot_bits(prot, 0);
-
-	/*
-	 * At time of allocation, the runtime protection bits are the same
-	 * as the maximum protection bits.
-	 */
-	encl_page->vm_run_prot_bits = encl_page->vm_max_prot_bits;
-
-	return encl_page;
-}
-
 static int sgx_validate_secinfo(struct sgx_secinfo *secinfo)
 {
 	u64 perm = secinfo->flags & SGX_SECINFO_PERMISSION_MASK;
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH RFC 3/3] x86/sgx: Implement EAUG population with MAP_POPULATE
  2022-03-06  5:32 ` Jarkko Sakkinen
  (?)
@ 2022-03-06  5:32   ` Jarkko Sakkinen
  -1 siblings, 0 replies; 68+ messages in thread
From: Jarkko Sakkinen @ 2022-03-06  5:32 UTC (permalink / raw)
  To: linux-mm
  Cc: Dave Hansen, Nathaniel McCallum, Reinette Chatre, Andrew Morton,
	linux-sgx, linux-kernel, Jarkko Sakkinen, H. Peter Anvin,
	Florian Fainelli, Thomas Bogendoerfer, Matthew Auld,
	Thomas Hellström, Daniel Vetter, Jason Ekstrand,
	Chris Wilson, Maarten Lankhorst, Tvrtko Ursulin,
	Greg Kroah-Hartman, Vasily Averin, Shakeel Butt, Alexey Gladkov,
	Alexander Mikhalitsyn, zhangyiru, linux-mips, intel-gfx,
	dri-devel, codalist, linux-unionfs, linux-fsdevel

With SGX1 an enclave needs to be created with its maximum memory demands
pre-allocated. Pages cannot be added to an enclave after it is initialized.
SGX2 introduces a new function, ENCLS[EAUG] for adding pages to an
initialized enclave.

Add support for dynamically adding pages to an initialized enclave with
mmap() by populating pages with EAUG. Use f_ops->populate() callback to
achieve this behaviour.

Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
---
 arch/x86/kernel/cpu/sgx/driver.c | 129 +++++++++++++++++++++++++++++++
 1 file changed, 129 insertions(+)

diff --git a/arch/x86/kernel/cpu/sgx/driver.c b/arch/x86/kernel/cpu/sgx/driver.c
index aa9b8b868867..0e97e7476076 100644
--- a/arch/x86/kernel/cpu/sgx/driver.c
+++ b/arch/x86/kernel/cpu/sgx/driver.c
@@ -9,6 +9,7 @@
 #include <asm/traps.h>
 #include "driver.h"
 #include "encl.h"
+#include "encls.h"
 
 u64 sgx_attributes_reserved_mask;
 u64 sgx_xfrm_reserved_mask = ~0x3;
@@ -101,6 +102,133 @@ static int sgx_mmap(struct file *file, struct vm_area_struct *vma)
 	return 0;
 }
 
+static int sgx_encl_augment_page(struct sgx_encl *encl, unsigned long offset)
+{
+	struct sgx_pageinfo pginfo = {0};
+	struct sgx_encl_page *encl_page;
+	struct sgx_epc_page *epc_page;
+	struct sgx_va_page *va_page;
+	u64 secinfo_flags;
+	int ret;
+
+	/*
+	 * Ignore internal permission checking for dynamically added pages.
+	 * They matter only for data added during the pre-initialization phase.
+	 * The enclave decides the permissions by the means of EACCEPT,
+	 * EACCEPTCOPY and EMODPE.
+	 */
+	secinfo_flags = SGX_SECINFO_R | SGX_SECINFO_W | SGX_SECINFO_X;
+	encl_page = sgx_encl_page_alloc(encl, offset, secinfo_flags);
+	if (IS_ERR(encl_page))
+		return PTR_ERR(encl_page);
+
+	epc_page = sgx_alloc_epc_page(encl_page, true);
+	if (IS_ERR(epc_page)) {
+		ret = PTR_ERR(epc_page);
+		goto err_alloc_epc_page;
+	}
+
+	va_page = sgx_encl_grow(encl);
+	if (IS_ERR(va_page)) {
+		ret = PTR_ERR(va_page);
+		goto err_grow;
+	}
+
+	mutex_lock(&encl->lock);
+
+	/*
+	 * Adding to encl->va_pages must be done under encl->lock.  Ditto for
+	 * deleting (via sgx_encl_shrink()) in the error path.
+	 */
+	if (va_page)
+		list_add(&va_page->list, &encl->va_pages);
+
+	/*
+	 * Insert prior to EADD in case of OOM.  EADD modifies MRENCLAVE, i.e.
+	 * can't be gracefully unwound, while failure on EADD/EXTEND is limited
+	 * to userspace errors (or kernel/hardware bugs).
+	 */
+	ret = xa_insert(&encl->page_array, PFN_DOWN(encl_page->desc),
+			encl_page, GFP_KERNEL);
+
+	/*
+	 * If ret == -EBUSY then page was created in another flow while
+	 * running without encl->lock
+	 */
+	if (ret)
+		goto err_xa_insert;
+
+	pginfo.secs = (unsigned long)sgx_get_epc_virt_addr(encl->secs.epc_page);
+	pginfo.addr = encl_page->desc & PAGE_MASK;
+	pginfo.metadata = 0;
+
+	ret = __eaug(&pginfo, sgx_get_epc_virt_addr(epc_page));
+	if (ret)
+		goto err_eaug;
+
+	encl_page->encl = encl;
+	encl_page->epc_page = epc_page;
+	encl_page->type = SGX_PAGE_TYPE_REG;
+	encl->secs_child_cnt++;
+
+	sgx_mark_page_reclaimable(encl_page->epc_page);
+
+	mutex_unlock(&encl->lock);
+
+	return 0;
+
+err_eaug:
+	xa_erase(&encl->page_array, PFN_DOWN(encl_page->desc));
+
+err_xa_insert:
+	sgx_encl_shrink(encl, va_page);
+	mutex_unlock(&encl->lock);
+
+err_grow:
+	sgx_encl_free_epc_page(epc_page);
+
+err_alloc_epc_page:
+	kfree(encl_page);
+
+	return VM_FAULT_SIGBUS;
+}
+
+/*
+ * Add new pages to the enclave sequentially with ENCLS[EAUG]. Note that
+ * sgx_mmap() validates that the given VMA is within the enclave range. Calling
+ * here sgx_encl_may_map() second time would too time consuming.
+ */
+static int sgx_populate(struct file *file, struct vm_area_struct *vma)
+{
+	unsigned long length = vma->vm_end - vma->vm_start;
+	struct sgx_encl *encl = file->private_data;
+	unsigned long start = encl->base - vma->vm_start;
+	unsigned long pos;
+	int ret;
+
+	/* EAUG works only for initialized enclaves. */
+	if (!test_bit(SGX_ENCL_INITIALIZED, &encl->flags))
+		return -EINVAL;
+
+	for (pos = 0 ; pos < length; pos += PAGE_SIZE) {
+		if (signal_pending(current)) {
+			if (!pos)
+				ret = -ERESTARTSYS;
+
+			break;
+		}
+
+		if (need_resched())
+			cond_resched();
+
+		ret = sgx_encl_augment_page(encl, start + pos);
+		if (ret)
+			break;
+	}
+
+	return ret;
+}
+
 static unsigned long sgx_get_unmapped_area(struct file *file,
 					   unsigned long addr,
 					   unsigned long len,
@@ -133,6 +261,7 @@ static const struct file_operations sgx_encl_fops = {
 	.compat_ioctl		= sgx_compat_ioctl,
 #endif
 	.mmap			= sgx_mmap,
+	.populate		= sgx_populate,
 	.get_unmapped_area	= sgx_get_unmapped_area,
 };
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH RFC 3/3] x86/sgx: Implement EAUG population with MAP_POPULATE
@ 2022-03-06  5:32   ` Jarkko Sakkinen
  0 siblings, 0 replies; 68+ messages in thread
From: Jarkko Sakkinen @ 2022-03-06  5:32 UTC (permalink / raw)
  To: linux-mm
  Cc: zhangyiru, Daniel Vetter, Dave Hansen, dri-devel, Chris Wilson,
	Jason Ekstrand, H. Peter Anvin, Alexander Mikhalitsyn,
	Florian Fainelli, linux-unionfs, codalist, Matthew Auld,
	Vasily Averin, Thomas Hellström, intel-gfx, linux-mips,
	Shakeel Butt, Reinette Chatre, linux-sgx, Thomas Bogendoerfer,
	Nathaniel McCallum, Greg Kroah-Hartman, Tvrtko Ursulin,
	linux-kernel, Jarkko Sakkinen, linux-fsdevel, Andrew Morton,
	Alexey Gladkov

With SGX1 an enclave needs to be created with its maximum memory demands
pre-allocated. Pages cannot be added to an enclave after it is initialized.
SGX2 introduces a new function, ENCLS[EAUG] for adding pages to an
initialized enclave.

Add support for dynamically adding pages to an initialized enclave with
mmap() by populating pages with EAUG. Use f_ops->populate() callback to
achieve this behaviour.

Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
---
 arch/x86/kernel/cpu/sgx/driver.c | 129 +++++++++++++++++++++++++++++++
 1 file changed, 129 insertions(+)

diff --git a/arch/x86/kernel/cpu/sgx/driver.c b/arch/x86/kernel/cpu/sgx/driver.c
index aa9b8b868867..0e97e7476076 100644
--- a/arch/x86/kernel/cpu/sgx/driver.c
+++ b/arch/x86/kernel/cpu/sgx/driver.c
@@ -9,6 +9,7 @@
 #include <asm/traps.h>
 #include "driver.h"
 #include "encl.h"
+#include "encls.h"
 
 u64 sgx_attributes_reserved_mask;
 u64 sgx_xfrm_reserved_mask = ~0x3;
@@ -101,6 +102,133 @@ static int sgx_mmap(struct file *file, struct vm_area_struct *vma)
 	return 0;
 }
 
+static int sgx_encl_augment_page(struct sgx_encl *encl, unsigned long offset)
+{
+	struct sgx_pageinfo pginfo = {0};
+	struct sgx_encl_page *encl_page;
+	struct sgx_epc_page *epc_page;
+	struct sgx_va_page *va_page;
+	u64 secinfo_flags;
+	int ret;
+
+	/*
+	 * Ignore internal permission checking for dynamically added pages.
+	 * They matter only for data added during the pre-initialization phase.
+	 * The enclave decides the permissions by the means of EACCEPT,
+	 * EACCEPTCOPY and EMODPE.
+	 */
+	secinfo_flags = SGX_SECINFO_R | SGX_SECINFO_W | SGX_SECINFO_X;
+	encl_page = sgx_encl_page_alloc(encl, offset, secinfo_flags);
+	if (IS_ERR(encl_page))
+		return PTR_ERR(encl_page);
+
+	epc_page = sgx_alloc_epc_page(encl_page, true);
+	if (IS_ERR(epc_page)) {
+		ret = PTR_ERR(epc_page);
+		goto err_alloc_epc_page;
+	}
+
+	va_page = sgx_encl_grow(encl);
+	if (IS_ERR(va_page)) {
+		ret = PTR_ERR(va_page);
+		goto err_grow;
+	}
+
+	mutex_lock(&encl->lock);
+
+	/*
+	 * Adding to encl->va_pages must be done under encl->lock.  Ditto for
+	 * deleting (via sgx_encl_shrink()) in the error path.
+	 */
+	if (va_page)
+		list_add(&va_page->list, &encl->va_pages);
+
+	/*
+	 * Insert prior to EADD in case of OOM.  EADD modifies MRENCLAVE, i.e.
+	 * can't be gracefully unwound, while failure on EADD/EXTEND is limited
+	 * to userspace errors (or kernel/hardware bugs).
+	 */
+	ret = xa_insert(&encl->page_array, PFN_DOWN(encl_page->desc),
+			encl_page, GFP_KERNEL);
+
+	/*
+	 * If ret == -EBUSY then page was created in another flow while
+	 * running without encl->lock
+	 */
+	if (ret)
+		goto err_xa_insert;
+
+	pginfo.secs = (unsigned long)sgx_get_epc_virt_addr(encl->secs.epc_page);
+	pginfo.addr = encl_page->desc & PAGE_MASK;
+	pginfo.metadata = 0;
+
+	ret = __eaug(&pginfo, sgx_get_epc_virt_addr(epc_page));
+	if (ret)
+		goto err_eaug;
+
+	encl_page->encl = encl;
+	encl_page->epc_page = epc_page;
+	encl_page->type = SGX_PAGE_TYPE_REG;
+	encl->secs_child_cnt++;
+
+	sgx_mark_page_reclaimable(encl_page->epc_page);
+
+	mutex_unlock(&encl->lock);
+
+	return 0;
+
+err_eaug:
+	xa_erase(&encl->page_array, PFN_DOWN(encl_page->desc));
+
+err_xa_insert:
+	sgx_encl_shrink(encl, va_page);
+	mutex_unlock(&encl->lock);
+
+err_grow:
+	sgx_encl_free_epc_page(epc_page);
+
+err_alloc_epc_page:
+	kfree(encl_page);
+
+	return VM_FAULT_SIGBUS;
+}
+
+/*
+ * Add new pages to the enclave sequentially with ENCLS[EAUG]. Note that
+ * sgx_mmap() validates that the given VMA is within the enclave range. Calling
+ * here sgx_encl_may_map() second time would too time consuming.
+ */
+static int sgx_populate(struct file *file, struct vm_area_struct *vma)
+{
+	unsigned long length = vma->vm_end - vma->vm_start;
+	struct sgx_encl *encl = file->private_data;
+	unsigned long start = encl->base - vma->vm_start;
+	unsigned long pos;
+	int ret;
+
+	/* EAUG works only for initialized enclaves. */
+	if (!test_bit(SGX_ENCL_INITIALIZED, &encl->flags))
+		return -EINVAL;
+
+	for (pos = 0 ; pos < length; pos += PAGE_SIZE) {
+		if (signal_pending(current)) {
+			if (!pos)
+				ret = -ERESTARTSYS;
+
+			break;
+		}
+
+		if (need_resched())
+			cond_resched();
+
+		ret = sgx_encl_augment_page(encl, start + pos);
+		if (ret)
+			break;
+	}
+
+	return ret;
+}
+
 static unsigned long sgx_get_unmapped_area(struct file *file,
 					   unsigned long addr,
 					   unsigned long len,
@@ -133,6 +261,7 @@ static const struct file_operations sgx_encl_fops = {
 	.compat_ioctl		= sgx_compat_ioctl,
 #endif
 	.mmap			= sgx_mmap,
+	.populate		= sgx_populate,
 	.get_unmapped_area	= sgx_get_unmapped_area,
 };
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [Intel-gfx] [PATCH RFC 3/3] x86/sgx: Implement EAUG population with MAP_POPULATE
@ 2022-03-06  5:32   ` Jarkko Sakkinen
  0 siblings, 0 replies; 68+ messages in thread
From: Jarkko Sakkinen @ 2022-03-06  5:32 UTC (permalink / raw)
  To: linux-mm
  Cc: zhangyiru, Daniel Vetter, Dave Hansen, dri-devel, Chris Wilson,
	H. Peter Anvin, Alexander Mikhalitsyn, Florian Fainelli,
	linux-unionfs, codalist, Matthew Auld, Vasily Averin,
	Thomas Hellström, intel-gfx, linux-mips, Shakeel Butt,
	Reinette Chatre, linux-sgx, Thomas Bogendoerfer,
	Nathaniel McCallum, Greg Kroah-Hartman, linux-kernel,
	Jarkko Sakkinen, linux-fsdevel, Andrew Morton, Alexey Gladkov

With SGX1 an enclave needs to be created with its maximum memory demands
pre-allocated. Pages cannot be added to an enclave after it is initialized.
SGX2 introduces a new function, ENCLS[EAUG] for adding pages to an
initialized enclave.

Add support for dynamically adding pages to an initialized enclave with
mmap() by populating pages with EAUG. Use f_ops->populate() callback to
achieve this behaviour.

Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
---
 arch/x86/kernel/cpu/sgx/driver.c | 129 +++++++++++++++++++++++++++++++
 1 file changed, 129 insertions(+)

diff --git a/arch/x86/kernel/cpu/sgx/driver.c b/arch/x86/kernel/cpu/sgx/driver.c
index aa9b8b868867..0e97e7476076 100644
--- a/arch/x86/kernel/cpu/sgx/driver.c
+++ b/arch/x86/kernel/cpu/sgx/driver.c
@@ -9,6 +9,7 @@
 #include <asm/traps.h>
 #include "driver.h"
 #include "encl.h"
+#include "encls.h"
 
 u64 sgx_attributes_reserved_mask;
 u64 sgx_xfrm_reserved_mask = ~0x3;
@@ -101,6 +102,133 @@ static int sgx_mmap(struct file *file, struct vm_area_struct *vma)
 	return 0;
 }
 
+static int sgx_encl_augment_page(struct sgx_encl *encl, unsigned long offset)
+{
+	struct sgx_pageinfo pginfo = {0};
+	struct sgx_encl_page *encl_page;
+	struct sgx_epc_page *epc_page;
+	struct sgx_va_page *va_page;
+	u64 secinfo_flags;
+	int ret;
+
+	/*
+	 * Ignore internal permission checking for dynamically added pages.
+	 * They matter only for data added during the pre-initialization phase.
+	 * The enclave decides the permissions by the means of EACCEPT,
+	 * EACCEPTCOPY and EMODPE.
+	 */
+	secinfo_flags = SGX_SECINFO_R | SGX_SECINFO_W | SGX_SECINFO_X;
+	encl_page = sgx_encl_page_alloc(encl, offset, secinfo_flags);
+	if (IS_ERR(encl_page))
+		return PTR_ERR(encl_page);
+
+	epc_page = sgx_alloc_epc_page(encl_page, true);
+	if (IS_ERR(epc_page)) {
+		ret = PTR_ERR(epc_page);
+		goto err_alloc_epc_page;
+	}
+
+	va_page = sgx_encl_grow(encl);
+	if (IS_ERR(va_page)) {
+		ret = PTR_ERR(va_page);
+		goto err_grow;
+	}
+
+	mutex_lock(&encl->lock);
+
+	/*
+	 * Adding to encl->va_pages must be done under encl->lock.  Ditto for
+	 * deleting (via sgx_encl_shrink()) in the error path.
+	 */
+	if (va_page)
+		list_add(&va_page->list, &encl->va_pages);
+
+	/*
+	 * Insert prior to EADD in case of OOM.  EADD modifies MRENCLAVE, i.e.
+	 * can't be gracefully unwound, while failure on EADD/EXTEND is limited
+	 * to userspace errors (or kernel/hardware bugs).
+	 */
+	ret = xa_insert(&encl->page_array, PFN_DOWN(encl_page->desc),
+			encl_page, GFP_KERNEL);
+
+	/*
+	 * If ret == -EBUSY then page was created in another flow while
+	 * running without encl->lock
+	 */
+	if (ret)
+		goto err_xa_insert;
+
+	pginfo.secs = (unsigned long)sgx_get_epc_virt_addr(encl->secs.epc_page);
+	pginfo.addr = encl_page->desc & PAGE_MASK;
+	pginfo.metadata = 0;
+
+	ret = __eaug(&pginfo, sgx_get_epc_virt_addr(epc_page));
+	if (ret)
+		goto err_eaug;
+
+	encl_page->encl = encl;
+	encl_page->epc_page = epc_page;
+	encl_page->type = SGX_PAGE_TYPE_REG;
+	encl->secs_child_cnt++;
+
+	sgx_mark_page_reclaimable(encl_page->epc_page);
+
+	mutex_unlock(&encl->lock);
+
+	return 0;
+
+err_eaug:
+	xa_erase(&encl->page_array, PFN_DOWN(encl_page->desc));
+
+err_xa_insert:
+	sgx_encl_shrink(encl, va_page);
+	mutex_unlock(&encl->lock);
+
+err_grow:
+	sgx_encl_free_epc_page(epc_page);
+
+err_alloc_epc_page:
+	kfree(encl_page);
+
+	return VM_FAULT_SIGBUS;
+}
+
+/*
+ * Add new pages to the enclave sequentially with ENCLS[EAUG]. Note that
+ * sgx_mmap() validates that the given VMA is within the enclave range. Calling
+ * here sgx_encl_may_map() second time would too time consuming.
+ */
+static int sgx_populate(struct file *file, struct vm_area_struct *vma)
+{
+	unsigned long length = vma->vm_end - vma->vm_start;
+	struct sgx_encl *encl = file->private_data;
+	unsigned long start = encl->base - vma->vm_start;
+	unsigned long pos;
+	int ret;
+
+	/* EAUG works only for initialized enclaves. */
+	if (!test_bit(SGX_ENCL_INITIALIZED, &encl->flags))
+		return -EINVAL;
+
+	for (pos = 0 ; pos < length; pos += PAGE_SIZE) {
+		if (signal_pending(current)) {
+			if (!pos)
+				ret = -ERESTARTSYS;
+
+			break;
+		}
+
+		if (need_resched())
+			cond_resched();
+
+		ret = sgx_encl_augment_page(encl, start + pos);
+		if (ret)
+			break;
+	}
+
+	return ret;
+}
+
 static unsigned long sgx_get_unmapped_area(struct file *file,
 					   unsigned long addr,
 					   unsigned long len,
@@ -133,6 +261,7 @@ static const struct file_operations sgx_encl_fops = {
 	.compat_ioctl		= sgx_compat_ioctl,
 #endif
 	.mmap			= sgx_mmap,
+	.populate		= sgx_populate,
 	.get_unmapped_area	= sgx_get_unmapped_area,
 };
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* RE: [PATCH RFC 0/3] MAP_POPULATE for device memory
  2022-03-06  5:32 ` Jarkko Sakkinen
  (?)
@ 2022-03-06  8:30   ` David Laight
  -1 siblings, 0 replies; 68+ messages in thread
From: David Laight @ 2022-03-06  8:30 UTC (permalink / raw)
  To: 'Jarkko Sakkinen', linux-mm
  Cc: Dave Hansen, Nathaniel McCallum, Reinette Chatre, Andrew Morton,
	linux-sgx, linux-kernel, Florian Fainelli, Thomas Bogendoerfer,
	Matthew Auld, Thomas Hellström, Daniel Vetter,
	Jason Ekstrand, Chris Wilson, Maarten Lankhorst,
	Greg Kroah-Hartman, Tvrtko Ursulin, Vasily Averin, Shakeel Butt,
	Michal Hocko, zhangyiru, Alexey Gladkov, Alexander Mikhalitsyn,
	linux-mips, intel-gfx, dri-devel, codalist, linux-unionfs,
	linux-fsdevel

From: Jarkko Sakkinen
> Sent: 06 March 2022 05:32
> 
> For device memory (aka VM_IO | VM_PFNMAP) MAP_POPULATE does nothing. Allow
> to use that for initializing the device memory by providing a new callback
> f_ops->populate() for the purpose.
> 
> SGX patches are provided to show the callback in context.
> 
> An obvious alternative is a ioctl but it is less elegant and requires
> two syscalls (mmap + ioctl) per memory range, instead of just one
> (mmap).

Is this all about trying to stop the vm_operations_struct.fault()
function being called?

It is pretty easy to ensure the mappings are setup in the driver's
mmap() function.
Then the fault() function can just return -VM_FAULT_SIGBUS;

If it is actually device memory you just need to call vm_iomap_memory()
That quite nicely mmap()s PCIe memory space into a user process.

Mapping driver memory is slightly more difficult.
For buffers allocated with dma_alloc_coherent() you can
probably use dma_mmap_coherent().
But I have a loop calling remap_pfn_range() because the
buffer area is made of multiple 16kB kernel buffers that
need to be mapped to contiguous user pages.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 68+ messages in thread

* RE: [PATCH RFC 0/3] MAP_POPULATE for device memory
@ 2022-03-06  8:30   ` David Laight
  0 siblings, 0 replies; 68+ messages in thread
From: David Laight @ 2022-03-06  8:30 UTC (permalink / raw)
  To: 'Jarkko Sakkinen', linux-mm
  Cc: Michal Hocko, zhangyiru, Daniel Vetter, Dave Hansen, dri-devel,
	Chris Wilson, Jason Ekstrand, Alexander Mikhalitsyn,
	Florian Fainelli, linux-unionfs, codalist, Matthew Auld,
	Vasily Averin, Thomas Hellström, intel-gfx, linux-mips,
	Shakeel Butt, Reinette Chatre, linux-sgx, Thomas Bogendoerfer,
	Nathaniel McCallum, Greg Kroah-Hartman, Tvrtko Ursulin,
	linux-kernel, linux-fsdevel, Andrew Morton, Alexey Gladkov

From: Jarkko Sakkinen
> Sent: 06 March 2022 05:32
> 
> For device memory (aka VM_IO | VM_PFNMAP) MAP_POPULATE does nothing. Allow
> to use that for initializing the device memory by providing a new callback
> f_ops->populate() for the purpose.
> 
> SGX patches are provided to show the callback in context.
> 
> An obvious alternative is a ioctl but it is less elegant and requires
> two syscalls (mmap + ioctl) per memory range, instead of just one
> (mmap).

Is this all about trying to stop the vm_operations_struct.fault()
function being called?

It is pretty easy to ensure the mappings are setup in the driver's
mmap() function.
Then the fault() function can just return -VM_FAULT_SIGBUS;

If it is actually device memory you just need to call vm_iomap_memory()
That quite nicely mmap()s PCIe memory space into a user process.

Mapping driver memory is slightly more difficult.
For buffers allocated with dma_alloc_coherent() you can
probably use dma_mmap_coherent().
But I have a loop calling remap_pfn_range() because the
buffer area is made of multiple 16kB kernel buffers that
need to be mapped to contiguous user pages.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Intel-gfx] [PATCH RFC 0/3] MAP_POPULATE for device memory
@ 2022-03-06  8:30   ` David Laight
  0 siblings, 0 replies; 68+ messages in thread
From: David Laight @ 2022-03-06  8:30 UTC (permalink / raw)
  To: 'Jarkko Sakkinen', linux-mm
  Cc: Michal Hocko, zhangyiru, Daniel Vetter, Dave Hansen, dri-devel,
	Chris Wilson, Alexander Mikhalitsyn, Florian Fainelli,
	linux-unionfs, codalist, Matthew Auld, Vasily Averin,
	Thomas Hellström, intel-gfx, linux-mips, Shakeel Butt,
	Reinette Chatre, linux-sgx, Thomas Bogendoerfer,
	Nathaniel McCallum, Greg Kroah-Hartman, linux-kernel,
	linux-fsdevel, Andrew Morton, Alexey Gladkov

From: Jarkko Sakkinen
> Sent: 06 March 2022 05:32
> 
> For device memory (aka VM_IO | VM_PFNMAP) MAP_POPULATE does nothing. Allow
> to use that for initializing the device memory by providing a new callback
> f_ops->populate() for the purpose.
> 
> SGX patches are provided to show the callback in context.
> 
> An obvious alternative is a ioctl but it is less elegant and requires
> two syscalls (mmap + ioctl) per memory range, instead of just one
> (mmap).

Is this all about trying to stop the vm_operations_struct.fault()
function being called?

It is pretty easy to ensure the mappings are setup in the driver's
mmap() function.
Then the fault() function can just return -VM_FAULT_SIGBUS;

If it is actually device memory you just need to call vm_iomap_memory()
That quite nicely mmap()s PCIe memory space into a user process.

Mapping driver memory is slightly more difficult.
For buffers allocated with dma_alloc_coherent() you can
probably use dma_mmap_coherent().
But I have a loop calling remap_pfn_range() because the
buffer area is made of multiple 16kB kernel buffers that
need to be mapped to contiguous user pages.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH RFC 1/3] mm: Add f_ops->populate()
  2022-03-06  5:32   ` Jarkko Sakkinen
  (?)
@ 2022-03-06 10:01     ` Greg Kroah-Hartman
  -1 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2022-03-06 10:01 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: linux-mm, Dave Hansen, Nathaniel McCallum, Reinette Chatre,
	Andrew Morton, linux-sgx, linux-kernel, Florian Fainelli,
	Thomas Bogendoerfer, Matthew Auld, Thomas Hellström,
	Daniel Vetter, Jason Ekstrand, Chris Wilson, Maarten Lankhorst,
	Tvrtko Ursulin, Shakeel Butt, Vasily Averin, zhangyiru,
	Alexander Mikhalitsyn, Alexey Gladkov, linux-mips, intel-gfx,
	dri-devel, codalist, linux-unionfs, linux-fsdevel

On Sun, Mar 06, 2022 at 07:32:05AM +0200, Jarkko Sakkinen wrote:
> Sometimes you might want to use MAP_POPULATE to ask a device driver to
> initialize the device memory in some specific manner. SGX driver can use
> this to request more memory by issuing ENCLS[EAUG] x86 opcode for each
> page in the address range.
> 
> Add f_ops->populate() with the same parameters as f_ops->mmap() and make
> it conditionally called inside call_mmap(). Update call sites
> accodingly.
> ---
> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
> v3:
> -       if (!ret && do_populate && file->f_op->populate)
> +       if (!ret && do_populate && file->f_op->populate &&
> +           !!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
> (reported by Matthew Wilcox)
> v2:
> -       if (!ret && do_populate)
> +       if (!ret && do_populate && file->f_op->populate)
> (reported by Jan Harkes)
> ---
>  arch/mips/kernel/vdso.c                    |  2 +-
>  drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c |  2 +-
>  fs/coda/file.c                             |  2 +-
>  fs/overlayfs/file.c                        |  2 +-
>  include/linux/fs.h                         | 12 ++++++++++--
>  include/linux/mm.h                         |  2 +-
>  ipc/shm.c                                  |  2 +-
>  mm/mmap.c                                  | 10 +++++-----
>  mm/nommu.c                                 |  4 ++--
>  9 files changed, 23 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/mips/kernel/vdso.c b/arch/mips/kernel/vdso.c
> index 3d0cf471f2fe..89f3f3da9abd 100644
> --- a/arch/mips/kernel/vdso.c
> +++ b/arch/mips/kernel/vdso.c
> @@ -102,7 +102,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
>  		base = mmap_region(NULL, STACK_TOP, PAGE_SIZE,
>  				VM_READ | VM_EXEC |
>  				VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC,
> -				0, NULL);
> +				0, NULL, false);
>  		if (IS_ERR_VALUE(base)) {
>  			ret = base;
>  			goto out;
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> index 1b526039a60d..4c71f64d6a79 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> @@ -107,7 +107,7 @@ static int i915_gem_dmabuf_mmap(struct dma_buf *dma_buf, struct vm_area_struct *
>  	if (!obj->base.filp)
>  		return -ENODEV;
>  
> -	ret = call_mmap(obj->base.filp, vma);
> +	ret = call_mmap(obj->base.filp, vma, false);
>  	if (ret)
>  		return ret;
>  
> diff --git a/fs/coda/file.c b/fs/coda/file.c
> index 29dd87be2fb8..e14f312fdbf8 100644
> --- a/fs/coda/file.c
> +++ b/fs/coda/file.c
> @@ -173,7 +173,7 @@ coda_file_mmap(struct file *coda_file, struct vm_area_struct *vma)
>  	spin_unlock(&cii->c_lock);
>  
>  	vma->vm_file = get_file(host_file);
> -	ret = call_mmap(vma->vm_file, vma);
> +	ret = call_mmap(vma->vm_file, vma, false);
>  
>  	if (ret) {
>  		/* if call_mmap fails, our caller will put host_file so we
> diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
> index fa125feed0ff..b963a9397e80 100644
> --- a/fs/overlayfs/file.c
> +++ b/fs/overlayfs/file.c
> @@ -503,7 +503,7 @@ static int ovl_mmap(struct file *file, struct vm_area_struct *vma)
>  	vma_set_file(vma, realfile);
>  
>  	old_cred = ovl_override_creds(file_inode(file)->i_sb);
> -	ret = call_mmap(vma->vm_file, vma);
> +	ret = call_mmap(vma->vm_file, vma, false);
>  	revert_creds(old_cred);
>  	ovl_file_accessed(file);
>  
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index e2d892b201b0..2909e2d14af8 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -42,6 +42,7 @@
>  #include <linux/mount.h>
>  #include <linux/cred.h>
>  #include <linux/mnt_idmapping.h>
> +#include <linux/mm.h>
>  
>  #include <asm/byteorder.h>
>  #include <uapi/linux/fs.h>
> @@ -1993,6 +1994,7 @@ struct file_operations {
>  	long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
>  	long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
>  	int (*mmap) (struct file *, struct vm_area_struct *);
> +	int (*populate)(struct file *, struct vm_area_struct *);
>  	unsigned long mmap_supported_flags;
>  	int (*open) (struct inode *, struct file *);
>  	int (*flush) (struct file *, fl_owner_t id);
> @@ -2074,9 +2076,15 @@ static inline ssize_t call_write_iter(struct file *file, struct kiocb *kio,
>  	return file->f_op->write_iter(kio, iter);
>  }
>  
> -static inline int call_mmap(struct file *file, struct vm_area_struct *vma)
> +static inline int call_mmap(struct file *file, struct vm_area_struct *vma, bool do_populate)
>  {
> -	return file->f_op->mmap(file, vma);
> +	int ret = file->f_op->mmap(file, vma);
> +
> +	if (!ret && do_populate && file->f_op->populate &&
> +	    !!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
> +		ret = file->f_op->populate(file, vma);
> +
> +	return ret;
>  }
>  
>  extern ssize_t vfs_read(struct file *, char __user *, size_t, loff_t *);
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 213cc569b192..6c8c036f423b 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2683,7 +2683,7 @@ extern unsigned long get_unmapped_area(struct file *, unsigned long, unsigned lo
>  
>  extern unsigned long mmap_region(struct file *file, unsigned long addr,
>  	unsigned long len, vm_flags_t vm_flags, unsigned long pgoff,
> -	struct list_head *uf);
> +	struct list_head *uf, bool do_populate);

As I have said many times before, don't add random boolean flags to
function arguments, as they provide no hint as to what they do at all.
When you read the code, you then have to go back and look up the
function definition here and see what exactly it means and the flow is
broken.

Make function names mean something obvious, for this, if it really is a
good idea to have this new flag (and I doubt it, but that's not my
call), then make this a mmap_region_populate() call to make it obvious
it is something different than the notmal mmap_region() call.

But as is, this is pretty horrid, don't you agree?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH RFC 1/3] mm: Add f_ops->populate()
@ 2022-03-06 10:01     ` Greg Kroah-Hartman
  0 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2022-03-06 10:01 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: zhangyiru, Daniel Vetter, Dave Hansen, dri-devel, linux-kernel,
	linux-mm, Jason Ekstrand, Alexander Mikhalitsyn,
	Florian Fainelli, linux-unionfs, codalist, Matthew Auld,
	Vasily Averin, Thomas Hellström, intel-gfx, Shakeel Butt,
	Reinette Chatre, Chris Wilson, linux-sgx, Thomas Bogendoerfer,
	Nathaniel McCallum, Tvrtko Ursulin, linux-mips, linux-fsdevel,
	Andrew Morton, Alexey Gladkov

On Sun, Mar 06, 2022 at 07:32:05AM +0200, Jarkko Sakkinen wrote:
> Sometimes you might want to use MAP_POPULATE to ask a device driver to
> initialize the device memory in some specific manner. SGX driver can use
> this to request more memory by issuing ENCLS[EAUG] x86 opcode for each
> page in the address range.
> 
> Add f_ops->populate() with the same parameters as f_ops->mmap() and make
> it conditionally called inside call_mmap(). Update call sites
> accodingly.
> ---
> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
> v3:
> -       if (!ret && do_populate && file->f_op->populate)
> +       if (!ret && do_populate && file->f_op->populate &&
> +           !!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
> (reported by Matthew Wilcox)
> v2:
> -       if (!ret && do_populate)
> +       if (!ret && do_populate && file->f_op->populate)
> (reported by Jan Harkes)
> ---
>  arch/mips/kernel/vdso.c                    |  2 +-
>  drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c |  2 +-
>  fs/coda/file.c                             |  2 +-
>  fs/overlayfs/file.c                        |  2 +-
>  include/linux/fs.h                         | 12 ++++++++++--
>  include/linux/mm.h                         |  2 +-
>  ipc/shm.c                                  |  2 +-
>  mm/mmap.c                                  | 10 +++++-----
>  mm/nommu.c                                 |  4 ++--
>  9 files changed, 23 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/mips/kernel/vdso.c b/arch/mips/kernel/vdso.c
> index 3d0cf471f2fe..89f3f3da9abd 100644
> --- a/arch/mips/kernel/vdso.c
> +++ b/arch/mips/kernel/vdso.c
> @@ -102,7 +102,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
>  		base = mmap_region(NULL, STACK_TOP, PAGE_SIZE,
>  				VM_READ | VM_EXEC |
>  				VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC,
> -				0, NULL);
> +				0, NULL, false);
>  		if (IS_ERR_VALUE(base)) {
>  			ret = base;
>  			goto out;
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> index 1b526039a60d..4c71f64d6a79 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> @@ -107,7 +107,7 @@ static int i915_gem_dmabuf_mmap(struct dma_buf *dma_buf, struct vm_area_struct *
>  	if (!obj->base.filp)
>  		return -ENODEV;
>  
> -	ret = call_mmap(obj->base.filp, vma);
> +	ret = call_mmap(obj->base.filp, vma, false);
>  	if (ret)
>  		return ret;
>  
> diff --git a/fs/coda/file.c b/fs/coda/file.c
> index 29dd87be2fb8..e14f312fdbf8 100644
> --- a/fs/coda/file.c
> +++ b/fs/coda/file.c
> @@ -173,7 +173,7 @@ coda_file_mmap(struct file *coda_file, struct vm_area_struct *vma)
>  	spin_unlock(&cii->c_lock);
>  
>  	vma->vm_file = get_file(host_file);
> -	ret = call_mmap(vma->vm_file, vma);
> +	ret = call_mmap(vma->vm_file, vma, false);
>  
>  	if (ret) {
>  		/* if call_mmap fails, our caller will put host_file so we
> diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
> index fa125feed0ff..b963a9397e80 100644
> --- a/fs/overlayfs/file.c
> +++ b/fs/overlayfs/file.c
> @@ -503,7 +503,7 @@ static int ovl_mmap(struct file *file, struct vm_area_struct *vma)
>  	vma_set_file(vma, realfile);
>  
>  	old_cred = ovl_override_creds(file_inode(file)->i_sb);
> -	ret = call_mmap(vma->vm_file, vma);
> +	ret = call_mmap(vma->vm_file, vma, false);
>  	revert_creds(old_cred);
>  	ovl_file_accessed(file);
>  
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index e2d892b201b0..2909e2d14af8 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -42,6 +42,7 @@
>  #include <linux/mount.h>
>  #include <linux/cred.h>
>  #include <linux/mnt_idmapping.h>
> +#include <linux/mm.h>
>  
>  #include <asm/byteorder.h>
>  #include <uapi/linux/fs.h>
> @@ -1993,6 +1994,7 @@ struct file_operations {
>  	long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
>  	long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
>  	int (*mmap) (struct file *, struct vm_area_struct *);
> +	int (*populate)(struct file *, struct vm_area_struct *);
>  	unsigned long mmap_supported_flags;
>  	int (*open) (struct inode *, struct file *);
>  	int (*flush) (struct file *, fl_owner_t id);
> @@ -2074,9 +2076,15 @@ static inline ssize_t call_write_iter(struct file *file, struct kiocb *kio,
>  	return file->f_op->write_iter(kio, iter);
>  }
>  
> -static inline int call_mmap(struct file *file, struct vm_area_struct *vma)
> +static inline int call_mmap(struct file *file, struct vm_area_struct *vma, bool do_populate)
>  {
> -	return file->f_op->mmap(file, vma);
> +	int ret = file->f_op->mmap(file, vma);
> +
> +	if (!ret && do_populate && file->f_op->populate &&
> +	    !!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
> +		ret = file->f_op->populate(file, vma);
> +
> +	return ret;
>  }
>  
>  extern ssize_t vfs_read(struct file *, char __user *, size_t, loff_t *);
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 213cc569b192..6c8c036f423b 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2683,7 +2683,7 @@ extern unsigned long get_unmapped_area(struct file *, unsigned long, unsigned lo
>  
>  extern unsigned long mmap_region(struct file *file, unsigned long addr,
>  	unsigned long len, vm_flags_t vm_flags, unsigned long pgoff,
> -	struct list_head *uf);
> +	struct list_head *uf, bool do_populate);

As I have said many times before, don't add random boolean flags to
function arguments, as they provide no hint as to what they do at all.
When you read the code, you then have to go back and look up the
function definition here and see what exactly it means and the flow is
broken.

Make function names mean something obvious, for this, if it really is a
good idea to have this new flag (and I doubt it, but that's not my
call), then make this a mmap_region_populate() call to make it obvious
it is something different than the notmal mmap_region() call.

But as is, this is pretty horrid, don't you agree?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Intel-gfx] [PATCH RFC 1/3] mm: Add f_ops->populate()
@ 2022-03-06 10:01     ` Greg Kroah-Hartman
  0 siblings, 0 replies; 68+ messages in thread
From: Greg Kroah-Hartman @ 2022-03-06 10:01 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: zhangyiru, Daniel Vetter, Dave Hansen, dri-devel, linux-kernel,
	linux-mm, Alexander Mikhalitsyn, Florian Fainelli, linux-unionfs,
	codalist, Matthew Auld, Vasily Averin, Thomas Hellström,
	intel-gfx, Shakeel Butt, Reinette Chatre, Chris Wilson,
	linux-sgx, Thomas Bogendoerfer, Nathaniel McCallum, linux-mips,
	linux-fsdevel, Andrew Morton, Alexey Gladkov

On Sun, Mar 06, 2022 at 07:32:05AM +0200, Jarkko Sakkinen wrote:
> Sometimes you might want to use MAP_POPULATE to ask a device driver to
> initialize the device memory in some specific manner. SGX driver can use
> this to request more memory by issuing ENCLS[EAUG] x86 opcode for each
> page in the address range.
> 
> Add f_ops->populate() with the same parameters as f_ops->mmap() and make
> it conditionally called inside call_mmap(). Update call sites
> accodingly.
> ---
> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
> v3:
> -       if (!ret && do_populate && file->f_op->populate)
> +       if (!ret && do_populate && file->f_op->populate &&
> +           !!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
> (reported by Matthew Wilcox)
> v2:
> -       if (!ret && do_populate)
> +       if (!ret && do_populate && file->f_op->populate)
> (reported by Jan Harkes)
> ---
>  arch/mips/kernel/vdso.c                    |  2 +-
>  drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c |  2 +-
>  fs/coda/file.c                             |  2 +-
>  fs/overlayfs/file.c                        |  2 +-
>  include/linux/fs.h                         | 12 ++++++++++--
>  include/linux/mm.h                         |  2 +-
>  ipc/shm.c                                  |  2 +-
>  mm/mmap.c                                  | 10 +++++-----
>  mm/nommu.c                                 |  4 ++--
>  9 files changed, 23 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/mips/kernel/vdso.c b/arch/mips/kernel/vdso.c
> index 3d0cf471f2fe..89f3f3da9abd 100644
> --- a/arch/mips/kernel/vdso.c
> +++ b/arch/mips/kernel/vdso.c
> @@ -102,7 +102,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
>  		base = mmap_region(NULL, STACK_TOP, PAGE_SIZE,
>  				VM_READ | VM_EXEC |
>  				VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC,
> -				0, NULL);
> +				0, NULL, false);
>  		if (IS_ERR_VALUE(base)) {
>  			ret = base;
>  			goto out;
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> index 1b526039a60d..4c71f64d6a79 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> @@ -107,7 +107,7 @@ static int i915_gem_dmabuf_mmap(struct dma_buf *dma_buf, struct vm_area_struct *
>  	if (!obj->base.filp)
>  		return -ENODEV;
>  
> -	ret = call_mmap(obj->base.filp, vma);
> +	ret = call_mmap(obj->base.filp, vma, false);
>  	if (ret)
>  		return ret;
>  
> diff --git a/fs/coda/file.c b/fs/coda/file.c
> index 29dd87be2fb8..e14f312fdbf8 100644
> --- a/fs/coda/file.c
> +++ b/fs/coda/file.c
> @@ -173,7 +173,7 @@ coda_file_mmap(struct file *coda_file, struct vm_area_struct *vma)
>  	spin_unlock(&cii->c_lock);
>  
>  	vma->vm_file = get_file(host_file);
> -	ret = call_mmap(vma->vm_file, vma);
> +	ret = call_mmap(vma->vm_file, vma, false);
>  
>  	if (ret) {
>  		/* if call_mmap fails, our caller will put host_file so we
> diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
> index fa125feed0ff..b963a9397e80 100644
> --- a/fs/overlayfs/file.c
> +++ b/fs/overlayfs/file.c
> @@ -503,7 +503,7 @@ static int ovl_mmap(struct file *file, struct vm_area_struct *vma)
>  	vma_set_file(vma, realfile);
>  
>  	old_cred = ovl_override_creds(file_inode(file)->i_sb);
> -	ret = call_mmap(vma->vm_file, vma);
> +	ret = call_mmap(vma->vm_file, vma, false);
>  	revert_creds(old_cred);
>  	ovl_file_accessed(file);
>  
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index e2d892b201b0..2909e2d14af8 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -42,6 +42,7 @@
>  #include <linux/mount.h>
>  #include <linux/cred.h>
>  #include <linux/mnt_idmapping.h>
> +#include <linux/mm.h>
>  
>  #include <asm/byteorder.h>
>  #include <uapi/linux/fs.h>
> @@ -1993,6 +1994,7 @@ struct file_operations {
>  	long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
>  	long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
>  	int (*mmap) (struct file *, struct vm_area_struct *);
> +	int (*populate)(struct file *, struct vm_area_struct *);
>  	unsigned long mmap_supported_flags;
>  	int (*open) (struct inode *, struct file *);
>  	int (*flush) (struct file *, fl_owner_t id);
> @@ -2074,9 +2076,15 @@ static inline ssize_t call_write_iter(struct file *file, struct kiocb *kio,
>  	return file->f_op->write_iter(kio, iter);
>  }
>  
> -static inline int call_mmap(struct file *file, struct vm_area_struct *vma)
> +static inline int call_mmap(struct file *file, struct vm_area_struct *vma, bool do_populate)
>  {
> -	return file->f_op->mmap(file, vma);
> +	int ret = file->f_op->mmap(file, vma);
> +
> +	if (!ret && do_populate && file->f_op->populate &&
> +	    !!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
> +		ret = file->f_op->populate(file, vma);
> +
> +	return ret;
>  }
>  
>  extern ssize_t vfs_read(struct file *, char __user *, size_t, loff_t *);
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 213cc569b192..6c8c036f423b 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2683,7 +2683,7 @@ extern unsigned long get_unmapped_area(struct file *, unsigned long, unsigned lo
>  
>  extern unsigned long mmap_region(struct file *file, unsigned long addr,
>  	unsigned long len, vm_flags_t vm_flags, unsigned long pgoff,
> -	struct list_head *uf);
> +	struct list_head *uf, bool do_populate);

As I have said many times before, don't add random boolean flags to
function arguments, as they provide no hint as to what they do at all.
When you read the code, you then have to go back and look up the
function definition here and see what exactly it means and the flow is
broken.

Make function names mean something obvious, for this, if it really is a
good idea to have this new flag (and I doubt it, but that's not my
call), then make this a mmap_region_populate() call to make it obvious
it is something different than the notmal mmap_region() call.

But as is, this is pretty horrid, don't you agree?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH RFC 0/3] MAP_POPULATE for device memory
  2022-03-06  5:32 ` Jarkko Sakkinen
  (?)
@ 2022-03-06 11:33   ` Matthew Wilcox
  -1 siblings, 0 replies; 68+ messages in thread
From: Matthew Wilcox @ 2022-03-06 11:33 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: linux-mm, Dave Hansen, Nathaniel McCallum, Reinette Chatre,
	Andrew Morton, linux-sgx, linux-kernel, Florian Fainelli,
	Thomas Bogendoerfer, Matthew Auld, Thomas Hellström,
	Daniel Vetter, Jason Ekstrand, Chris Wilson, Maarten Lankhorst,
	Greg Kroah-Hartman, Tvrtko Ursulin, Vasily Averin, Shakeel Butt,
	Michal Hocko, zhangyiru, Alexey Gladkov, Alexander Mikhalitsyn,
	linux-mips, intel-gfx, dri-devel, codalist, linux-unionfs,
	linux-fsdevel

On Sun, Mar 06, 2022 at 07:32:04AM +0200, Jarkko Sakkinen wrote:
> For device memory (aka VM_IO | VM_PFNMAP) MAP_POPULATE does nothing. Allow
> to use that for initializing the device memory by providing a new callback
> f_ops->populate() for the purpose.

As I said, NAK.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH RFC 0/3] MAP_POPULATE for device memory
@ 2022-03-06 11:33   ` Matthew Wilcox
  0 siblings, 0 replies; 68+ messages in thread
From: Matthew Wilcox @ 2022-03-06 11:33 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Michal Hocko, zhangyiru, Daniel Vetter, Dave Hansen, dri-devel,
	Chris Wilson, linux-mm, Jason Ekstrand, Alexander Mikhalitsyn,
	Florian Fainelli, linux-unionfs, codalist, Matthew Auld,
	Vasily Averin, Thomas Hellström, intel-gfx, linux-mips,
	Shakeel Butt, Reinette Chatre, linux-sgx, Thomas Bogendoerfer,
	Nathaniel McCallum, Greg Kroah-Hartman, Tvrtko Ursulin,
	linux-kernel, linux-fsdevel, Andrew Morton, Alexey Gladkov

On Sun, Mar 06, 2022 at 07:32:04AM +0200, Jarkko Sakkinen wrote:
> For device memory (aka VM_IO | VM_PFNMAP) MAP_POPULATE does nothing. Allow
> to use that for initializing the device memory by providing a new callback
> f_ops->populate() for the purpose.

As I said, NAK.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Intel-gfx] [PATCH RFC 0/3] MAP_POPULATE for device memory
@ 2022-03-06 11:33   ` Matthew Wilcox
  0 siblings, 0 replies; 68+ messages in thread
From: Matthew Wilcox @ 2022-03-06 11:33 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Michal Hocko, zhangyiru, Daniel Vetter, Dave Hansen, dri-devel,
	Chris Wilson, linux-mm, Alexander Mikhalitsyn, Florian Fainelli,
	linux-unionfs, codalist, Matthew Auld, Vasily Averin,
	Thomas Hellström, intel-gfx, linux-mips, Shakeel Butt,
	Reinette Chatre, linux-sgx, Thomas Bogendoerfer,
	Nathaniel McCallum, Greg Kroah-Hartman, linux-kernel,
	linux-fsdevel, Andrew Morton, Alexey Gladkov

On Sun, Mar 06, 2022 at 07:32:04AM +0200, Jarkko Sakkinen wrote:
> For device memory (aka VM_IO | VM_PFNMAP) MAP_POPULATE does nothing. Allow
> to use that for initializing the device memory by providing a new callback
> f_ops->populate() for the purpose.

As I said, NAK.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH RFC 0/3] MAP_POPULATE for device memory
  2022-03-06  8:30   ` David Laight
  (?)
@ 2022-03-06 16:52     ` 'Jarkko Sakkinen'
  -1 siblings, 0 replies; 68+ messages in thread
From: 'Jarkko Sakkinen' @ 2022-03-06 16:52 UTC (permalink / raw)
  To: David Laight
  Cc: linux-mm, Dave Hansen, Nathaniel McCallum, Reinette Chatre,
	Andrew Morton, linux-sgx, linux-kernel, Florian Fainelli,
	Thomas Bogendoerfer, Matthew Auld, Thomas Hellström,
	Daniel Vetter, Jason Ekstrand, Chris Wilson, Maarten Lankhorst,
	Greg Kroah-Hartman, Tvrtko Ursulin, Vasily Averin, Shakeel Butt,
	Michal Hocko, zhangyiru, Alexey Gladkov, Alexander Mikhalitsyn,
	linux-mips, intel-gfx, dri-devel, codalist, linux-unionfs,
	linux-fsdevel

On Sun, Mar 06, 2022 at 08:30:14AM +0000, David Laight wrote:
> From: Jarkko Sakkinen
> > Sent: 06 March 2022 05:32
> > 
> > For device memory (aka VM_IO | VM_PFNMAP) MAP_POPULATE does nothing. Allow
> > to use that for initializing the device memory by providing a new callback
> > f_ops->populate() for the purpose.
> > 
> > SGX patches are provided to show the callback in context.
> > 
> > An obvious alternative is a ioctl but it is less elegant and requires
> > two syscalls (mmap + ioctl) per memory range, instead of just one
> > (mmap).
> 
> Is this all about trying to stop the vm_operations_struct.fault()
> function being called?

In SGX protected memory is actually encrypted normal memory and CPU access
control semantics (marked as reserved, e.g. struct page's).

In SGX you need call ENCLS[EAUG] outside the protected memory to add new
pages to the protected memory. Then when CPU is executing inside this
protected memory, also known as enclaves, it commits the memory as part of
the enclave either with ENCLU[EACCEPT] or ENCLU[EACCEPTCOPY].

So the point is not prevent page faults but to prepare the memory for
pending state so that the enclave can then accept them without round-trips,
and in some cases thus improve performance (in our case in enarx.dev
platform that we are developing).

In fact, #PF handler in SGX driver in the current SGX2 patch set also does
EAUG on-demand. Optimal is to have both routes available. And said, this
can be of course also implemented as ioctl.

BR, Jarkko

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH RFC 0/3] MAP_POPULATE for device memory
@ 2022-03-06 16:52     ` 'Jarkko Sakkinen'
  0 siblings, 0 replies; 68+ messages in thread
From: 'Jarkko Sakkinen' @ 2022-03-06 16:52 UTC (permalink / raw)
  To: David Laight
  Cc: Michal Hocko, zhangyiru, Daniel Vetter, Dave Hansen, dri-devel,
	Chris Wilson, linux-mm, Jason Ekstrand, Alexander Mikhalitsyn,
	Florian Fainelli, linux-unionfs, codalist, Matthew Auld,
	Vasily Averin, Thomas Hellström, intel-gfx, linux-mips,
	Shakeel Butt, Reinette Chatre, linux-sgx, Thomas Bogendoerfer,
	Nathaniel McCallum, Greg Kroah-Hartman, Tvrtko Ursulin,
	linux-kernel, linux-fsdevel, Andrew Morton, Alexey Gladkov

On Sun, Mar 06, 2022 at 08:30:14AM +0000, David Laight wrote:
> From: Jarkko Sakkinen
> > Sent: 06 March 2022 05:32
> > 
> > For device memory (aka VM_IO | VM_PFNMAP) MAP_POPULATE does nothing. Allow
> > to use that for initializing the device memory by providing a new callback
> > f_ops->populate() for the purpose.
> > 
> > SGX patches are provided to show the callback in context.
> > 
> > An obvious alternative is a ioctl but it is less elegant and requires
> > two syscalls (mmap + ioctl) per memory range, instead of just one
> > (mmap).
> 
> Is this all about trying to stop the vm_operations_struct.fault()
> function being called?

In SGX protected memory is actually encrypted normal memory and CPU access
control semantics (marked as reserved, e.g. struct page's).

In SGX you need call ENCLS[EAUG] outside the protected memory to add new
pages to the protected memory. Then when CPU is executing inside this
protected memory, also known as enclaves, it commits the memory as part of
the enclave either with ENCLU[EACCEPT] or ENCLU[EACCEPTCOPY].

So the point is not prevent page faults but to prepare the memory for
pending state so that the enclave can then accept them without round-trips,
and in some cases thus improve performance (in our case in enarx.dev
platform that we are developing).

In fact, #PF handler in SGX driver in the current SGX2 patch set also does
EAUG on-demand. Optimal is to have both routes available. And said, this
can be of course also implemented as ioctl.

BR, Jarkko

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Intel-gfx] [PATCH RFC 0/3] MAP_POPULATE for device memory
@ 2022-03-06 16:52     ` 'Jarkko Sakkinen'
  0 siblings, 0 replies; 68+ messages in thread
From: 'Jarkko Sakkinen' @ 2022-03-06 16:52 UTC (permalink / raw)
  To: David Laight
  Cc: Michal Hocko, zhangyiru, Daniel Vetter, Dave Hansen, dri-devel,
	Chris Wilson, linux-mm, Alexander Mikhalitsyn, Florian Fainelli,
	linux-unionfs, codalist, Matthew Auld, Vasily Averin,
	Thomas Hellström, intel-gfx, linux-mips, Shakeel Butt,
	Reinette Chatre, linux-sgx, Thomas Bogendoerfer,
	Nathaniel McCallum, Greg Kroah-Hartman, linux-kernel,
	linux-fsdevel, Andrew Morton, Alexey Gladkov

On Sun, Mar 06, 2022 at 08:30:14AM +0000, David Laight wrote:
> From: Jarkko Sakkinen
> > Sent: 06 March 2022 05:32
> > 
> > For device memory (aka VM_IO | VM_PFNMAP) MAP_POPULATE does nothing. Allow
> > to use that for initializing the device memory by providing a new callback
> > f_ops->populate() for the purpose.
> > 
> > SGX patches are provided to show the callback in context.
> > 
> > An obvious alternative is a ioctl but it is less elegant and requires
> > two syscalls (mmap + ioctl) per memory range, instead of just one
> > (mmap).
> 
> Is this all about trying to stop the vm_operations_struct.fault()
> function being called?

In SGX protected memory is actually encrypted normal memory and CPU access
control semantics (marked as reserved, e.g. struct page's).

In SGX you need call ENCLS[EAUG] outside the protected memory to add new
pages to the protected memory. Then when CPU is executing inside this
protected memory, also known as enclaves, it commits the memory as part of
the enclave either with ENCLU[EACCEPT] or ENCLU[EACCEPTCOPY].

So the point is not prevent page faults but to prepare the memory for
pending state so that the enclave can then accept them without round-trips,
and in some cases thus improve performance (in our case in enarx.dev
platform that we are developing).

In fact, #PF handler in SGX driver in the current SGX2 patch set also does
EAUG on-demand. Optimal is to have both routes available. And said, this
can be of course also implemented as ioctl.

BR, Jarkko

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH RFC 1/3] mm: Add f_ops->populate()
  2022-03-06 10:01     ` Greg Kroah-Hartman
  (?)
@ 2022-03-06 17:02       ` Jarkko Sakkinen
  -1 siblings, 0 replies; 68+ messages in thread
From: Jarkko Sakkinen @ 2022-03-06 17:02 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: linux-mm, Dave Hansen, Nathaniel McCallum, Reinette Chatre,
	Andrew Morton, linux-sgx, linux-kernel, Florian Fainelli,
	Thomas Bogendoerfer, Matthew Auld, Thomas Hellström,
	Daniel Vetter, Jason Ekstrand, Chris Wilson, Maarten Lankhorst,
	Tvrtko Ursulin, Shakeel Butt, Vasily Averin, zhangyiru,
	Alexander Mikhalitsyn, Alexey Gladkov, linux-mips, intel-gfx,
	dri-devel, codalist, linux-unionfs, linux-fsdevel

On Sun, Mar 06, 2022 at 11:01:36AM +0100, Greg Kroah-Hartman wrote:
> On Sun, Mar 06, 2022 at 07:32:05AM +0200, Jarkko Sakkinen wrote:
> > Sometimes you might want to use MAP_POPULATE to ask a device driver to
> > initialize the device memory in some specific manner. SGX driver can use
> > this to request more memory by issuing ENCLS[EAUG] x86 opcode for each
> > page in the address range.
> > 
> > Add f_ops->populate() with the same parameters as f_ops->mmap() and make
> > it conditionally called inside call_mmap(). Update call sites
> > accodingly.
> > ---
> > Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
> > v3:
> > -       if (!ret && do_populate && file->f_op->populate)
> > +       if (!ret && do_populate && file->f_op->populate &&
> > +           !!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
> > (reported by Matthew Wilcox)
> > v2:
> > -       if (!ret && do_populate)
> > +       if (!ret && do_populate && file->f_op->populate)
> > (reported by Jan Harkes)
> > ---
> >  arch/mips/kernel/vdso.c                    |  2 +-
> >  drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c |  2 +-
> >  fs/coda/file.c                             |  2 +-
> >  fs/overlayfs/file.c                        |  2 +-
> >  include/linux/fs.h                         | 12 ++++++++++--
> >  include/linux/mm.h                         |  2 +-
> >  ipc/shm.c                                  |  2 +-
> >  mm/mmap.c                                  | 10 +++++-----
> >  mm/nommu.c                                 |  4 ++--
> >  9 files changed, 23 insertions(+), 15 deletions(-)
> > 
> > diff --git a/arch/mips/kernel/vdso.c b/arch/mips/kernel/vdso.c
> > index 3d0cf471f2fe..89f3f3da9abd 100644
> > --- a/arch/mips/kernel/vdso.c
> > +++ b/arch/mips/kernel/vdso.c
> > @@ -102,7 +102,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
> >  		base = mmap_region(NULL, STACK_TOP, PAGE_SIZE,
> >  				VM_READ | VM_EXEC |
> >  				VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC,
> > -				0, NULL);
> > +				0, NULL, false);
> >  		if (IS_ERR_VALUE(base)) {
> >  			ret = base;
> >  			goto out;
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> > index 1b526039a60d..4c71f64d6a79 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> > @@ -107,7 +107,7 @@ static int i915_gem_dmabuf_mmap(struct dma_buf *dma_buf, struct vm_area_struct *
> >  	if (!obj->base.filp)
> >  		return -ENODEV;
> >  
> > -	ret = call_mmap(obj->base.filp, vma);
> > +	ret = call_mmap(obj->base.filp, vma, false);
> >  	if (ret)
> >  		return ret;
> >  
> > diff --git a/fs/coda/file.c b/fs/coda/file.c
> > index 29dd87be2fb8..e14f312fdbf8 100644
> > --- a/fs/coda/file.c
> > +++ b/fs/coda/file.c
> > @@ -173,7 +173,7 @@ coda_file_mmap(struct file *coda_file, struct vm_area_struct *vma)
> >  	spin_unlock(&cii->c_lock);
> >  
> >  	vma->vm_file = get_file(host_file);
> > -	ret = call_mmap(vma->vm_file, vma);
> > +	ret = call_mmap(vma->vm_file, vma, false);
> >  
> >  	if (ret) {
> >  		/* if call_mmap fails, our caller will put host_file so we
> > diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
> > index fa125feed0ff..b963a9397e80 100644
> > --- a/fs/overlayfs/file.c
> > +++ b/fs/overlayfs/file.c
> > @@ -503,7 +503,7 @@ static int ovl_mmap(struct file *file, struct vm_area_struct *vma)
> >  	vma_set_file(vma, realfile);
> >  
> >  	old_cred = ovl_override_creds(file_inode(file)->i_sb);
> > -	ret = call_mmap(vma->vm_file, vma);
> > +	ret = call_mmap(vma->vm_file, vma, false);
> >  	revert_creds(old_cred);
> >  	ovl_file_accessed(file);
> >  
> > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > index e2d892b201b0..2909e2d14af8 100644
> > --- a/include/linux/fs.h
> > +++ b/include/linux/fs.h
> > @@ -42,6 +42,7 @@
> >  #include <linux/mount.h>
> >  #include <linux/cred.h>
> >  #include <linux/mnt_idmapping.h>
> > +#include <linux/mm.h>
> >  
> >  #include <asm/byteorder.h>
> >  #include <uapi/linux/fs.h>
> > @@ -1993,6 +1994,7 @@ struct file_operations {
> >  	long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
> >  	long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
> >  	int (*mmap) (struct file *, struct vm_area_struct *);
> > +	int (*populate)(struct file *, struct vm_area_struct *);
> >  	unsigned long mmap_supported_flags;
> >  	int (*open) (struct inode *, struct file *);
> >  	int (*flush) (struct file *, fl_owner_t id);
> > @@ -2074,9 +2076,15 @@ static inline ssize_t call_write_iter(struct file *file, struct kiocb *kio,
> >  	return file->f_op->write_iter(kio, iter);
> >  }
> >  
> > -static inline int call_mmap(struct file *file, struct vm_area_struct *vma)
> > +static inline int call_mmap(struct file *file, struct vm_area_struct *vma, bool do_populate)
> >  {
> > -	return file->f_op->mmap(file, vma);
> > +	int ret = file->f_op->mmap(file, vma);
> > +
> > +	if (!ret && do_populate && file->f_op->populate &&
> > +	    !!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
> > +		ret = file->f_op->populate(file, vma);
> > +
> > +	return ret;
> >  }
> >  
> >  extern ssize_t vfs_read(struct file *, char __user *, size_t, loff_t *);
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 213cc569b192..6c8c036f423b 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -2683,7 +2683,7 @@ extern unsigned long get_unmapped_area(struct file *, unsigned long, unsigned lo
> >  
> >  extern unsigned long mmap_region(struct file *file, unsigned long addr,
> >  	unsigned long len, vm_flags_t vm_flags, unsigned long pgoff,
> > -	struct list_head *uf);
> > +	struct list_head *uf, bool do_populate);
> 
> As I have said many times before, don't add random boolean flags to
> function arguments, as they provide no hint as to what they do at all.
> When you read the code, you then have to go back and look up the
> function definition here and see what exactly it means and the flow is
> broken.
> 
> Make function names mean something obvious, for this, if it really is a
> good idea to have this new flag (and I doubt it, but that's not my
> call), then make this a mmap_region_populate() call to make it obvious
> it is something different than the notmal mmap_region() call.

I can create:

* mmap_region_populate()
* call_mmap_populate()

This would localize the changes and leave out those boolean parameters.

> But as is, this is pretty horrid, don't you agree?

So can I conclude from this that in general having populate available for
device memory is something horrid, or just the implementation path?

That's the main reason why I made this RFC patch set, to get clear answer
to that question. I.e. if it is in general sense a bad idea, I'll just
create ioctl. If it is the implementation, I'll try to improve it.

Otherwise, I don't know whether or not it is good idea to include such
patch into the main SGX2 patch set. No means enforcibl tryy to push support
IO memory populate.

> thanks,
> 
> greg k-h

BR, Jarkko

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH RFC 1/3] mm: Add f_ops->populate()
@ 2022-03-06 17:02       ` Jarkko Sakkinen
  0 siblings, 0 replies; 68+ messages in thread
From: Jarkko Sakkinen @ 2022-03-06 17:02 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: zhangyiru, Daniel Vetter, Dave Hansen, dri-devel, linux-kernel,
	linux-mm, Jason Ekstrand, Alexander Mikhalitsyn,
	Florian Fainelli, linux-unionfs, codalist, Matthew Auld,
	Vasily Averin, Thomas Hellström, intel-gfx, Shakeel Butt,
	Reinette Chatre, Chris Wilson, linux-sgx, Thomas Bogendoerfer,
	Nathaniel McCallum, Tvrtko Ursulin, linux-mips, linux-fsdevel,
	Andrew Morton, Alexey Gladkov

On Sun, Mar 06, 2022 at 11:01:36AM +0100, Greg Kroah-Hartman wrote:
> On Sun, Mar 06, 2022 at 07:32:05AM +0200, Jarkko Sakkinen wrote:
> > Sometimes you might want to use MAP_POPULATE to ask a device driver to
> > initialize the device memory in some specific manner. SGX driver can use
> > this to request more memory by issuing ENCLS[EAUG] x86 opcode for each
> > page in the address range.
> > 
> > Add f_ops->populate() with the same parameters as f_ops->mmap() and make
> > it conditionally called inside call_mmap(). Update call sites
> > accodingly.
> > ---
> > Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
> > v3:
> > -       if (!ret && do_populate && file->f_op->populate)
> > +       if (!ret && do_populate && file->f_op->populate &&
> > +           !!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
> > (reported by Matthew Wilcox)
> > v2:
> > -       if (!ret && do_populate)
> > +       if (!ret && do_populate && file->f_op->populate)
> > (reported by Jan Harkes)
> > ---
> >  arch/mips/kernel/vdso.c                    |  2 +-
> >  drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c |  2 +-
> >  fs/coda/file.c                             |  2 +-
> >  fs/overlayfs/file.c                        |  2 +-
> >  include/linux/fs.h                         | 12 ++++++++++--
> >  include/linux/mm.h                         |  2 +-
> >  ipc/shm.c                                  |  2 +-
> >  mm/mmap.c                                  | 10 +++++-----
> >  mm/nommu.c                                 |  4 ++--
> >  9 files changed, 23 insertions(+), 15 deletions(-)
> > 
> > diff --git a/arch/mips/kernel/vdso.c b/arch/mips/kernel/vdso.c
> > index 3d0cf471f2fe..89f3f3da9abd 100644
> > --- a/arch/mips/kernel/vdso.c
> > +++ b/arch/mips/kernel/vdso.c
> > @@ -102,7 +102,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
> >  		base = mmap_region(NULL, STACK_TOP, PAGE_SIZE,
> >  				VM_READ | VM_EXEC |
> >  				VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC,
> > -				0, NULL);
> > +				0, NULL, false);
> >  		if (IS_ERR_VALUE(base)) {
> >  			ret = base;
> >  			goto out;
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> > index 1b526039a60d..4c71f64d6a79 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> > @@ -107,7 +107,7 @@ static int i915_gem_dmabuf_mmap(struct dma_buf *dma_buf, struct vm_area_struct *
> >  	if (!obj->base.filp)
> >  		return -ENODEV;
> >  
> > -	ret = call_mmap(obj->base.filp, vma);
> > +	ret = call_mmap(obj->base.filp, vma, false);
> >  	if (ret)
> >  		return ret;
> >  
> > diff --git a/fs/coda/file.c b/fs/coda/file.c
> > index 29dd87be2fb8..e14f312fdbf8 100644
> > --- a/fs/coda/file.c
> > +++ b/fs/coda/file.c
> > @@ -173,7 +173,7 @@ coda_file_mmap(struct file *coda_file, struct vm_area_struct *vma)
> >  	spin_unlock(&cii->c_lock);
> >  
> >  	vma->vm_file = get_file(host_file);
> > -	ret = call_mmap(vma->vm_file, vma);
> > +	ret = call_mmap(vma->vm_file, vma, false);
> >  
> >  	if (ret) {
> >  		/* if call_mmap fails, our caller will put host_file so we
> > diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
> > index fa125feed0ff..b963a9397e80 100644
> > --- a/fs/overlayfs/file.c
> > +++ b/fs/overlayfs/file.c
> > @@ -503,7 +503,7 @@ static int ovl_mmap(struct file *file, struct vm_area_struct *vma)
> >  	vma_set_file(vma, realfile);
> >  
> >  	old_cred = ovl_override_creds(file_inode(file)->i_sb);
> > -	ret = call_mmap(vma->vm_file, vma);
> > +	ret = call_mmap(vma->vm_file, vma, false);
> >  	revert_creds(old_cred);
> >  	ovl_file_accessed(file);
> >  
> > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > index e2d892b201b0..2909e2d14af8 100644
> > --- a/include/linux/fs.h
> > +++ b/include/linux/fs.h
> > @@ -42,6 +42,7 @@
> >  #include <linux/mount.h>
> >  #include <linux/cred.h>
> >  #include <linux/mnt_idmapping.h>
> > +#include <linux/mm.h>
> >  
> >  #include <asm/byteorder.h>
> >  #include <uapi/linux/fs.h>
> > @@ -1993,6 +1994,7 @@ struct file_operations {
> >  	long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
> >  	long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
> >  	int (*mmap) (struct file *, struct vm_area_struct *);
> > +	int (*populate)(struct file *, struct vm_area_struct *);
> >  	unsigned long mmap_supported_flags;
> >  	int (*open) (struct inode *, struct file *);
> >  	int (*flush) (struct file *, fl_owner_t id);
> > @@ -2074,9 +2076,15 @@ static inline ssize_t call_write_iter(struct file *file, struct kiocb *kio,
> >  	return file->f_op->write_iter(kio, iter);
> >  }
> >  
> > -static inline int call_mmap(struct file *file, struct vm_area_struct *vma)
> > +static inline int call_mmap(struct file *file, struct vm_area_struct *vma, bool do_populate)
> >  {
> > -	return file->f_op->mmap(file, vma);
> > +	int ret = file->f_op->mmap(file, vma);
> > +
> > +	if (!ret && do_populate && file->f_op->populate &&
> > +	    !!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
> > +		ret = file->f_op->populate(file, vma);
> > +
> > +	return ret;
> >  }
> >  
> >  extern ssize_t vfs_read(struct file *, char __user *, size_t, loff_t *);
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 213cc569b192..6c8c036f423b 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -2683,7 +2683,7 @@ extern unsigned long get_unmapped_area(struct file *, unsigned long, unsigned lo
> >  
> >  extern unsigned long mmap_region(struct file *file, unsigned long addr,
> >  	unsigned long len, vm_flags_t vm_flags, unsigned long pgoff,
> > -	struct list_head *uf);
> > +	struct list_head *uf, bool do_populate);
> 
> As I have said many times before, don't add random boolean flags to
> function arguments, as they provide no hint as to what they do at all.
> When you read the code, you then have to go back and look up the
> function definition here and see what exactly it means and the flow is
> broken.
> 
> Make function names mean something obvious, for this, if it really is a
> good idea to have this new flag (and I doubt it, but that's not my
> call), then make this a mmap_region_populate() call to make it obvious
> it is something different than the notmal mmap_region() call.

I can create:

* mmap_region_populate()
* call_mmap_populate()

This would localize the changes and leave out those boolean parameters.

> But as is, this is pretty horrid, don't you agree?

So can I conclude from this that in general having populate available for
device memory is something horrid, or just the implementation path?

That's the main reason why I made this RFC patch set, to get clear answer
to that question. I.e. if it is in general sense a bad idea, I'll just
create ioctl. If it is the implementation, I'll try to improve it.

Otherwise, I don't know whether or not it is good idea to include such
patch into the main SGX2 patch set. No means enforcibl tryy to push support
IO memory populate.

> thanks,
> 
> greg k-h

BR, Jarkko

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Intel-gfx] [PATCH RFC 1/3] mm: Add f_ops->populate()
@ 2022-03-06 17:02       ` Jarkko Sakkinen
  0 siblings, 0 replies; 68+ messages in thread
From: Jarkko Sakkinen @ 2022-03-06 17:02 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: zhangyiru, Daniel Vetter, Dave Hansen, dri-devel, linux-kernel,
	linux-mm, Alexander Mikhalitsyn, Florian Fainelli, linux-unionfs,
	codalist, Matthew Auld, Vasily Averin, Thomas Hellström,
	intel-gfx, Shakeel Butt, Reinette Chatre, Chris Wilson,
	linux-sgx, Thomas Bogendoerfer, Nathaniel McCallum, linux-mips,
	linux-fsdevel, Andrew Morton, Alexey Gladkov

On Sun, Mar 06, 2022 at 11:01:36AM +0100, Greg Kroah-Hartman wrote:
> On Sun, Mar 06, 2022 at 07:32:05AM +0200, Jarkko Sakkinen wrote:
> > Sometimes you might want to use MAP_POPULATE to ask a device driver to
> > initialize the device memory in some specific manner. SGX driver can use
> > this to request more memory by issuing ENCLS[EAUG] x86 opcode for each
> > page in the address range.
> > 
> > Add f_ops->populate() with the same parameters as f_ops->mmap() and make
> > it conditionally called inside call_mmap(). Update call sites
> > accodingly.
> > ---
> > Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
> > v3:
> > -       if (!ret && do_populate && file->f_op->populate)
> > +       if (!ret && do_populate && file->f_op->populate &&
> > +           !!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
> > (reported by Matthew Wilcox)
> > v2:
> > -       if (!ret && do_populate)
> > +       if (!ret && do_populate && file->f_op->populate)
> > (reported by Jan Harkes)
> > ---
> >  arch/mips/kernel/vdso.c                    |  2 +-
> >  drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c |  2 +-
> >  fs/coda/file.c                             |  2 +-
> >  fs/overlayfs/file.c                        |  2 +-
> >  include/linux/fs.h                         | 12 ++++++++++--
> >  include/linux/mm.h                         |  2 +-
> >  ipc/shm.c                                  |  2 +-
> >  mm/mmap.c                                  | 10 +++++-----
> >  mm/nommu.c                                 |  4 ++--
> >  9 files changed, 23 insertions(+), 15 deletions(-)
> > 
> > diff --git a/arch/mips/kernel/vdso.c b/arch/mips/kernel/vdso.c
> > index 3d0cf471f2fe..89f3f3da9abd 100644
> > --- a/arch/mips/kernel/vdso.c
> > +++ b/arch/mips/kernel/vdso.c
> > @@ -102,7 +102,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
> >  		base = mmap_region(NULL, STACK_TOP, PAGE_SIZE,
> >  				VM_READ | VM_EXEC |
> >  				VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC,
> > -				0, NULL);
> > +				0, NULL, false);
> >  		if (IS_ERR_VALUE(base)) {
> >  			ret = base;
> >  			goto out;
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> > index 1b526039a60d..4c71f64d6a79 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> > @@ -107,7 +107,7 @@ static int i915_gem_dmabuf_mmap(struct dma_buf *dma_buf, struct vm_area_struct *
> >  	if (!obj->base.filp)
> >  		return -ENODEV;
> >  
> > -	ret = call_mmap(obj->base.filp, vma);
> > +	ret = call_mmap(obj->base.filp, vma, false);
> >  	if (ret)
> >  		return ret;
> >  
> > diff --git a/fs/coda/file.c b/fs/coda/file.c
> > index 29dd87be2fb8..e14f312fdbf8 100644
> > --- a/fs/coda/file.c
> > +++ b/fs/coda/file.c
> > @@ -173,7 +173,7 @@ coda_file_mmap(struct file *coda_file, struct vm_area_struct *vma)
> >  	spin_unlock(&cii->c_lock);
> >  
> >  	vma->vm_file = get_file(host_file);
> > -	ret = call_mmap(vma->vm_file, vma);
> > +	ret = call_mmap(vma->vm_file, vma, false);
> >  
> >  	if (ret) {
> >  		/* if call_mmap fails, our caller will put host_file so we
> > diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
> > index fa125feed0ff..b963a9397e80 100644
> > --- a/fs/overlayfs/file.c
> > +++ b/fs/overlayfs/file.c
> > @@ -503,7 +503,7 @@ static int ovl_mmap(struct file *file, struct vm_area_struct *vma)
> >  	vma_set_file(vma, realfile);
> >  
> >  	old_cred = ovl_override_creds(file_inode(file)->i_sb);
> > -	ret = call_mmap(vma->vm_file, vma);
> > +	ret = call_mmap(vma->vm_file, vma, false);
> >  	revert_creds(old_cred);
> >  	ovl_file_accessed(file);
> >  
> > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > index e2d892b201b0..2909e2d14af8 100644
> > --- a/include/linux/fs.h
> > +++ b/include/linux/fs.h
> > @@ -42,6 +42,7 @@
> >  #include <linux/mount.h>
> >  #include <linux/cred.h>
> >  #include <linux/mnt_idmapping.h>
> > +#include <linux/mm.h>
> >  
> >  #include <asm/byteorder.h>
> >  #include <uapi/linux/fs.h>
> > @@ -1993,6 +1994,7 @@ struct file_operations {
> >  	long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
> >  	long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
> >  	int (*mmap) (struct file *, struct vm_area_struct *);
> > +	int (*populate)(struct file *, struct vm_area_struct *);
> >  	unsigned long mmap_supported_flags;
> >  	int (*open) (struct inode *, struct file *);
> >  	int (*flush) (struct file *, fl_owner_t id);
> > @@ -2074,9 +2076,15 @@ static inline ssize_t call_write_iter(struct file *file, struct kiocb *kio,
> >  	return file->f_op->write_iter(kio, iter);
> >  }
> >  
> > -static inline int call_mmap(struct file *file, struct vm_area_struct *vma)
> > +static inline int call_mmap(struct file *file, struct vm_area_struct *vma, bool do_populate)
> >  {
> > -	return file->f_op->mmap(file, vma);
> > +	int ret = file->f_op->mmap(file, vma);
> > +
> > +	if (!ret && do_populate && file->f_op->populate &&
> > +	    !!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
> > +		ret = file->f_op->populate(file, vma);
> > +
> > +	return ret;
> >  }
> >  
> >  extern ssize_t vfs_read(struct file *, char __user *, size_t, loff_t *);
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 213cc569b192..6c8c036f423b 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -2683,7 +2683,7 @@ extern unsigned long get_unmapped_area(struct file *, unsigned long, unsigned lo
> >  
> >  extern unsigned long mmap_region(struct file *file, unsigned long addr,
> >  	unsigned long len, vm_flags_t vm_flags, unsigned long pgoff,
> > -	struct list_head *uf);
> > +	struct list_head *uf, bool do_populate);
> 
> As I have said many times before, don't add random boolean flags to
> function arguments, as they provide no hint as to what they do at all.
> When you read the code, you then have to go back and look up the
> function definition here and see what exactly it means and the flow is
> broken.
> 
> Make function names mean something obvious, for this, if it really is a
> good idea to have this new flag (and I doubt it, but that's not my
> call), then make this a mmap_region_populate() call to make it obvious
> it is something different than the notmal mmap_region() call.

I can create:

* mmap_region_populate()
* call_mmap_populate()

This would localize the changes and leave out those boolean parameters.

> But as is, this is pretty horrid, don't you agree?

So can I conclude from this that in general having populate available for
device memory is something horrid, or just the implementation path?

That's the main reason why I made this RFC patch set, to get clear answer
to that question. I.e. if it is in general sense a bad idea, I'll just
create ioctl. If it is the implementation, I'll try to improve it.

Otherwise, I don't know whether or not it is good idea to include such
patch into the main SGX2 patch set. No means enforcibl tryy to push support
IO memory populate.

> thanks,
> 
> greg k-h

BR, Jarkko

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH RFC 1/3] mm: Add f_ops->populate()
  2022-03-06 17:02       ` Jarkko Sakkinen
  (?)
@ 2022-03-06 17:03         ` Jarkko Sakkinen
  -1 siblings, 0 replies; 68+ messages in thread
From: Jarkko Sakkinen @ 2022-03-06 17:03 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: linux-mm, Dave Hansen, Nathaniel McCallum, Reinette Chatre,
	Andrew Morton, linux-sgx, linux-kernel, Florian Fainelli,
	Thomas Bogendoerfer, Matthew Auld, Thomas Hellström,
	Daniel Vetter, Jason Ekstrand, Chris Wilson, Maarten Lankhorst,
	Tvrtko Ursulin, Shakeel Butt, Vasily Averin, zhangyiru,
	Alexander Mikhalitsyn, Alexey Gladkov, linux-mips, intel-gfx,
	dri-devel, codalist, linux-unionfs, linux-fsdevel

On Sun, Mar 06, 2022 at 07:03:00PM +0200, Jarkko Sakkinen wrote:
> On Sun, Mar 06, 2022 at 11:01:36AM +0100, Greg Kroah-Hartman wrote:
> > On Sun, Mar 06, 2022 at 07:32:05AM +0200, Jarkko Sakkinen wrote:
> > > Sometimes you might want to use MAP_POPULATE to ask a device driver to
> > > initialize the device memory in some specific manner. SGX driver can use
> > > this to request more memory by issuing ENCLS[EAUG] x86 opcode for each
> > > page in the address range.
> > > 
> > > Add f_ops->populate() with the same parameters as f_ops->mmap() and make
> > > it conditionally called inside call_mmap(). Update call sites
> > > accodingly.
> > > ---
> > > Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
> > > v3:
> > > -       if (!ret && do_populate && file->f_op->populate)
> > > +       if (!ret && do_populate && file->f_op->populate &&
> > > +           !!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
> > > (reported by Matthew Wilcox)
> > > v2:
> > > -       if (!ret && do_populate)
> > > +       if (!ret && do_populate && file->f_op->populate)
> > > (reported by Jan Harkes)
> > > ---
> > >  arch/mips/kernel/vdso.c                    |  2 +-
> > >  drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c |  2 +-
> > >  fs/coda/file.c                             |  2 +-
> > >  fs/overlayfs/file.c                        |  2 +-
> > >  include/linux/fs.h                         | 12 ++++++++++--
> > >  include/linux/mm.h                         |  2 +-
> > >  ipc/shm.c                                  |  2 +-
> > >  mm/mmap.c                                  | 10 +++++-----
> > >  mm/nommu.c                                 |  4 ++--
> > >  9 files changed, 23 insertions(+), 15 deletions(-)
> > > 
> > > diff --git a/arch/mips/kernel/vdso.c b/arch/mips/kernel/vdso.c
> > > index 3d0cf471f2fe..89f3f3da9abd 100644
> > > --- a/arch/mips/kernel/vdso.c
> > > +++ b/arch/mips/kernel/vdso.c
> > > @@ -102,7 +102,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
> > >  		base = mmap_region(NULL, STACK_TOP, PAGE_SIZE,
> > >  				VM_READ | VM_EXEC |
> > >  				VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC,
> > > -				0, NULL);
> > > +				0, NULL, false);
> > >  		if (IS_ERR_VALUE(base)) {
> > >  			ret = base;
> > >  			goto out;
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> > > index 1b526039a60d..4c71f64d6a79 100644
> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> > > @@ -107,7 +107,7 @@ static int i915_gem_dmabuf_mmap(struct dma_buf *dma_buf, struct vm_area_struct *
> > >  	if (!obj->base.filp)
> > >  		return -ENODEV;
> > >  
> > > -	ret = call_mmap(obj->base.filp, vma);
> > > +	ret = call_mmap(obj->base.filp, vma, false);
> > >  	if (ret)
> > >  		return ret;
> > >  
> > > diff --git a/fs/coda/file.c b/fs/coda/file.c
> > > index 29dd87be2fb8..e14f312fdbf8 100644
> > > --- a/fs/coda/file.c
> > > +++ b/fs/coda/file.c
> > > @@ -173,7 +173,7 @@ coda_file_mmap(struct file *coda_file, struct vm_area_struct *vma)
> > >  	spin_unlock(&cii->c_lock);
> > >  
> > >  	vma->vm_file = get_file(host_file);
> > > -	ret = call_mmap(vma->vm_file, vma);
> > > +	ret = call_mmap(vma->vm_file, vma, false);
> > >  
> > >  	if (ret) {
> > >  		/* if call_mmap fails, our caller will put host_file so we
> > > diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
> > > index fa125feed0ff..b963a9397e80 100644
> > > --- a/fs/overlayfs/file.c
> > > +++ b/fs/overlayfs/file.c
> > > @@ -503,7 +503,7 @@ static int ovl_mmap(struct file *file, struct vm_area_struct *vma)
> > >  	vma_set_file(vma, realfile);
> > >  
> > >  	old_cred = ovl_override_creds(file_inode(file)->i_sb);
> > > -	ret = call_mmap(vma->vm_file, vma);
> > > +	ret = call_mmap(vma->vm_file, vma, false);
> > >  	revert_creds(old_cred);
> > >  	ovl_file_accessed(file);
> > >  
> > > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > > index e2d892b201b0..2909e2d14af8 100644
> > > --- a/include/linux/fs.h
> > > +++ b/include/linux/fs.h
> > > @@ -42,6 +42,7 @@
> > >  #include <linux/mount.h>
> > >  #include <linux/cred.h>
> > >  #include <linux/mnt_idmapping.h>
> > > +#include <linux/mm.h>
> > >  
> > >  #include <asm/byteorder.h>
> > >  #include <uapi/linux/fs.h>
> > > @@ -1993,6 +1994,7 @@ struct file_operations {
> > >  	long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
> > >  	long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
> > >  	int (*mmap) (struct file *, struct vm_area_struct *);
> > > +	int (*populate)(struct file *, struct vm_area_struct *);
> > >  	unsigned long mmap_supported_flags;
> > >  	int (*open) (struct inode *, struct file *);
> > >  	int (*flush) (struct file *, fl_owner_t id);
> > > @@ -2074,9 +2076,15 @@ static inline ssize_t call_write_iter(struct file *file, struct kiocb *kio,
> > >  	return file->f_op->write_iter(kio, iter);
> > >  }
> > >  
> > > -static inline int call_mmap(struct file *file, struct vm_area_struct *vma)
> > > +static inline int call_mmap(struct file *file, struct vm_area_struct *vma, bool do_populate)
> > >  {
> > > -	return file->f_op->mmap(file, vma);
> > > +	int ret = file->f_op->mmap(file, vma);
> > > +
> > > +	if (!ret && do_populate && file->f_op->populate &&
> > > +	    !!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
> > > +		ret = file->f_op->populate(file, vma);
> > > +
> > > +	return ret;
> > >  }
> > >  
> > >  extern ssize_t vfs_read(struct file *, char __user *, size_t, loff_t *);
> > > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > > index 213cc569b192..6c8c036f423b 100644
> > > --- a/include/linux/mm.h
> > > +++ b/include/linux/mm.h
> > > @@ -2683,7 +2683,7 @@ extern unsigned long get_unmapped_area(struct file *, unsigned long, unsigned lo
> > >  
> > >  extern unsigned long mmap_region(struct file *file, unsigned long addr,
> > >  	unsigned long len, vm_flags_t vm_flags, unsigned long pgoff,
> > > -	struct list_head *uf);
> > > +	struct list_head *uf, bool do_populate);
> > 
> > As I have said many times before, don't add random boolean flags to
> > function arguments, as they provide no hint as to what they do at all.
> > When you read the code, you then have to go back and look up the
> > function definition here and see what exactly it means and the flow is
> > broken.
> > 
> > Make function names mean something obvious, for this, if it really is a
> > good idea to have this new flag (and I doubt it, but that's not my
> > call), then make this a mmap_region_populate() call to make it obvious
> > it is something different than the notmal mmap_region() call.
> 
> I can create:
> 
> * mmap_region_populate()
> * call_mmap_populate()
> 
> This would localize the changes and leave out those boolean parameters.
> 
> > But as is, this is pretty horrid, don't you agree?
> 
> So can I conclude from this that in general having populate available for
> device memory is something horrid, or just the implementation path?
> 
> That's the main reason why I made this RFC patch set, to get clear answer
> to that question. I.e. if it is in general sense a bad idea, I'll just
> create ioctl. If it is the implementation, I'll try to improve it.
> 
> Otherwise, I don't know whether or not it is good idea to include such
> patch into the main SGX2 patch set. No means enforcibl tryy to push support
                                         ~~~~~
                                         intention

BR, Jarkko

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH RFC 1/3] mm: Add f_ops->populate()
@ 2022-03-06 17:03         ` Jarkko Sakkinen
  0 siblings, 0 replies; 68+ messages in thread
From: Jarkko Sakkinen @ 2022-03-06 17:03 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: zhangyiru, Daniel Vetter, Dave Hansen, dri-devel, linux-kernel,
	linux-mm, Jason Ekstrand, Alexander Mikhalitsyn,
	Florian Fainelli, linux-unionfs, codalist, Matthew Auld,
	Vasily Averin, Thomas Hellström, intel-gfx, Shakeel Butt,
	Reinette Chatre, Chris Wilson, linux-sgx, Thomas Bogendoerfer,
	Nathaniel McCallum, Tvrtko Ursulin, linux-mips, linux-fsdevel,
	Andrew Morton, Alexey Gladkov

On Sun, Mar 06, 2022 at 07:03:00PM +0200, Jarkko Sakkinen wrote:
> On Sun, Mar 06, 2022 at 11:01:36AM +0100, Greg Kroah-Hartman wrote:
> > On Sun, Mar 06, 2022 at 07:32:05AM +0200, Jarkko Sakkinen wrote:
> > > Sometimes you might want to use MAP_POPULATE to ask a device driver to
> > > initialize the device memory in some specific manner. SGX driver can use
> > > this to request more memory by issuing ENCLS[EAUG] x86 opcode for each
> > > page in the address range.
> > > 
> > > Add f_ops->populate() with the same parameters as f_ops->mmap() and make
> > > it conditionally called inside call_mmap(). Update call sites
> > > accodingly.
> > > ---
> > > Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
> > > v3:
> > > -       if (!ret && do_populate && file->f_op->populate)
> > > +       if (!ret && do_populate && file->f_op->populate &&
> > > +           !!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
> > > (reported by Matthew Wilcox)
> > > v2:
> > > -       if (!ret && do_populate)
> > > +       if (!ret && do_populate && file->f_op->populate)
> > > (reported by Jan Harkes)
> > > ---
> > >  arch/mips/kernel/vdso.c                    |  2 +-
> > >  drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c |  2 +-
> > >  fs/coda/file.c                             |  2 +-
> > >  fs/overlayfs/file.c                        |  2 +-
> > >  include/linux/fs.h                         | 12 ++++++++++--
> > >  include/linux/mm.h                         |  2 +-
> > >  ipc/shm.c                                  |  2 +-
> > >  mm/mmap.c                                  | 10 +++++-----
> > >  mm/nommu.c                                 |  4 ++--
> > >  9 files changed, 23 insertions(+), 15 deletions(-)
> > > 
> > > diff --git a/arch/mips/kernel/vdso.c b/arch/mips/kernel/vdso.c
> > > index 3d0cf471f2fe..89f3f3da9abd 100644
> > > --- a/arch/mips/kernel/vdso.c
> > > +++ b/arch/mips/kernel/vdso.c
> > > @@ -102,7 +102,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
> > >  		base = mmap_region(NULL, STACK_TOP, PAGE_SIZE,
> > >  				VM_READ | VM_EXEC |
> > >  				VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC,
> > > -				0, NULL);
> > > +				0, NULL, false);
> > >  		if (IS_ERR_VALUE(base)) {
> > >  			ret = base;
> > >  			goto out;
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> > > index 1b526039a60d..4c71f64d6a79 100644
> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> > > @@ -107,7 +107,7 @@ static int i915_gem_dmabuf_mmap(struct dma_buf *dma_buf, struct vm_area_struct *
> > >  	if (!obj->base.filp)
> > >  		return -ENODEV;
> > >  
> > > -	ret = call_mmap(obj->base.filp, vma);
> > > +	ret = call_mmap(obj->base.filp, vma, false);
> > >  	if (ret)
> > >  		return ret;
> > >  
> > > diff --git a/fs/coda/file.c b/fs/coda/file.c
> > > index 29dd87be2fb8..e14f312fdbf8 100644
> > > --- a/fs/coda/file.c
> > > +++ b/fs/coda/file.c
> > > @@ -173,7 +173,7 @@ coda_file_mmap(struct file *coda_file, struct vm_area_struct *vma)
> > >  	spin_unlock(&cii->c_lock);
> > >  
> > >  	vma->vm_file = get_file(host_file);
> > > -	ret = call_mmap(vma->vm_file, vma);
> > > +	ret = call_mmap(vma->vm_file, vma, false);
> > >  
> > >  	if (ret) {
> > >  		/* if call_mmap fails, our caller will put host_file so we
> > > diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
> > > index fa125feed0ff..b963a9397e80 100644
> > > --- a/fs/overlayfs/file.c
> > > +++ b/fs/overlayfs/file.c
> > > @@ -503,7 +503,7 @@ static int ovl_mmap(struct file *file, struct vm_area_struct *vma)
> > >  	vma_set_file(vma, realfile);
> > >  
> > >  	old_cred = ovl_override_creds(file_inode(file)->i_sb);
> > > -	ret = call_mmap(vma->vm_file, vma);
> > > +	ret = call_mmap(vma->vm_file, vma, false);
> > >  	revert_creds(old_cred);
> > >  	ovl_file_accessed(file);
> > >  
> > > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > > index e2d892b201b0..2909e2d14af8 100644
> > > --- a/include/linux/fs.h
> > > +++ b/include/linux/fs.h
> > > @@ -42,6 +42,7 @@
> > >  #include <linux/mount.h>
> > >  #include <linux/cred.h>
> > >  #include <linux/mnt_idmapping.h>
> > > +#include <linux/mm.h>
> > >  
> > >  #include <asm/byteorder.h>
> > >  #include <uapi/linux/fs.h>
> > > @@ -1993,6 +1994,7 @@ struct file_operations {
> > >  	long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
> > >  	long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
> > >  	int (*mmap) (struct file *, struct vm_area_struct *);
> > > +	int (*populate)(struct file *, struct vm_area_struct *);
> > >  	unsigned long mmap_supported_flags;
> > >  	int (*open) (struct inode *, struct file *);
> > >  	int (*flush) (struct file *, fl_owner_t id);
> > > @@ -2074,9 +2076,15 @@ static inline ssize_t call_write_iter(struct file *file, struct kiocb *kio,
> > >  	return file->f_op->write_iter(kio, iter);
> > >  }
> > >  
> > > -static inline int call_mmap(struct file *file, struct vm_area_struct *vma)
> > > +static inline int call_mmap(struct file *file, struct vm_area_struct *vma, bool do_populate)
> > >  {
> > > -	return file->f_op->mmap(file, vma);
> > > +	int ret = file->f_op->mmap(file, vma);
> > > +
> > > +	if (!ret && do_populate && file->f_op->populate &&
> > > +	    !!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
> > > +		ret = file->f_op->populate(file, vma);
> > > +
> > > +	return ret;
> > >  }
> > >  
> > >  extern ssize_t vfs_read(struct file *, char __user *, size_t, loff_t *);
> > > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > > index 213cc569b192..6c8c036f423b 100644
> > > --- a/include/linux/mm.h
> > > +++ b/include/linux/mm.h
> > > @@ -2683,7 +2683,7 @@ extern unsigned long get_unmapped_area(struct file *, unsigned long, unsigned lo
> > >  
> > >  extern unsigned long mmap_region(struct file *file, unsigned long addr,
> > >  	unsigned long len, vm_flags_t vm_flags, unsigned long pgoff,
> > > -	struct list_head *uf);
> > > +	struct list_head *uf, bool do_populate);
> > 
> > As I have said many times before, don't add random boolean flags to
> > function arguments, as they provide no hint as to what they do at all.
> > When you read the code, you then have to go back and look up the
> > function definition here and see what exactly it means and the flow is
> > broken.
> > 
> > Make function names mean something obvious, for this, if it really is a
> > good idea to have this new flag (and I doubt it, but that's not my
> > call), then make this a mmap_region_populate() call to make it obvious
> > it is something different than the notmal mmap_region() call.
> 
> I can create:
> 
> * mmap_region_populate()
> * call_mmap_populate()
> 
> This would localize the changes and leave out those boolean parameters.
> 
> > But as is, this is pretty horrid, don't you agree?
> 
> So can I conclude from this that in general having populate available for
> device memory is something horrid, or just the implementation path?
> 
> That's the main reason why I made this RFC patch set, to get clear answer
> to that question. I.e. if it is in general sense a bad idea, I'll just
> create ioctl. If it is the implementation, I'll try to improve it.
> 
> Otherwise, I don't know whether or not it is good idea to include such
> patch into the main SGX2 patch set. No means enforcibl tryy to push support
                                         ~~~~~
                                         intention

BR, Jarkko

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Intel-gfx] [PATCH RFC 1/3] mm: Add f_ops->populate()
@ 2022-03-06 17:03         ` Jarkko Sakkinen
  0 siblings, 0 replies; 68+ messages in thread
From: Jarkko Sakkinen @ 2022-03-06 17:03 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: zhangyiru, Daniel Vetter, Dave Hansen, dri-devel, linux-kernel,
	linux-mm, Alexander Mikhalitsyn, Florian Fainelli, linux-unionfs,
	codalist, Matthew Auld, Vasily Averin, Thomas Hellström,
	intel-gfx, Shakeel Butt, Reinette Chatre, Chris Wilson,
	linux-sgx, Thomas Bogendoerfer, Nathaniel McCallum, linux-mips,
	linux-fsdevel, Andrew Morton, Alexey Gladkov

On Sun, Mar 06, 2022 at 07:03:00PM +0200, Jarkko Sakkinen wrote:
> On Sun, Mar 06, 2022 at 11:01:36AM +0100, Greg Kroah-Hartman wrote:
> > On Sun, Mar 06, 2022 at 07:32:05AM +0200, Jarkko Sakkinen wrote:
> > > Sometimes you might want to use MAP_POPULATE to ask a device driver to
> > > initialize the device memory in some specific manner. SGX driver can use
> > > this to request more memory by issuing ENCLS[EAUG] x86 opcode for each
> > > page in the address range.
> > > 
> > > Add f_ops->populate() with the same parameters as f_ops->mmap() and make
> > > it conditionally called inside call_mmap(). Update call sites
> > > accodingly.
> > > ---
> > > Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
> > > v3:
> > > -       if (!ret && do_populate && file->f_op->populate)
> > > +       if (!ret && do_populate && file->f_op->populate &&
> > > +           !!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
> > > (reported by Matthew Wilcox)
> > > v2:
> > > -       if (!ret && do_populate)
> > > +       if (!ret && do_populate && file->f_op->populate)
> > > (reported by Jan Harkes)
> > > ---
> > >  arch/mips/kernel/vdso.c                    |  2 +-
> > >  drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c |  2 +-
> > >  fs/coda/file.c                             |  2 +-
> > >  fs/overlayfs/file.c                        |  2 +-
> > >  include/linux/fs.h                         | 12 ++++++++++--
> > >  include/linux/mm.h                         |  2 +-
> > >  ipc/shm.c                                  |  2 +-
> > >  mm/mmap.c                                  | 10 +++++-----
> > >  mm/nommu.c                                 |  4 ++--
> > >  9 files changed, 23 insertions(+), 15 deletions(-)
> > > 
> > > diff --git a/arch/mips/kernel/vdso.c b/arch/mips/kernel/vdso.c
> > > index 3d0cf471f2fe..89f3f3da9abd 100644
> > > --- a/arch/mips/kernel/vdso.c
> > > +++ b/arch/mips/kernel/vdso.c
> > > @@ -102,7 +102,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
> > >  		base = mmap_region(NULL, STACK_TOP, PAGE_SIZE,
> > >  				VM_READ | VM_EXEC |
> > >  				VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC,
> > > -				0, NULL);
> > > +				0, NULL, false);
> > >  		if (IS_ERR_VALUE(base)) {
> > >  			ret = base;
> > >  			goto out;
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> > > index 1b526039a60d..4c71f64d6a79 100644
> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> > > @@ -107,7 +107,7 @@ static int i915_gem_dmabuf_mmap(struct dma_buf *dma_buf, struct vm_area_struct *
> > >  	if (!obj->base.filp)
> > >  		return -ENODEV;
> > >  
> > > -	ret = call_mmap(obj->base.filp, vma);
> > > +	ret = call_mmap(obj->base.filp, vma, false);
> > >  	if (ret)
> > >  		return ret;
> > >  
> > > diff --git a/fs/coda/file.c b/fs/coda/file.c
> > > index 29dd87be2fb8..e14f312fdbf8 100644
> > > --- a/fs/coda/file.c
> > > +++ b/fs/coda/file.c
> > > @@ -173,7 +173,7 @@ coda_file_mmap(struct file *coda_file, struct vm_area_struct *vma)
> > >  	spin_unlock(&cii->c_lock);
> > >  
> > >  	vma->vm_file = get_file(host_file);
> > > -	ret = call_mmap(vma->vm_file, vma);
> > > +	ret = call_mmap(vma->vm_file, vma, false);
> > >  
> > >  	if (ret) {
> > >  		/* if call_mmap fails, our caller will put host_file so we
> > > diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
> > > index fa125feed0ff..b963a9397e80 100644
> > > --- a/fs/overlayfs/file.c
> > > +++ b/fs/overlayfs/file.c
> > > @@ -503,7 +503,7 @@ static int ovl_mmap(struct file *file, struct vm_area_struct *vma)
> > >  	vma_set_file(vma, realfile);
> > >  
> > >  	old_cred = ovl_override_creds(file_inode(file)->i_sb);
> > > -	ret = call_mmap(vma->vm_file, vma);
> > > +	ret = call_mmap(vma->vm_file, vma, false);
> > >  	revert_creds(old_cred);
> > >  	ovl_file_accessed(file);
> > >  
> > > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > > index e2d892b201b0..2909e2d14af8 100644
> > > --- a/include/linux/fs.h
> > > +++ b/include/linux/fs.h
> > > @@ -42,6 +42,7 @@
> > >  #include <linux/mount.h>
> > >  #include <linux/cred.h>
> > >  #include <linux/mnt_idmapping.h>
> > > +#include <linux/mm.h>
> > >  
> > >  #include <asm/byteorder.h>
> > >  #include <uapi/linux/fs.h>
> > > @@ -1993,6 +1994,7 @@ struct file_operations {
> > >  	long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
> > >  	long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
> > >  	int (*mmap) (struct file *, struct vm_area_struct *);
> > > +	int (*populate)(struct file *, struct vm_area_struct *);
> > >  	unsigned long mmap_supported_flags;
> > >  	int (*open) (struct inode *, struct file *);
> > >  	int (*flush) (struct file *, fl_owner_t id);
> > > @@ -2074,9 +2076,15 @@ static inline ssize_t call_write_iter(struct file *file, struct kiocb *kio,
> > >  	return file->f_op->write_iter(kio, iter);
> > >  }
> > >  
> > > -static inline int call_mmap(struct file *file, struct vm_area_struct *vma)
> > > +static inline int call_mmap(struct file *file, struct vm_area_struct *vma, bool do_populate)
> > >  {
> > > -	return file->f_op->mmap(file, vma);
> > > +	int ret = file->f_op->mmap(file, vma);
> > > +
> > > +	if (!ret && do_populate && file->f_op->populate &&
> > > +	    !!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
> > > +		ret = file->f_op->populate(file, vma);
> > > +
> > > +	return ret;
> > >  }
> > >  
> > >  extern ssize_t vfs_read(struct file *, char __user *, size_t, loff_t *);
> > > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > > index 213cc569b192..6c8c036f423b 100644
> > > --- a/include/linux/mm.h
> > > +++ b/include/linux/mm.h
> > > @@ -2683,7 +2683,7 @@ extern unsigned long get_unmapped_area(struct file *, unsigned long, unsigned lo
> > >  
> > >  extern unsigned long mmap_region(struct file *file, unsigned long addr,
> > >  	unsigned long len, vm_flags_t vm_flags, unsigned long pgoff,
> > > -	struct list_head *uf);
> > > +	struct list_head *uf, bool do_populate);
> > 
> > As I have said many times before, don't add random boolean flags to
> > function arguments, as they provide no hint as to what they do at all.
> > When you read the code, you then have to go back and look up the
> > function definition here and see what exactly it means and the flow is
> > broken.
> > 
> > Make function names mean something obvious, for this, if it really is a
> > good idea to have this new flag (and I doubt it, but that's not my
> > call), then make this a mmap_region_populate() call to make it obvious
> > it is something different than the notmal mmap_region() call.
> 
> I can create:
> 
> * mmap_region_populate()
> * call_mmap_populate()
> 
> This would localize the changes and leave out those boolean parameters.
> 
> > But as is, this is pretty horrid, don't you agree?
> 
> So can I conclude from this that in general having populate available for
> device memory is something horrid, or just the implementation path?
> 
> That's the main reason why I made this RFC patch set, to get clear answer
> to that question. I.e. if it is in general sense a bad idea, I'll just
> create ioctl. If it is the implementation, I'll try to improve it.
> 
> Otherwise, I don't know whether or not it is good idea to include such
> patch into the main SGX2 patch set. No means enforcibl tryy to push support
                                         ~~~~~
                                         intention

BR, Jarkko

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH RFC 1/3] mm: Add f_ops->populate()
  2022-03-06 17:02       ` Jarkko Sakkinen
  (?)
@ 2022-03-06 22:43         ` Matthew Wilcox
  -1 siblings, 0 replies; 68+ messages in thread
From: Matthew Wilcox @ 2022-03-06 22:43 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Greg Kroah-Hartman, linux-mm, Dave Hansen, Nathaniel McCallum,
	Reinette Chatre, Andrew Morton, linux-sgx, linux-kernel,
	Florian Fainelli, Thomas Bogendoerfer, Matthew Auld,
	Thomas Hellström, Daniel Vetter, Jason Ekstrand,
	Chris Wilson, Maarten Lankhorst, Tvrtko Ursulin, Shakeel Butt,
	Vasily Averin, zhangyiru, Alexander Mikhalitsyn, Alexey Gladkov,
	linux-mips, intel-gfx, dri-devel, codalist, linux-unionfs,
	linux-fsdevel

On Sun, Mar 06, 2022 at 07:02:57PM +0200, Jarkko Sakkinen wrote:
> So can I conclude from this that in general having populate available for
> device memory is something horrid, or just the implementation path?

You haven't even attempted to explain what the problem is you're trying
to solve.  You've shown up with some terrible code and said "Hey, is
this a good idea".  No, no, it's not.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH RFC 1/3] mm: Add f_ops->populate()
@ 2022-03-06 22:43         ` Matthew Wilcox
  0 siblings, 0 replies; 68+ messages in thread
From: Matthew Wilcox @ 2022-03-06 22:43 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: zhangyiru, Daniel Vetter, Dave Hansen, dri-devel, Chris Wilson,
	linux-mm, Jason Ekstrand, Alexander Mikhalitsyn,
	Florian Fainelli, linux-unionfs, codalist, Matthew Auld,
	Vasily Averin, Thomas Hellström, intel-gfx, linux-mips,
	Shakeel Butt, Reinette Chatre, linux-sgx, Thomas Bogendoerfer,
	Nathaniel McCallum, Greg Kroah-Hartman, Tvrtko Ursulin,
	linux-kernel, linux-fsdevel, Andrew Morton, Alexey Gladkov

On Sun, Mar 06, 2022 at 07:02:57PM +0200, Jarkko Sakkinen wrote:
> So can I conclude from this that in general having populate available for
> device memory is something horrid, or just the implementation path?

You haven't even attempted to explain what the problem is you're trying
to solve.  You've shown up with some terrible code and said "Hey, is
this a good idea".  No, no, it's not.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Intel-gfx] [PATCH RFC 1/3] mm: Add f_ops->populate()
@ 2022-03-06 22:43         ` Matthew Wilcox
  0 siblings, 0 replies; 68+ messages in thread
From: Matthew Wilcox @ 2022-03-06 22:43 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: zhangyiru, Daniel Vetter, Dave Hansen, dri-devel, Chris Wilson,
	linux-mm, Alexander Mikhalitsyn, Florian Fainelli, linux-unionfs,
	codalist, Matthew Auld, Vasily Averin, Thomas Hellström,
	intel-gfx, linux-mips, Shakeel Butt, Reinette Chatre, linux-sgx,
	Thomas Bogendoerfer, Nathaniel McCallum, Greg Kroah-Hartman,
	linux-kernel, linux-fsdevel, Andrew Morton, Alexey Gladkov

On Sun, Mar 06, 2022 at 07:02:57PM +0200, Jarkko Sakkinen wrote:
> So can I conclude from this that in general having populate available for
> device memory is something horrid, or just the implementation path?

You haven't even attempted to explain what the problem is you're trying
to solve.  You've shown up with some terrible code and said "Hey, is
this a good idea".  No, no, it's not.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH RFC 0/3] MAP_POPULATE for device memory
  2022-03-06 11:33   ` Matthew Wilcox
@ 2022-03-07  7:48     ` Christoph Hellwig
  -1 siblings, 0 replies; 68+ messages in thread
From: Christoph Hellwig @ 2022-03-07  7:48 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Jarkko Sakkinen, linux-mm, Dave Hansen, Nathaniel McCallum,
	Reinette Chatre, Andrew Morton, linux-sgx, linux-kernel,
	Florian Fainelli, Thomas Bogendoerfer, Matthew Auld,
	Thomas Hellström, Daniel Vetter, Jason Ekstrand,
	Chris Wilson, Maarten Lankhorst, Greg Kroah-Hartman,
	Tvrtko Ursulin, Vasily Averin, Shakeel Butt, Michal Hocko,
	zhangyiru, Alexey Gladkov, Alexander Mikhalitsyn, linux-mips,
	intel-gfx, dri-devel, codalist, linux-unionfs, linux-fsdevel

On Sun, Mar 06, 2022 at 11:33:02AM +0000, Matthew Wilcox wrote:
> On Sun, Mar 06, 2022 at 07:32:04AM +0200, Jarkko Sakkinen wrote:
> > For device memory (aka VM_IO | VM_PFNMAP) MAP_POPULATE does nothing. Allow
> > to use that for initializing the device memory by providing a new callback
> > f_ops->populate() for the purpose.
> 
> As I said, NAK.

Agreed.  This is an amazingly bad interface.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Intel-gfx] [PATCH RFC 0/3] MAP_POPULATE for device memory
@ 2022-03-07  7:48     ` Christoph Hellwig
  0 siblings, 0 replies; 68+ messages in thread
From: Christoph Hellwig @ 2022-03-07  7:48 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Michal Hocko, zhangyiru, Daniel Vetter, Dave Hansen, dri-devel,
	Chris Wilson, linux-mm, Alexander Mikhalitsyn, Florian Fainelli,
	linux-unionfs, codalist, Matthew Auld, Vasily Averin,
	Thomas Hellström, intel-gfx, linux-mips, Shakeel Butt,
	Reinette Chatre, linux-sgx, Thomas Bogendoerfer,
	Nathaniel McCallum, Greg Kroah-Hartman, linux-kernel,
	Jarkko Sakkinen, linux-fsdevel, Andrew Morton, Alexey Gladkov

On Sun, Mar 06, 2022 at 11:33:02AM +0000, Matthew Wilcox wrote:
> On Sun, Mar 06, 2022 at 07:32:04AM +0200, Jarkko Sakkinen wrote:
> > For device memory (aka VM_IO | VM_PFNMAP) MAP_POPULATE does nothing. Allow
> > to use that for initializing the device memory by providing a new callback
> > f_ops->populate() for the purpose.
> 
> As I said, NAK.

Agreed.  This is an amazingly bad interface.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH RFC 0/3] MAP_POPULATE for device memory
  2022-03-06  5:32 ` Jarkko Sakkinen
  (?)
@ 2022-03-07 10:12   ` David Hildenbrand
  -1 siblings, 0 replies; 68+ messages in thread
From: David Hildenbrand @ 2022-03-07 10:12 UTC (permalink / raw)
  To: Jarkko Sakkinen, linux-mm
  Cc: Dave Hansen, Nathaniel McCallum, Reinette Chatre, Andrew Morton,
	linux-sgx, linux-kernel, Florian Fainelli, Thomas Bogendoerfer,
	Matthew Auld, Thomas Hellström, Daniel Vetter,
	Jason Ekstrand, Chris Wilson, Maarten Lankhorst,
	Greg Kroah-Hartman, Tvrtko Ursulin, Vasily Averin, Shakeel Butt,
	Michal Hocko, zhangyiru, Alexey Gladkov, Alexander Mikhalitsyn,
	linux-mips, intel-gfx, dri-devel, codalist, linux-unionfs,
	linux-fsdevel

On 06.03.22 06:32, Jarkko Sakkinen wrote:
> For device memory (aka VM_IO | VM_PFNMAP) MAP_POPULATE does nothing. Allow
> to use that for initializing the device memory by providing a new callback
> f_ops->populate() for the purpose.
> 
> SGX patches are provided to show the callback in context.
> 
> An obvious alternative is a ioctl but it is less elegant and requires
> two syscalls (mmap + ioctl) per memory range, instead of just one
> (mmap).

What about extending MADV_POPULATE_READ | MADV_POPULATE_WRITE to support
VM_IO | VM_PFNMAP (as well?) ?


-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH RFC 0/3] MAP_POPULATE for device memory
@ 2022-03-07 10:12   ` David Hildenbrand
  0 siblings, 0 replies; 68+ messages in thread
From: David Hildenbrand @ 2022-03-07 10:12 UTC (permalink / raw)
  To: Jarkko Sakkinen, linux-mm
  Cc: Michal Hocko, zhangyiru, Daniel Vetter, Dave Hansen, dri-devel,
	Chris Wilson, Jason Ekstrand, Alexander Mikhalitsyn,
	Florian Fainelli, linux-unionfs, codalist, Matthew Auld,
	Vasily Averin, Thomas Hellström, intel-gfx, linux-mips,
	Shakeel Butt, Reinette Chatre, linux-sgx, Thomas Bogendoerfer,
	Nathaniel McCallum, Greg Kroah-Hartman, Tvrtko Ursulin,
	linux-kernel, linux-fsdevel, Andrew Morton, Alexey Gladkov

On 06.03.22 06:32, Jarkko Sakkinen wrote:
> For device memory (aka VM_IO | VM_PFNMAP) MAP_POPULATE does nothing. Allow
> to use that for initializing the device memory by providing a new callback
> f_ops->populate() for the purpose.
> 
> SGX patches are provided to show the callback in context.
> 
> An obvious alternative is a ioctl but it is less elegant and requires
> two syscalls (mmap + ioctl) per memory range, instead of just one
> (mmap).

What about extending MADV_POPULATE_READ | MADV_POPULATE_WRITE to support
VM_IO | VM_PFNMAP (as well?) ?


-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Intel-gfx] [PATCH RFC 0/3] MAP_POPULATE for device memory
@ 2022-03-07 10:12   ` David Hildenbrand
  0 siblings, 0 replies; 68+ messages in thread
From: David Hildenbrand @ 2022-03-07 10:12 UTC (permalink / raw)
  To: Jarkko Sakkinen, linux-mm
  Cc: Michal Hocko, zhangyiru, Daniel Vetter, Dave Hansen, dri-devel,
	Chris Wilson, Alexander Mikhalitsyn, Florian Fainelli,
	linux-unionfs, codalist, Matthew Auld, Vasily Averin,
	Thomas Hellström, intel-gfx, linux-mips, Shakeel Butt,
	Reinette Chatre, linux-sgx, Thomas Bogendoerfer,
	Nathaniel McCallum, Greg Kroah-Hartman, linux-kernel,
	linux-fsdevel, Andrew Morton, Alexey Gladkov

On 06.03.22 06:32, Jarkko Sakkinen wrote:
> For device memory (aka VM_IO | VM_PFNMAP) MAP_POPULATE does nothing. Allow
> to use that for initializing the device memory by providing a new callback
> f_ops->populate() for the purpose.
> 
> SGX patches are provided to show the callback in context.
> 
> An obvious alternative is a ioctl but it is less elegant and requires
> two syscalls (mmap + ioctl) per memory range, instead of just one
> (mmap).

What about extending MADV_POPULATE_READ | MADV_POPULATE_WRITE to support
VM_IO | VM_PFNMAP (as well?) ?


-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH RFC 1/3] mm: Add f_ops->populate()
  2022-03-06 22:43         ` Matthew Wilcox
  (?)
@ 2022-03-07 13:16           ` Jarkko Sakkinen
  -1 siblings, 0 replies; 68+ messages in thread
From: Jarkko Sakkinen @ 2022-03-07 13:16 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Greg Kroah-Hartman, linux-mm, Dave Hansen, Nathaniel McCallum,
	Reinette Chatre, Andrew Morton, linux-sgx, linux-kernel,
	Florian Fainelli, Thomas Bogendoerfer, Matthew Auld,
	Thomas Hellström, Daniel Vetter, Jason Ekstrand,
	Chris Wilson, Maarten Lankhorst, Tvrtko Ursulin, Shakeel Butt,
	Vasily Averin, zhangyiru, Alexander Mikhalitsyn, Alexey Gladkov,
	linux-mips, intel-gfx, dri-devel, codalist, linux-unionfs,
	linux-fsdevel

On Sun, Mar 06, 2022 at 10:43:31PM +0000, Matthew Wilcox wrote:
> On Sun, Mar 06, 2022 at 07:02:57PM +0200, Jarkko Sakkinen wrote:
> > So can I conclude from this that in general having populate available for
> > device memory is something horrid, or just the implementation path?
> 
> You haven't even attempted to explain what the problem is you're trying
> to solve.  You've shown up with some terrible code and said "Hey, is
> this a good idea".  No, no, it's not.

The problem is that in order to include memory to enclave, which is
essentially a reserved address range processes virtual address space
there's two steps into it:

1. Host side (kernel) does ENCLS[EAUG] to request a new page to be
   added to the enclave.
2. Enclave accepts request with ENCLU[EACCEPT] or ENCLU[EACCEPTCOPY].

In the current SGX2 patch set this taken care by the page fault
handler. I.e. the enclave calls ENCLU[EACCEPT] for an empty address
and the #PF handler then does EAUG for a single page.

So if you want to process a batch of pages this generates O(n)
round-trips.

So if there was a way pre-do a batch of EAUG's, that would allow
to load data to the enclave without causing page faults happening
constantly.

One solution for this simply add ioctl:

https://lore.kernel.org/linux-sgx/YiLRBglTEbu8cHP9@iki.fi/T/#m195ec84bf85614a140abeee245c5118c22ace8f3

But in practice when you wanted to use it, you would setup the
parameters so that they match the mmap() range. So for pratical
user space API having mmap() take care of this would be much more
lean option.

BR, Jarkko

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH RFC 1/3] mm: Add f_ops->populate()
@ 2022-03-07 13:16           ` Jarkko Sakkinen
  0 siblings, 0 replies; 68+ messages in thread
From: Jarkko Sakkinen @ 2022-03-07 13:16 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: zhangyiru, Daniel Vetter, Dave Hansen, dri-devel, Chris Wilson,
	linux-mm, Jason Ekstrand, Alexander Mikhalitsyn,
	Florian Fainelli, linux-unionfs, codalist, Matthew Auld,
	Vasily Averin, Thomas Hellström, intel-gfx, linux-mips,
	Shakeel Butt, Reinette Chatre, linux-sgx, Thomas Bogendoerfer,
	Nathaniel McCallum, Greg Kroah-Hartman, Tvrtko Ursulin,
	linux-kernel, linux-fsdevel, Andrew Morton, Alexey Gladkov

On Sun, Mar 06, 2022 at 10:43:31PM +0000, Matthew Wilcox wrote:
> On Sun, Mar 06, 2022 at 07:02:57PM +0200, Jarkko Sakkinen wrote:
> > So can I conclude from this that in general having populate available for
> > device memory is something horrid, or just the implementation path?
> 
> You haven't even attempted to explain what the problem is you're trying
> to solve.  You've shown up with some terrible code and said "Hey, is
> this a good idea".  No, no, it's not.

The problem is that in order to include memory to enclave, which is
essentially a reserved address range processes virtual address space
there's two steps into it:

1. Host side (kernel) does ENCLS[EAUG] to request a new page to be
   added to the enclave.
2. Enclave accepts request with ENCLU[EACCEPT] or ENCLU[EACCEPTCOPY].

In the current SGX2 patch set this taken care by the page fault
handler. I.e. the enclave calls ENCLU[EACCEPT] for an empty address
and the #PF handler then does EAUG for a single page.

So if you want to process a batch of pages this generates O(n)
round-trips.

So if there was a way pre-do a batch of EAUG's, that would allow
to load data to the enclave without causing page faults happening
constantly.

One solution for this simply add ioctl:

https://lore.kernel.org/linux-sgx/YiLRBglTEbu8cHP9@iki.fi/T/#m195ec84bf85614a140abeee245c5118c22ace8f3

But in practice when you wanted to use it, you would setup the
parameters so that they match the mmap() range. So for pratical
user space API having mmap() take care of this would be much more
lean option.

BR, Jarkko

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Intel-gfx] [PATCH RFC 1/3] mm: Add f_ops->populate()
@ 2022-03-07 13:16           ` Jarkko Sakkinen
  0 siblings, 0 replies; 68+ messages in thread
From: Jarkko Sakkinen @ 2022-03-07 13:16 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: zhangyiru, Daniel Vetter, Dave Hansen, dri-devel, Chris Wilson,
	linux-mm, Alexander Mikhalitsyn, Florian Fainelli, linux-unionfs,
	codalist, Matthew Auld, Vasily Averin, Thomas Hellström,
	intel-gfx, linux-mips, Shakeel Butt, Reinette Chatre, linux-sgx,
	Thomas Bogendoerfer, Nathaniel McCallum, Greg Kroah-Hartman,
	linux-kernel, linux-fsdevel, Andrew Morton, Alexey Gladkov

On Sun, Mar 06, 2022 at 10:43:31PM +0000, Matthew Wilcox wrote:
> On Sun, Mar 06, 2022 at 07:02:57PM +0200, Jarkko Sakkinen wrote:
> > So can I conclude from this that in general having populate available for
> > device memory is something horrid, or just the implementation path?
> 
> You haven't even attempted to explain what the problem is you're trying
> to solve.  You've shown up with some terrible code and said "Hey, is
> this a good idea".  No, no, it's not.

The problem is that in order to include memory to enclave, which is
essentially a reserved address range processes virtual address space
there's two steps into it:

1. Host side (kernel) does ENCLS[EAUG] to request a new page to be
   added to the enclave.
2. Enclave accepts request with ENCLU[EACCEPT] or ENCLU[EACCEPTCOPY].

In the current SGX2 patch set this taken care by the page fault
handler. I.e. the enclave calls ENCLU[EACCEPT] for an empty address
and the #PF handler then does EAUG for a single page.

So if you want to process a batch of pages this generates O(n)
round-trips.

So if there was a way pre-do a batch of EAUG's, that would allow
to load data to the enclave without causing page faults happening
constantly.

One solution for this simply add ioctl:

https://lore.kernel.org/linux-sgx/YiLRBglTEbu8cHP9@iki.fi/T/#m195ec84bf85614a140abeee245c5118c22ace8f3

But in practice when you wanted to use it, you would setup the
parameters so that they match the mmap() range. So for pratical
user space API having mmap() take care of this would be much more
lean option.

BR, Jarkko

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH RFC 1/3] mm: Add f_ops->populate()
  2022-03-07 13:16           ` Jarkko Sakkinen
  (?)
@ 2022-03-07 13:26             ` Jarkko Sakkinen
  -1 siblings, 0 replies; 68+ messages in thread
From: Jarkko Sakkinen @ 2022-03-07 13:26 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Greg Kroah-Hartman, linux-mm, Dave Hansen, Nathaniel McCallum,
	Reinette Chatre, Andrew Morton, linux-sgx, linux-kernel,
	Florian Fainelli, Thomas Bogendoerfer, Matthew Auld,
	Thomas Hellström, Daniel Vetter, Jason Ekstrand,
	Chris Wilson, Maarten Lankhorst, Tvrtko Ursulin, Shakeel Butt,
	Vasily Averin, zhangyiru, Alexander Mikhalitsyn, Alexey Gladkov,
	linux-mips, intel-gfx, dri-devel, codalist, linux-unionfs,
	linux-fsdevel

On Mon, Mar 07, 2022 at 03:16:57PM +0200, Jarkko Sakkinen wrote:
> On Sun, Mar 06, 2022 at 10:43:31PM +0000, Matthew Wilcox wrote:
> > On Sun, Mar 06, 2022 at 07:02:57PM +0200, Jarkko Sakkinen wrote:
> > > So can I conclude from this that in general having populate available for
> > > device memory is something horrid, or just the implementation path?
> > 
> > You haven't even attempted to explain what the problem is you're trying
> > to solve.  You've shown up with some terrible code and said "Hey, is
> > this a good idea".  No, no, it's not.
> 
> The problem is that in order to include memory to enclave, which is
> essentially a reserved address range processes virtual address space
> there's two steps into it:
> 
> 1. Host side (kernel) does ENCLS[EAUG] to request a new page to be
>    added to the enclave.
> 2. Enclave accepts request with ENCLU[EACCEPT] or ENCLU[EACCEPTCOPY].
> 
> In the current SGX2 patch set this taken care by the page fault
> handler. I.e. the enclave calls ENCLU[EACCEPT] for an empty address
> and the #PF handler then does EAUG for a single page.
> 
> So if you want to process a batch of pages this generates O(n)
> round-trips.
> 
> So if there was a way pre-do a batch of EAUG's, that would allow
> to load data to the enclave without causing page faults happening
> constantly.
> 
> One solution for this simply add ioctl:
> 
> https://lore.kernel.org/linux-sgx/YiLRBglTEbu8cHP9@iki.fi/T/#m195ec84bf85614a140abeee245c5118c22ace8f3
> 
> But in practice when you wanted to use it, you would setup the
> parameters so that they match the mmap() range. So for pratical
> user space API having mmap() take care of this would be much more
> lean option.

For something like Graphene [1] the lazy #PF based option is probably
a way to go. For wasm runtime that we're doing in Enarx [2] we get better
performance by having something like this. I.e. we most of the time take
as much as we use.

[1] https://github.com/gramineproject/graphene
[2] https://enarx.dev/

BR, Jarkko

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH RFC 1/3] mm: Add f_ops->populate()
@ 2022-03-07 13:26             ` Jarkko Sakkinen
  0 siblings, 0 replies; 68+ messages in thread
From: Jarkko Sakkinen @ 2022-03-07 13:26 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: zhangyiru, Daniel Vetter, Dave Hansen, dri-devel, Chris Wilson,
	linux-mm, Jason Ekstrand, Alexander Mikhalitsyn,
	Florian Fainelli, linux-unionfs, codalist, Matthew Auld,
	Vasily Averin, Thomas Hellström, intel-gfx, linux-mips,
	Shakeel Butt, Reinette Chatre, linux-sgx, Thomas Bogendoerfer,
	Nathaniel McCallum, Greg Kroah-Hartman, Tvrtko Ursulin,
	linux-kernel, linux-fsdevel, Andrew Morton, Alexey Gladkov

On Mon, Mar 07, 2022 at 03:16:57PM +0200, Jarkko Sakkinen wrote:
> On Sun, Mar 06, 2022 at 10:43:31PM +0000, Matthew Wilcox wrote:
> > On Sun, Mar 06, 2022 at 07:02:57PM +0200, Jarkko Sakkinen wrote:
> > > So can I conclude from this that in general having populate available for
> > > device memory is something horrid, or just the implementation path?
> > 
> > You haven't even attempted to explain what the problem is you're trying
> > to solve.  You've shown up with some terrible code and said "Hey, is
> > this a good idea".  No, no, it's not.
> 
> The problem is that in order to include memory to enclave, which is
> essentially a reserved address range processes virtual address space
> there's two steps into it:
> 
> 1. Host side (kernel) does ENCLS[EAUG] to request a new page to be
>    added to the enclave.
> 2. Enclave accepts request with ENCLU[EACCEPT] or ENCLU[EACCEPTCOPY].
> 
> In the current SGX2 patch set this taken care by the page fault
> handler. I.e. the enclave calls ENCLU[EACCEPT] for an empty address
> and the #PF handler then does EAUG for a single page.
> 
> So if you want to process a batch of pages this generates O(n)
> round-trips.
> 
> So if there was a way pre-do a batch of EAUG's, that would allow
> to load data to the enclave without causing page faults happening
> constantly.
> 
> One solution for this simply add ioctl:
> 
> https://lore.kernel.org/linux-sgx/YiLRBglTEbu8cHP9@iki.fi/T/#m195ec84bf85614a140abeee245c5118c22ace8f3
> 
> But in practice when you wanted to use it, you would setup the
> parameters so that they match the mmap() range. So for pratical
> user space API having mmap() take care of this would be much more
> lean option.

For something like Graphene [1] the lazy #PF based option is probably
a way to go. For wasm runtime that we're doing in Enarx [2] we get better
performance by having something like this. I.e. we most of the time take
as much as we use.

[1] https://github.com/gramineproject/graphene
[2] https://enarx.dev/

BR, Jarkko

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Intel-gfx] [PATCH RFC 1/3] mm: Add f_ops->populate()
@ 2022-03-07 13:26             ` Jarkko Sakkinen
  0 siblings, 0 replies; 68+ messages in thread
From: Jarkko Sakkinen @ 2022-03-07 13:26 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: zhangyiru, Daniel Vetter, Dave Hansen, dri-devel, Chris Wilson,
	linux-mm, Alexander Mikhalitsyn, Florian Fainelli, linux-unionfs,
	codalist, Matthew Auld, Vasily Averin, Thomas Hellström,
	intel-gfx, linux-mips, Shakeel Butt, Reinette Chatre, linux-sgx,
	Thomas Bogendoerfer, Nathaniel McCallum, Greg Kroah-Hartman,
	linux-kernel, linux-fsdevel, Andrew Morton, Alexey Gladkov

On Mon, Mar 07, 2022 at 03:16:57PM +0200, Jarkko Sakkinen wrote:
> On Sun, Mar 06, 2022 at 10:43:31PM +0000, Matthew Wilcox wrote:
> > On Sun, Mar 06, 2022 at 07:02:57PM +0200, Jarkko Sakkinen wrote:
> > > So can I conclude from this that in general having populate available for
> > > device memory is something horrid, or just the implementation path?
> > 
> > You haven't even attempted to explain what the problem is you're trying
> > to solve.  You've shown up with some terrible code and said "Hey, is
> > this a good idea".  No, no, it's not.
> 
> The problem is that in order to include memory to enclave, which is
> essentially a reserved address range processes virtual address space
> there's two steps into it:
> 
> 1. Host side (kernel) does ENCLS[EAUG] to request a new page to be
>    added to the enclave.
> 2. Enclave accepts request with ENCLU[EACCEPT] or ENCLU[EACCEPTCOPY].
> 
> In the current SGX2 patch set this taken care by the page fault
> handler. I.e. the enclave calls ENCLU[EACCEPT] for an empty address
> and the #PF handler then does EAUG for a single page.
> 
> So if you want to process a batch of pages this generates O(n)
> round-trips.
> 
> So if there was a way pre-do a batch of EAUG's, that would allow
> to load data to the enclave without causing page faults happening
> constantly.
> 
> One solution for this simply add ioctl:
> 
> https://lore.kernel.org/linux-sgx/YiLRBglTEbu8cHP9@iki.fi/T/#m195ec84bf85614a140abeee245c5118c22ace8f3
> 
> But in practice when you wanted to use it, you would setup the
> parameters so that they match the mmap() range. So for pratical
> user space API having mmap() take care of this would be much more
> lean option.

For something like Graphene [1] the lazy #PF based option is probably
a way to go. For wasm runtime that we're doing in Enarx [2] we get better
performance by having something like this. I.e. we most of the time take
as much as we use.

[1] https://github.com/gramineproject/graphene
[2] https://enarx.dev/

BR, Jarkko

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH RFC 0/3] MAP_POPULATE for device memory
  2022-03-07  7:48     ` [Intel-gfx] " Christoph Hellwig
  (?)
@ 2022-03-07 13:29       ` Jarkko Sakkinen
  -1 siblings, 0 replies; 68+ messages in thread
From: Jarkko Sakkinen @ 2022-03-07 13:29 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Matthew Wilcox, linux-mm, Dave Hansen, Nathaniel McCallum,
	Reinette Chatre, Andrew Morton, linux-sgx, linux-kernel,
	Florian Fainelli, Thomas Bogendoerfer, Matthew Auld,
	Thomas Hellström, Daniel Vetter, Jason Ekstrand,
	Chris Wilson, G, Maarten Lankhorst, Greg Kroah-Hartman,
	Tvrtko Ursulin, Vasily Averin, Shakeel Butt, Michal Hocko,
	zhangyiru, Alexey Gladkov, Alexander Mikhalitsyn, linux-mips,
	intel-gfx, dri-devel, codalist, linux-unionfs, linux-fsdevel

On Sun, Mar 06, 2022 at 11:48:26PM -0800, Christoph Hellwig wrote:
> On Sun, Mar 06, 2022 at 11:33:02AM +0000, Matthew Wilcox wrote:
> > On Sun, Mar 06, 2022 at 07:32:04AM +0200, Jarkko Sakkinen wrote:
> > > For device memory (aka VM_IO | VM_PFNMAP) MAP_POPULATE does nothing. Allow
> > > to use that for initializing the device memory by providing a new callback
> > > f_ops->populate() for the purpose.
> > 
> > As I said, NAK.
> 
> Agreed.  This is an amazingly bad interface.

So what would you suggest to sort out the issue? I'm happy to go with
ioctl if nothing else is acceptable.

BR, Jarkko

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH RFC 0/3] MAP_POPULATE for device memory
@ 2022-03-07 13:29       ` Jarkko Sakkinen
  0 siblings, 0 replies; 68+ messages in thread
From: Jarkko Sakkinen @ 2022-03-07 13:29 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Michal Hocko, zhangyiru, Daniel Vetter, Dave Hansen, dri-devel,
	Chris Wilson, linux-mm, Jason Ekstrand, Alexander Mikhalitsyn,
	Florian Fainelli, linux-unionfs, Matthew Wilcox, codalist,
	Matthew Auld, Vasily Averin, Thomas Hellström, intel-gfx, G,
	linux-mips, Shakeel Butt, Reinette Chatre, linux-sgx,
	Thomas Bogendoerfer, Nathaniel McCallum, Greg Kroah-Hartman,
	Tvrtko Ursulin, linux-kernel, linux-fsdevel, Andrew Morton,
	Alexey Gladkov

On Sun, Mar 06, 2022 at 11:48:26PM -0800, Christoph Hellwig wrote:
> On Sun, Mar 06, 2022 at 11:33:02AM +0000, Matthew Wilcox wrote:
> > On Sun, Mar 06, 2022 at 07:32:04AM +0200, Jarkko Sakkinen wrote:
> > > For device memory (aka VM_IO | VM_PFNMAP) MAP_POPULATE does nothing. Allow
> > > to use that for initializing the device memory by providing a new callback
> > > f_ops->populate() for the purpose.
> > 
> > As I said, NAK.
> 
> Agreed.  This is an amazingly bad interface.

So what would you suggest to sort out the issue? I'm happy to go with
ioctl if nothing else is acceptable.

BR, Jarkko

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Intel-gfx] [PATCH RFC 0/3] MAP_POPULATE for device memory
@ 2022-03-07 13:29       ` Jarkko Sakkinen
  0 siblings, 0 replies; 68+ messages in thread
From: Jarkko Sakkinen @ 2022-03-07 13:29 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Michal Hocko, zhangyiru, Daniel Vetter, Dave Hansen, dri-devel,
	Chris Wilson, linux-mm, Alexander Mikhalitsyn, Florian Fainelli,
	linux-unionfs, Matthew Wilcox, codalist, Matthew Auld,
	Vasily Averin, Thomas Hellström, intel-gfx, G, linux-mips,
	Shakeel Butt, Reinette Chatre, linux-sgx, Thomas Bogendoerfer,
	Nathaniel McCallum, Greg Kroah-Hartman, linux-kernel,
	linux-fsdevel, Andrew Morton, Alexey Gladkov

On Sun, Mar 06, 2022 at 11:48:26PM -0800, Christoph Hellwig wrote:
> On Sun, Mar 06, 2022 at 11:33:02AM +0000, Matthew Wilcox wrote:
> > On Sun, Mar 06, 2022 at 07:32:04AM +0200, Jarkko Sakkinen wrote:
> > > For device memory (aka VM_IO | VM_PFNMAP) MAP_POPULATE does nothing. Allow
> > > to use that for initializing the device memory by providing a new callback
> > > f_ops->populate() for the purpose.
> > 
> > As I said, NAK.
> 
> Agreed.  This is an amazingly bad interface.

So what would you suggest to sort out the issue? I'm happy to go with
ioctl if nothing else is acceptable.

BR, Jarkko

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH RFC 0/3] MAP_POPULATE for device memory
  2022-03-07 10:12   ` David Hildenbrand
  (?)
@ 2022-03-07 14:22     ` Jarkko Sakkinen
  -1 siblings, 0 replies; 68+ messages in thread
From: Jarkko Sakkinen @ 2022-03-07 14:22 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, Dave Hansen, Nathaniel McCallum, Reinette Chatre,
	Andrew Morton, linux-sgx, linux-kernel, Florian Fainelli,
	Thomas Bogendoerfer, Matthew Auld, Thomas Hellström,
	Daniel Vetter, Jason Ekstrand, Chris Wilson, Maarten Lankhorst,
	Greg Kroah-Hartman, Tvrtko Ursulin, Vasily Averin, Shakeel Butt,
	Michal Hocko, zhangyiru, Alexey Gladkov, Alexander Mikhalitsyn,
	linux-mips, intel-gfx, dri-devel, codalist, linux-unionfs,
	linux-fsdevel

On Mon, Mar 07, 2022 at 11:12:44AM +0100, David Hildenbrand wrote:
> On 06.03.22 06:32, Jarkko Sakkinen wrote:
> > For device memory (aka VM_IO | VM_PFNMAP) MAP_POPULATE does nothing. Allow
> > to use that for initializing the device memory by providing a new callback
> > f_ops->populate() for the purpose.
> > 
> > SGX patches are provided to show the callback in context.
> > 
> > An obvious alternative is a ioctl but it is less elegant and requires
> > two syscalls (mmap + ioctl) per memory range, instead of just one
> > (mmap).
> 
> What about extending MADV_POPULATE_READ | MADV_POPULATE_WRITE to support
> VM_IO | VM_PFNMAP (as well?) ?

What would be a proper point to bind that behaviour? For mmap/mprotect it'd
be probably populate_vma_page_range() because that would span both mmap()
and mprotect() (Dave's suggestion in this thread).

For MAP_POPULATE I did not have hard proof to show that it would be used
by other drivers but for madvice() you can find at least a few ioctl
based implementations:

$ git grep -e madv --and \( -e ioc \)  drivers/
drivers/gpu/drm/i915/gem/i915_gem_ioctls.h:int i915_gem_madvise_ioctl(struct drm_device *dev, void *data,
drivers/gpu/drm/i915/i915_driver.c:     DRM_IOCTL_DEF_DRV(I915_GEM_MADVISE, i915_gem_madvise_ioctl, DRM_RENDER_ALLOW),
drivers/gpu/drm/i915/i915_gem.c:i915_gem_madvise_ioctl(struct drm_device *dev, void *data,
drivers/gpu/drm/msm/msm_drv.c:static int msm_ioctl_gem_madvise(struct drm_device *dev, void *data,
drivers/gpu/drm/msm/msm_drv.c:  DRM_IOCTL_DEF_DRV(MSM_GEM_MADVISE,  msm_ioctl_gem_madvise,  DRM_RENDER_ALLOW),
drivers/gpu/drm/panfrost/panfrost_drv.c:static int panfrost_ioctl_madvise(struct drm_device *dev, void *data,
drivers/gpu/drm/vc4/vc4_drv.c:  DRM_IOCTL_DEF_DRV(VC4_GEM_MADVISE, vc4_gem_madvise_ioctl, DRM_RENDER_ALLOW),
drivers/gpu/drm/vc4/vc4_drv.h:int vc4_gem_madvise_ioctl(struct drm_device *dev, void *data,
drivers/gpu/drm/vc4/vc4_gem.c:int vc4_gem_madvise_ioctl(struct drm_device *dev, void *data,

IMHO this also provides supportive claim for MAP_POPULATE, and yeah, I
agree that to be consistent implementation, both madvice() and MAP_POPULATE
should work.

> -- 
> Thanks,
> 
> David / dhildenb

BR, Jarkko

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH RFC 0/3] MAP_POPULATE for device memory
@ 2022-03-07 14:22     ` Jarkko Sakkinen
  0 siblings, 0 replies; 68+ messages in thread
From: Jarkko Sakkinen @ 2022-03-07 14:22 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Michal Hocko, zhangyiru, Daniel Vetter, Dave Hansen, dri-devel,
	Chris Wilson, linux-mm, Jason Ekstrand, Alexander Mikhalitsyn,
	Florian Fainelli, linux-unionfs, codalist, Matthew Auld,
	Vasily Averin, Thomas Hellström, intel-gfx, linux-mips,
	Shakeel Butt, Reinette Chatre, linux-sgx, Thomas Bogendoerfer,
	Nathaniel McCallum, Greg Kroah-Hartman, Tvrtko Ursulin,
	linux-kernel, linux-fsdevel, Andrew Morton, Alexey Gladkov

On Mon, Mar 07, 2022 at 11:12:44AM +0100, David Hildenbrand wrote:
> On 06.03.22 06:32, Jarkko Sakkinen wrote:
> > For device memory (aka VM_IO | VM_PFNMAP) MAP_POPULATE does nothing. Allow
> > to use that for initializing the device memory by providing a new callback
> > f_ops->populate() for the purpose.
> > 
> > SGX patches are provided to show the callback in context.
> > 
> > An obvious alternative is a ioctl but it is less elegant and requires
> > two syscalls (mmap + ioctl) per memory range, instead of just one
> > (mmap).
> 
> What about extending MADV_POPULATE_READ | MADV_POPULATE_WRITE to support
> VM_IO | VM_PFNMAP (as well?) ?

What would be a proper point to bind that behaviour? For mmap/mprotect it'd
be probably populate_vma_page_range() because that would span both mmap()
and mprotect() (Dave's suggestion in this thread).

For MAP_POPULATE I did not have hard proof to show that it would be used
by other drivers but for madvice() you can find at least a few ioctl
based implementations:

$ git grep -e madv --and \( -e ioc \)  drivers/
drivers/gpu/drm/i915/gem/i915_gem_ioctls.h:int i915_gem_madvise_ioctl(struct drm_device *dev, void *data,
drivers/gpu/drm/i915/i915_driver.c:     DRM_IOCTL_DEF_DRV(I915_GEM_MADVISE, i915_gem_madvise_ioctl, DRM_RENDER_ALLOW),
drivers/gpu/drm/i915/i915_gem.c:i915_gem_madvise_ioctl(struct drm_device *dev, void *data,
drivers/gpu/drm/msm/msm_drv.c:static int msm_ioctl_gem_madvise(struct drm_device *dev, void *data,
drivers/gpu/drm/msm/msm_drv.c:  DRM_IOCTL_DEF_DRV(MSM_GEM_MADVISE,  msm_ioctl_gem_madvise,  DRM_RENDER_ALLOW),
drivers/gpu/drm/panfrost/panfrost_drv.c:static int panfrost_ioctl_madvise(struct drm_device *dev, void *data,
drivers/gpu/drm/vc4/vc4_drv.c:  DRM_IOCTL_DEF_DRV(VC4_GEM_MADVISE, vc4_gem_madvise_ioctl, DRM_RENDER_ALLOW),
drivers/gpu/drm/vc4/vc4_drv.h:int vc4_gem_madvise_ioctl(struct drm_device *dev, void *data,
drivers/gpu/drm/vc4/vc4_gem.c:int vc4_gem_madvise_ioctl(struct drm_device *dev, void *data,

IMHO this also provides supportive claim for MAP_POPULATE, and yeah, I
agree that to be consistent implementation, both madvice() and MAP_POPULATE
should work.

> -- 
> Thanks,
> 
> David / dhildenb

BR, Jarkko

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Intel-gfx] [PATCH RFC 0/3] MAP_POPULATE for device memory
@ 2022-03-07 14:22     ` Jarkko Sakkinen
  0 siblings, 0 replies; 68+ messages in thread
From: Jarkko Sakkinen @ 2022-03-07 14:22 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Michal Hocko, zhangyiru, Daniel Vetter, Dave Hansen, dri-devel,
	Chris Wilson, linux-mm, Alexander Mikhalitsyn, Florian Fainelli,
	linux-unionfs, codalist, Matthew Auld, Vasily Averin,
	Thomas Hellström, intel-gfx, linux-mips, Shakeel Butt,
	Reinette Chatre, linux-sgx, Thomas Bogendoerfer,
	Nathaniel McCallum, Greg Kroah-Hartman, linux-kernel,
	linux-fsdevel, Andrew Morton, Alexey Gladkov

On Mon, Mar 07, 2022 at 11:12:44AM +0100, David Hildenbrand wrote:
> On 06.03.22 06:32, Jarkko Sakkinen wrote:
> > For device memory (aka VM_IO | VM_PFNMAP) MAP_POPULATE does nothing. Allow
> > to use that for initializing the device memory by providing a new callback
> > f_ops->populate() for the purpose.
> > 
> > SGX patches are provided to show the callback in context.
> > 
> > An obvious alternative is a ioctl but it is less elegant and requires
> > two syscalls (mmap + ioctl) per memory range, instead of just one
> > (mmap).
> 
> What about extending MADV_POPULATE_READ | MADV_POPULATE_WRITE to support
> VM_IO | VM_PFNMAP (as well?) ?

What would be a proper point to bind that behaviour? For mmap/mprotect it'd
be probably populate_vma_page_range() because that would span both mmap()
and mprotect() (Dave's suggestion in this thread).

For MAP_POPULATE I did not have hard proof to show that it would be used
by other drivers but for madvice() you can find at least a few ioctl
based implementations:

$ git grep -e madv --and \( -e ioc \)  drivers/
drivers/gpu/drm/i915/gem/i915_gem_ioctls.h:int i915_gem_madvise_ioctl(struct drm_device *dev, void *data,
drivers/gpu/drm/i915/i915_driver.c:     DRM_IOCTL_DEF_DRV(I915_GEM_MADVISE, i915_gem_madvise_ioctl, DRM_RENDER_ALLOW),
drivers/gpu/drm/i915/i915_gem.c:i915_gem_madvise_ioctl(struct drm_device *dev, void *data,
drivers/gpu/drm/msm/msm_drv.c:static int msm_ioctl_gem_madvise(struct drm_device *dev, void *data,
drivers/gpu/drm/msm/msm_drv.c:  DRM_IOCTL_DEF_DRV(MSM_GEM_MADVISE,  msm_ioctl_gem_madvise,  DRM_RENDER_ALLOW),
drivers/gpu/drm/panfrost/panfrost_drv.c:static int panfrost_ioctl_madvise(struct drm_device *dev, void *data,
drivers/gpu/drm/vc4/vc4_drv.c:  DRM_IOCTL_DEF_DRV(VC4_GEM_MADVISE, vc4_gem_madvise_ioctl, DRM_RENDER_ALLOW),
drivers/gpu/drm/vc4/vc4_drv.h:int vc4_gem_madvise_ioctl(struct drm_device *dev, void *data,
drivers/gpu/drm/vc4/vc4_gem.c:int vc4_gem_madvise_ioctl(struct drm_device *dev, void *data,

IMHO this also provides supportive claim for MAP_POPULATE, and yeah, I
agree that to be consistent implementation, both madvice() and MAP_POPULATE
should work.

> -- 
> Thanks,
> 
> David / dhildenb

BR, Jarkko

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Intel-gfx] ✗ Fi.CI.BUILD: failure for MAP_POPULATE for device memory
  2022-03-06  5:32 ` Jarkko Sakkinen
                   ` (7 preceding siblings ...)
  (?)
@ 2022-03-07 14:23 ` Patchwork
  -1 siblings, 0 replies; 68+ messages in thread
From: Patchwork @ 2022-03-07 14:23 UTC (permalink / raw)
  To: Jarkko Sakkinen; +Cc: intel-gfx

== Series Details ==

Series: MAP_POPULATE for device memory
URL   : https://patchwork.freedesktop.org/series/101099/
State : failure

== Summary ==

Applying: mm: Add f_ops->populate()
Applying: x86/sgx: Export sgx_encl_page_alloc()
error: sha1 information is lacking or useless (arch/x86/kernel/cpu/sgx/encl.c).
error: could not build fake ancestor
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0002 x86/sgx: Export sgx_encl_page_alloc()
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH RFC 0/3] MAP_POPULATE for device memory
  2022-03-07 14:22     ` Jarkko Sakkinen
  (?)
@ 2022-03-07 14:33       ` David Hildenbrand
  -1 siblings, 0 replies; 68+ messages in thread
From: David Hildenbrand @ 2022-03-07 14:33 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: linux-mm, Dave Hansen, Nathaniel McCallum, Reinette Chatre,
	Andrew Morton, linux-sgx, linux-kernel, Florian Fainelli,
	Thomas Bogendoerfer, Matthew Auld, Thomas Hellström,
	Daniel Vetter, Jason Ekstrand, Chris Wilson, Maarten Lankhorst,
	Greg Kroah-Hartman, Tvrtko Ursulin, Vasily Averin, Shakeel Butt,
	Michal Hocko, zhangyiru, Alexey Gladkov, Alexander Mikhalitsyn,
	linux-mips, intel-gfx, dri-devel, codalist, linux-unionfs,
	linux-fsdevel

On 07.03.22 15:22, Jarkko Sakkinen wrote:
> On Mon, Mar 07, 2022 at 11:12:44AM +0100, David Hildenbrand wrote:
>> On 06.03.22 06:32, Jarkko Sakkinen wrote:
>>> For device memory (aka VM_IO | VM_PFNMAP) MAP_POPULATE does nothing. Allow
>>> to use that for initializing the device memory by providing a new callback
>>> f_ops->populate() for the purpose.
>>>
>>> SGX patches are provided to show the callback in context.
>>>
>>> An obvious alternative is a ioctl but it is less elegant and requires
>>> two syscalls (mmap + ioctl) per memory range, instead of just one
>>> (mmap).
>>
>> What about extending MADV_POPULATE_READ | MADV_POPULATE_WRITE to support
>> VM_IO | VM_PFNMAP (as well?) ?
> 
> What would be a proper point to bind that behaviour? For mmap/mprotect it'd
> be probably populate_vma_page_range() because that would span both mmap()
> and mprotect() (Dave's suggestion in this thread).

MADV_POPULATE_* ends up in faultin_vma_page_range(), right next to
populate_vma_page_range(). So it might require a similar way to hook
into the driver I guess.

> 
> For MAP_POPULATE I did not have hard proof to show that it would be used
> by other drivers but for madvice() you can find at least a few ioctl
> based implementations:
> 
> $ git grep -e madv --and \( -e ioc \)  drivers/
> drivers/gpu/drm/i915/gem/i915_gem_ioctls.h:int i915_gem_madvise_ioctl(struct drm_device *dev, void *data,
> drivers/gpu/drm/i915/i915_driver.c:     DRM_IOCTL_DEF_DRV(I915_GEM_MADVISE, i915_gem_madvise_ioctl, DRM_RENDER_ALLOW),
> drivers/gpu/drm/i915/i915_gem.c:i915_gem_madvise_ioctl(struct drm_device *dev, void *data,
> drivers/gpu/drm/msm/msm_drv.c:static int msm_ioctl_gem_madvise(struct drm_device *dev, void *data,
> drivers/gpu/drm/msm/msm_drv.c:  DRM_IOCTL_DEF_DRV(MSM_GEM_MADVISE,  msm_ioctl_gem_madvise,  DRM_RENDER_ALLOW),
> drivers/gpu/drm/panfrost/panfrost_drv.c:static int panfrost_ioctl_madvise(struct drm_device *dev, void *data,
> drivers/gpu/drm/vc4/vc4_drv.c:  DRM_IOCTL_DEF_DRV(VC4_GEM_MADVISE, vc4_gem_madvise_ioctl, DRM_RENDER_ALLOW),
> drivers/gpu/drm/vc4/vc4_drv.h:int vc4_gem_madvise_ioctl(struct drm_device *dev, void *data,
> drivers/gpu/drm/vc4/vc4_gem.c:int vc4_gem_madvise_ioctl(struct drm_device *dev, void *data,
> 
> IMHO this also provides supportive claim for MAP_POPULATE, and yeah, I
> agree that to be consistent implementation, both madvice() and MAP_POPULATE
> should work.

MADV_POPULATE_WRITE + MADV_DONTNEED/FALLOC_FL_PUNCH_HOLE is one way to
dynamically manage memory consumption inside a sparse memory mapping
(preallocate/populate via MADV_POPULATE_WRITE, discard via
MADV_DONTNEED/FALLOC_FL_PUNCH_HOLE).  Extending that whole mechanism to
deal with VM_IO | VM_PFNMAP mappings as well could be interesting.

At least I herd about some ideas where we might want to dynamically
expose memory to a VM (via virtio-mem) inside a sparse memory mapping,
and the memory in that sparse memory mapping is provided from a
dedicated memory pool managed by a device driver -- not just using
ordinary anonymous/file/hugetlb memory as we do right now.

Now, this is certainly stuff for the future, I just wanted to mention it.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH RFC 0/3] MAP_POPULATE for device memory
@ 2022-03-07 14:33       ` David Hildenbrand
  0 siblings, 0 replies; 68+ messages in thread
From: David Hildenbrand @ 2022-03-07 14:33 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Michal Hocko, zhangyiru, Daniel Vetter, Dave Hansen, dri-devel,
	Chris Wilson, linux-mm, Jason Ekstrand, Alexander Mikhalitsyn,
	Florian Fainelli, linux-unionfs, codalist, Matthew Auld,
	Vasily Averin, Thomas Hellström, intel-gfx, linux-mips,
	Shakeel Butt, Reinette Chatre, linux-sgx, Thomas Bogendoerfer,
	Nathaniel McCallum, Greg Kroah-Hartman, Tvrtko Ursulin,
	linux-kernel, linux-fsdevel, Andrew Morton, Alexey Gladkov

On 07.03.22 15:22, Jarkko Sakkinen wrote:
> On Mon, Mar 07, 2022 at 11:12:44AM +0100, David Hildenbrand wrote:
>> On 06.03.22 06:32, Jarkko Sakkinen wrote:
>>> For device memory (aka VM_IO | VM_PFNMAP) MAP_POPULATE does nothing. Allow
>>> to use that for initializing the device memory by providing a new callback
>>> f_ops->populate() for the purpose.
>>>
>>> SGX patches are provided to show the callback in context.
>>>
>>> An obvious alternative is a ioctl but it is less elegant and requires
>>> two syscalls (mmap + ioctl) per memory range, instead of just one
>>> (mmap).
>>
>> What about extending MADV_POPULATE_READ | MADV_POPULATE_WRITE to support
>> VM_IO | VM_PFNMAP (as well?) ?
> 
> What would be a proper point to bind that behaviour? For mmap/mprotect it'd
> be probably populate_vma_page_range() because that would span both mmap()
> and mprotect() (Dave's suggestion in this thread).

MADV_POPULATE_* ends up in faultin_vma_page_range(), right next to
populate_vma_page_range(). So it might require a similar way to hook
into the driver I guess.

> 
> For MAP_POPULATE I did not have hard proof to show that it would be used
> by other drivers but for madvice() you can find at least a few ioctl
> based implementations:
> 
> $ git grep -e madv --and \( -e ioc \)  drivers/
> drivers/gpu/drm/i915/gem/i915_gem_ioctls.h:int i915_gem_madvise_ioctl(struct drm_device *dev, void *data,
> drivers/gpu/drm/i915/i915_driver.c:     DRM_IOCTL_DEF_DRV(I915_GEM_MADVISE, i915_gem_madvise_ioctl, DRM_RENDER_ALLOW),
> drivers/gpu/drm/i915/i915_gem.c:i915_gem_madvise_ioctl(struct drm_device *dev, void *data,
> drivers/gpu/drm/msm/msm_drv.c:static int msm_ioctl_gem_madvise(struct drm_device *dev, void *data,
> drivers/gpu/drm/msm/msm_drv.c:  DRM_IOCTL_DEF_DRV(MSM_GEM_MADVISE,  msm_ioctl_gem_madvise,  DRM_RENDER_ALLOW),
> drivers/gpu/drm/panfrost/panfrost_drv.c:static int panfrost_ioctl_madvise(struct drm_device *dev, void *data,
> drivers/gpu/drm/vc4/vc4_drv.c:  DRM_IOCTL_DEF_DRV(VC4_GEM_MADVISE, vc4_gem_madvise_ioctl, DRM_RENDER_ALLOW),
> drivers/gpu/drm/vc4/vc4_drv.h:int vc4_gem_madvise_ioctl(struct drm_device *dev, void *data,
> drivers/gpu/drm/vc4/vc4_gem.c:int vc4_gem_madvise_ioctl(struct drm_device *dev, void *data,
> 
> IMHO this also provides supportive claim for MAP_POPULATE, and yeah, I
> agree that to be consistent implementation, both madvice() and MAP_POPULATE
> should work.

MADV_POPULATE_WRITE + MADV_DONTNEED/FALLOC_FL_PUNCH_HOLE is one way to
dynamically manage memory consumption inside a sparse memory mapping
(preallocate/populate via MADV_POPULATE_WRITE, discard via
MADV_DONTNEED/FALLOC_FL_PUNCH_HOLE).  Extending that whole mechanism to
deal with VM_IO | VM_PFNMAP mappings as well could be interesting.

At least I herd about some ideas where we might want to dynamically
expose memory to a VM (via virtio-mem) inside a sparse memory mapping,
and the memory in that sparse memory mapping is provided from a
dedicated memory pool managed by a device driver -- not just using
ordinary anonymous/file/hugetlb memory as we do right now.

Now, this is certainly stuff for the future, I just wanted to mention it.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Intel-gfx] [PATCH RFC 0/3] MAP_POPULATE for device memory
@ 2022-03-07 14:33       ` David Hildenbrand
  0 siblings, 0 replies; 68+ messages in thread
From: David Hildenbrand @ 2022-03-07 14:33 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Michal Hocko, zhangyiru, Daniel Vetter, Dave Hansen, dri-devel,
	Chris Wilson, linux-mm, Alexander Mikhalitsyn, Florian Fainelli,
	linux-unionfs, codalist, Matthew Auld, Vasily Averin,
	Thomas Hellström, intel-gfx, linux-mips, Shakeel Butt,
	Reinette Chatre, linux-sgx, Thomas Bogendoerfer,
	Nathaniel McCallum, Greg Kroah-Hartman, linux-kernel,
	linux-fsdevel, Andrew Morton, Alexey Gladkov

On 07.03.22 15:22, Jarkko Sakkinen wrote:
> On Mon, Mar 07, 2022 at 11:12:44AM +0100, David Hildenbrand wrote:
>> On 06.03.22 06:32, Jarkko Sakkinen wrote:
>>> For device memory (aka VM_IO | VM_PFNMAP) MAP_POPULATE does nothing. Allow
>>> to use that for initializing the device memory by providing a new callback
>>> f_ops->populate() for the purpose.
>>>
>>> SGX patches are provided to show the callback in context.
>>>
>>> An obvious alternative is a ioctl but it is less elegant and requires
>>> two syscalls (mmap + ioctl) per memory range, instead of just one
>>> (mmap).
>>
>> What about extending MADV_POPULATE_READ | MADV_POPULATE_WRITE to support
>> VM_IO | VM_PFNMAP (as well?) ?
> 
> What would be a proper point to bind that behaviour? For mmap/mprotect it'd
> be probably populate_vma_page_range() because that would span both mmap()
> and mprotect() (Dave's suggestion in this thread).

MADV_POPULATE_* ends up in faultin_vma_page_range(), right next to
populate_vma_page_range(). So it might require a similar way to hook
into the driver I guess.

> 
> For MAP_POPULATE I did not have hard proof to show that it would be used
> by other drivers but for madvice() you can find at least a few ioctl
> based implementations:
> 
> $ git grep -e madv --and \( -e ioc \)  drivers/
> drivers/gpu/drm/i915/gem/i915_gem_ioctls.h:int i915_gem_madvise_ioctl(struct drm_device *dev, void *data,
> drivers/gpu/drm/i915/i915_driver.c:     DRM_IOCTL_DEF_DRV(I915_GEM_MADVISE, i915_gem_madvise_ioctl, DRM_RENDER_ALLOW),
> drivers/gpu/drm/i915/i915_gem.c:i915_gem_madvise_ioctl(struct drm_device *dev, void *data,
> drivers/gpu/drm/msm/msm_drv.c:static int msm_ioctl_gem_madvise(struct drm_device *dev, void *data,
> drivers/gpu/drm/msm/msm_drv.c:  DRM_IOCTL_DEF_DRV(MSM_GEM_MADVISE,  msm_ioctl_gem_madvise,  DRM_RENDER_ALLOW),
> drivers/gpu/drm/panfrost/panfrost_drv.c:static int panfrost_ioctl_madvise(struct drm_device *dev, void *data,
> drivers/gpu/drm/vc4/vc4_drv.c:  DRM_IOCTL_DEF_DRV(VC4_GEM_MADVISE, vc4_gem_madvise_ioctl, DRM_RENDER_ALLOW),
> drivers/gpu/drm/vc4/vc4_drv.h:int vc4_gem_madvise_ioctl(struct drm_device *dev, void *data,
> drivers/gpu/drm/vc4/vc4_gem.c:int vc4_gem_madvise_ioctl(struct drm_device *dev, void *data,
> 
> IMHO this also provides supportive claim for MAP_POPULATE, and yeah, I
> agree that to be consistent implementation, both madvice() and MAP_POPULATE
> should work.

MADV_POPULATE_WRITE + MADV_DONTNEED/FALLOC_FL_PUNCH_HOLE is one way to
dynamically manage memory consumption inside a sparse memory mapping
(preallocate/populate via MADV_POPULATE_WRITE, discard via
MADV_DONTNEED/FALLOC_FL_PUNCH_HOLE).  Extending that whole mechanism to
deal with VM_IO | VM_PFNMAP mappings as well could be interesting.

At least I herd about some ideas where we might want to dynamically
expose memory to a VM (via virtio-mem) inside a sparse memory mapping,
and the memory in that sparse memory mapping is provided from a
dedicated memory pool managed by a device driver -- not just using
ordinary anonymous/file/hugetlb memory as we do right now.

Now, this is certainly stuff for the future, I just wanted to mention it.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH RFC 0/3] MAP_POPULATE for device memory
  2022-03-07 14:33       ` David Hildenbrand
  (?)
@ 2022-03-07 15:49         ` Jarkko Sakkinen
  -1 siblings, 0 replies; 68+ messages in thread
From: Jarkko Sakkinen @ 2022-03-07 15:49 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, Dave Hansen, Nathaniel McCallum, Reinette Chatre,
	Andrew Morton, linux-sgx, linux-kernel, Florian Fainelli,
	Thomas Bogendoerfer, Matthew Auld, Thomas Hellström,
	Daniel Vetter, Jason Ekstrand, Chris Wilson, Maarten Lankhorst,
	Greg Kroah-Hartman, Tvrtko Ursulin, Vasily Averin, Shakeel Butt,
	Michal Hocko, zhangyiru, Alexey Gladkov, Alexander Mikhalitsyn,
	linux-mips, intel-gfx, dri-devel, codalist, linux-unionfs,
	linux-fsdevel

On Mon, Mar 07, 2022 at 03:33:52PM +0100, David Hildenbrand wrote:
> On 07.03.22 15:22, Jarkko Sakkinen wrote:
> > On Mon, Mar 07, 2022 at 11:12:44AM +0100, David Hildenbrand wrote:
> >> On 06.03.22 06:32, Jarkko Sakkinen wrote:
> >>> For device memory (aka VM_IO | VM_PFNMAP) MAP_POPULATE does nothing. Allow
> >>> to use that for initializing the device memory by providing a new callback
> >>> f_ops->populate() for the purpose.
> >>>
> >>> SGX patches are provided to show the callback in context.
> >>>
> >>> An obvious alternative is a ioctl but it is less elegant and requires
> >>> two syscalls (mmap + ioctl) per memory range, instead of just one
> >>> (mmap).
> >>
> >> What about extending MADV_POPULATE_READ | MADV_POPULATE_WRITE to support
> >> VM_IO | VM_PFNMAP (as well?) ?
> > 
> > What would be a proper point to bind that behaviour? For mmap/mprotect it'd
> > be probably populate_vma_page_range() because that would span both mmap()
> > and mprotect() (Dave's suggestion in this thread).
> 
> MADV_POPULATE_* ends up in faultin_vma_page_range(), right next to
> populate_vma_page_range(). So it might require a similar way to hook
> into the driver I guess.
> 
> > 
> > For MAP_POPULATE I did not have hard proof to show that it would be used
> > by other drivers but for madvice() you can find at least a few ioctl
> > based implementations:
> > 
> > $ git grep -e madv --and \( -e ioc \)  drivers/
> > drivers/gpu/drm/i915/gem/i915_gem_ioctls.h:int i915_gem_madvise_ioctl(struct drm_device *dev, void *data,
> > drivers/gpu/drm/i915/i915_driver.c:     DRM_IOCTL_DEF_DRV(I915_GEM_MADVISE, i915_gem_madvise_ioctl, DRM_RENDER_ALLOW),
> > drivers/gpu/drm/i915/i915_gem.c:i915_gem_madvise_ioctl(struct drm_device *dev, void *data,
> > drivers/gpu/drm/msm/msm_drv.c:static int msm_ioctl_gem_madvise(struct drm_device *dev, void *data,
> > drivers/gpu/drm/msm/msm_drv.c:  DRM_IOCTL_DEF_DRV(MSM_GEM_MADVISE,  msm_ioctl_gem_madvise,  DRM_RENDER_ALLOW),
> > drivers/gpu/drm/panfrost/panfrost_drv.c:static int panfrost_ioctl_madvise(struct drm_device *dev, void *data,
> > drivers/gpu/drm/vc4/vc4_drv.c:  DRM_IOCTL_DEF_DRV(VC4_GEM_MADVISE, vc4_gem_madvise_ioctl, DRM_RENDER_ALLOW),
> > drivers/gpu/drm/vc4/vc4_drv.h:int vc4_gem_madvise_ioctl(struct drm_device *dev, void *data,
> > drivers/gpu/drm/vc4/vc4_gem.c:int vc4_gem_madvise_ioctl(struct drm_device *dev, void *data,
> > 
> > IMHO this also provides supportive claim for MAP_POPULATE, and yeah, I
> > agree that to be consistent implementation, both madvice() and MAP_POPULATE
> > should work.
> 
> MADV_POPULATE_WRITE + MADV_DONTNEED/FALLOC_FL_PUNCH_HOLE is one way to
> dynamically manage memory consumption inside a sparse memory mapping
> (preallocate/populate via MADV_POPULATE_WRITE, discard via
> MADV_DONTNEED/FALLOC_FL_PUNCH_HOLE).  Extending that whole mechanism to
> deal with VM_IO | VM_PFNMAP mappings as well could be interesting.
> 
> At least I herd about some ideas where we might want to dynamically
> expose memory to a VM (via virtio-mem) inside a sparse memory mapping,
> and the memory in that sparse memory mapping is provided from a
> dedicated memory pool managed by a device driver -- not just using
> ordinary anonymous/file/hugetlb memory as we do right now.
> 
> Now, this is certainly stuff for the future, I just wanted to mention it.

For SGX purposes I'm now studying the possibly to use ra_state to get
idea where do "prefetching" (EAUG's) in batches, as it is something
that would not require any intrusive changes to mm but thank you for
sharing this. Looking into implementing this properly is the 2nd option,
if that does not work out.

> -- 
> Thanks,
> 
> David / dhildenb

BR, Jarkko

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH RFC 0/3] MAP_POPULATE for device memory
@ 2022-03-07 15:49         ` Jarkko Sakkinen
  0 siblings, 0 replies; 68+ messages in thread
From: Jarkko Sakkinen @ 2022-03-07 15:49 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Michal Hocko, zhangyiru, Daniel Vetter, Dave Hansen, dri-devel,
	Chris Wilson, linux-mm, Jason Ekstrand, Alexander Mikhalitsyn,
	Florian Fainelli, linux-unionfs, codalist, Matthew Auld,
	Vasily Averin, Thomas Hellström, intel-gfx, linux-mips,
	Shakeel Butt, Reinette Chatre, linux-sgx, Thomas Bogendoerfer,
	Nathaniel McCallum, Greg Kroah-Hartman, Tvrtko Ursulin,
	linux-kernel, linux-fsdevel, Andrew Morton, Alexey Gladkov

On Mon, Mar 07, 2022 at 03:33:52PM +0100, David Hildenbrand wrote:
> On 07.03.22 15:22, Jarkko Sakkinen wrote:
> > On Mon, Mar 07, 2022 at 11:12:44AM +0100, David Hildenbrand wrote:
> >> On 06.03.22 06:32, Jarkko Sakkinen wrote:
> >>> For device memory (aka VM_IO | VM_PFNMAP) MAP_POPULATE does nothing. Allow
> >>> to use that for initializing the device memory by providing a new callback
> >>> f_ops->populate() for the purpose.
> >>>
> >>> SGX patches are provided to show the callback in context.
> >>>
> >>> An obvious alternative is a ioctl but it is less elegant and requires
> >>> two syscalls (mmap + ioctl) per memory range, instead of just one
> >>> (mmap).
> >>
> >> What about extending MADV_POPULATE_READ | MADV_POPULATE_WRITE to support
> >> VM_IO | VM_PFNMAP (as well?) ?
> > 
> > What would be a proper point to bind that behaviour? For mmap/mprotect it'd
> > be probably populate_vma_page_range() because that would span both mmap()
> > and mprotect() (Dave's suggestion in this thread).
> 
> MADV_POPULATE_* ends up in faultin_vma_page_range(), right next to
> populate_vma_page_range(). So it might require a similar way to hook
> into the driver I guess.
> 
> > 
> > For MAP_POPULATE I did not have hard proof to show that it would be used
> > by other drivers but for madvice() you can find at least a few ioctl
> > based implementations:
> > 
> > $ git grep -e madv --and \( -e ioc \)  drivers/
> > drivers/gpu/drm/i915/gem/i915_gem_ioctls.h:int i915_gem_madvise_ioctl(struct drm_device *dev, void *data,
> > drivers/gpu/drm/i915/i915_driver.c:     DRM_IOCTL_DEF_DRV(I915_GEM_MADVISE, i915_gem_madvise_ioctl, DRM_RENDER_ALLOW),
> > drivers/gpu/drm/i915/i915_gem.c:i915_gem_madvise_ioctl(struct drm_device *dev, void *data,
> > drivers/gpu/drm/msm/msm_drv.c:static int msm_ioctl_gem_madvise(struct drm_device *dev, void *data,
> > drivers/gpu/drm/msm/msm_drv.c:  DRM_IOCTL_DEF_DRV(MSM_GEM_MADVISE,  msm_ioctl_gem_madvise,  DRM_RENDER_ALLOW),
> > drivers/gpu/drm/panfrost/panfrost_drv.c:static int panfrost_ioctl_madvise(struct drm_device *dev, void *data,
> > drivers/gpu/drm/vc4/vc4_drv.c:  DRM_IOCTL_DEF_DRV(VC4_GEM_MADVISE, vc4_gem_madvise_ioctl, DRM_RENDER_ALLOW),
> > drivers/gpu/drm/vc4/vc4_drv.h:int vc4_gem_madvise_ioctl(struct drm_device *dev, void *data,
> > drivers/gpu/drm/vc4/vc4_gem.c:int vc4_gem_madvise_ioctl(struct drm_device *dev, void *data,
> > 
> > IMHO this also provides supportive claim for MAP_POPULATE, and yeah, I
> > agree that to be consistent implementation, both madvice() and MAP_POPULATE
> > should work.
> 
> MADV_POPULATE_WRITE + MADV_DONTNEED/FALLOC_FL_PUNCH_HOLE is one way to
> dynamically manage memory consumption inside a sparse memory mapping
> (preallocate/populate via MADV_POPULATE_WRITE, discard via
> MADV_DONTNEED/FALLOC_FL_PUNCH_HOLE).  Extending that whole mechanism to
> deal with VM_IO | VM_PFNMAP mappings as well could be interesting.
> 
> At least I herd about some ideas where we might want to dynamically
> expose memory to a VM (via virtio-mem) inside a sparse memory mapping,
> and the memory in that sparse memory mapping is provided from a
> dedicated memory pool managed by a device driver -- not just using
> ordinary anonymous/file/hugetlb memory as we do right now.
> 
> Now, this is certainly stuff for the future, I just wanted to mention it.

For SGX purposes I'm now studying the possibly to use ra_state to get
idea where do "prefetching" (EAUG's) in batches, as it is something
that would not require any intrusive changes to mm but thank you for
sharing this. Looking into implementing this properly is the 2nd option,
if that does not work out.

> -- 
> Thanks,
> 
> David / dhildenb

BR, Jarkko

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Intel-gfx] [PATCH RFC 0/3] MAP_POPULATE for device memory
@ 2022-03-07 15:49         ` Jarkko Sakkinen
  0 siblings, 0 replies; 68+ messages in thread
From: Jarkko Sakkinen @ 2022-03-07 15:49 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Michal Hocko, zhangyiru, Daniel Vetter, Dave Hansen, dri-devel,
	Chris Wilson, linux-mm, Alexander Mikhalitsyn, Florian Fainelli,
	linux-unionfs, codalist, Matthew Auld, Vasily Averin,
	Thomas Hellström, intel-gfx, linux-mips, Shakeel Butt,
	Reinette Chatre, linux-sgx, Thomas Bogendoerfer,
	Nathaniel McCallum, Greg Kroah-Hartman, linux-kernel,
	linux-fsdevel, Andrew Morton, Alexey Gladkov

On Mon, Mar 07, 2022 at 03:33:52PM +0100, David Hildenbrand wrote:
> On 07.03.22 15:22, Jarkko Sakkinen wrote:
> > On Mon, Mar 07, 2022 at 11:12:44AM +0100, David Hildenbrand wrote:
> >> On 06.03.22 06:32, Jarkko Sakkinen wrote:
> >>> For device memory (aka VM_IO | VM_PFNMAP) MAP_POPULATE does nothing. Allow
> >>> to use that for initializing the device memory by providing a new callback
> >>> f_ops->populate() for the purpose.
> >>>
> >>> SGX patches are provided to show the callback in context.
> >>>
> >>> An obvious alternative is a ioctl but it is less elegant and requires
> >>> two syscalls (mmap + ioctl) per memory range, instead of just one
> >>> (mmap).
> >>
> >> What about extending MADV_POPULATE_READ | MADV_POPULATE_WRITE to support
> >> VM_IO | VM_PFNMAP (as well?) ?
> > 
> > What would be a proper point to bind that behaviour? For mmap/mprotect it'd
> > be probably populate_vma_page_range() because that would span both mmap()
> > and mprotect() (Dave's suggestion in this thread).
> 
> MADV_POPULATE_* ends up in faultin_vma_page_range(), right next to
> populate_vma_page_range(). So it might require a similar way to hook
> into the driver I guess.
> 
> > 
> > For MAP_POPULATE I did not have hard proof to show that it would be used
> > by other drivers but for madvice() you can find at least a few ioctl
> > based implementations:
> > 
> > $ git grep -e madv --and \( -e ioc \)  drivers/
> > drivers/gpu/drm/i915/gem/i915_gem_ioctls.h:int i915_gem_madvise_ioctl(struct drm_device *dev, void *data,
> > drivers/gpu/drm/i915/i915_driver.c:     DRM_IOCTL_DEF_DRV(I915_GEM_MADVISE, i915_gem_madvise_ioctl, DRM_RENDER_ALLOW),
> > drivers/gpu/drm/i915/i915_gem.c:i915_gem_madvise_ioctl(struct drm_device *dev, void *data,
> > drivers/gpu/drm/msm/msm_drv.c:static int msm_ioctl_gem_madvise(struct drm_device *dev, void *data,
> > drivers/gpu/drm/msm/msm_drv.c:  DRM_IOCTL_DEF_DRV(MSM_GEM_MADVISE,  msm_ioctl_gem_madvise,  DRM_RENDER_ALLOW),
> > drivers/gpu/drm/panfrost/panfrost_drv.c:static int panfrost_ioctl_madvise(struct drm_device *dev, void *data,
> > drivers/gpu/drm/vc4/vc4_drv.c:  DRM_IOCTL_DEF_DRV(VC4_GEM_MADVISE, vc4_gem_madvise_ioctl, DRM_RENDER_ALLOW),
> > drivers/gpu/drm/vc4/vc4_drv.h:int vc4_gem_madvise_ioctl(struct drm_device *dev, void *data,
> > drivers/gpu/drm/vc4/vc4_gem.c:int vc4_gem_madvise_ioctl(struct drm_device *dev, void *data,
> > 
> > IMHO this also provides supportive claim for MAP_POPULATE, and yeah, I
> > agree that to be consistent implementation, both madvice() and MAP_POPULATE
> > should work.
> 
> MADV_POPULATE_WRITE + MADV_DONTNEED/FALLOC_FL_PUNCH_HOLE is one way to
> dynamically manage memory consumption inside a sparse memory mapping
> (preallocate/populate via MADV_POPULATE_WRITE, discard via
> MADV_DONTNEED/FALLOC_FL_PUNCH_HOLE).  Extending that whole mechanism to
> deal with VM_IO | VM_PFNMAP mappings as well could be interesting.
> 
> At least I herd about some ideas where we might want to dynamically
> expose memory to a VM (via virtio-mem) inside a sparse memory mapping,
> and the memory in that sparse memory mapping is provided from a
> dedicated memory pool managed by a device driver -- not just using
> ordinary anonymous/file/hugetlb memory as we do right now.
> 
> Now, this is certainly stuff for the future, I just wanted to mention it.

For SGX purposes I'm now studying the possibly to use ra_state to get
idea where do "prefetching" (EAUG's) in batches, as it is something
that would not require any intrusive changes to mm but thank you for
sharing this. Looking into implementing this properly is the 2nd option,
if that does not work out.

> -- 
> Thanks,
> 
> David / dhildenb

BR, Jarkko

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH RFC 0/3] MAP_POPULATE for device memory
  2022-03-07 13:29       ` Jarkko Sakkinen
@ 2022-03-07 15:56         ` Christoph Hellwig
  -1 siblings, 0 replies; 68+ messages in thread
From: Christoph Hellwig @ 2022-03-07 15:56 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Christoph Hellwig, Matthew Wilcox, linux-mm, Dave Hansen,
	Nathaniel McCallum, Reinette Chatre, Andrew Morton, linux-sgx,
	linux-kernel, Florian Fainelli, Thomas Bogendoerfer,
	Matthew Auld, Thomas Hellström, Daniel Vetter,
	Jason Ekstrand, Chris Wilson, G, Maarten Lankhorst,
	Greg Kroah-Hartman, Tvrtko Ursulin, Vasily Averin, Shakeel Butt,
	Michal Hocko, zhangyiru, Alexey Gladkov, Alexander Mikhalitsyn,
	linux-mips, intel-gfx, dri-devel, codalist, linux-unionfs,
	linux-fsdevel

On Mon, Mar 07, 2022 at 03:29:35PM +0200, Jarkko Sakkinen wrote:
> So what would you suggest to sort out the issue? I'm happy to go with
> ioctl if nothing else is acceptable.

PLenty of drivers treat all mmaps as if MAP_POPULATE was specified,
typically by using (io_)remap_pfn_range.  If there any reason to only
optionally have the pre-fault semantics for sgx?  If not this should
be really simple.  And if we have a real need for it to be optional
we'll just need to find a sane way to pass that information to ->mmap.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Intel-gfx] [PATCH RFC 0/3] MAP_POPULATE for device memory
@ 2022-03-07 15:56         ` Christoph Hellwig
  0 siblings, 0 replies; 68+ messages in thread
From: Christoph Hellwig @ 2022-03-07 15:56 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Michal Hocko, zhangyiru, Daniel Vetter, Dave Hansen, dri-devel,
	Chris Wilson, linux-mm, Alexander Mikhalitsyn, Florian Fainelli,
	linux-unionfs, Matthew Wilcox, codalist, Christoph Hellwig,
	Matthew Auld, Vasily Averin, Thomas Hellström, intel-gfx, G,
	linux-mips, Shakeel Butt, Reinette Chatre, linux-sgx,
	Thomas Bogendoerfer, Nathaniel McCallum, Greg Kroah-Hartman,
	linux-kernel, linux-fsdevel, Andrew Morton, Alexey Gladkov

On Mon, Mar 07, 2022 at 03:29:35PM +0200, Jarkko Sakkinen wrote:
> So what would you suggest to sort out the issue? I'm happy to go with
> ioctl if nothing else is acceptable.

PLenty of drivers treat all mmaps as if MAP_POPULATE was specified,
typically by using (io_)remap_pfn_range.  If there any reason to only
optionally have the pre-fault semantics for sgx?  If not this should
be really simple.  And if we have a real need for it to be optional
we'll just need to find a sane way to pass that information to ->mmap.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH RFC 0/3] MAP_POPULATE for device memory
  2022-03-07 15:56         ` [Intel-gfx] " Christoph Hellwig
  (?)
@ 2022-03-07 15:58           ` Jarkko Sakkinen
  -1 siblings, 0 replies; 68+ messages in thread
From: Jarkko Sakkinen @ 2022-03-07 15:58 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Matthew Wilcox, linux-mm, Dave Hansen, Nathaniel McCallum,
	Reinette Chatre, Andrew Morton, linux-sgx, linux-kernel,
	Florian Fainelli, Thomas Bogendoerfer, Matthew Auld,
	Thomas Hellström, Daniel Vetter, Jason Ekstrand,
	Chris Wilson, G, Maarten Lankhorst, Greg Kroah-Hartman,
	Tvrtko Ursulin, Vasily Averin, Shakeel Butt, Michal Hocko,
	zhangyiru, Alexey Gladkov, Alexander Mikhalitsyn, linux-mips,
	intel-gfx, dri-devel, codalist, linux-unionfs, linux-fsdevel

On Mon, Mar 07, 2022 at 07:56:53AM -0800, Christoph Hellwig wrote:
> On Mon, Mar 07, 2022 at 03:29:35PM +0200, Jarkko Sakkinen wrote:
> > So what would you suggest to sort out the issue? I'm happy to go with
> > ioctl if nothing else is acceptable.
> 
> PLenty of drivers treat all mmaps as if MAP_POPULATE was specified,
> typically by using (io_)remap_pfn_range.  If there any reason to only
> optionally have the pre-fault semantics for sgx?  If not this should
> be really simple.  And if we have a real need for it to be optional
> we'll just need to find a sane way to pass that information to ->mmap.

Dave, what if mmap() would just unconditionally EAUG after initialization?

It's an option, yes.

BR, Jarkko

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH RFC 0/3] MAP_POPULATE for device memory
@ 2022-03-07 15:58           ` Jarkko Sakkinen
  0 siblings, 0 replies; 68+ messages in thread
From: Jarkko Sakkinen @ 2022-03-07 15:58 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Michal Hocko, zhangyiru, Daniel Vetter, Dave Hansen, dri-devel,
	Chris Wilson, linux-mm, Jason Ekstrand, Alexander Mikhalitsyn,
	Florian Fainelli, linux-unionfs, Matthew Wilcox, codalist,
	Matthew Auld, Vasily Averin, Thomas Hellström, intel-gfx, G,
	linux-mips, Shakeel Butt, Reinette Chatre, linux-sgx,
	Thomas Bogendoerfer, Nathaniel McCallum, Greg Kroah-Hartman,
	Tvrtko Ursulin, linux-kernel, linux-fsdevel, Andrew Morton,
	Alexey Gladkov

On Mon, Mar 07, 2022 at 07:56:53AM -0800, Christoph Hellwig wrote:
> On Mon, Mar 07, 2022 at 03:29:35PM +0200, Jarkko Sakkinen wrote:
> > So what would you suggest to sort out the issue? I'm happy to go with
> > ioctl if nothing else is acceptable.
> 
> PLenty of drivers treat all mmaps as if MAP_POPULATE was specified,
> typically by using (io_)remap_pfn_range.  If there any reason to only
> optionally have the pre-fault semantics for sgx?  If not this should
> be really simple.  And if we have a real need for it to be optional
> we'll just need to find a sane way to pass that information to ->mmap.

Dave, what if mmap() would just unconditionally EAUG after initialization?

It's an option, yes.

BR, Jarkko

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Intel-gfx] [PATCH RFC 0/3] MAP_POPULATE for device memory
@ 2022-03-07 15:58           ` Jarkko Sakkinen
  0 siblings, 0 replies; 68+ messages in thread
From: Jarkko Sakkinen @ 2022-03-07 15:58 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Michal Hocko, zhangyiru, Daniel Vetter, Dave Hansen, dri-devel,
	Chris Wilson, linux-mm, Alexander Mikhalitsyn, Florian Fainelli,
	linux-unionfs, Matthew Wilcox, codalist, Matthew Auld,
	Vasily Averin, Thomas Hellström, intel-gfx, G, linux-mips,
	Shakeel Butt, Reinette Chatre, linux-sgx, Thomas Bogendoerfer,
	Nathaniel McCallum, Greg Kroah-Hartman, linux-kernel,
	linux-fsdevel, Andrew Morton, Alexey Gladkov

On Mon, Mar 07, 2022 at 07:56:53AM -0800, Christoph Hellwig wrote:
> On Mon, Mar 07, 2022 at 03:29:35PM +0200, Jarkko Sakkinen wrote:
> > So what would you suggest to sort out the issue? I'm happy to go with
> > ioctl if nothing else is acceptable.
> 
> PLenty of drivers treat all mmaps as if MAP_POPULATE was specified,
> typically by using (io_)remap_pfn_range.  If there any reason to only
> optionally have the pre-fault semantics for sgx?  If not this should
> be really simple.  And if we have a real need for it to be optional
> we'll just need to find a sane way to pass that information to ->mmap.

Dave, what if mmap() would just unconditionally EAUG after initialization?

It's an option, yes.

BR, Jarkko

^ permalink raw reply	[flat|nested] 68+ messages in thread

* RE: [PATCH RFC 0/3] MAP_POPULATE for device memory
  2022-03-07 15:56         ` [Intel-gfx] " Christoph Hellwig
  (?)
@ 2022-03-07 22:11           ` David Laight
  -1 siblings, 0 replies; 68+ messages in thread
From: David Laight @ 2022-03-07 22:11 UTC (permalink / raw)
  To: 'Christoph Hellwig', Jarkko Sakkinen
  Cc: Matthew Wilcox, linux-mm, Dave Hansen, Nathaniel McCallum,
	Reinette Chatre, Andrew Morton, linux-sgx, linux-kernel,
	Florian Fainelli, Thomas Bogendoerfer, Matthew Auld,
	Thomas Hellström, Daniel Vetter, Jason Ekstrand,
	Chris Wilson, G, Maarten Lankhorst, Greg Kroah-Hartman,
	Tvrtko Ursulin, Vasily Averin, Shakeel Butt, Michal Hocko,
	zhangyiru, Alexey Gladkov, Alexander Mikhalitsyn, linux-mips,
	intel-gfx, dri-devel, codalist, linux-unionfs, linux-fsdevel

From: Christoph Hellwig
> Sent: 07 March 2022 15:57
> 
> On Mon, Mar 07, 2022 at 03:29:35PM +0200, Jarkko Sakkinen wrote:
> > So what would you suggest to sort out the issue? I'm happy to go with
> > ioctl if nothing else is acceptable.
> 
> PLenty of drivers treat all mmaps as if MAP_POPULATE was specified,
> typically by using (io_)remap_pfn_range.  If there any reason to only
> optionally have the pre-fault semantics for sgx?  If not this should
> be really simple.  And if we have a real need for it to be optional
> we'll just need to find a sane way to pass that information to ->mmap.

Is there any space in vma->vm_flags ?

That would be better than an extra argument or function.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 68+ messages in thread

* RE: [PATCH RFC 0/3] MAP_POPULATE for device memory
@ 2022-03-07 22:11           ` David Laight
  0 siblings, 0 replies; 68+ messages in thread
From: David Laight @ 2022-03-07 22:11 UTC (permalink / raw)
  To: 'Christoph Hellwig', Jarkko Sakkinen
  Cc: Michal Hocko, zhangyiru, Daniel Vetter, Dave Hansen, dri-devel,
	Chris Wilson, linux-mm, Jason Ekstrand, Alexander Mikhalitsyn,
	Florian Fainelli, linux-unionfs, Matthew Wilcox, codalist,
	Matthew Auld, Vasily Averin, Thomas Hellström, intel-gfx, G,
	linux-mips, Shakeel Butt, Reinette Chatre, linux-sgx,
	Thomas Bogendoerfer, Nathaniel McCallum, Greg Kroah-Hartman,
	Tvrtko Ursulin, linux-kernel, linux-fsdevel, Andrew Morton,
	Alexey Gladkov

From: Christoph Hellwig
> Sent: 07 March 2022 15:57
> 
> On Mon, Mar 07, 2022 at 03:29:35PM +0200, Jarkko Sakkinen wrote:
> > So what would you suggest to sort out the issue? I'm happy to go with
> > ioctl if nothing else is acceptable.
> 
> PLenty of drivers treat all mmaps as if MAP_POPULATE was specified,
> typically by using (io_)remap_pfn_range.  If there any reason to only
> optionally have the pre-fault semantics for sgx?  If not this should
> be really simple.  And if we have a real need for it to be optional
> we'll just need to find a sane way to pass that information to ->mmap.

Is there any space in vma->vm_flags ?

That would be better than an extra argument or function.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Intel-gfx] [PATCH RFC 0/3] MAP_POPULATE for device memory
@ 2022-03-07 22:11           ` David Laight
  0 siblings, 0 replies; 68+ messages in thread
From: David Laight @ 2022-03-07 22:11 UTC (permalink / raw)
  To: 'Christoph Hellwig', Jarkko Sakkinen
  Cc: Michal Hocko, zhangyiru, Daniel Vetter, Dave Hansen, dri-devel,
	Chris Wilson, linux-mm, Alexander Mikhalitsyn, Florian Fainelli,
	linux-unionfs, Matthew Wilcox, codalist, Matthew Auld,
	Vasily Averin, Thomas Hellström, intel-gfx, G, linux-mips,
	Shakeel Butt, Reinette Chatre, linux-sgx, Thomas Bogendoerfer,
	Nathaniel McCallum, Greg Kroah-Hartman, linux-kernel,
	linux-fsdevel, Andrew Morton, Alexey Gladkov

From: Christoph Hellwig
> Sent: 07 March 2022 15:57
> 
> On Mon, Mar 07, 2022 at 03:29:35PM +0200, Jarkko Sakkinen wrote:
> > So what would you suggest to sort out the issue? I'm happy to go with
> > ioctl if nothing else is acceptable.
> 
> PLenty of drivers treat all mmaps as if MAP_POPULATE was specified,
> typically by using (io_)remap_pfn_range.  If there any reason to only
> optionally have the pre-fault semantics for sgx?  If not this should
> be really simple.  And if we have a real need for it to be optional
> we'll just need to find a sane way to pass that information to ->mmap.

Is there any space in vma->vm_flags ?

That would be better than an extra argument or function.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH RFC 0/3] MAP_POPULATE for device memory
  2022-03-07 22:11           ` David Laight
  (?)
@ 2022-03-08 10:10             ` Jarkko Sakkinen
  -1 siblings, 0 replies; 68+ messages in thread
From: Jarkko Sakkinen @ 2022-03-08 10:10 UTC (permalink / raw)
  To: David Laight
  Cc: 'Christoph Hellwig',
	Matthew Wilcox, linux-mm, Dave Hansen, Nathaniel McCallum,
	Reinette Chatre, Andrew Morton, linux-sgx, linux-kernel,
	Florian Fainelli, Thomas Bogendoerfer, Matthew Auld,
	Thomas Hellström, Daniel Vetter, Jason Ekstrand,
	Chris Wilson, G, Maarten Lankhorst, Greg Kroah-Hartman,
	Tvrtko Ursulin, Vasily Averin, Shakeel Butt, Michal Hocko,
	zhangyiru, Alexey Gladkov, Alexander Mikhalitsyn, linux-mips,
	intel-gfx, dri-devel, codalist, linux-unionfs, linux-fsdevel

On Mon, Mar 07, 2022 at 10:11:19PM +0000, David Laight wrote:
> From: Christoph Hellwig
> > Sent: 07 March 2022 15:57
> > 
> > On Mon, Mar 07, 2022 at 03:29:35PM +0200, Jarkko Sakkinen wrote:
> > > So what would you suggest to sort out the issue? I'm happy to go with
> > > ioctl if nothing else is acceptable.
> > 
> > PLenty of drivers treat all mmaps as if MAP_POPULATE was specified,
> > typically by using (io_)remap_pfn_range.  If there any reason to only
> > optionally have the pre-fault semantics for sgx?  If not this should
> > be really simple.  And if we have a real need for it to be optional
> > we'll just need to find a sane way to pass that information to ->mmap.
> 
> Is there any space in vma->vm_flags ?
> 
> That would be better than an extra argument or function.

It's very dense but I'll give a shot for callback route based on Dave's
comments in this thread. I.e. use it as filter inside __mm_populate() and
populate_vma_page_range().

For Enarx, which we are implementing being able to use MAP_POPULATE and get
the full range EAUG'd would be best way to optimize the performance of wasm
JIT (Enarx is a wasm run-time capable of running inside an SGX enclave, AMD
SEV-SNP VM etc.). More so than any predictor (ra_state, madvice etc.) inside
#PF handler, which have been suggested in this thread.

After some research on how we implement user space, I'd rather keep the #PF
handler working on a single page (EAUG a single page) and have either ioctl
or MAP_POPULATE to do the batch fill.

We can still "not trust the user space" i.e. the populate does not have to
guarantee to do the full length since the #PF handler will then fill the
holes. This was one concern in this thread but it is not hard to address.

BR, Jarkko

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH RFC 0/3] MAP_POPULATE for device memory
@ 2022-03-08 10:10             ` Jarkko Sakkinen
  0 siblings, 0 replies; 68+ messages in thread
From: Jarkko Sakkinen @ 2022-03-08 10:10 UTC (permalink / raw)
  To: David Laight
  Cc: Michal Hocko, zhangyiru, Daniel Vetter, Dave Hansen, dri-devel,
	Chris Wilson, linux-mm, Jason Ekstrand, Alexander Mikhalitsyn,
	Florian Fainelli, linux-unionfs, Matthew Wilcox, codalist,
	'Christoph Hellwig',
	Matthew Auld, Vasily Averin, Thomas Hellström, intel-gfx, G,
	linux-mips, Shakeel Butt, Reinette Chatre, linux-sgx,
	Thomas Bogendoerfer, Nathaniel McCallum, Greg Kroah-Hartman,
	Tvrtko Ursulin, linux-kernel, linux-fsdevel, Andrew Morton,
	Alexey Gladkov

On Mon, Mar 07, 2022 at 10:11:19PM +0000, David Laight wrote:
> From: Christoph Hellwig
> > Sent: 07 March 2022 15:57
> > 
> > On Mon, Mar 07, 2022 at 03:29:35PM +0200, Jarkko Sakkinen wrote:
> > > So what would you suggest to sort out the issue? I'm happy to go with
> > > ioctl if nothing else is acceptable.
> > 
> > PLenty of drivers treat all mmaps as if MAP_POPULATE was specified,
> > typically by using (io_)remap_pfn_range.  If there any reason to only
> > optionally have the pre-fault semantics for sgx?  If not this should
> > be really simple.  And if we have a real need for it to be optional
> > we'll just need to find a sane way to pass that information to ->mmap.
> 
> Is there any space in vma->vm_flags ?
> 
> That would be better than an extra argument or function.

It's very dense but I'll give a shot for callback route based on Dave's
comments in this thread. I.e. use it as filter inside __mm_populate() and
populate_vma_page_range().

For Enarx, which we are implementing being able to use MAP_POPULATE and get
the full range EAUG'd would be best way to optimize the performance of wasm
JIT (Enarx is a wasm run-time capable of running inside an SGX enclave, AMD
SEV-SNP VM etc.). More so than any predictor (ra_state, madvice etc.) inside
#PF handler, which have been suggested in this thread.

After some research on how we implement user space, I'd rather keep the #PF
handler working on a single page (EAUG a single page) and have either ioctl
or MAP_POPULATE to do the batch fill.

We can still "not trust the user space" i.e. the populate does not have to
guarantee to do the full length since the #PF handler will then fill the
holes. This was one concern in this thread but it is not hard to address.

BR, Jarkko

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Intel-gfx] [PATCH RFC 0/3] MAP_POPULATE for device memory
@ 2022-03-08 10:10             ` Jarkko Sakkinen
  0 siblings, 0 replies; 68+ messages in thread
From: Jarkko Sakkinen @ 2022-03-08 10:10 UTC (permalink / raw)
  To: David Laight
  Cc: Michal Hocko, zhangyiru, Daniel Vetter, Dave Hansen, dri-devel,
	Chris Wilson, linux-mm, Alexander Mikhalitsyn, Florian Fainelli,
	linux-unionfs, Matthew Wilcox, codalist,
	'Christoph Hellwig',
	Matthew Auld, Vasily Averin, Thomas Hellström, intel-gfx, G,
	linux-mips, Shakeel Butt, Reinette Chatre, linux-sgx,
	Thomas Bogendoerfer, Nathaniel McCallum, Greg Kroah-Hartman,
	linux-kernel, linux-fsdevel, Andrew Morton, Alexey Gladkov

On Mon, Mar 07, 2022 at 10:11:19PM +0000, David Laight wrote:
> From: Christoph Hellwig
> > Sent: 07 March 2022 15:57
> > 
> > On Mon, Mar 07, 2022 at 03:29:35PM +0200, Jarkko Sakkinen wrote:
> > > So what would you suggest to sort out the issue? I'm happy to go with
> > > ioctl if nothing else is acceptable.
> > 
> > PLenty of drivers treat all mmaps as if MAP_POPULATE was specified,
> > typically by using (io_)remap_pfn_range.  If there any reason to only
> > optionally have the pre-fault semantics for sgx?  If not this should
> > be really simple.  And if we have a real need for it to be optional
> > we'll just need to find a sane way to pass that information to ->mmap.
> 
> Is there any space in vma->vm_flags ?
> 
> That would be better than an extra argument or function.

It's very dense but I'll give a shot for callback route based on Dave's
comments in this thread. I.e. use it as filter inside __mm_populate() and
populate_vma_page_range().

For Enarx, which we are implementing being able to use MAP_POPULATE and get
the full range EAUG'd would be best way to optimize the performance of wasm
JIT (Enarx is a wasm run-time capable of running inside an SGX enclave, AMD
SEV-SNP VM etc.). More so than any predictor (ra_state, madvice etc.) inside
#PF handler, which have been suggested in this thread.

After some research on how we implement user space, I'd rather keep the #PF
handler working on a single page (EAUG a single page) and have either ioctl
or MAP_POPULATE to do the batch fill.

We can still "not trust the user space" i.e. the populate does not have to
guarantee to do the full length since the #PF handler will then fill the
holes. This was one concern in this thread but it is not hard to address.

BR, Jarkko

^ permalink raw reply	[flat|nested] 68+ messages in thread

end of thread, other threads:[~2022-03-08 12:51 UTC | newest]

Thread overview: 68+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-06  5:32 [PATCH RFC 0/3] MAP_POPULATE for device memory Jarkko Sakkinen
2022-03-06  5:32 ` [Intel-gfx] " Jarkko Sakkinen
2022-03-06  5:32 ` Jarkko Sakkinen
2022-03-06  5:32 ` [PATCH RFC 1/3] mm: Add f_ops->populate() Jarkko Sakkinen
2022-03-06  5:32   ` [Intel-gfx] " Jarkko Sakkinen
2022-03-06  5:32   ` Jarkko Sakkinen
2022-03-06 10:01   ` Greg Kroah-Hartman
2022-03-06 10:01     ` [Intel-gfx] " Greg Kroah-Hartman
2022-03-06 10:01     ` Greg Kroah-Hartman
2022-03-06 17:02     ` Jarkko Sakkinen
2022-03-06 17:02       ` [Intel-gfx] " Jarkko Sakkinen
2022-03-06 17:02       ` Jarkko Sakkinen
2022-03-06 17:03       ` Jarkko Sakkinen
2022-03-06 17:03         ` [Intel-gfx] " Jarkko Sakkinen
2022-03-06 17:03         ` Jarkko Sakkinen
2022-03-06 22:43       ` Matthew Wilcox
2022-03-06 22:43         ` [Intel-gfx] " Matthew Wilcox
2022-03-06 22:43         ` Matthew Wilcox
2022-03-07 13:16         ` Jarkko Sakkinen
2022-03-07 13:16           ` [Intel-gfx] " Jarkko Sakkinen
2022-03-07 13:16           ` Jarkko Sakkinen
2022-03-07 13:26           ` Jarkko Sakkinen
2022-03-07 13:26             ` [Intel-gfx] " Jarkko Sakkinen
2022-03-07 13:26             ` Jarkko Sakkinen
2022-03-06  5:32 ` [PATCH RFC 2/3] x86/sgx: Export sgx_encl_page_alloc() Jarkko Sakkinen
2022-03-06  5:32   ` [Intel-gfx] " Jarkko Sakkinen
2022-03-06  5:32   ` Jarkko Sakkinen
2022-03-06  5:32 ` [PATCH RFC 3/3] x86/sgx: Implement EAUG population with MAP_POPULATE Jarkko Sakkinen
2022-03-06  5:32   ` [Intel-gfx] " Jarkko Sakkinen
2022-03-06  5:32   ` Jarkko Sakkinen
2022-03-06  8:30 ` [PATCH RFC 0/3] MAP_POPULATE for device memory David Laight
2022-03-06  8:30   ` [Intel-gfx] " David Laight
2022-03-06  8:30   ` David Laight
2022-03-06 16:52   ` 'Jarkko Sakkinen'
2022-03-06 16:52     ` [Intel-gfx] " 'Jarkko Sakkinen'
2022-03-06 16:52     ` 'Jarkko Sakkinen'
2022-03-06 11:33 ` Matthew Wilcox
2022-03-06 11:33   ` [Intel-gfx] " Matthew Wilcox
2022-03-06 11:33   ` Matthew Wilcox
2022-03-07  7:48   ` Christoph Hellwig
2022-03-07  7:48     ` [Intel-gfx] " Christoph Hellwig
2022-03-07 13:29     ` Jarkko Sakkinen
2022-03-07 13:29       ` [Intel-gfx] " Jarkko Sakkinen
2022-03-07 13:29       ` Jarkko Sakkinen
2022-03-07 15:56       ` Christoph Hellwig
2022-03-07 15:56         ` [Intel-gfx] " Christoph Hellwig
2022-03-07 15:58         ` Jarkko Sakkinen
2022-03-07 15:58           ` [Intel-gfx] " Jarkko Sakkinen
2022-03-07 15:58           ` Jarkko Sakkinen
2022-03-07 22:11         ` David Laight
2022-03-07 22:11           ` [Intel-gfx] " David Laight
2022-03-07 22:11           ` David Laight
2022-03-08 10:10           ` Jarkko Sakkinen
2022-03-08 10:10             ` [Intel-gfx] " Jarkko Sakkinen
2022-03-08 10:10             ` Jarkko Sakkinen
2022-03-07 10:12 ` David Hildenbrand
2022-03-07 10:12   ` [Intel-gfx] " David Hildenbrand
2022-03-07 10:12   ` David Hildenbrand
2022-03-07 14:22   ` Jarkko Sakkinen
2022-03-07 14:22     ` [Intel-gfx] " Jarkko Sakkinen
2022-03-07 14:22     ` Jarkko Sakkinen
2022-03-07 14:33     ` David Hildenbrand
2022-03-07 14:33       ` [Intel-gfx] " David Hildenbrand
2022-03-07 14:33       ` David Hildenbrand
2022-03-07 15:49       ` Jarkko Sakkinen
2022-03-07 15:49         ` [Intel-gfx] " Jarkko Sakkinen
2022-03-07 15:49         ` Jarkko Sakkinen
2022-03-07 14:23 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.