linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/8] Huge page-table entries for TTM
@ 2019-12-03 13:22 Thomas Hellström (VMware)
  2019-12-03 13:22 ` [PATCH 1/8] mm: Introduce vma_is_special_huge Thomas Hellström (VMware)
                   ` (8 more replies)
  0 siblings, 9 replies; 23+ messages in thread
From: Thomas Hellström (VMware) @ 2019-12-03 13:22 UTC (permalink / raw)
  To: linux-mm, linux-kernel, dri-devel
  Cc: pv-drivers, linux-graphics-maintainer, Thomas Hellström,
	Andrew Morton, Michal Hocko, Matthew Wilcox (Oracle),
	Kirill A. Shutemov, Ralph Campbell, Jérôme Glisse,
	Christian König

In order to save TLB space and CPU usage this patchset enables huge- and giant
page-table entries for TTM and TTM-enabled graphics drivers.

Patch 1 introduces a vma_is_special_huge() function to make the mm code
take the same path as DAX when splitting huge- and giant page table entries,
(which currently is zapping the page-table entry and rely on re-faulting).

Patch 2 makes the mm code split existing huge page-table entries
on huge_fault fallbacks. Typically on COW or on buffer-objects that want
write-notify. COW and write-notification is always done on the lowest
page-table level. See the patch log message for additional considerations.

Patch 3 introduces functions to allow the graphics drivers to manipulate
the caching- and encryption flags of huge page-table entries without ugly
hacks.

Patch 4 implements the huge_fault handler in TTM.
This enables huge page-table entries, provided that the kernel is configured
to support transhuge pages, either by default or using madvise().
However, they are unlikely to be inserted unless the kernel buffer object
pfns and user-space addresses align perfectly. There are various options
here, but since buffer objects that reside in system pages typically start
at huge page boundaries if they are backed by huge pages, we try to enforce
buffer object starting pfns and user-space addresses to be huge page-size
aligned if their size exceeds a huge page-size. If pud-size transhuge
("giant") pages are enabled by the arch, the same holds for those.

Patch 5 implements a specialized huge_fault handler for vmwgfx.
The vmwgfx driver may perform dirty-tracking and needs some special code
to handle that correctly.

Patch 6 implements a drm helper to align user-space addresses according
to the above scheme, if possible.

Patch 7 implements a TTM range manager that does the same for graphics IO
memory.

Patch 8 finally hooks up the helpers of patch 6 and 7 to the vmwgfx driver.
A similar change is needed for graphics drivers that want a reasonable
likelyhood of actually using huge page-table entries.

Finally, if a buffer object size is not huge-page or giant-page aligned,
its size will NOT be inflated by this patchset. This means that the buffer
object tail will use smaller size page-table entries and thus no memory
overhead occurs. Drivers that want to pay the memory overhead price need to
implement their own scheme to inflate buffer-object sizes.

PMD size huge page-table-entries have been tested with vmwgfx and found to
work well both with system memory backed and IO memory backed buffer objects.

PUD size giant page-table-entries have seen limited (fault and COW) testing
using a modified kernel and a fake vmwgfx TTM memory type. The vmwgfx driver
does otherwise not support 1GB-size IO memory resources.

Comments and suggestions welcome.
Thomas

Changes since RFC:
* Check for buffer objects present in contigous IO Memory (Christian König)
* Rebased on the vmwgfx emulated coherent memory functionality. That rebase
  adds patch 5.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: "Christian König" <christian.koenig@amd.com>




^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 1/8] mm: Introduce vma_is_special_huge
  2019-12-03 13:22 [PATCH 0/8] Huge page-table entries for TTM Thomas Hellström (VMware)
@ 2019-12-03 13:22 ` Thomas Hellström (VMware)
  2020-03-01  4:04   ` Andrew Morton
  2019-12-03 13:22 ` [PATCH 2/8] mm: Split huge pages on write-notify or COW Thomas Hellström (VMware)
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 23+ messages in thread
From: Thomas Hellström (VMware) @ 2019-12-03 13:22 UTC (permalink / raw)
  To: linux-mm, linux-kernel, dri-devel
  Cc: pv-drivers, linux-graphics-maintainer, Thomas Hellstrom,
	Andrew Morton, Michal Hocko, Matthew Wilcox (Oracle),
	Kirill A. Shutemov, Ralph Campbell, Jérôme Glisse,
	Christian König

From: Thomas Hellstrom <thellstrom@vmware.com>

For VM_PFNMAP and VM_MIXEDMAP vmas that want to support transhuge pages
and -page table entries, introduce vma_is_special_huge() that takes the
same codepaths as vma_is_dax().

The use of "special" follows the definition in memory.c, vm_normal_page():
"Special" mappings do not wish to be associated with a "struct page"
(either it doesn't exist, or it exists but they don't want to touch it)

For PAGE_SIZE pages, "special" is determined per page table entry to be
able to deal with COW pages. But since we don't have huge COW pages,
we can classify a vma as either "special huge" or "normal huge".

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: "Christian König" <christian.koenig@amd.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
---
 include/linux/mm.h | 6 ++++++
 mm/huge_memory.c   | 6 +++---
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 0133542d69c9..886a1f899887 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2822,6 +2822,12 @@ extern long copy_huge_page_from_user(struct page *dst_page,
 				const void __user *usr_src,
 				unsigned int pages_per_huge_page,
 				bool allow_pagefault);
+static inline bool vma_is_special_huge(struct vm_area_struct *vma)
+{
+	return vma_is_dax(vma) || (vma->vm_file &&
+				   (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP)));
+}
+
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE || CONFIG_HUGETLBFS */
 
 #ifdef CONFIG_DEBUG_PAGEALLOC
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 41a0fbddc96b..f8d24fc3f4df 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1789,7 +1789,7 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 	orig_pmd = pmdp_huge_get_and_clear_full(tlb->mm, addr, pmd,
 			tlb->fullmm);
 	tlb_remove_pmd_tlb_entry(tlb, pmd, addr);
-	if (vma_is_dax(vma)) {
+	if (vma_is_special_huge(vma)) {
 		if (arch_needs_pgtable_deposit())
 			zap_deposited_table(tlb->mm, pmd);
 		spin_unlock(ptl);
@@ -2053,7 +2053,7 @@ int zap_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma,
 	 */
 	pudp_huge_get_and_clear_full(tlb->mm, addr, pud, tlb->fullmm);
 	tlb_remove_pud_tlb_entry(tlb, pud, addr);
-	if (vma_is_dax(vma)) {
+	if (vma_is_special_huge(vma)) {
 		spin_unlock(ptl);
 		/* No zero page support yet */
 	} else {
@@ -2162,7 +2162,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
 		 */
 		if (arch_needs_pgtable_deposit())
 			zap_deposited_table(mm, pmd);
-		if (vma_is_dax(vma))
+		if (vma_is_special_huge(vma))
 			return;
 		page = pmd_page(_pmd);
 		if (!PageDirty(page) && pmd_dirty(_pmd))
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 2/8] mm: Split huge pages on write-notify or COW
  2019-12-03 13:22 [PATCH 0/8] Huge page-table entries for TTM Thomas Hellström (VMware)
  2019-12-03 13:22 ` [PATCH 1/8] mm: Introduce vma_is_special_huge Thomas Hellström (VMware)
@ 2019-12-03 13:22 ` Thomas Hellström (VMware)
  2020-03-01  4:04   ` Andrew Morton
  2019-12-03 13:22 ` [PATCH 3/8] mm: Add vmf_insert_pfn_xxx_prot() for huge page-table entries Thomas Hellström (VMware)
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 23+ messages in thread
From: Thomas Hellström (VMware) @ 2019-12-03 13:22 UTC (permalink / raw)
  To: linux-mm, linux-kernel, dri-devel
  Cc: pv-drivers, linux-graphics-maintainer, Thomas Hellstrom,
	Andrew Morton, Michal Hocko, Matthew Wilcox (Oracle),
	Kirill A. Shutemov, Ralph Campbell, Jérôme Glisse,
	Christian König

From: Thomas Hellstrom <thellstrom@vmware.com>

We currently only do COW and write-notify on the PTE level, so if the
huge_fault() handler returns VM_FAULT_FALLBACK on wp faults,
split the huge pages and page-table entries. Also do this for huge PUDs
if there is no huge_fault() handler and the vma is not anonymous, similar
to how it's done for PMDs.

Note that fs/dax.c does the splitting in the huge_fault() handler, but as
huge_fault() is implemented by modules we need to consider whether to
export the splitting functions for use in the modules or whether to try
to keep calls in the core. Opt for the latter. A follow-up patch can
remove the dax.c split_huge_pmd() if needed.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: "Christian König" <christian.koenig@amd.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
---
 mm/memory.c | 24 +++++++++++++++++-------
 1 file changed, 17 insertions(+), 7 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 78517a5bb8ac..7ede2456a76f 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3851,11 +3851,14 @@ static inline vm_fault_t wp_huge_pmd(struct vm_fault *vmf, pmd_t orig_pmd)
 {
 	if (vma_is_anonymous(vmf->vma))
 		return do_huge_pmd_wp_page(vmf, orig_pmd);
-	if (vmf->vma->vm_ops->huge_fault)
-		return vmf->vma->vm_ops->huge_fault(vmf, PE_SIZE_PMD);
+	if (vmf->vma->vm_ops->huge_fault) {
+		vm_fault_t ret = vmf->vma->vm_ops->huge_fault(vmf, PE_SIZE_PMD);
 
-	/* COW handled on pte level: split pmd */
-	VM_BUG_ON_VMA(vmf->vma->vm_flags & VM_SHARED, vmf->vma);
+		if (!(ret & VM_FAULT_FALLBACK))
+			return ret;
+	}
+
+	/* COW or write-notify handled on pte level: split pmd. */
 	__split_huge_pmd(vmf->vma, vmf->pmd, vmf->address, false, NULL);
 
 	return VM_FAULT_FALLBACK;
@@ -3883,9 +3886,16 @@ static vm_fault_t wp_huge_pud(struct vm_fault *vmf, pud_t orig_pud)
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 	/* No support for anonymous transparent PUD pages yet */
 	if (vma_is_anonymous(vmf->vma))
-		return VM_FAULT_FALLBACK;
-	if (vmf->vma->vm_ops->huge_fault)
-		return vmf->vma->vm_ops->huge_fault(vmf, PE_SIZE_PUD);
+		goto split;
+	if (vmf->vma->vm_ops->huge_fault) {
+		vm_fault_t ret = vmf->vma->vm_ops->huge_fault(vmf, PE_SIZE_PUD);
+
+		if (!(ret & VM_FAULT_FALLBACK))
+			return ret;
+	}
+split:
+	/* COW or write-notify not handled on PUD level: split pud.*/
+	__split_huge_pud(vmf->vma, vmf->pud, vmf->address);
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 	return VM_FAULT_FALLBACK;
 }
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 3/8] mm: Add vmf_insert_pfn_xxx_prot() for huge page-table entries
  2019-12-03 13:22 [PATCH 0/8] Huge page-table entries for TTM Thomas Hellström (VMware)
  2019-12-03 13:22 ` [PATCH 1/8] mm: Introduce vma_is_special_huge Thomas Hellström (VMware)
  2019-12-03 13:22 ` [PATCH 2/8] mm: Split huge pages on write-notify or COW Thomas Hellström (VMware)
@ 2019-12-03 13:22 ` Thomas Hellström (VMware)
  2019-12-03 13:22 ` [PATCH 4/8] drm/ttm, drm/vmwgfx: Support huge TTM pagefaults Thomas Hellström (VMware)
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 23+ messages in thread
From: Thomas Hellström (VMware) @ 2019-12-03 13:22 UTC (permalink / raw)
  To: linux-mm, linux-kernel, dri-devel
  Cc: pv-drivers, linux-graphics-maintainer, Thomas Hellstrom,
	Andrew Morton, Michal Hocko, Matthew Wilcox (Oracle),
	Kirill A. Shutemov, Ralph Campbell, Jérôme Glisse,
	Christian König

From: Thomas Hellstrom <thellstrom@vmware.com>

For graphics drivers needing to modify the page-protection, add
huge page-table entries counterparts to vmf_insert_prn_prot().

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: "Christian König" <christian.koenig@amd.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
---
 include/linux/huge_mm.h | 17 +++++++++++++++--
 mm/huge_memory.c        | 12 ++++++------
 2 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 0b84e13e88e2..b721d42d0adc 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -47,8 +47,21 @@ extern bool move_huge_pmd(struct vm_area_struct *vma, unsigned long old_addr,
 extern int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
 			unsigned long addr, pgprot_t newprot,
 			int prot_numa);
-vm_fault_t vmf_insert_pfn_pmd(struct vm_fault *vmf, pfn_t pfn, bool write);
-vm_fault_t vmf_insert_pfn_pud(struct vm_fault *vmf, pfn_t pfn, bool write);
+vm_fault_t vmf_insert_pfn_pmd_prot(struct vm_fault *vmf, pfn_t pfn,
+				   pgprot_t pgprot, bool write);
+static inline vm_fault_t vmf_insert_pfn_pmd(struct vm_fault *vmf, pfn_t pfn,
+					    bool write)
+{
+	return vmf_insert_pfn_pmd_prot(vmf, pfn, vmf->vma->vm_page_prot, write);
+}
+vm_fault_t vmf_insert_pfn_pud_prot(struct vm_fault *vmf, pfn_t pfn,
+				   pgprot_t pgprot, bool write);
+static inline vm_fault_t vmf_insert_pfn_pud(struct vm_fault *vmf, pfn_t pfn,
+					    bool write)
+{
+	return vmf_insert_pfn_pud_prot(vmf, pfn, vmf->vma->vm_page_prot, write);
+}
+
 enum transparent_hugepage_flag {
 	TRANSPARENT_HUGEPAGE_FLAG,
 	TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG,
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index f8d24fc3f4df..b472313709ed 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -811,11 +811,11 @@ static void insert_pfn_pmd(struct vm_area_struct *vma, unsigned long addr,
 		pte_free(mm, pgtable);
 }
 
-vm_fault_t vmf_insert_pfn_pmd(struct vm_fault *vmf, pfn_t pfn, bool write)
+vm_fault_t vmf_insert_pfn_pmd_prot(struct vm_fault *vmf, pfn_t pfn,
+				   pgprot_t pgprot, bool write)
 {
 	unsigned long addr = vmf->address & PMD_MASK;
 	struct vm_area_struct *vma = vmf->vma;
-	pgprot_t pgprot = vma->vm_page_prot;
 	pgtable_t pgtable = NULL;
 
 	/*
@@ -843,7 +843,7 @@ vm_fault_t vmf_insert_pfn_pmd(struct vm_fault *vmf, pfn_t pfn, bool write)
 	insert_pfn_pmd(vma, addr, vmf->pmd, pfn, pgprot, write, pgtable);
 	return VM_FAULT_NOPAGE;
 }
-EXPORT_SYMBOL_GPL(vmf_insert_pfn_pmd);
+EXPORT_SYMBOL_GPL(vmf_insert_pfn_pmd_prot);
 
 #ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
 static pud_t maybe_pud_mkwrite(pud_t pud, struct vm_area_struct *vma)
@@ -889,11 +889,11 @@ static void insert_pfn_pud(struct vm_area_struct *vma, unsigned long addr,
 	spin_unlock(ptl);
 }
 
-vm_fault_t vmf_insert_pfn_pud(struct vm_fault *vmf, pfn_t pfn, bool write)
+vm_fault_t vmf_insert_pfn_pud_prot(struct vm_fault *vmf, pfn_t pfn,
+				   pgprot_t pgprot, bool write)
 {
 	unsigned long addr = vmf->address & PUD_MASK;
 	struct vm_area_struct *vma = vmf->vma;
-	pgprot_t pgprot = vma->vm_page_prot;
 
 	/*
 	 * If we had pud_special, we could avoid all these restrictions,
@@ -914,7 +914,7 @@ vm_fault_t vmf_insert_pfn_pud(struct vm_fault *vmf, pfn_t pfn, bool write)
 	insert_pfn_pud(vma, addr, vmf->pud, pfn, pgprot, write);
 	return VM_FAULT_NOPAGE;
 }
-EXPORT_SYMBOL_GPL(vmf_insert_pfn_pud);
+EXPORT_SYMBOL_GPL(vmf_insert_pfn_pud_prot);
 #endif /* CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */
 
 static void touch_pmd(struct vm_area_struct *vma, unsigned long addr,
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 4/8] drm/ttm, drm/vmwgfx: Support huge TTM pagefaults
  2019-12-03 13:22 [PATCH 0/8] Huge page-table entries for TTM Thomas Hellström (VMware)
                   ` (2 preceding siblings ...)
  2019-12-03 13:22 ` [PATCH 3/8] mm: Add vmf_insert_pfn_xxx_prot() for huge page-table entries Thomas Hellström (VMware)
@ 2019-12-03 13:22 ` Thomas Hellström (VMware)
  2019-12-03 13:22 ` [PATCH 5/8] drm/vmwgfx: Support huge page faults Thomas Hellström (VMware)
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 23+ messages in thread
From: Thomas Hellström (VMware) @ 2019-12-03 13:22 UTC (permalink / raw)
  To: linux-mm, linux-kernel, dri-devel
  Cc: pv-drivers, linux-graphics-maintainer, Thomas Hellstrom,
	Andrew Morton, Michal Hocko, Matthew Wilcox (Oracle),
	Kirill A. Shutemov, Ralph Campbell, Jérôme Glisse,
	Christian König

From: Thomas Hellstrom <thellstrom@vmware.com>

Support huge (PMD-size and PUD-size) page-table entries by providing a
huge_fault() callback.
We still support private mappings and write-notify by splitting the huge
page-table entries on write-access.

Note that for huge page-faults to occur, either the kernel needs to be
compiled with trans-huge-pages always enabled, or the kernel needs to be
compiled with trans-huge-pages enabled using madvise, and the user-space
app needs to call madvise() to enable trans-huge pages on a per-mapping
basis.

Furthermore huge page-faults will not complete unless buffer objects and
user-space addresses are aligned on huge page size boundaries.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: "Christian König" <christian.koenig@amd.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
---
 drivers/gpu/drm/ttm/ttm_bo_vm.c            | 144 ++++++++++++++++++++-
 drivers/gpu/drm/vmwgfx/vmwgfx_page_dirty.c |   2 +-
 include/drm/ttm/ttm_bo_api.h               |   3 +-
 3 files changed, 144 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
index 4fdedbba266c..0be4a84e166d 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
@@ -158,6 +158,89 @@ vm_fault_t ttm_bo_vm_reserve(struct ttm_buffer_object *bo,
 }
 EXPORT_SYMBOL(ttm_bo_vm_reserve);
 
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+/**
+ * ttm_bo_vm_insert_huge - Insert a pfn for PUD or PMD faults
+ * @vmf: Fault data
+ * @bo: The buffer object
+ * @page_offset: Page offset from bo start
+ * @fault_page_size: The size of the fault in pages.
+ * @pgprot: The page protections.
+ * Does additional checking whether it's possible to insert a PUD or PMD
+ * pfn and performs the insertion.
+ *
+ * Return: VM_FAULT_NOPAGE on successful insertion, VM_FAULT_FALLBACK if
+ * a huge fault was not possible, and a VM_FAULT_ERROR code otherwise.
+ */
+static vm_fault_t ttm_bo_vm_insert_huge(struct vm_fault *vmf,
+					struct ttm_buffer_object *bo,
+					pgoff_t page_offset,
+					pgoff_t fault_page_size,
+					pgprot_t pgprot)
+{
+	pgoff_t i;
+	vm_fault_t ret;
+	unsigned long pfn;
+	pfn_t pfnt;
+	struct ttm_tt *ttm = bo->ttm;
+	bool write = vmf->flags & FAULT_FLAG_WRITE;
+
+	/* Fault should not cross bo boundary. */
+	page_offset &= ~(fault_page_size - 1);
+	if (page_offset + fault_page_size > bo->num_pages)
+		goto out_fallback;
+
+	if (bo->mem.bus.is_iomem)
+		pfn = ttm_bo_io_mem_pfn(bo, page_offset);
+	else
+		pfn = page_to_pfn(ttm->pages[page_offset]);
+
+	/* pfn must be fault_page_size aligned. */
+	if ((pfn & (fault_page_size - 1)) != 0)
+		goto out_fallback;
+
+	/* Check that memory is contigous. */
+	if (!bo->mem.bus.is_iomem)
+		for (i = 1; i < fault_page_size; ++i) {
+			if (page_to_pfn(ttm->pages[page_offset + i]) != pfn + i)
+				goto out_fallback;
+		}
+	/* IO mem without the io_mem_pfn callback is always contigous. */
+	else if (bo->bdev->driver->io_mem_pfn)
+		for (i = 1; i < fault_page_size; ++i) {
+			if (ttm_bo_io_mem_pfn(bo, page_offset + i) != pfn + i)
+				goto out_fallback;
+		}
+
+	pfnt = __pfn_to_pfn_t(pfn, PFN_DEV);
+	if (fault_page_size == (HPAGE_PMD_SIZE >> PAGE_SHIFT))
+		ret = vmf_insert_pfn_pmd_prot(vmf, pfnt, pgprot, write);
+#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
+	else if (fault_page_size == (HPAGE_PUD_SIZE >> PAGE_SHIFT))
+		ret = vmf_insert_pfn_pud_prot(vmf, pfnt, pgprot, write);
+#endif
+	else
+		WARN_ON_ONCE(ret = VM_FAULT_FALLBACK);
+
+	if (ret != VM_FAULT_NOPAGE)
+		goto out_fallback;
+
+	return VM_FAULT_NOPAGE;
+out_fallback:
+	count_vm_event(THP_FAULT_FALLBACK);
+	return VM_FAULT_FALLBACK;
+}
+#else
+static vm_fault_t ttm_bo_vm_insert_huge(struct vm_fault *vmf,
+					struct ttm_buffer_object *bo,
+					pgoff_t page_offset,
+					pgoff_t fault_page_size,
+					pgprot_t pgprot)
+{
+	return VM_FAULT_NOPAGE;
+}
+#endif
+
 /**
  * ttm_bo_vm_fault_reserved - TTM fault helper
  * @vmf: The struct vm_fault given as argument to the fault callback
@@ -178,7 +261,8 @@ EXPORT_SYMBOL(ttm_bo_vm_reserve);
  */
 vm_fault_t ttm_bo_vm_fault_reserved(struct vm_fault *vmf,
 				    pgprot_t prot,
-				    pgoff_t num_prefault)
+				    pgoff_t num_prefault,
+				    pgoff_t fault_page_size)
 {
 	struct vm_area_struct *vma = vmf->vma;
 	struct ttm_buffer_object *bo = vma->vm_private_data;
@@ -270,6 +354,13 @@ vm_fault_t ttm_bo_vm_fault_reserved(struct vm_fault *vmf,
 		prot = pgprot_decrypted(prot);
 	}
 
+	/* We don't prefault on huge faults. Yet. */
+	if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) && fault_page_size != 1) {
+		ret = ttm_bo_vm_insert_huge(vmf, bo, page_offset,
+					    fault_page_size, prot);
+		goto out_io_unlock;
+	}
+
 	/*
 	 * Speculatively prefault a number of pages. Only error on
 	 * first page.
@@ -328,7 +419,50 @@ static vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf)
 		return ret;
 
 	prot = vma->vm_page_prot;
-	ret = ttm_bo_vm_fault_reserved(vmf, prot, TTM_BO_VM_NUM_PREFAULT);
+	ret = ttm_bo_vm_fault_reserved(vmf, prot, TTM_BO_VM_NUM_PREFAULT, 1);
+	if (ret == VM_FAULT_RETRY && !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT))
+		return ret;
+
+	dma_resv_unlock(bo->base.resv);
+
+	return ret;
+}
+
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+static vm_fault_t ttm_bo_vm_huge_fault(struct vm_fault *vmf,
+				       enum page_entry_size pe_size)
+{
+	struct vm_area_struct *vma = vmf->vma;
+	pgprot_t prot;
+	struct ttm_buffer_object *bo = vma->vm_private_data;
+	vm_fault_t ret;
+	pgoff_t fault_page_size = 0;
+	bool write = vmf->flags & FAULT_FLAG_WRITE;
+
+	switch (pe_size) {
+	case PE_SIZE_PMD:
+		fault_page_size = HPAGE_PMD_SIZE >> PAGE_SHIFT;
+		break;
+#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
+	case PE_SIZE_PUD:
+		fault_page_size = HPAGE_PUD_SIZE >> PAGE_SHIFT;
+		break;
+#endif
+	default:
+		WARN_ON_ONCE(1);
+		return VM_FAULT_FALLBACK;
+	}
+
+	/* Fallback on write dirty-tracking or COW */
+	if (write && !(pgprot_val(vmf->vma->vm_page_prot) & _PAGE_RW))
+		return VM_FAULT_FALLBACK;
+
+	ret = ttm_bo_vm_reserve(bo, vmf);
+	if (ret)
+		return ret;
+
+	prot = vm_get_page_prot(vma->vm_flags);
+	ret = ttm_bo_vm_fault_reserved(vmf, prot, 1, fault_page_size);
 	if (ret == VM_FAULT_RETRY && !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT))
 		return ret;
 
@@ -336,6 +470,7 @@ static vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf)
 
 	return ret;
 }
+#endif
 
 void ttm_bo_vm_open(struct vm_area_struct *vma)
 {
@@ -437,7 +572,10 @@ static const struct vm_operations_struct ttm_bo_vm_ops = {
 	.fault = ttm_bo_vm_fault,
 	.open = ttm_bo_vm_open,
 	.close = ttm_bo_vm_close,
-	.access = ttm_bo_vm_access
+	.access = ttm_bo_vm_access,
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+	.huge_fault = ttm_bo_vm_huge_fault,
+#endif
 };
 
 static struct ttm_buffer_object *ttm_bo_vm_lookup(struct ttm_bo_device *bdev,
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_page_dirty.c b/drivers/gpu/drm/vmwgfx/vmwgfx_page_dirty.c
index f07aa857587c..17a5dca7b921 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_page_dirty.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_page_dirty.c
@@ -477,7 +477,7 @@ vm_fault_t vmw_bo_vm_fault(struct vm_fault *vmf)
 	else
 		prot = vm_get_page_prot(vma->vm_flags);
 
-	ret = ttm_bo_vm_fault_reserved(vmf, prot, num_prefault);
+	ret = ttm_bo_vm_fault_reserved(vmf, prot, num_prefault, 1);
 	if (ret == VM_FAULT_RETRY && !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT))
 		return ret;
 
diff --git a/include/drm/ttm/ttm_bo_api.h b/include/drm/ttm/ttm_bo_api.h
index 65e399d280f7..d800fc756b59 100644
--- a/include/drm/ttm/ttm_bo_api.h
+++ b/include/drm/ttm/ttm_bo_api.h
@@ -736,7 +736,8 @@ vm_fault_t ttm_bo_vm_reserve(struct ttm_buffer_object *bo,
 
 vm_fault_t ttm_bo_vm_fault_reserved(struct vm_fault *vmf,
 				    pgprot_t prot,
-				    pgoff_t num_prefault);
+				    pgoff_t num_prefault,
+				    pgoff_t fault_page_size);
 
 void ttm_bo_vm_open(struct vm_area_struct *vma);
 
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 5/8] drm/vmwgfx: Support huge page faults
  2019-12-03 13:22 [PATCH 0/8] Huge page-table entries for TTM Thomas Hellström (VMware)
                   ` (3 preceding siblings ...)
  2019-12-03 13:22 ` [PATCH 4/8] drm/ttm, drm/vmwgfx: Support huge TTM pagefaults Thomas Hellström (VMware)
@ 2019-12-03 13:22 ` Thomas Hellström (VMware)
  2019-12-03 13:22 ` [PATCH 6/8] drm: Add a drm_get_unmapped_area() helper Thomas Hellström (VMware)
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 23+ messages in thread
From: Thomas Hellström (VMware) @ 2019-12-03 13:22 UTC (permalink / raw)
  To: linux-mm, linux-kernel, dri-devel
  Cc: pv-drivers, linux-graphics-maintainer, Thomas Hellstrom,
	Andrew Morton, Michal Hocko, Matthew Wilcox (Oracle),
	Kirill A. Shutemov, Ralph Campbell, Jérôme Glisse,
	Christian König

From: Thomas Hellstrom <thellstrom@vmware.com>

With vmwgfx dirty-tracking we need a specialized huge_fault
callback. Implement and hook it up.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: "Christian König" <christian.koenig@amd.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
---
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.h        |  2 +
 drivers/gpu/drm/vmwgfx/vmwgfx_page_dirty.c | 66 +++++++++++++++++++++-
 drivers/gpu/drm/vmwgfx/vmwgfx_ttm_glue.c   |  1 +
 3 files changed, 68 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
index a31e726d6d71..8656a97448c3 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
@@ -1428,6 +1428,8 @@ void vmw_bo_dirty_unmap(struct vmw_buffer_object *vbo,
 			pgoff_t start, pgoff_t end);
 vm_fault_t vmw_bo_vm_fault(struct vm_fault *vmf);
 vm_fault_t vmw_bo_vm_mkwrite(struct vm_fault *vmf);
+vm_fault_t vmw_bo_vm_huge_fault(struct vm_fault *vmf,
+				enum page_entry_size pe_size);
 
 /**
  * VMW_DEBUG_KMS - Debug output for kernel mode-setting
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_page_dirty.c b/drivers/gpu/drm/vmwgfx/vmwgfx_page_dirty.c
index 17a5dca7b921..6f76a97ad969 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_page_dirty.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_page_dirty.c
@@ -473,7 +473,7 @@ vm_fault_t vmw_bo_vm_fault(struct vm_fault *vmf)
 	 * a lot of unnecessary write faults.
 	 */
 	if (vbo->dirty && vbo->dirty->method == VMW_BO_DIRTY_MKWRITE)
-		prot = vma->vm_page_prot;
+		prot = vm_get_page_prot(vma->vm_flags & ~VM_SHARED);
 	else
 		prot = vm_get_page_prot(vma->vm_flags);
 
@@ -486,3 +486,67 @@ vm_fault_t vmw_bo_vm_fault(struct vm_fault *vmf)
 
 	return ret;
 }
+
+vm_fault_t vmw_bo_vm_huge_fault(struct vm_fault *vmf,
+				enum page_entry_size pe_size)
+{
+	struct vm_area_struct *vma = vmf->vma;
+	struct ttm_buffer_object *bo = (struct ttm_buffer_object *)
+	    vma->vm_private_data;
+	struct vmw_buffer_object *vbo =
+		container_of(bo, struct vmw_buffer_object, base);
+	pgprot_t prot;
+	vm_fault_t ret;
+	pgoff_t fault_page_size;
+
+	switch (pe_size) {
+	case PE_SIZE_PMD:
+		fault_page_size = HPAGE_PMD_SIZE >> PAGE_SHIFT;
+		break;
+	case PE_SIZE_PUD:
+		fault_page_size = HPAGE_PUD_SIZE >> PAGE_SHIFT;
+		break;
+	default:
+		WARN_ON_ONCE(1);
+		return VM_FAULT_FALLBACK;
+	}
+
+	/* Always do write dirty-tracking on PTE level. */
+	if (READ_ONCE(vbo->dirty) && (vmf->flags & FAULT_FLAG_WRITE))
+		return VM_FAULT_FALLBACK;
+
+	ret = ttm_bo_vm_reserve(bo, vmf);
+	if (ret)
+		return ret;
+
+	if (vbo->dirty) {
+		pgoff_t allowed_prefault;
+		unsigned long page_offset;
+
+		page_offset = vmf->pgoff -
+			drm_vma_node_start(&bo->base.vma_node);
+		if (page_offset >= bo->num_pages ||
+		    vmw_resources_clean(vbo, page_offset,
+					page_offset + PAGE_SIZE,
+					&allowed_prefault)) {
+			ret = VM_FAULT_SIGBUS;
+			goto out_unlock;
+		}
+
+		/*
+		 * Write protect, so we get a new fault on write, and can
+		 * split.
+		 */
+		prot = vm_get_page_prot(vma->vm_flags & ~VM_SHARED);
+	} else {
+		prot = vm_get_page_prot(vma->vm_flags);
+	}
+	ret = ttm_bo_vm_fault_reserved(vmf, prot, 1, fault_page_size);
+	if (ret == VM_FAULT_RETRY && !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT))
+		return ret;
+
+out_unlock:
+	dma_resv_unlock(bo->base.resv);
+
+	return ret;
+}
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_glue.c b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_glue.c
index ce288756531b..de838ba88a97 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_glue.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_glue.c
@@ -33,6 +33,7 @@ int vmw_mmap(struct file *filp, struct vm_area_struct *vma)
 		.pfn_mkwrite = vmw_bo_vm_mkwrite,
 		.page_mkwrite = vmw_bo_vm_mkwrite,
 		.fault = vmw_bo_vm_fault,
+		.huge_fault = vmw_bo_vm_huge_fault,
 		.open = ttm_bo_vm_open,
 		.close = ttm_bo_vm_close
 	};
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 6/8] drm: Add a drm_get_unmapped_area() helper
  2019-12-03 13:22 [PATCH 0/8] Huge page-table entries for TTM Thomas Hellström (VMware)
                   ` (4 preceding siblings ...)
  2019-12-03 13:22 ` [PATCH 5/8] drm/vmwgfx: Support huge page faults Thomas Hellström (VMware)
@ 2019-12-03 13:22 ` Thomas Hellström (VMware)
  2019-12-04 11:11   ` Christian König
  2019-12-03 13:22 ` [PATCH 7/8] drm/ttm: Introduce a huge page aligning TTM range manager Thomas Hellström (VMware)
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 23+ messages in thread
From: Thomas Hellström (VMware) @ 2019-12-03 13:22 UTC (permalink / raw)
  To: linux-mm, linux-kernel, dri-devel
  Cc: pv-drivers, linux-graphics-maintainer, Thomas Hellstrom,
	Andrew Morton, Michal Hocko, Matthew Wilcox (Oracle),
	Kirill A. Shutemov, Ralph Campbell, Jérôme Glisse,
	Christian König

From: Thomas Hellstrom <thellstrom@vmware.com>

This helper is used to align user-space buffer object addresses to
huge page boundaries, minimizing the chance of alignment mismatch
between user-space addresses and physical addresses.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: "Christian König" <christian.koenig@amd.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
---
 drivers/gpu/drm/drm_file.c | 130 +++++++++++++++++++++++++++++++++++++
 include/drm/drm_file.h     |   5 ++
 2 files changed, 135 insertions(+)

diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
index ea34bc991858..e5b4024cd397 100644
--- a/drivers/gpu/drm/drm_file.c
+++ b/drivers/gpu/drm/drm_file.c
@@ -31,6 +31,8 @@
  * OTHER DEALINGS IN THE SOFTWARE.
  */
 
+#include <uapi/asm/mman.h>
+
 #include <linux/dma-fence.h>
 #include <linux/module.h>
 #include <linux/pci.h>
@@ -41,6 +43,7 @@
 #include <drm/drm_drv.h>
 #include <drm/drm_file.h>
 #include <drm/drm_print.h>
+#include <drm/drm_vma_manager.h>
 
 #include "drm_crtc_internal.h"
 #include "drm_internal.h"
@@ -754,3 +757,130 @@ void drm_send_event(struct drm_device *dev, struct drm_pending_event *e)
 	spin_unlock_irqrestore(&dev->event_lock, irqflags);
 }
 EXPORT_SYMBOL(drm_send_event);
+
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+/*
+ * drm_addr_inflate() attempts to construct an aligned area by inflating
+ * the area size and skipping the unaligned start of the area.
+ * adapted from shmem_get_unmapped_area()
+ */
+static unsigned long drm_addr_inflate(unsigned long addr,
+				      unsigned long len,
+				      unsigned long pgoff,
+				      unsigned long flags,
+				      unsigned long huge_size)
+{
+	unsigned long offset, inflated_len;
+	unsigned long inflated_addr;
+	unsigned long inflated_offset;
+
+	offset = (pgoff << PAGE_SHIFT) & (huge_size - 1);
+	if (offset && offset + len < 2 * huge_size)
+		return addr;
+	if ((addr & (huge_size - 1)) == offset)
+		return addr;
+
+	inflated_len = len + huge_size - PAGE_SIZE;
+	if (inflated_len > TASK_SIZE)
+		return addr;
+	if (inflated_len < len)
+		return addr;
+
+	inflated_addr = current->mm->get_unmapped_area(NULL, 0, inflated_len,
+						       0, flags);
+	if (IS_ERR_VALUE(inflated_addr))
+		return addr;
+	if (inflated_addr & ~PAGE_MASK)
+		return addr;
+
+	inflated_offset = inflated_addr & (huge_size - 1);
+	inflated_addr += offset - inflated_offset;
+	if (inflated_offset > offset)
+		inflated_addr += huge_size;
+
+	if (inflated_addr > TASK_SIZE - len)
+		return addr;
+
+	return inflated_addr;
+}
+
+/**
+ * drm_get_unmapped_area() - Get an unused user-space virtual memory area
+ * suitable for huge page table entries.
+ * @file: The struct file representing the address space being mmap()'d.
+ * @uaddr: Start address suggested by user-space.
+ * @len: Length of the area.
+ * @pgoff: The page offset into the address space.
+ * @flags: mmap flags
+ * @mgr: The address space manager used by the drm driver. This argument can
+ * probably be removed at some point when all drivers use the same
+ * address space manager.
+ *
+ * This function attempts to find an unused user-space virtual memory area
+ * that can accommodate the size we want to map, and that is properly
+ * aligned to facilitate huge page table entries matching actual
+ * huge pages or huge page aligned memory in buffer objects. Buffer objects
+ * are assumed to start at huge page boundary pfns (io memory) or be
+ * populated by huge pages aligned to the start of the buffer object
+ * (system- or coherent memory). Adapted from shmem_get_unmapped_area.
+ *
+ * Return: aligned user-space address.
+ */
+unsigned long drm_get_unmapped_area(struct file *file,
+				    unsigned long uaddr, unsigned long len,
+				    unsigned long pgoff, unsigned long flags,
+				    struct drm_vma_offset_manager *mgr)
+{
+	unsigned long addr;
+	unsigned long inflated_addr;
+	struct drm_vma_offset_node *node;
+
+	if (len > TASK_SIZE)
+		return -ENOMEM;
+
+	/* Adjust mapping offset to be zero at bo start */
+	drm_vma_offset_lock_lookup(mgr);
+	node = drm_vma_offset_lookup_locked(mgr, pgoff, 1);
+	if (node)
+		pgoff -= node->vm_node.start;
+	drm_vma_offset_unlock_lookup(mgr);
+
+	addr = current->mm->get_unmapped_area(file, uaddr, len, pgoff, flags);
+	if (IS_ERR_VALUE(addr))
+		return addr;
+	if (addr & ~PAGE_MASK)
+		return addr;
+	if (addr > TASK_SIZE - len)
+		return addr;
+
+	if (len < HPAGE_PMD_SIZE)
+		return addr;
+	if (flags & MAP_FIXED)
+		return addr;
+	/*
+	 * Our priority is to support MAP_SHARED mapped hugely;
+	 * and support MAP_PRIVATE mapped hugely too, until it is COWed.
+	 * But if caller specified an address hint, respect that as before.
+	 */
+	if (uaddr)
+		return addr;
+
+	inflated_addr = drm_addr_inflate(addr, len, pgoff, flags,
+					 HPAGE_PMD_SIZE);
+
+	if (IS_ENABLED(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD) &&
+	    len >= HPAGE_PUD_SIZE)
+		inflated_addr = drm_addr_inflate(inflated_addr, len, pgoff,
+						 flags, HPAGE_PUD_SIZE);
+	return inflated_addr;
+}
+#else /* CONFIG_TRANSPARENT_HUGEPAGE */
+unsigned long drm_get_unmapped_area(struct file *file,
+				    unsigned long uaddr, unsigned long len,
+				    unsigned long pgoff, unsigned long flags,
+				    struct drm_vma_offset_manager *mgr)
+{
+	return current->mm->get_unmapped_area(file, uaddr, len, pgoff, flags);
+}
+#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
+EXPORT_SYMBOL_GPL(drm_get_unmapped_area);
diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
index 67af60bb527a..4719cc80d547 100644
--- a/include/drm/drm_file.h
+++ b/include/drm/drm_file.h
@@ -386,5 +386,10 @@ void drm_event_cancel_free(struct drm_device *dev,
 			   struct drm_pending_event *p);
 void drm_send_event_locked(struct drm_device *dev, struct drm_pending_event *e);
 void drm_send_event(struct drm_device *dev, struct drm_pending_event *e);
+struct drm_vma_offset_manager;
+unsigned long drm_get_unmapped_area(struct file *file,
+				    unsigned long uaddr, unsigned long len,
+				    unsigned long pgoff, unsigned long flags,
+				    struct drm_vma_offset_manager *mgr);
 
 #endif /* _DRM_FILE_H_ */
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 7/8] drm/ttm: Introduce a huge page aligning TTM range manager.
  2019-12-03 13:22 [PATCH 0/8] Huge page-table entries for TTM Thomas Hellström (VMware)
                   ` (5 preceding siblings ...)
  2019-12-03 13:22 ` [PATCH 6/8] drm: Add a drm_get_unmapped_area() helper Thomas Hellström (VMware)
@ 2019-12-03 13:22 ` Thomas Hellström (VMware)
  2019-12-04 11:13   ` Christian König
  2019-12-03 13:22 ` [PATCH 8/8] drm/vmwgfx: Hook up the helpers to align buffer objects Thomas Hellström (VMware)
  2020-03-01  4:04 ` [PATCH 0/8] Huge page-table entries for TTM Andrew Morton
  8 siblings, 1 reply; 23+ messages in thread
From: Thomas Hellström (VMware) @ 2019-12-03 13:22 UTC (permalink / raw)
  To: linux-mm, linux-kernel, dri-devel
  Cc: pv-drivers, linux-graphics-maintainer, Thomas Hellstrom,
	Andrew Morton, Michal Hocko, Matthew Wilcox (Oracle),
	Kirill A. Shutemov, Ralph Campbell, Jérôme Glisse,
	Christian König

From: Thomas Hellstrom <thellstrom@vmware.com>

Using huge page-table entries require that the start of a buffer object
is huge page size aligned. So introduce a ttm_bo_man_get_node_huge()
function that attempts to accomplish this for allocations that are larger
than the huge page size, and provide a new range-manager instance that
uses that function.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: "Christian König" <christian.koenig@amd.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
---
 drivers/gpu/drm/ttm/ttm_bo_manager.c | 92 ++++++++++++++++++++++++++++
 include/drm/ttm/ttm_bo_driver.h      |  1 +
 2 files changed, 93 insertions(+)

diff --git a/drivers/gpu/drm/ttm/ttm_bo_manager.c b/drivers/gpu/drm/ttm/ttm_bo_manager.c
index 18d3debcc949..26aa1a2ae7f1 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_manager.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_manager.c
@@ -89,6 +89,89 @@ static int ttm_bo_man_get_node(struct ttm_mem_type_manager *man,
 	return 0;
 }
 
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+static int ttm_bo_insert_aligned(struct drm_mm *mm, struct drm_mm_node *node,
+				 unsigned long align_pages,
+				 const struct ttm_place *place,
+				 struct ttm_mem_reg *mem,
+				 unsigned long lpfn,
+				 enum drm_mm_insert_mode mode)
+{
+	if (align_pages >= mem->page_alignment &&
+	    (!mem->page_alignment || align_pages % mem->page_alignment == 0)) {
+		return drm_mm_insert_node_in_range(mm, node,
+						   mem->num_pages,
+						   align_pages, 0,
+						   place->fpfn, lpfn, mode);
+	}
+
+	return -ENOSPC;
+}
+
+static int ttm_bo_man_get_node_huge(struct ttm_mem_type_manager *man,
+				    struct ttm_buffer_object *bo,
+				    const struct ttm_place *place,
+				    struct ttm_mem_reg *mem)
+{
+	struct ttm_range_manager *rman = (struct ttm_range_manager *) man->priv;
+	struct drm_mm *mm = &rman->mm;
+	struct drm_mm_node *node;
+	unsigned long align_pages;
+	unsigned long lpfn;
+	enum drm_mm_insert_mode mode = DRM_MM_INSERT_BEST;
+	int ret;
+
+	node = kzalloc(sizeof(*node), GFP_KERNEL);
+	if (!node)
+		return -ENOMEM;
+
+	lpfn = place->lpfn;
+	if (!lpfn)
+		lpfn = man->size;
+
+	mode = DRM_MM_INSERT_BEST;
+	if (place->flags & TTM_PL_FLAG_TOPDOWN)
+		mode = DRM_MM_INSERT_HIGH;
+
+	spin_lock(&rman->lock);
+	if (IS_ENABLED(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD)) {
+		align_pages = (HPAGE_PUD_SIZE >> PAGE_SHIFT);
+		if (mem->num_pages >= align_pages) {
+			ret = ttm_bo_insert_aligned(mm, node, align_pages,
+						    place, mem, lpfn, mode);
+			if (!ret)
+				goto found_unlock;
+		}
+	}
+
+	align_pages = (HPAGE_PMD_SIZE >> PAGE_SHIFT);
+	if (mem->num_pages >= align_pages) {
+		ret = ttm_bo_insert_aligned(mm, node, align_pages, place, mem,
+					    lpfn, mode);
+		if (!ret)
+			goto found_unlock;
+	}
+
+	ret = drm_mm_insert_node_in_range(mm, node, mem->num_pages,
+					  mem->page_alignment, 0,
+					  place->fpfn, lpfn, mode);
+found_unlock:
+	spin_unlock(&rman->lock);
+
+	if (unlikely(ret)) {
+		kfree(node);
+	} else {
+		mem->mm_node = node;
+		mem->start = node->start;
+	}
+
+	return 0;
+}
+#else
+#define ttm_bo_man_get_node_huge ttm_bo_man_get_node
+#endif
+
+
 static void ttm_bo_man_put_node(struct ttm_mem_type_manager *man,
 				struct ttm_mem_reg *mem)
 {
@@ -154,3 +237,12 @@ const struct ttm_mem_type_manager_func ttm_bo_manager_func = {
 	.debug = ttm_bo_man_debug
 };
 EXPORT_SYMBOL(ttm_bo_manager_func);
+
+const struct ttm_mem_type_manager_func ttm_bo_manager_huge_func = {
+	.init = ttm_bo_man_init,
+	.takedown = ttm_bo_man_takedown,
+	.get_node = ttm_bo_man_get_node_huge,
+	.put_node = ttm_bo_man_put_node,
+	.debug = ttm_bo_man_debug
+};
+EXPORT_SYMBOL(ttm_bo_manager_huge_func);
diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
index cac7a8a0825a..868bd0d4be6a 100644
--- a/include/drm/ttm/ttm_bo_driver.h
+++ b/include/drm/ttm/ttm_bo_driver.h
@@ -888,5 +888,6 @@ int ttm_bo_pipeline_gutting(struct ttm_buffer_object *bo);
 pgprot_t ttm_io_prot(uint32_t caching_flags, pgprot_t tmp);
 
 extern const struct ttm_mem_type_manager_func ttm_bo_manager_func;
+extern const struct ttm_mem_type_manager_func ttm_bo_manager_huge_func;
 
 #endif
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 8/8] drm/vmwgfx: Hook up the helpers to align buffer objects
  2019-12-03 13:22 [PATCH 0/8] Huge page-table entries for TTM Thomas Hellström (VMware)
                   ` (6 preceding siblings ...)
  2019-12-03 13:22 ` [PATCH 7/8] drm/ttm: Introduce a huge page aligning TTM range manager Thomas Hellström (VMware)
@ 2019-12-03 13:22 ` Thomas Hellström (VMware)
  2020-03-01  4:04 ` [PATCH 0/8] Huge page-table entries for TTM Andrew Morton
  8 siblings, 0 replies; 23+ messages in thread
From: Thomas Hellström (VMware) @ 2019-12-03 13:22 UTC (permalink / raw)
  To: linux-mm, linux-kernel, dri-devel
  Cc: pv-drivers, linux-graphics-maintainer, Thomas Hellstrom,
	Andrew Morton, Michal Hocko, Matthew Wilcox (Oracle),
	Kirill A. Shutemov, Ralph Campbell, Jérôme Glisse,
	Christian König

From: Thomas Hellstrom <thellstrom@vmware.com>

Start using the helpers that align buffer object user-space addresses and
buffer object vram addresses to huge page boundaries.
This is to improve the chances of allowing huge page-table entries.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: "Christian König" <christian.koenig@amd.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
---
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.c        | 13 +++++++++++++
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.h        |  1 +
 drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c |  2 +-
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
index e962048f65d2..5452cabb4a2e 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
@@ -1215,6 +1215,18 @@ static void vmw_remove(struct pci_dev *pdev)
 	drm_put_dev(dev);
 }
 
+static unsigned long
+vmw_get_unmapped_area(struct file *file, unsigned long uaddr,
+		      unsigned long len, unsigned long pgoff,
+		      unsigned long flags)
+{
+	struct drm_file *file_priv = file->private_data;
+	struct vmw_private *dev_priv = vmw_priv(file_priv->minor->dev);
+
+	return drm_get_unmapped_area(file, uaddr, len, pgoff, flags,
+				     &dev_priv->vma_manager);
+}
+
 static int vmwgfx_pm_notifier(struct notifier_block *nb, unsigned long val,
 			      void *ptr)
 {
@@ -1386,6 +1398,7 @@ static const struct file_operations vmwgfx_driver_fops = {
 	.compat_ioctl = vmw_compat_ioctl,
 #endif
 	.llseek = noop_llseek,
+	.get_unmapped_area = vmw_get_unmapped_area,
 };
 
 static struct drm_driver driver = {
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
index 8656a97448c3..9b3a5d940024 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
@@ -929,6 +929,7 @@ extern int vmw_mmap(struct file *filp, struct vm_area_struct *vma);
 
 extern void vmw_validation_mem_init_ttm(struct vmw_private *dev_priv,
 					size_t gran);
+
 /**
  * TTM buffer object driver - vmwgfx_ttm_buffer.c
  */
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
index d8ea3dd10af0..c319fe0ca912 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
@@ -754,7 +754,7 @@ static int vmw_init_mem_type(struct ttm_bo_device *bdev, uint32_t type,
 		break;
 	case TTM_PL_VRAM:
 		/* "On-card" video ram */
-		man->func = &ttm_bo_manager_func;
+		man->func = &ttm_bo_manager_huge_func;
 		man->gpu_offset = 0;
 		man->flags = TTM_MEMTYPE_FLAG_FIXED | TTM_MEMTYPE_FLAG_MAPPABLE;
 		man->available_caching = TTM_PL_FLAG_CACHED;
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 6/8] drm: Add a drm_get_unmapped_area() helper
  2019-12-03 13:22 ` [PATCH 6/8] drm: Add a drm_get_unmapped_area() helper Thomas Hellström (VMware)
@ 2019-12-04 11:11   ` Christian König
  2019-12-04 11:36     ` Thomas Hellström (VMware)
  0 siblings, 1 reply; 23+ messages in thread
From: Christian König @ 2019-12-04 11:11 UTC (permalink / raw)
  To: Thomas Hellström (VMware), linux-mm, linux-kernel, dri-devel
  Cc: pv-drivers, linux-graphics-maintainer, Thomas Hellstrom,
	Andrew Morton, Michal Hocko, Matthew Wilcox (Oracle),
	Kirill A. Shutemov, Ralph Campbell, Jérôme Glisse

Am 03.12.19 um 14:22 schrieb Thomas Hellström (VMware):
> From: Thomas Hellstrom <thellstrom@vmware.com>
>
> This helper is used to align user-space buffer object addresses to
> huge page boundaries, minimizing the chance of alignment mismatch
> between user-space addresses and physical addresses.

Mhm, I'm wondering if that is really such a good idea.

Wouldn't it be sufficient if userspace uses MAP_HUGETLB?

Regards,
Christian.

>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> Cc: Ralph Campbell <rcampbell@nvidia.com>
> Cc: "Jérôme Glisse" <jglisse@redhat.com>
> Cc: "Christian König" <christian.koenig@amd.com>
> Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
> ---
>   drivers/gpu/drm/drm_file.c | 130 +++++++++++++++++++++++++++++++++++++
>   include/drm/drm_file.h     |   5 ++
>   2 files changed, 135 insertions(+)
>
> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
> index ea34bc991858..e5b4024cd397 100644
> --- a/drivers/gpu/drm/drm_file.c
> +++ b/drivers/gpu/drm/drm_file.c
> @@ -31,6 +31,8 @@
>    * OTHER DEALINGS IN THE SOFTWARE.
>    */
>   
> +#include <uapi/asm/mman.h>
> +
>   #include <linux/dma-fence.h>
>   #include <linux/module.h>
>   #include <linux/pci.h>
> @@ -41,6 +43,7 @@
>   #include <drm/drm_drv.h>
>   #include <drm/drm_file.h>
>   #include <drm/drm_print.h>
> +#include <drm/drm_vma_manager.h>
>   
>   #include "drm_crtc_internal.h"
>   #include "drm_internal.h"
> @@ -754,3 +757,130 @@ void drm_send_event(struct drm_device *dev, struct drm_pending_event *e)
>   	spin_unlock_irqrestore(&dev->event_lock, irqflags);
>   }
>   EXPORT_SYMBOL(drm_send_event);
> +
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> +/*
> + * drm_addr_inflate() attempts to construct an aligned area by inflating
> + * the area size and skipping the unaligned start of the area.
> + * adapted from shmem_get_unmapped_area()
> + */
> +static unsigned long drm_addr_inflate(unsigned long addr,
> +				      unsigned long len,
> +				      unsigned long pgoff,
> +				      unsigned long flags,
> +				      unsigned long huge_size)
> +{
> +	unsigned long offset, inflated_len;
> +	unsigned long inflated_addr;
> +	unsigned long inflated_offset;
> +
> +	offset = (pgoff << PAGE_SHIFT) & (huge_size - 1);
> +	if (offset && offset + len < 2 * huge_size)
> +		return addr;
> +	if ((addr & (huge_size - 1)) == offset)
> +		return addr;
> +
> +	inflated_len = len + huge_size - PAGE_SIZE;
> +	if (inflated_len > TASK_SIZE)
> +		return addr;
> +	if (inflated_len < len)
> +		return addr;
> +
> +	inflated_addr = current->mm->get_unmapped_area(NULL, 0, inflated_len,
> +						       0, flags);
> +	if (IS_ERR_VALUE(inflated_addr))
> +		return addr;
> +	if (inflated_addr & ~PAGE_MASK)
> +		return addr;
> +
> +	inflated_offset = inflated_addr & (huge_size - 1);
> +	inflated_addr += offset - inflated_offset;
> +	if (inflated_offset > offset)
> +		inflated_addr += huge_size;
> +
> +	if (inflated_addr > TASK_SIZE - len)
> +		return addr;
> +
> +	return inflated_addr;
> +}
> +
> +/**
> + * drm_get_unmapped_area() - Get an unused user-space virtual memory area
> + * suitable for huge page table entries.
> + * @file: The struct file representing the address space being mmap()'d.
> + * @uaddr: Start address suggested by user-space.
> + * @len: Length of the area.
> + * @pgoff: The page offset into the address space.
> + * @flags: mmap flags
> + * @mgr: The address space manager used by the drm driver. This argument can
> + * probably be removed at some point when all drivers use the same
> + * address space manager.
> + *
> + * This function attempts to find an unused user-space virtual memory area
> + * that can accommodate the size we want to map, and that is properly
> + * aligned to facilitate huge page table entries matching actual
> + * huge pages or huge page aligned memory in buffer objects. Buffer objects
> + * are assumed to start at huge page boundary pfns (io memory) or be
> + * populated by huge pages aligned to the start of the buffer object
> + * (system- or coherent memory). Adapted from shmem_get_unmapped_area.
> + *
> + * Return: aligned user-space address.
> + */
> +unsigned long drm_get_unmapped_area(struct file *file,
> +				    unsigned long uaddr, unsigned long len,
> +				    unsigned long pgoff, unsigned long flags,
> +				    struct drm_vma_offset_manager *mgr)
> +{
> +	unsigned long addr;
> +	unsigned long inflated_addr;
> +	struct drm_vma_offset_node *node;
> +
> +	if (len > TASK_SIZE)
> +		return -ENOMEM;
> +
> +	/* Adjust mapping offset to be zero at bo start */
> +	drm_vma_offset_lock_lookup(mgr);
> +	node = drm_vma_offset_lookup_locked(mgr, pgoff, 1);
> +	if (node)
> +		pgoff -= node->vm_node.start;
> +	drm_vma_offset_unlock_lookup(mgr);
> +
> +	addr = current->mm->get_unmapped_area(file, uaddr, len, pgoff, flags);
> +	if (IS_ERR_VALUE(addr))
> +		return addr;
> +	if (addr & ~PAGE_MASK)
> +		return addr;
> +	if (addr > TASK_SIZE - len)
> +		return addr;
> +
> +	if (len < HPAGE_PMD_SIZE)
> +		return addr;
> +	if (flags & MAP_FIXED)
> +		return addr;
> +	/*
> +	 * Our priority is to support MAP_SHARED mapped hugely;
> +	 * and support MAP_PRIVATE mapped hugely too, until it is COWed.
> +	 * But if caller specified an address hint, respect that as before.
> +	 */
> +	if (uaddr)
> +		return addr;
> +
> +	inflated_addr = drm_addr_inflate(addr, len, pgoff, flags,
> +					 HPAGE_PMD_SIZE);
> +
> +	if (IS_ENABLED(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD) &&
> +	    len >= HPAGE_PUD_SIZE)
> +		inflated_addr = drm_addr_inflate(inflated_addr, len, pgoff,
> +						 flags, HPAGE_PUD_SIZE);
> +	return inflated_addr;
> +}
> +#else /* CONFIG_TRANSPARENT_HUGEPAGE */
> +unsigned long drm_get_unmapped_area(struct file *file,
> +				    unsigned long uaddr, unsigned long len,
> +				    unsigned long pgoff, unsigned long flags,
> +				    struct drm_vma_offset_manager *mgr)
> +{
> +	return current->mm->get_unmapped_area(file, uaddr, len, pgoff, flags);
> +}
> +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
> +EXPORT_SYMBOL_GPL(drm_get_unmapped_area);
> diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
> index 67af60bb527a..4719cc80d547 100644
> --- a/include/drm/drm_file.h
> +++ b/include/drm/drm_file.h
> @@ -386,5 +386,10 @@ void drm_event_cancel_free(struct drm_device *dev,
>   			   struct drm_pending_event *p);
>   void drm_send_event_locked(struct drm_device *dev, struct drm_pending_event *e);
>   void drm_send_event(struct drm_device *dev, struct drm_pending_event *e);
> +struct drm_vma_offset_manager;
> +unsigned long drm_get_unmapped_area(struct file *file,
> +				    unsigned long uaddr, unsigned long len,
> +				    unsigned long pgoff, unsigned long flags,
> +				    struct drm_vma_offset_manager *mgr);
>   
>   #endif /* _DRM_FILE_H_ */



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 7/8] drm/ttm: Introduce a huge page aligning TTM range manager.
  2019-12-03 13:22 ` [PATCH 7/8] drm/ttm: Introduce a huge page aligning TTM range manager Thomas Hellström (VMware)
@ 2019-12-04 11:13   ` Christian König
  2019-12-04 11:45     ` Thomas Hellström (VMware)
  0 siblings, 1 reply; 23+ messages in thread
From: Christian König @ 2019-12-04 11:13 UTC (permalink / raw)
  To: Thomas Hellström (VMware), linux-mm, linux-kernel, dri-devel
  Cc: pv-drivers, linux-graphics-maintainer, Thomas Hellstrom,
	Andrew Morton, Michal Hocko, Matthew Wilcox (Oracle),
	Kirill A. Shutemov, Ralph Campbell, Jérôme Glisse

Am 03.12.19 um 14:22 schrieb Thomas Hellström (VMware):
> From: Thomas Hellstrom <thellstrom@vmware.com>
>
> Using huge page-table entries require that the start of a buffer object
> is huge page size aligned. So introduce a ttm_bo_man_get_node_huge()
> function that attempts to accomplish this for allocations that are larger
> than the huge page size, and provide a new range-manager instance that
> uses that function.

I still don't think that this is a good idea.

The driver/userspace should just use a proper alignment if it wants to 
use huge pages.

Regards,
Christian.

>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> Cc: Ralph Campbell <rcampbell@nvidia.com>
> Cc: "Jérôme Glisse" <jglisse@redhat.com>
> Cc: "Christian König" <christian.koenig@amd.com>
> Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
> ---
>   drivers/gpu/drm/ttm/ttm_bo_manager.c | 92 ++++++++++++++++++++++++++++
>   include/drm/ttm/ttm_bo_driver.h      |  1 +
>   2 files changed, 93 insertions(+)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo_manager.c b/drivers/gpu/drm/ttm/ttm_bo_manager.c
> index 18d3debcc949..26aa1a2ae7f1 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo_manager.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo_manager.c
> @@ -89,6 +89,89 @@ static int ttm_bo_man_get_node(struct ttm_mem_type_manager *man,
>   	return 0;
>   }
>   
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> +static int ttm_bo_insert_aligned(struct drm_mm *mm, struct drm_mm_node *node,
> +				 unsigned long align_pages,
> +				 const struct ttm_place *place,
> +				 struct ttm_mem_reg *mem,
> +				 unsigned long lpfn,
> +				 enum drm_mm_insert_mode mode)
> +{
> +	if (align_pages >= mem->page_alignment &&
> +	    (!mem->page_alignment || align_pages % mem->page_alignment == 0)) {
> +		return drm_mm_insert_node_in_range(mm, node,
> +						   mem->num_pages,
> +						   align_pages, 0,
> +						   place->fpfn, lpfn, mode);
> +	}
> +
> +	return -ENOSPC;
> +}
> +
> +static int ttm_bo_man_get_node_huge(struct ttm_mem_type_manager *man,
> +				    struct ttm_buffer_object *bo,
> +				    const struct ttm_place *place,
> +				    struct ttm_mem_reg *mem)
> +{
> +	struct ttm_range_manager *rman = (struct ttm_range_manager *) man->priv;
> +	struct drm_mm *mm = &rman->mm;
> +	struct drm_mm_node *node;
> +	unsigned long align_pages;
> +	unsigned long lpfn;
> +	enum drm_mm_insert_mode mode = DRM_MM_INSERT_BEST;
> +	int ret;
> +
> +	node = kzalloc(sizeof(*node), GFP_KERNEL);
> +	if (!node)
> +		return -ENOMEM;
> +
> +	lpfn = place->lpfn;
> +	if (!lpfn)
> +		lpfn = man->size;
> +
> +	mode = DRM_MM_INSERT_BEST;
> +	if (place->flags & TTM_PL_FLAG_TOPDOWN)
> +		mode = DRM_MM_INSERT_HIGH;
> +
> +	spin_lock(&rman->lock);
> +	if (IS_ENABLED(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD)) {
> +		align_pages = (HPAGE_PUD_SIZE >> PAGE_SHIFT);
> +		if (mem->num_pages >= align_pages) {
> +			ret = ttm_bo_insert_aligned(mm, node, align_pages,
> +						    place, mem, lpfn, mode);
> +			if (!ret)
> +				goto found_unlock;
> +		}
> +	}
> +
> +	align_pages = (HPAGE_PMD_SIZE >> PAGE_SHIFT);
> +	if (mem->num_pages >= align_pages) {
> +		ret = ttm_bo_insert_aligned(mm, node, align_pages, place, mem,
> +					    lpfn, mode);
> +		if (!ret)
> +			goto found_unlock;
> +	}
> +
> +	ret = drm_mm_insert_node_in_range(mm, node, mem->num_pages,
> +					  mem->page_alignment, 0,
> +					  place->fpfn, lpfn, mode);
> +found_unlock:
> +	spin_unlock(&rman->lock);
> +
> +	if (unlikely(ret)) {
> +		kfree(node);
> +	} else {
> +		mem->mm_node = node;
> +		mem->start = node->start;
> +	}
> +
> +	return 0;
> +}
> +#else
> +#define ttm_bo_man_get_node_huge ttm_bo_man_get_node
> +#endif
> +
> +
>   static void ttm_bo_man_put_node(struct ttm_mem_type_manager *man,
>   				struct ttm_mem_reg *mem)
>   {
> @@ -154,3 +237,12 @@ const struct ttm_mem_type_manager_func ttm_bo_manager_func = {
>   	.debug = ttm_bo_man_debug
>   };
>   EXPORT_SYMBOL(ttm_bo_manager_func);
> +
> +const struct ttm_mem_type_manager_func ttm_bo_manager_huge_func = {
> +	.init = ttm_bo_man_init,
> +	.takedown = ttm_bo_man_takedown,
> +	.get_node = ttm_bo_man_get_node_huge,
> +	.put_node = ttm_bo_man_put_node,
> +	.debug = ttm_bo_man_debug
> +};
> +EXPORT_SYMBOL(ttm_bo_manager_huge_func);
> diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
> index cac7a8a0825a..868bd0d4be6a 100644
> --- a/include/drm/ttm/ttm_bo_driver.h
> +++ b/include/drm/ttm/ttm_bo_driver.h
> @@ -888,5 +888,6 @@ int ttm_bo_pipeline_gutting(struct ttm_buffer_object *bo);
>   pgprot_t ttm_io_prot(uint32_t caching_flags, pgprot_t tmp);
>   
>   extern const struct ttm_mem_type_manager_func ttm_bo_manager_func;
> +extern const struct ttm_mem_type_manager_func ttm_bo_manager_huge_func;
>   
>   #endif



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 6/8] drm: Add a drm_get_unmapped_area() helper
  2019-12-04 11:11   ` Christian König
@ 2019-12-04 11:36     ` Thomas Hellström (VMware)
  2019-12-04 12:08       ` Christian König
  0 siblings, 1 reply; 23+ messages in thread
From: Thomas Hellström (VMware) @ 2019-12-04 11:36 UTC (permalink / raw)
  To: Christian König, linux-mm, linux-kernel, dri-devel
  Cc: pv-drivers, linux-graphics-maintainer, Thomas Hellstrom,
	Andrew Morton, Michal Hocko, Matthew Wilcox (Oracle),
	Kirill A. Shutemov, Ralph Campbell, Jérôme Glisse

On 12/4/19 12:11 PM, Christian König wrote:
> Am 03.12.19 um 14:22 schrieb Thomas Hellström (VMware):
>> From: Thomas Hellstrom <thellstrom@vmware.com>
>>
>> This helper is used to align user-space buffer object addresses to
>> huge page boundaries, minimizing the chance of alignment mismatch
>> between user-space addresses and physical addresses.
>
> Mhm, I'm wondering if that is really such a good idea.

Could you elaborate? What drawbacks do you see?
Note that this is the way other subsystems are doing it. Take a look at 
shmem_get_unmapped_area() for instance.

>
> Wouldn't it be sufficient if userspace uses MAP_HUGETLB?

MAP_HUGETLB is something different and appears to be tied to the kernel 
persistent huge page mechanism, whereas the TTM huge pages is tided to 
the THP functionality (although skipped by khugepaged).

Thanks,

Thomas



>
> Regards,
> Christian.
>
>>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Michal Hocko <mhocko@suse.com>
>> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
>> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
>> Cc: Ralph Campbell <rcampbell@nvidia.com>
>> Cc: "Jérôme Glisse" <jglisse@redhat.com>
>> Cc: "Christian König" <christian.koenig@amd.com>
>> Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
>> ---
>>   drivers/gpu/drm/drm_file.c | 130 +++++++++++++++++++++++++++++++++++++
>>   include/drm/drm_file.h     |   5 ++
>>   2 files changed, 135 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
>> index ea34bc991858..e5b4024cd397 100644
>> --- a/drivers/gpu/drm/drm_file.c
>> +++ b/drivers/gpu/drm/drm_file.c
>> @@ -31,6 +31,8 @@
>>    * OTHER DEALINGS IN THE SOFTWARE.
>>    */
>>   +#include <uapi/asm/mman.h>
>> +
>>   #include <linux/dma-fence.h>
>>   #include <linux/module.h>
>>   #include <linux/pci.h>
>> @@ -41,6 +43,7 @@
>>   #include <drm/drm_drv.h>
>>   #include <drm/drm_file.h>
>>   #include <drm/drm_print.h>
>> +#include <drm/drm_vma_manager.h>
>>     #include "drm_crtc_internal.h"
>>   #include "drm_internal.h"
>> @@ -754,3 +757,130 @@ void drm_send_event(struct drm_device *dev, 
>> struct drm_pending_event *e)
>>       spin_unlock_irqrestore(&dev->event_lock, irqflags);
>>   }
>>   EXPORT_SYMBOL(drm_send_event);
>> +
>> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
>> +/*
>> + * drm_addr_inflate() attempts to construct an aligned area by 
>> inflating
>> + * the area size and skipping the unaligned start of the area.
>> + * adapted from shmem_get_unmapped_area()
>> + */
>> +static unsigned long drm_addr_inflate(unsigned long addr,
>> +                      unsigned long len,
>> +                      unsigned long pgoff,
>> +                      unsigned long flags,
>> +                      unsigned long huge_size)
>> +{
>> +    unsigned long offset, inflated_len;
>> +    unsigned long inflated_addr;
>> +    unsigned long inflated_offset;
>> +
>> +    offset = (pgoff << PAGE_SHIFT) & (huge_size - 1);
>> +    if (offset && offset + len < 2 * huge_size)
>> +        return addr;
>> +    if ((addr & (huge_size - 1)) == offset)
>> +        return addr;
>> +
>> +    inflated_len = len + huge_size - PAGE_SIZE;
>> +    if (inflated_len > TASK_SIZE)
>> +        return addr;
>> +    if (inflated_len < len)
>> +        return addr;
>> +
>> +    inflated_addr = current->mm->get_unmapped_area(NULL, 0, 
>> inflated_len,
>> +                               0, flags);
>> +    if (IS_ERR_VALUE(inflated_addr))
>> +        return addr;
>> +    if (inflated_addr & ~PAGE_MASK)
>> +        return addr;
>> +
>> +    inflated_offset = inflated_addr & (huge_size - 1);
>> +    inflated_addr += offset - inflated_offset;
>> +    if (inflated_offset > offset)
>> +        inflated_addr += huge_size;
>> +
>> +    if (inflated_addr > TASK_SIZE - len)
>> +        return addr;
>> +
>> +    return inflated_addr;
>> +}
>> +
>> +/**
>> + * drm_get_unmapped_area() - Get an unused user-space virtual memory 
>> area
>> + * suitable for huge page table entries.
>> + * @file: The struct file representing the address space being 
>> mmap()'d.
>> + * @uaddr: Start address suggested by user-space.
>> + * @len: Length of the area.
>> + * @pgoff: The page offset into the address space.
>> + * @flags: mmap flags
>> + * @mgr: The address space manager used by the drm driver. This 
>> argument can
>> + * probably be removed at some point when all drivers use the same
>> + * address space manager.
>> + *
>> + * This function attempts to find an unused user-space virtual 
>> memory area
>> + * that can accommodate the size we want to map, and that is properly
>> + * aligned to facilitate huge page table entries matching actual
>> + * huge pages or huge page aligned memory in buffer objects. Buffer 
>> objects
>> + * are assumed to start at huge page boundary pfns (io memory) or be
>> + * populated by huge pages aligned to the start of the buffer object
>> + * (system- or coherent memory). Adapted from shmem_get_unmapped_area.
>> + *
>> + * Return: aligned user-space address.
>> + */
>> +unsigned long drm_get_unmapped_area(struct file *file,
>> +                    unsigned long uaddr, unsigned long len,
>> +                    unsigned long pgoff, unsigned long flags,
>> +                    struct drm_vma_offset_manager *mgr)
>> +{
>> +    unsigned long addr;
>> +    unsigned long inflated_addr;
>> +    struct drm_vma_offset_node *node;
>> +
>> +    if (len > TASK_SIZE)
>> +        return -ENOMEM;
>> +
>> +    /* Adjust mapping offset to be zero at bo start */
>> +    drm_vma_offset_lock_lookup(mgr);
>> +    node = drm_vma_offset_lookup_locked(mgr, pgoff, 1);
>> +    if (node)
>> +        pgoff -= node->vm_node.start;
>> +    drm_vma_offset_unlock_lookup(mgr);
>> +
>> +    addr = current->mm->get_unmapped_area(file, uaddr, len, pgoff, 
>> flags);
>> +    if (IS_ERR_VALUE(addr))
>> +        return addr;
>> +    if (addr & ~PAGE_MASK)
>> +        return addr;
>> +    if (addr > TASK_SIZE - len)
>> +        return addr;
>> +
>> +    if (len < HPAGE_PMD_SIZE)
>> +        return addr;
>> +    if (flags & MAP_FIXED)
>> +        return addr;
>> +    /*
>> +     * Our priority is to support MAP_SHARED mapped hugely;
>> +     * and support MAP_PRIVATE mapped hugely too, until it is COWed.
>> +     * But if caller specified an address hint, respect that as before.
>> +     */
>> +    if (uaddr)
>> +        return addr;
>> +
>> +    inflated_addr = drm_addr_inflate(addr, len, pgoff, flags,
>> +                     HPAGE_PMD_SIZE);
>> +
>> +    if (IS_ENABLED(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD) &&
>> +        len >= HPAGE_PUD_SIZE)
>> +        inflated_addr = drm_addr_inflate(inflated_addr, len, pgoff,
>> +                         flags, HPAGE_PUD_SIZE);
>> +    return inflated_addr;
>> +}
>> +#else /* CONFIG_TRANSPARENT_HUGEPAGE */
>> +unsigned long drm_get_unmapped_area(struct file *file,
>> +                    unsigned long uaddr, unsigned long len,
>> +                    unsigned long pgoff, unsigned long flags,
>> +                    struct drm_vma_offset_manager *mgr)
>> +{
>> +    return current->mm->get_unmapped_area(file, uaddr, len, pgoff, 
>> flags);
>> +}
>> +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
>> +EXPORT_SYMBOL_GPL(drm_get_unmapped_area);
>> diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
>> index 67af60bb527a..4719cc80d547 100644
>> --- a/include/drm/drm_file.h
>> +++ b/include/drm/drm_file.h
>> @@ -386,5 +386,10 @@ void drm_event_cancel_free(struct drm_device *dev,
>>                  struct drm_pending_event *p);
>>   void drm_send_event_locked(struct drm_device *dev, struct 
>> drm_pending_event *e);
>>   void drm_send_event(struct drm_device *dev, struct 
>> drm_pending_event *e);
>> +struct drm_vma_offset_manager;
>> +unsigned long drm_get_unmapped_area(struct file *file,
>> +                    unsigned long uaddr, unsigned long len,
>> +                    unsigned long pgoff, unsigned long flags,
>> +                    struct drm_vma_offset_manager *mgr);
>>     #endif /* _DRM_FILE_H_ */




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 7/8] drm/ttm: Introduce a huge page aligning TTM range manager.
  2019-12-04 11:13   ` Christian König
@ 2019-12-04 11:45     ` Thomas Hellström (VMware)
  2019-12-04 12:16       ` Christian König
  0 siblings, 1 reply; 23+ messages in thread
From: Thomas Hellström (VMware) @ 2019-12-04 11:45 UTC (permalink / raw)
  To: Christian König, linux-mm, linux-kernel, dri-devel
  Cc: pv-drivers, linux-graphics-maintainer, Thomas Hellstrom,
	Andrew Morton, Michal Hocko, Matthew Wilcox (Oracle),
	Kirill A. Shutemov, Ralph Campbell, Jérôme Glisse

On 12/4/19 12:13 PM, Christian König wrote:
> Am 03.12.19 um 14:22 schrieb Thomas Hellström (VMware):
>> From: Thomas Hellstrom <thellstrom@vmware.com>
>>
>> Using huge page-table entries require that the start of a buffer object
>> is huge page size aligned. So introduce a ttm_bo_man_get_node_huge()
>> function that attempts to accomplish this for allocations that are 
>> larger
>> than the huge page size, and provide a new range-manager instance that
>> uses that function.
>
> I still don't think that this is a good idea.

Again, can you elaborate with some specific concerns?

>
> The driver/userspace should just use a proper alignment if it wants to 
> use huge pages.

There are drawbacks with this approach. The TTM alignment is a hard 
constraint. Assume that you want to fit a 1GB buffer object into limited 
VRAM space, and _if possible_ use PUD size huge pages. Let's say there 
is 1GB available, but not 1GB aligned. The proper alignment approach 
would fail and possibly start to evict stuff from VRAM just to be able 
to accomodate the PUD alignment. That's bad. The approach I suggest 
would instead fall back to PMD alignment and use 2MB page table entries 
if possible, and as a last resort use 4K page table entries.

Or do I misunderstand how you envision enforcing the alignment?

Thanks,

Thomas






>
> Regards,
> Christian.
>
>>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Michal Hocko <mhocko@suse.com>
>> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
>> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
>> Cc: Ralph Campbell <rcampbell@nvidia.com>
>> Cc: "Jérôme Glisse" <jglisse@redhat.com>
>> Cc: "Christian König" <christian.koenig@amd.com>
>> Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
>> ---
>>   drivers/gpu/drm/ttm/ttm_bo_manager.c | 92 ++++++++++++++++++++++++++++
>>   include/drm/ttm/ttm_bo_driver.h      |  1 +
>>   2 files changed, 93 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/ttm/ttm_bo_manager.c 
>> b/drivers/gpu/drm/ttm/ttm_bo_manager.c
>> index 18d3debcc949..26aa1a2ae7f1 100644
>> --- a/drivers/gpu/drm/ttm/ttm_bo_manager.c
>> +++ b/drivers/gpu/drm/ttm/ttm_bo_manager.c
>> @@ -89,6 +89,89 @@ static int ttm_bo_man_get_node(struct 
>> ttm_mem_type_manager *man,
>>       return 0;
>>   }
>>   +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
>> +static int ttm_bo_insert_aligned(struct drm_mm *mm, struct 
>> drm_mm_node *node,
>> +                 unsigned long align_pages,
>> +                 const struct ttm_place *place,
>> +                 struct ttm_mem_reg *mem,
>> +                 unsigned long lpfn,
>> +                 enum drm_mm_insert_mode mode)
>> +{
>> +    if (align_pages >= mem->page_alignment &&
>> +        (!mem->page_alignment || align_pages % mem->page_alignment 
>> == 0)) {
>> +        return drm_mm_insert_node_in_range(mm, node,
>> +                           mem->num_pages,
>> +                           align_pages, 0,
>> +                           place->fpfn, lpfn, mode);
>> +    }
>> +
>> +    return -ENOSPC;
>> +}
>> +
>> +static int ttm_bo_man_get_node_huge(struct ttm_mem_type_manager *man,
>> +                    struct ttm_buffer_object *bo,
>> +                    const struct ttm_place *place,
>> +                    struct ttm_mem_reg *mem)
>> +{
>> +    struct ttm_range_manager *rman = (struct ttm_range_manager *) 
>> man->priv;
>> +    struct drm_mm *mm = &rman->mm;
>> +    struct drm_mm_node *node;
>> +    unsigned long align_pages;
>> +    unsigned long lpfn;
>> +    enum drm_mm_insert_mode mode = DRM_MM_INSERT_BEST;
>> +    int ret;
>> +
>> +    node = kzalloc(sizeof(*node), GFP_KERNEL);
>> +    if (!node)
>> +        return -ENOMEM;
>> +
>> +    lpfn = place->lpfn;
>> +    if (!lpfn)
>> +        lpfn = man->size;
>> +
>> +    mode = DRM_MM_INSERT_BEST;
>> +    if (place->flags & TTM_PL_FLAG_TOPDOWN)
>> +        mode = DRM_MM_INSERT_HIGH;
>> +
>> +    spin_lock(&rman->lock);
>> +    if (IS_ENABLED(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD)) {
>> +        align_pages = (HPAGE_PUD_SIZE >> PAGE_SHIFT);
>> +        if (mem->num_pages >= align_pages) {
>> +            ret = ttm_bo_insert_aligned(mm, node, align_pages,
>> +                            place, mem, lpfn, mode);
>> +            if (!ret)
>> +                goto found_unlock;
>> +        }
>> +    }
>> +
>> +    align_pages = (HPAGE_PMD_SIZE >> PAGE_SHIFT);
>> +    if (mem->num_pages >= align_pages) {
>> +        ret = ttm_bo_insert_aligned(mm, node, align_pages, place, mem,
>> +                        lpfn, mode);
>> +        if (!ret)
>> +            goto found_unlock;
>> +    }
>> +
>> +    ret = drm_mm_insert_node_in_range(mm, node, mem->num_pages,
>> +                      mem->page_alignment, 0,
>> +                      place->fpfn, lpfn, mode);
>> +found_unlock:
>> +    spin_unlock(&rman->lock);
>> +
>> +    if (unlikely(ret)) {
>> +        kfree(node);
>> +    } else {
>> +        mem->mm_node = node;
>> +        mem->start = node->start;
>> +    }
>> +
>> +    return 0;
>> +}
>> +#else
>> +#define ttm_bo_man_get_node_huge ttm_bo_man_get_node
>> +#endif
>> +
>> +
>>   static void ttm_bo_man_put_node(struct ttm_mem_type_manager *man,
>>                   struct ttm_mem_reg *mem)
>>   {
>> @@ -154,3 +237,12 @@ const struct ttm_mem_type_manager_func 
>> ttm_bo_manager_func = {
>>       .debug = ttm_bo_man_debug
>>   };
>>   EXPORT_SYMBOL(ttm_bo_manager_func);
>> +
>> +const struct ttm_mem_type_manager_func ttm_bo_manager_huge_func = {
>> +    .init = ttm_bo_man_init,
>> +    .takedown = ttm_bo_man_takedown,
>> +    .get_node = ttm_bo_man_get_node_huge,
>> +    .put_node = ttm_bo_man_put_node,
>> +    .debug = ttm_bo_man_debug
>> +};
>> +EXPORT_SYMBOL(ttm_bo_manager_huge_func);
>> diff --git a/include/drm/ttm/ttm_bo_driver.h 
>> b/include/drm/ttm/ttm_bo_driver.h
>> index cac7a8a0825a..868bd0d4be6a 100644
>> --- a/include/drm/ttm/ttm_bo_driver.h
>> +++ b/include/drm/ttm/ttm_bo_driver.h
>> @@ -888,5 +888,6 @@ int ttm_bo_pipeline_gutting(struct 
>> ttm_buffer_object *bo);
>>   pgprot_t ttm_io_prot(uint32_t caching_flags, pgprot_t tmp);
>>     extern const struct ttm_mem_type_manager_func ttm_bo_manager_func;
>> +extern const struct ttm_mem_type_manager_func ttm_bo_manager_huge_func;
>>     #endif




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 6/8] drm: Add a drm_get_unmapped_area() helper
  2019-12-04 11:36     ` Thomas Hellström (VMware)
@ 2019-12-04 12:08       ` Christian König
  2019-12-04 12:32         ` Thomas Hellström (VMware)
  0 siblings, 1 reply; 23+ messages in thread
From: Christian König @ 2019-12-04 12:08 UTC (permalink / raw)
  To: Thomas Hellström (VMware), linux-mm, linux-kernel, dri-devel
  Cc: pv-drivers, linux-graphics-maintainer, Thomas Hellstrom,
	Andrew Morton, Michal Hocko, Matthew Wilcox (Oracle),
	Kirill A. Shutemov, Ralph Campbell, Jérôme Glisse

Am 04.12.19 um 12:36 schrieb Thomas Hellström (VMware):
> On 12/4/19 12:11 PM, Christian König wrote:
>> Am 03.12.19 um 14:22 schrieb Thomas Hellström (VMware):
>>> From: Thomas Hellstrom <thellstrom@vmware.com>
>>>
>>> This helper is used to align user-space buffer object addresses to
>>> huge page boundaries, minimizing the chance of alignment mismatch
>>> between user-space addresses and physical addresses.
>>
>> Mhm, I'm wondering if that is really such a good idea.
>
> Could you elaborate? What drawbacks do you see?

Main problem for me seems to be that I don't fully understand what the 
get_unmapped_area callback is doing.

For example why do we need to use drm_vma_offset_lookup_locked() to 
adjust the pgoff?

The mapped offset should be completely irrelevant for finding some piece 
of userspace address space or am I totally off here?

> Note that this is the way other subsystems are doing it. Take a look 
> at shmem_get_unmapped_area() for instance.
>
>>
>> Wouldn't it be sufficient if userspace uses MAP_HUGETLB?
>
> MAP_HUGETLB is something different and appears to be tied to the 
> kernel persistent huge page mechanism, whereas the TTM huge pages is 
> tided to the THP functionality (although skipped by khugepaged).

Ok, that makes sense. Over all we want to be transparent here.

Regards,
Christian.

>
> Thanks,
>
> Thomas
>
>
>
>>
>> Regards,
>> Christian.
>>
>>>
>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>> Cc: Michal Hocko <mhocko@suse.com>
>>> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
>>> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
>>> Cc: Ralph Campbell <rcampbell@nvidia.com>
>>> Cc: "Jérôme Glisse" <jglisse@redhat.com>
>>> Cc: "Christian König" <christian.koenig@amd.com>
>>> Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
>>> ---
>>>   drivers/gpu/drm/drm_file.c | 130 
>>> +++++++++++++++++++++++++++++++++++++
>>>   include/drm/drm_file.h     |   5 ++
>>>   2 files changed, 135 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
>>> index ea34bc991858..e5b4024cd397 100644
>>> --- a/drivers/gpu/drm/drm_file.c
>>> +++ b/drivers/gpu/drm/drm_file.c
>>> @@ -31,6 +31,8 @@
>>>    * OTHER DEALINGS IN THE SOFTWARE.
>>>    */
>>>   +#include <uapi/asm/mman.h>
>>> +
>>>   #include <linux/dma-fence.h>
>>>   #include <linux/module.h>
>>>   #include <linux/pci.h>
>>> @@ -41,6 +43,7 @@
>>>   #include <drm/drm_drv.h>
>>>   #include <drm/drm_file.h>
>>>   #include <drm/drm_print.h>
>>> +#include <drm/drm_vma_manager.h>
>>>     #include "drm_crtc_internal.h"
>>>   #include "drm_internal.h"
>>> @@ -754,3 +757,130 @@ void drm_send_event(struct drm_device *dev, 
>>> struct drm_pending_event *e)
>>>       spin_unlock_irqrestore(&dev->event_lock, irqflags);
>>>   }
>>>   EXPORT_SYMBOL(drm_send_event);
>>> +
>>> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
>>> +/*
>>> + * drm_addr_inflate() attempts to construct an aligned area by 
>>> inflating
>>> + * the area size and skipping the unaligned start of the area.
>>> + * adapted from shmem_get_unmapped_area()
>>> + */
>>> +static unsigned long drm_addr_inflate(unsigned long addr,
>>> +                      unsigned long len,
>>> +                      unsigned long pgoff,
>>> +                      unsigned long flags,
>>> +                      unsigned long huge_size)
>>> +{
>>> +    unsigned long offset, inflated_len;
>>> +    unsigned long inflated_addr;
>>> +    unsigned long inflated_offset;
>>> +
>>> +    offset = (pgoff << PAGE_SHIFT) & (huge_size - 1);
>>> +    if (offset && offset + len < 2 * huge_size)
>>> +        return addr;
>>> +    if ((addr & (huge_size - 1)) == offset)
>>> +        return addr;
>>> +
>>> +    inflated_len = len + huge_size - PAGE_SIZE;
>>> +    if (inflated_len > TASK_SIZE)
>>> +        return addr;
>>> +    if (inflated_len < len)
>>> +        return addr;
>>> +
>>> +    inflated_addr = current->mm->get_unmapped_area(NULL, 0, 
>>> inflated_len,
>>> +                               0, flags);
>>> +    if (IS_ERR_VALUE(inflated_addr))
>>> +        return addr;
>>> +    if (inflated_addr & ~PAGE_MASK)
>>> +        return addr;
>>> +
>>> +    inflated_offset = inflated_addr & (huge_size - 1);
>>> +    inflated_addr += offset - inflated_offset;
>>> +    if (inflated_offset > offset)
>>> +        inflated_addr += huge_size;
>>> +
>>> +    if (inflated_addr > TASK_SIZE - len)
>>> +        return addr;
>>> +
>>> +    return inflated_addr;
>>> +}
>>> +
>>> +/**
>>> + * drm_get_unmapped_area() - Get an unused user-space virtual 
>>> memory area
>>> + * suitable for huge page table entries.
>>> + * @file: The struct file representing the address space being 
>>> mmap()'d.
>>> + * @uaddr: Start address suggested by user-space.
>>> + * @len: Length of the area.
>>> + * @pgoff: The page offset into the address space.
>>> + * @flags: mmap flags
>>> + * @mgr: The address space manager used by the drm driver. This 
>>> argument can
>>> + * probably be removed at some point when all drivers use the same
>>> + * address space manager.
>>> + *
>>> + * This function attempts to find an unused user-space virtual 
>>> memory area
>>> + * that can accommodate the size we want to map, and that is properly
>>> + * aligned to facilitate huge page table entries matching actual
>>> + * huge pages or huge page aligned memory in buffer objects. Buffer 
>>> objects
>>> + * are assumed to start at huge page boundary pfns (io memory) or be
>>> + * populated by huge pages aligned to the start of the buffer object
>>> + * (system- or coherent memory). Adapted from shmem_get_unmapped_area.
>>> + *
>>> + * Return: aligned user-space address.
>>> + */
>>> +unsigned long drm_get_unmapped_area(struct file *file,
>>> +                    unsigned long uaddr, unsigned long len,
>>> +                    unsigned long pgoff, unsigned long flags,
>>> +                    struct drm_vma_offset_manager *mgr)
>>> +{
>>> +    unsigned long addr;
>>> +    unsigned long inflated_addr;
>>> +    struct drm_vma_offset_node *node;
>>> +
>>> +    if (len > TASK_SIZE)
>>> +        return -ENOMEM;
>>> +
>>> +    /* Adjust mapping offset to be zero at bo start */
>>> +    drm_vma_offset_lock_lookup(mgr);
>>> +    node = drm_vma_offset_lookup_locked(mgr, pgoff, 1);
>>> +    if (node)
>>> +        pgoff -= node->vm_node.start;
>>> +    drm_vma_offset_unlock_lookup(mgr);
>>> +
>>> +    addr = current->mm->get_unmapped_area(file, uaddr, len, pgoff, 
>>> flags);
>>> +    if (IS_ERR_VALUE(addr))
>>> +        return addr;
>>> +    if (addr & ~PAGE_MASK)
>>> +        return addr;
>>> +    if (addr > TASK_SIZE - len)
>>> +        return addr;
>>> +
>>> +    if (len < HPAGE_PMD_SIZE)
>>> +        return addr;
>>> +    if (flags & MAP_FIXED)
>>> +        return addr;
>>> +    /*
>>> +     * Our priority is to support MAP_SHARED mapped hugely;
>>> +     * and support MAP_PRIVATE mapped hugely too, until it is COWed.
>>> +     * But if caller specified an address hint, respect that as 
>>> before.
>>> +     */
>>> +    if (uaddr)
>>> +        return addr;
>>> +
>>> +    inflated_addr = drm_addr_inflate(addr, len, pgoff, flags,
>>> +                     HPAGE_PMD_SIZE);
>>> +
>>> +    if (IS_ENABLED(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD) &&
>>> +        len >= HPAGE_PUD_SIZE)
>>> +        inflated_addr = drm_addr_inflate(inflated_addr, len, pgoff,
>>> +                         flags, HPAGE_PUD_SIZE);
>>> +    return inflated_addr;
>>> +}
>>> +#else /* CONFIG_TRANSPARENT_HUGEPAGE */
>>> +unsigned long drm_get_unmapped_area(struct file *file,
>>> +                    unsigned long uaddr, unsigned long len,
>>> +                    unsigned long pgoff, unsigned long flags,
>>> +                    struct drm_vma_offset_manager *mgr)
>>> +{
>>> +    return current->mm->get_unmapped_area(file, uaddr, len, pgoff, 
>>> flags);
>>> +}
>>> +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
>>> +EXPORT_SYMBOL_GPL(drm_get_unmapped_area);
>>> diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
>>> index 67af60bb527a..4719cc80d547 100644
>>> --- a/include/drm/drm_file.h
>>> +++ b/include/drm/drm_file.h
>>> @@ -386,5 +386,10 @@ void drm_event_cancel_free(struct drm_device *dev,
>>>                  struct drm_pending_event *p);
>>>   void drm_send_event_locked(struct drm_device *dev, struct 
>>> drm_pending_event *e);
>>>   void drm_send_event(struct drm_device *dev, struct 
>>> drm_pending_event *e);
>>> +struct drm_vma_offset_manager;
>>> +unsigned long drm_get_unmapped_area(struct file *file,
>>> +                    unsigned long uaddr, unsigned long len,
>>> +                    unsigned long pgoff, unsigned long flags,
>>> +                    struct drm_vma_offset_manager *mgr);
>>>     #endif /* _DRM_FILE_H_ */
>
>



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 7/8] drm/ttm: Introduce a huge page aligning TTM range manager.
  2019-12-04 11:45     ` Thomas Hellström (VMware)
@ 2019-12-04 12:16       ` Christian König
  2019-12-04 13:18         ` Thomas Hellström (VMware)
  0 siblings, 1 reply; 23+ messages in thread
From: Christian König @ 2019-12-04 12:16 UTC (permalink / raw)
  To: Thomas Hellström (VMware), linux-mm, linux-kernel, dri-devel
  Cc: pv-drivers, linux-graphics-maintainer, Thomas Hellstrom,
	Andrew Morton, Michal Hocko, Matthew Wilcox (Oracle),
	Kirill A. Shutemov, Ralph Campbell, Jérôme Glisse

Am 04.12.19 um 12:45 schrieb Thomas Hellström (VMware):
> On 12/4/19 12:13 PM, Christian König wrote:
>> Am 03.12.19 um 14:22 schrieb Thomas Hellström (VMware):
>>> From: Thomas Hellstrom <thellstrom@vmware.com>
>>>
>>> Using huge page-table entries require that the start of a buffer object
>>> is huge page size aligned. So introduce a ttm_bo_man_get_node_huge()
>>> function that attempts to accomplish this for allocations that are 
>>> larger
>>> than the huge page size, and provide a new range-manager instance that
>>> uses that function.
>>
>> I still don't think that this is a good idea.
>
> Again, can you elaborate with some specific concerns?

You seems to be seeing PUD as something optional.

>>
>> The driver/userspace should just use a proper alignment if it wants 
>> to use huge pages.
>
> There are drawbacks with this approach. The TTM alignment is a hard 
> constraint. Assume that you want to fit a 1GB buffer object into 
> limited VRAM space, and _if possible_ use PUD size huge pages. Let's 
> say there is 1GB available, but not 1GB aligned. The proper alignment 
> approach would fail and possibly start to evict stuff from VRAM just 
> to be able to accomodate the PUD alignment. That's bad. The approach I 
> suggest would instead fall back to PMD alignment and use 2MB page 
> table entries if possible, and as a last resort use 4K page table 
> entries.

And exactly that sounds like a bad idea to me.

Using 1GB alignment is indeed unrealistic in most cases, but for 2MB 
alignment we should really start to evict BOs.

Otherwise the address space can become fragmented and we won't be able 
de-fragment it in any way.

Additional to that 2MB pages is becoming mandatory for some hardware 
features. E.g. scanning out of system memory won't be possible with 4k 
pages in the future.

Regards,
Christian.

> Or do I misunderstand how you envision enforcing the alignment?
>
> Thanks,
>
> Thomas
>
>
>
>
>
>
>>
>> Regards,
>> Christian.
>>
>>>
>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>> Cc: Michal Hocko <mhocko@suse.com>
>>> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
>>> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
>>> Cc: Ralph Campbell <rcampbell@nvidia.com>
>>> Cc: "Jérôme Glisse" <jglisse@redhat.com>
>>> Cc: "Christian König" <christian.koenig@amd.com>
>>> Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
>>> ---
>>>   drivers/gpu/drm/ttm/ttm_bo_manager.c | 92 
>>> ++++++++++++++++++++++++++++
>>>   include/drm/ttm/ttm_bo_driver.h      |  1 +
>>>   2 files changed, 93 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo_manager.c 
>>> b/drivers/gpu/drm/ttm/ttm_bo_manager.c
>>> index 18d3debcc949..26aa1a2ae7f1 100644
>>> --- a/drivers/gpu/drm/ttm/ttm_bo_manager.c
>>> +++ b/drivers/gpu/drm/ttm/ttm_bo_manager.c
>>> @@ -89,6 +89,89 @@ static int ttm_bo_man_get_node(struct 
>>> ttm_mem_type_manager *man,
>>>       return 0;
>>>   }
>>>   +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
>>> +static int ttm_bo_insert_aligned(struct drm_mm *mm, struct 
>>> drm_mm_node *node,
>>> +                 unsigned long align_pages,
>>> +                 const struct ttm_place *place,
>>> +                 struct ttm_mem_reg *mem,
>>> +                 unsigned long lpfn,
>>> +                 enum drm_mm_insert_mode mode)
>>> +{
>>> +    if (align_pages >= mem->page_alignment &&
>>> +        (!mem->page_alignment || align_pages % mem->page_alignment 
>>> == 0)) {
>>> +        return drm_mm_insert_node_in_range(mm, node,
>>> +                           mem->num_pages,
>>> +                           align_pages, 0,
>>> +                           place->fpfn, lpfn, mode);
>>> +    }
>>> +
>>> +    return -ENOSPC;
>>> +}
>>> +
>>> +static int ttm_bo_man_get_node_huge(struct ttm_mem_type_manager *man,
>>> +                    struct ttm_buffer_object *bo,
>>> +                    const struct ttm_place *place,
>>> +                    struct ttm_mem_reg *mem)
>>> +{
>>> +    struct ttm_range_manager *rman = (struct ttm_range_manager *) 
>>> man->priv;
>>> +    struct drm_mm *mm = &rman->mm;
>>> +    struct drm_mm_node *node;
>>> +    unsigned long align_pages;
>>> +    unsigned long lpfn;
>>> +    enum drm_mm_insert_mode mode = DRM_MM_INSERT_BEST;
>>> +    int ret;
>>> +
>>> +    node = kzalloc(sizeof(*node), GFP_KERNEL);
>>> +    if (!node)
>>> +        return -ENOMEM;
>>> +
>>> +    lpfn = place->lpfn;
>>> +    if (!lpfn)
>>> +        lpfn = man->size;
>>> +
>>> +    mode = DRM_MM_INSERT_BEST;
>>> +    if (place->flags & TTM_PL_FLAG_TOPDOWN)
>>> +        mode = DRM_MM_INSERT_HIGH;
>>> +
>>> +    spin_lock(&rman->lock);
>>> +    if (IS_ENABLED(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD)) {
>>> +        align_pages = (HPAGE_PUD_SIZE >> PAGE_SHIFT);
>>> +        if (mem->num_pages >= align_pages) {
>>> +            ret = ttm_bo_insert_aligned(mm, node, align_pages,
>>> +                            place, mem, lpfn, mode);
>>> +            if (!ret)
>>> +                goto found_unlock;
>>> +        }
>>> +    }
>>> +
>>> +    align_pages = (HPAGE_PMD_SIZE >> PAGE_SHIFT);
>>> +    if (mem->num_pages >= align_pages) {
>>> +        ret = ttm_bo_insert_aligned(mm, node, align_pages, place, mem,
>>> +                        lpfn, mode);
>>> +        if (!ret)
>>> +            goto found_unlock;
>>> +    }
>>> +
>>> +    ret = drm_mm_insert_node_in_range(mm, node, mem->num_pages,
>>> +                      mem->page_alignment, 0,
>>> +                      place->fpfn, lpfn, mode);
>>> +found_unlock:
>>> +    spin_unlock(&rman->lock);
>>> +
>>> +    if (unlikely(ret)) {
>>> +        kfree(node);
>>> +    } else {
>>> +        mem->mm_node = node;
>>> +        mem->start = node->start;
>>> +    }
>>> +
>>> +    return 0;
>>> +}
>>> +#else
>>> +#define ttm_bo_man_get_node_huge ttm_bo_man_get_node
>>> +#endif
>>> +
>>> +
>>>   static void ttm_bo_man_put_node(struct ttm_mem_type_manager *man,
>>>                   struct ttm_mem_reg *mem)
>>>   {
>>> @@ -154,3 +237,12 @@ const struct ttm_mem_type_manager_func 
>>> ttm_bo_manager_func = {
>>>       .debug = ttm_bo_man_debug
>>>   };
>>>   EXPORT_SYMBOL(ttm_bo_manager_func);
>>> +
>>> +const struct ttm_mem_type_manager_func ttm_bo_manager_huge_func = {
>>> +    .init = ttm_bo_man_init,
>>> +    .takedown = ttm_bo_man_takedown,
>>> +    .get_node = ttm_bo_man_get_node_huge,
>>> +    .put_node = ttm_bo_man_put_node,
>>> +    .debug = ttm_bo_man_debug
>>> +};
>>> +EXPORT_SYMBOL(ttm_bo_manager_huge_func);
>>> diff --git a/include/drm/ttm/ttm_bo_driver.h 
>>> b/include/drm/ttm/ttm_bo_driver.h
>>> index cac7a8a0825a..868bd0d4be6a 100644
>>> --- a/include/drm/ttm/ttm_bo_driver.h
>>> +++ b/include/drm/ttm/ttm_bo_driver.h
>>> @@ -888,5 +888,6 @@ int ttm_bo_pipeline_gutting(struct 
>>> ttm_buffer_object *bo);
>>>   pgprot_t ttm_io_prot(uint32_t caching_flags, pgprot_t tmp);
>>>     extern const struct ttm_mem_type_manager_func ttm_bo_manager_func;
>>> +extern const struct ttm_mem_type_manager_func 
>>> ttm_bo_manager_huge_func;
>>>     #endif
>
>



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 6/8] drm: Add a drm_get_unmapped_area() helper
  2019-12-04 12:08       ` Christian König
@ 2019-12-04 12:32         ` Thomas Hellström (VMware)
  2019-12-04 14:40           ` Christian König
  0 siblings, 1 reply; 23+ messages in thread
From: Thomas Hellström (VMware) @ 2019-12-04 12:32 UTC (permalink / raw)
  To: Christian König, linux-mm, linux-kernel, dri-devel
  Cc: pv-drivers, linux-graphics-maintainer, Thomas Hellstrom,
	Andrew Morton, Michal Hocko, Matthew Wilcox (Oracle),
	Kirill A. Shutemov, Ralph Campbell, Jérôme Glisse

On 12/4/19 1:08 PM, Christian König wrote:
> Am 04.12.19 um 12:36 schrieb Thomas Hellström (VMware):
>> On 12/4/19 12:11 PM, Christian König wrote:
>>> Am 03.12.19 um 14:22 schrieb Thomas Hellström (VMware):
>>>> From: Thomas Hellstrom <thellstrom@vmware.com>
>>>>
>>>> This helper is used to align user-space buffer object addresses to
>>>> huge page boundaries, minimizing the chance of alignment mismatch
>>>> between user-space addresses and physical addresses.
>>>
>>> Mhm, I'm wondering if that is really such a good idea.
>>
>> Could you elaborate? What drawbacks do you see?
>
> Main problem for me seems to be that I don't fully understand what the 
> get_unmapped_area callback is doing.

It makes sure that, if there is a chance that we could use huge 
page-table entries, virtual address huge page boundaries are perfectly 
aligned to physical address huge page boundaries, which is if not a CPU 
hardware requirement, at least a kernel requirement currently.


>
> For example why do we need to use drm_vma_offset_lookup_locked() to 
> adjust the pgoff?
>
> The mapped offset should be completely irrelevant for finding some 
> piece of userspace address space or am I totally off here?


Because the unmodified pgoff assumes that physical address boundaries 
are perfectly aligned with file offset boundaries, which is typical for 
all other subsystems.

That's not true for TTM, however, where a buffer object start physical 
address may be huge page aligned, but the file offset is always page 
aligned. We could of course change that to align also file offsets to 
huge page size boundaries, but with the above adjustment, that's not 
needed. I opted for the adjustment.

Thanks,

Thomas




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 7/8] drm/ttm: Introduce a huge page aligning TTM range manager.
  2019-12-04 12:16       ` Christian König
@ 2019-12-04 13:18         ` Thomas Hellström (VMware)
  2019-12-04 14:02           ` Christian König
  0 siblings, 1 reply; 23+ messages in thread
From: Thomas Hellström (VMware) @ 2019-12-04 13:18 UTC (permalink / raw)
  To: Christian König, linux-mm, linux-kernel, dri-devel
  Cc: pv-drivers, linux-graphics-maintainer, Thomas Hellstrom,
	Andrew Morton, Michal Hocko, Matthew Wilcox (Oracle),
	Kirill A. Shutemov, Ralph Campbell, Jérôme Glisse

On 12/4/19 1:16 PM, Christian König wrote:
> Am 04.12.19 um 12:45 schrieb Thomas Hellström (VMware):
>> On 12/4/19 12:13 PM, Christian König wrote:
>>> Am 03.12.19 um 14:22 schrieb Thomas Hellström (VMware):
>>>> From: Thomas Hellstrom <thellstrom@vmware.com>
>>>>
>>>> Using huge page-table entries require that the start of a buffer 
>>>> object
>>>> is huge page size aligned. So introduce a ttm_bo_man_get_node_huge()
>>>> function that attempts to accomplish this for allocations that are 
>>>> larger
>>>> than the huge page size, and provide a new range-manager instance that
>>>> uses that function.
>>>
>>> I still don't think that this is a good idea.
>>
>> Again, can you elaborate with some specific concerns?
>
> You seems to be seeing PUD as something optional.
>
>>>
>>> The driver/userspace should just use a proper alignment if it wants 
>>> to use huge pages.
>>
>> There are drawbacks with this approach. The TTM alignment is a hard 
>> constraint. Assume that you want to fit a 1GB buffer object into 
>> limited VRAM space, and _if possible_ use PUD size huge pages. Let's 
>> say there is 1GB available, but not 1GB aligned. The proper alignment 
>> approach would fail and possibly start to evict stuff from VRAM just 
>> to be able to accomodate the PUD alignment. That's bad. The approach 
>> I suggest would instead fall back to PMD alignment and use 2MB page 
>> table entries if possible, and as a last resort use 4K page table 
>> entries.
>
> And exactly that sounds like a bad idea to me.
>
> Using 1GB alignment is indeed unrealistic in most cases, but for 2MB 
> alignment we should really start to evict BOs.
>
> Otherwise the address space can become fragmented and we won't be able 
> de-fragment it in any way.

Ah, I see, Yeah that's the THP tradeoff between fragmentation and 
memory-usage. From my point of view, it's not self-evident that either 
approach is the best one, but the nice thing with the suggested code is 
that you can view it as an optional helper. For example, to avoid 
fragmentation and have a high huge-page hit ratio for 2MB pages, You'd 
either inflate the buffer object size to be 2MB aligned, which would 
affect also system memory, or you'd set the TTM memory alignment to 2MB. 
If in addition you'd like "soft" (non-evicting) alignment also for 1GB 
pages, you'd also hook up the new range manager. I figure different 
drivers would want to use different strategies.

In any case, vmwgfx would, due to its very limited VRAM size, want to 
use the "soft" alignment provided by this patch, but if you don't see 
any other drivers wanting that, I could definitely move it to vmwgfx.

/Thomas





^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 7/8] drm/ttm: Introduce a huge page aligning TTM range manager.
  2019-12-04 13:18         ` Thomas Hellström (VMware)
@ 2019-12-04 14:02           ` Christian König
  0 siblings, 0 replies; 23+ messages in thread
From: Christian König @ 2019-12-04 14:02 UTC (permalink / raw)
  To: Thomas Hellström (VMware), linux-mm, linux-kernel, dri-devel
  Cc: pv-drivers, linux-graphics-maintainer, Thomas Hellstrom,
	Andrew Morton, Michal Hocko, Matthew Wilcox (Oracle),
	Kirill A. Shutemov, Ralph Campbell, Jérôme Glisse

Am 04.12.19 um 14:18 schrieb Thomas Hellström (VMware):
> On 12/4/19 1:16 PM, Christian König wrote:
>> Am 04.12.19 um 12:45 schrieb Thomas Hellström (VMware):
>>> On 12/4/19 12:13 PM, Christian König wrote:
>>>> Am 03.12.19 um 14:22 schrieb Thomas Hellström (VMware):
>>>>> From: Thomas Hellstrom <thellstrom@vmware.com>
>>>>>
>>>>> Using huge page-table entries require that the start of a buffer 
>>>>> object
>>>>> is huge page size aligned. So introduce a ttm_bo_man_get_node_huge()
>>>>> function that attempts to accomplish this for allocations that are 
>>>>> larger
>>>>> than the huge page size, and provide a new range-manager instance 
>>>>> that
>>>>> uses that function.
>>>>
>>>> I still don't think that this is a good idea.
>>>
>>> Again, can you elaborate with some specific concerns?
>>
>> You seems to be seeing PUD as something optional.
>>
>>>>
>>>> The driver/userspace should just use a proper alignment if it wants 
>>>> to use huge pages.
>>>
>>> There are drawbacks with this approach. The TTM alignment is a hard 
>>> constraint. Assume that you want to fit a 1GB buffer object into 
>>> limited VRAM space, and _if possible_ use PUD size huge pages. Let's 
>>> say there is 1GB available, but not 1GB aligned. The proper 
>>> alignment approach would fail and possibly start to evict stuff from 
>>> VRAM just to be able to accomodate the PUD alignment. That's bad. 
>>> The approach I suggest would instead fall back to PMD alignment and 
>>> use 2MB page table entries if possible, and as a last resort use 4K 
>>> page table entries.
>>
>> And exactly that sounds like a bad idea to me.
>>
>> Using 1GB alignment is indeed unrealistic in most cases, but for 2MB 
>> alignment we should really start to evict BOs.
>>
>> Otherwise the address space can become fragmented and we won't be 
>> able de-fragment it in any way.
>
> Ah, I see, Yeah that's the THP tradeoff between fragmentation and 
> memory-usage. From my point of view, it's not self-evident that either 
> approach is the best one, but the nice thing with the suggested code 
> is that you can view it as an optional helper. For example, to avoid 
> fragmentation and have a high huge-page hit ratio for 2MB pages, You'd 
> either inflate the buffer object size to be 2MB aligned, which would 
> affect also system memory, or you'd set the TTM memory alignment to 
> 2MB. If in addition you'd like "soft" (non-evicting) alignment also 
> for 1GB pages, you'd also hook up the new range manager. I figure 
> different drivers would want to use different strategies.
>
> In any case, vmwgfx would, due to its very limited VRAM size, want to 
> use the "soft" alignment provided by this patch, but if you don't see 
> any other drivers wanting that, I could definitely move it to vmwgfx.

Ok, let's do it this way then. Both amdgpu and well as nouveau have 
specialized allocators anyway and I don't see the need for this in radeon.

Regards,
Christian.

>
> /Thomas
>
>
>



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 6/8] drm: Add a drm_get_unmapped_area() helper
  2019-12-04 12:32         ` Thomas Hellström (VMware)
@ 2019-12-04 14:40           ` Christian König
  2019-12-04 15:36             ` Thomas Hellström (VMware)
  0 siblings, 1 reply; 23+ messages in thread
From: Christian König @ 2019-12-04 14:40 UTC (permalink / raw)
  To: Thomas Hellström (VMware), linux-mm, linux-kernel, dri-devel
  Cc: pv-drivers, linux-graphics-maintainer, Thomas Hellstrom,
	Andrew Morton, Michal Hocko, Matthew Wilcox (Oracle),
	Kirill A. Shutemov, Ralph Campbell, Jérôme Glisse

Am 04.12.19 um 13:32 schrieb Thomas Hellström (VMware):
> On 12/4/19 1:08 PM, Christian König wrote:
>> Am 04.12.19 um 12:36 schrieb Thomas Hellström (VMware):
>>> On 12/4/19 12:11 PM, Christian König wrote:
>>>> Am 03.12.19 um 14:22 schrieb Thomas Hellström (VMware):
>>>>> From: Thomas Hellstrom <thellstrom@vmware.com>
>>>>>
>>>>> This helper is used to align user-space buffer object addresses to
>>>>> huge page boundaries, minimizing the chance of alignment mismatch
>>>>> between user-space addresses and physical addresses.
>>>>
>>>> Mhm, I'm wondering if that is really such a good idea.
>>>
>>> Could you elaborate? What drawbacks do you see?
>>
>> Main problem for me seems to be that I don't fully understand what 
>> the get_unmapped_area callback is doing.
>
> It makes sure that, if there is a chance that we could use huge 
> page-table entries, virtual address huge page boundaries are perfectly 
> aligned to physical address huge page boundaries, which is if not a 
> CPU hardware requirement, at least a kernel requirement currently.
>
>
>>
>> For example why do we need to use drm_vma_offset_lookup_locked() to 
>> adjust the pgoff?
>>
>> The mapped offset should be completely irrelevant for finding some 
>> piece of userspace address space or am I totally off here?
>
>
> Because the unmodified pgoff assumes that physical address boundaries 
> are perfectly aligned with file offset boundaries, which is typical 
> for all other subsystems.
>
> That's not true for TTM, however, where a buffer object start physical 
> address may be huge page aligned, but the file offset is always page 
> aligned. We could of course change that to align also file offsets to 
> huge page size boundaries, but with the above adjustment, that's not 
> needed. I opted for the adjustment.

I would opt for aligning the file offsets instead.

Now that you explained it that the rest of the kernel enforces this 
actually makes sense.

Regards,
Christian.

>
> Thanks,
>
> Thomas
>
>



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 6/8] drm: Add a drm_get_unmapped_area() helper
  2019-12-04 14:40           ` Christian König
@ 2019-12-04 15:36             ` Thomas Hellström (VMware)
  0 siblings, 0 replies; 23+ messages in thread
From: Thomas Hellström (VMware) @ 2019-12-04 15:36 UTC (permalink / raw)
  To: Christian König, linux-mm, linux-kernel, dri-devel
  Cc: pv-drivers, linux-graphics-maintainer, Thomas Hellstrom,
	Andrew Morton, Michal Hocko, Matthew Wilcox (Oracle),
	Kirill A. Shutemov, Ralph Campbell, Jérôme Glisse

On 12/4/19 3:40 PM, Christian König wrote:
> Am 04.12.19 um 13:32 schrieb Thomas Hellström (VMware):
>> On 12/4/19 1:08 PM, Christian König wrote:
>>> Am 04.12.19 um 12:36 schrieb Thomas Hellström (VMware):
>>>> On 12/4/19 12:11 PM, Christian König wrote:
>>>>> Am 03.12.19 um 14:22 schrieb Thomas Hellström (VMware):
>>>>>> From: Thomas Hellstrom <thellstrom@vmware.com>
>>>>>>
>>>>>> This helper is used to align user-space buffer object addresses to
>>>>>> huge page boundaries, minimizing the chance of alignment mismatch
>>>>>> between user-space addresses and physical addresses.
>>>>>
>>>>> Mhm, I'm wondering if that is really such a good idea.
>>>>
>>>> Could you elaborate? What drawbacks do you see?
>>>
>>> Main problem for me seems to be that I don't fully understand what 
>>> the get_unmapped_area callback is doing.
>>
>> It makes sure that, if there is a chance that we could use huge 
>> page-table entries, virtual address huge page boundaries are 
>> perfectly aligned to physical address huge page boundaries, which is 
>> if not a CPU hardware requirement, at least a kernel requirement 
>> currently.
>>
>>
>>>
>>> For example why do we need to use drm_vma_offset_lookup_locked() to 
>>> adjust the pgoff?
>>>
>>> The mapped offset should be completely irrelevant for finding some 
>>> piece of userspace address space or am I totally off here?
>>
>>
>> Because the unmodified pgoff assumes that physical address boundaries 
>> are perfectly aligned with file offset boundaries, which is typical 
>> for all other subsystems.
>>
>> That's not true for TTM, however, where a buffer object start 
>> physical address may be huge page aligned, but the file offset is 
>> always page aligned. We could of course change that to align also 
>> file offsets to huge page size boundaries, but with the above 
>> adjustment, that's not needed. I opted for the adjustment.
>
> I would opt for aligning the file offsets instead.

Yes but that adds additional complexity and considerations which made me 
think that lookup was the by far simplest choice:

- We need to modify the vma manager to care about alignments.
- Fragmentation issues. Do we want to align > 1G BOs
- For which drivers do we want to do this, how do we handle drivers that 
want to opt out in TTM mmap()?
- Non TTM drivers. Could they still reuse the same get_unmapped_area.

>
> Now that you explained it that the rest of the kernel enforces this 
> actually makes sense.

So is that an ack?


Thanks,

Thomas



>
> Regards,
> Christian.
>
>>
>> Thanks,
>>
>> Thomas
>>
>>



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/8] mm: Split huge pages on write-notify or COW
  2019-12-03 13:22 ` [PATCH 2/8] mm: Split huge pages on write-notify or COW Thomas Hellström (VMware)
@ 2020-03-01  4:04   ` Andrew Morton
  0 siblings, 0 replies; 23+ messages in thread
From: Andrew Morton @ 2020-03-01  4:04 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: linux-mm, linux-kernel, dri-devel, pv-drivers,
	linux-graphics-maintainer, Thomas Hellstrom, Michal Hocko,
	Matthew Wilcox (Oracle),
	Kirill A. Shutemov, Ralph Campbell, Jérôme Glisse,
	Christian König

On Tue,  3 Dec 2019 14:22:33 +0100 Thomas Hellström (VMware) <thomas_os@shipmail.org> wrote:

> We currently only do COW and write-notify on the PTE level, so if the
> huge_fault() handler returns VM_FAULT_FALLBACK on wp faults,
> split the huge pages and page-table entries. Also do this for huge PUDs
> if there is no huge_fault() handler and the vma is not anonymous, similar
> to how it's done for PMDs.

There is no description here of *why* we are making this change?

> Note that fs/dax.c does the splitting in the huge_fault() handler, but as
> huge_fault() is implemented by modules we need to consider whether to
> export the splitting functions for use in the modules or whether to try
> to keep calls in the core. Opt for the latter. A follow-up patch can
> remove the dax.c split_huge_pmd() if needed.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/8] mm: Introduce vma_is_special_huge
  2019-12-03 13:22 ` [PATCH 1/8] mm: Introduce vma_is_special_huge Thomas Hellström (VMware)
@ 2020-03-01  4:04   ` Andrew Morton
  0 siblings, 0 replies; 23+ messages in thread
From: Andrew Morton @ 2020-03-01  4:04 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: linux-mm, linux-kernel, dri-devel, pv-drivers,
	linux-graphics-maintainer, Thomas Hellstrom, Michal Hocko,
	Matthew Wilcox (Oracle),
	Kirill A. Shutemov, Ralph Campbell, Jérôme Glisse,
	Christian König

On Tue,  3 Dec 2019 14:22:32 +0100 Thomas Hellström (VMware) <thomas_os@shipmail.org> wrote:

> From: Thomas Hellstrom <thellstrom@vmware.com>
> 
> For VM_PFNMAP and VM_MIXEDMAP vmas that want to support transhuge pages
> and -page table entries, introduce vma_is_special_huge() that takes the
> same codepaths as vma_is_dax().
> 
> The use of "special" follows the definition in memory.c, vm_normal_page():
> "Special" mappings do not wish to be associated with a "struct page"
> (either it doesn't exist, or it exists but they don't want to touch it)
> 
> For PAGE_SIZE pages, "special" is determined per page table entry to be
> able to deal with COW pages. But since we don't have huge COW pages,
> we can classify a vma as either "special huge" or "normal huge".
> 
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2822,6 +2822,12 @@ extern long copy_huge_page_from_user(struct page *dst_page,
>  				const void __user *usr_src,
>  				unsigned int pages_per_huge_page,
>  				bool allow_pagefault);
> +static inline bool vma_is_special_huge(struct vm_area_struct *vma)
> +{
> +	return vma_is_dax(vma) || (vma->vm_file &&
> +				   (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP)));
> +}

Some documetnation would be nice.  Not only what it does, but why it
does it.  ie, what is the *meaning* of vma_is_spacial_huge(vma)==true?




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/8] Huge page-table entries for TTM
  2019-12-03 13:22 [PATCH 0/8] Huge page-table entries for TTM Thomas Hellström (VMware)
                   ` (7 preceding siblings ...)
  2019-12-03 13:22 ` [PATCH 8/8] drm/vmwgfx: Hook up the helpers to align buffer objects Thomas Hellström (VMware)
@ 2020-03-01  4:04 ` Andrew Morton
  8 siblings, 0 replies; 23+ messages in thread
From: Andrew Morton @ 2020-03-01  4:04 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: linux-mm, linux-kernel, dri-devel, pv-drivers,
	linux-graphics-maintainer, Michal Hocko, Matthew Wilcox (Oracle),
	Kirill A. Shutemov, Ralph Campbell, Jérôme Glisse,
	Christian König

On Tue,  3 Dec 2019 14:22:31 +0100 Thomas Hellström (VMware) <thomas_os@shipmail.org> wrote:

> In order to save TLB space and CPU usage this patchset enables huge- and giant
> page-table entries for TTM and TTM-enabled graphics drivers.

Have these savings been measured?  They shouild be, please.  And
documented in this changelog!


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2020-03-01  4:04 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-03 13:22 [PATCH 0/8] Huge page-table entries for TTM Thomas Hellström (VMware)
2019-12-03 13:22 ` [PATCH 1/8] mm: Introduce vma_is_special_huge Thomas Hellström (VMware)
2020-03-01  4:04   ` Andrew Morton
2019-12-03 13:22 ` [PATCH 2/8] mm: Split huge pages on write-notify or COW Thomas Hellström (VMware)
2020-03-01  4:04   ` Andrew Morton
2019-12-03 13:22 ` [PATCH 3/8] mm: Add vmf_insert_pfn_xxx_prot() for huge page-table entries Thomas Hellström (VMware)
2019-12-03 13:22 ` [PATCH 4/8] drm/ttm, drm/vmwgfx: Support huge TTM pagefaults Thomas Hellström (VMware)
2019-12-03 13:22 ` [PATCH 5/8] drm/vmwgfx: Support huge page faults Thomas Hellström (VMware)
2019-12-03 13:22 ` [PATCH 6/8] drm: Add a drm_get_unmapped_area() helper Thomas Hellström (VMware)
2019-12-04 11:11   ` Christian König
2019-12-04 11:36     ` Thomas Hellström (VMware)
2019-12-04 12:08       ` Christian König
2019-12-04 12:32         ` Thomas Hellström (VMware)
2019-12-04 14:40           ` Christian König
2019-12-04 15:36             ` Thomas Hellström (VMware)
2019-12-03 13:22 ` [PATCH 7/8] drm/ttm: Introduce a huge page aligning TTM range manager Thomas Hellström (VMware)
2019-12-04 11:13   ` Christian König
2019-12-04 11:45     ` Thomas Hellström (VMware)
2019-12-04 12:16       ` Christian König
2019-12-04 13:18         ` Thomas Hellström (VMware)
2019-12-04 14:02           ` Christian König
2019-12-03 13:22 ` [PATCH 8/8] drm/vmwgfx: Hook up the helpers to align buffer objects Thomas Hellström (VMware)
2020-03-01  4:04 ` [PATCH 0/8] Huge page-table entries for TTM Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).