All of lore.kernel.org
 help / color / mirror / Atom feed
* ZONE_DEVICE and pmem API support for powerpc
@ 2017-04-11 17:42 ` Oliver O'Halloran
  0 siblings, 0 replies; 65+ messages in thread
From: Oliver O'Halloran @ 2017-04-11 17:42 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: bsingharora, arbab, linux-nvdimm

Hi all,

This series adds support for ZONE_DEVICE and the pmem api on powerpc. Namely,
support for altmaps and the various bits and pieces required for DAX PMD faults.
The first two patches touch generic mm/ code, but otherwise this is fairly well
contained in arch/powerpc.

If the nvdimm folks could sanity check this series I'd appreciate it.

Series is based on next-20170411, but it should apply elsewhere with minor
fixups to arch_{add|remove}_memory due to conflicts with HMM.  For those
interested in testing this, there is a driver and matching firmware that carves
out some system memory for use as an emulated Con Tutto memory card.

Driver: https://github.com/oohal/linux/tree/contutto-next
Firmware: https://github.com/oohal/skiboot/tree/fake-contutto

Edit core/init.c:686 to control the amount of memory borrowed for the emulated
device.  I'm keeping the driver out of tree for a until 4.13 since I plan on
reworking the firmware interface anyway and There's at least one showstopper
bug.


Thanks,
Oliver

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 65+ messages in thread

* ZONE_DEVICE and pmem API support for powerpc
@ 2017-04-11 17:42 ` Oliver O'Halloran
  0 siblings, 0 replies; 65+ messages in thread
From: Oliver O'Halloran @ 2017-04-11 17:42 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: arbab, bsingharora, linux-nvdimm

Hi all,

This series adds support for ZONE_DEVICE and the pmem api on powerpc. Namely,
support for altmaps and the various bits and pieces required for DAX PMD faults.
The first two patches touch generic mm/ code, but otherwise this is fairly well
contained in arch/powerpc.

If the nvdimm folks could sanity check this series I'd appreciate it.

Series is based on next-20170411, but it should apply elsewhere with minor
fixups to arch_{add|remove}_memory due to conflicts with HMM.  For those
interested in testing this, there is a driver and matching firmware that carves
out some system memory for use as an emulated Con Tutto memory card.

Driver: https://github.com/oohal/linux/tree/contutto-next
Firmware: https://github.com/oohal/skiboot/tree/fake-contutto

Edit core/init.c:686 to control the amount of memory borrowed for the emulated
device.  I'm keeping the driver out of tree for a until 4.13 since I plan on
reworking the firmware interface anyway and There's at least one showstopper
bug.


Thanks,
Oliver

^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH 1/9] mm/huge_memory: Use zap_deposited_table() more
  2017-04-11 17:42 ` Oliver O'Halloran
  (?)
@ 2017-04-11 17:42   ` Oliver O'Halloran
  -1 siblings, 0 replies; 65+ messages in thread
From: Oliver O'Halloran @ 2017-04-11 17:42 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-nvdimm, bsingharora, arbab, linux-mm, Aneesh Kumar K.V,
	Kirill A. Shutemov

Depending flags of the PMD being zapped there may or may not be a
deposited pgtable to be freed. In two of the three cases this is open
coded while the third uses the zap_deposited_table() helper. This patch
converts the others to use the helper to clean things up a bit.

Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: linux-mm@kvack.org
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
---
For reference:

void zap_deposited_table(struct mm_struct *mm, pmd_t *pmd)
{
        pgtable_t pgtable;

        pgtable = pgtable_trans_huge_withdraw(mm, pmd);
        pte_free(mm, pgtable);
        atomic_long_dec(&mm->nr_ptes);
}
---
 mm/huge_memory.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index b787c4cfda0e..aa01dd47cc65 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1615,8 +1615,7 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 		if (is_huge_zero_pmd(orig_pmd))
 			tlb_remove_page_size(tlb, pmd_page(orig_pmd), HPAGE_PMD_SIZE);
 	} else if (is_huge_zero_pmd(orig_pmd)) {
-		pte_free(tlb->mm, pgtable_trans_huge_withdraw(tlb->mm, pmd));
-		atomic_long_dec(&tlb->mm->nr_ptes);
+		zap_deposited_table(tlb->mm, pmd);
 		spin_unlock(ptl);
 		tlb_remove_page_size(tlb, pmd_page(orig_pmd), HPAGE_PMD_SIZE);
 	} else {
@@ -1625,10 +1624,7 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 		VM_BUG_ON_PAGE(page_mapcount(page) < 0, page);
 		VM_BUG_ON_PAGE(!PageHead(page), page);
 		if (PageAnon(page)) {
-			pgtable_t pgtable;
-			pgtable = pgtable_trans_huge_withdraw(tlb->mm, pmd);
-			pte_free(tlb->mm, pgtable);
-			atomic_long_dec(&tlb->mm->nr_ptes);
+			zap_deposited_table(tlb->mm, pmd);
 			add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR);
 		} else {
 			if (arch_needs_pgtable_deposit())
-- 
2.9.3

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 1/9] mm/huge_memory: Use zap_deposited_table() more
@ 2017-04-11 17:42   ` Oliver O'Halloran
  0 siblings, 0 replies; 65+ messages in thread
From: Oliver O'Halloran @ 2017-04-11 17:42 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: arbab, bsingharora, linux-nvdimm, Oliver O'Halloran,
	Aneesh Kumar K.V, Kirill A. Shutemov, linux-mm

Depending flags of the PMD being zapped there may or may not be a
deposited pgtable to be freed. In two of the three cases this is open
coded while the third uses the zap_deposited_table() helper. This patch
converts the others to use the helper to clean things up a bit.

Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: linux-mm@kvack.org
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
---
For reference:

void zap_deposited_table(struct mm_struct *mm, pmd_t *pmd)
{
        pgtable_t pgtable;

        pgtable = pgtable_trans_huge_withdraw(mm, pmd);
        pte_free(mm, pgtable);
        atomic_long_dec(&mm->nr_ptes);
}
---
 mm/huge_memory.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index b787c4cfda0e..aa01dd47cc65 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1615,8 +1615,7 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 		if (is_huge_zero_pmd(orig_pmd))
 			tlb_remove_page_size(tlb, pmd_page(orig_pmd), HPAGE_PMD_SIZE);
 	} else if (is_huge_zero_pmd(orig_pmd)) {
-		pte_free(tlb->mm, pgtable_trans_huge_withdraw(tlb->mm, pmd));
-		atomic_long_dec(&tlb->mm->nr_ptes);
+		zap_deposited_table(tlb->mm, pmd);
 		spin_unlock(ptl);
 		tlb_remove_page_size(tlb, pmd_page(orig_pmd), HPAGE_PMD_SIZE);
 	} else {
@@ -1625,10 +1624,7 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 		VM_BUG_ON_PAGE(page_mapcount(page) < 0, page);
 		VM_BUG_ON_PAGE(!PageHead(page), page);
 		if (PageAnon(page)) {
-			pgtable_t pgtable;
-			pgtable = pgtable_trans_huge_withdraw(tlb->mm, pmd);
-			pte_free(tlb->mm, pgtable);
-			atomic_long_dec(&tlb->mm->nr_ptes);
+			zap_deposited_table(tlb->mm, pmd);
 			add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR);
 		} else {
 			if (arch_needs_pgtable_deposit())
-- 
2.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 1/9] mm/huge_memory: Use zap_deposited_table() more
@ 2017-04-11 17:42   ` Oliver O'Halloran
  0 siblings, 0 replies; 65+ messages in thread
From: Oliver O'Halloran @ 2017-04-11 17:42 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: arbab, bsingharora, linux-nvdimm, Oliver O'Halloran,
	Aneesh Kumar K.V, Kirill A. Shutemov, linux-mm

Depending flags of the PMD being zapped there may or may not be a
deposited pgtable to be freed. In two of the three cases this is open
coded while the third uses the zap_deposited_table() helper. This patch
converts the others to use the helper to clean things up a bit.

Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: linux-mm@kvack.org
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
---
For reference:

void zap_deposited_table(struct mm_struct *mm, pmd_t *pmd)
{
        pgtable_t pgtable;

        pgtable = pgtable_trans_huge_withdraw(mm, pmd);
        pte_free(mm, pgtable);
        atomic_long_dec(&mm->nr_ptes);
}
---
 mm/huge_memory.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index b787c4cfda0e..aa01dd47cc65 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1615,8 +1615,7 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 		if (is_huge_zero_pmd(orig_pmd))
 			tlb_remove_page_size(tlb, pmd_page(orig_pmd), HPAGE_PMD_SIZE);
 	} else if (is_huge_zero_pmd(orig_pmd)) {
-		pte_free(tlb->mm, pgtable_trans_huge_withdraw(tlb->mm, pmd));
-		atomic_long_dec(&tlb->mm->nr_ptes);
+		zap_deposited_table(tlb->mm, pmd);
 		spin_unlock(ptl);
 		tlb_remove_page_size(tlb, pmd_page(orig_pmd), HPAGE_PMD_SIZE);
 	} else {
@@ -1625,10 +1624,7 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 		VM_BUG_ON_PAGE(page_mapcount(page) < 0, page);
 		VM_BUG_ON_PAGE(!PageHead(page), page);
 		if (PageAnon(page)) {
-			pgtable_t pgtable;
-			pgtable = pgtable_trans_huge_withdraw(tlb->mm, pmd);
-			pte_free(tlb->mm, pgtable);
-			atomic_long_dec(&tlb->mm->nr_ptes);
+			zap_deposited_table(tlb->mm, pmd);
 			add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR);
 		} else {
 			if (arch_needs_pgtable_deposit())
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 2/9] mm/huge_memory: Deposit a pgtable for DAX PMD faults when required
  2017-04-11 17:42 ` Oliver O'Halloran
  (?)
@ 2017-04-11 17:42   ` Oliver O'Halloran
  -1 siblings, 0 replies; 65+ messages in thread
From: Oliver O'Halloran @ 2017-04-11 17:42 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: linux-nvdimm, bsingharora, arbab, linux-mm, Aneesh Kumar K.V

Although all architectures use a deposited page table for THP on anonymous VMAs
some architectures (s390 and powerpc) require the deposited storage even for
file backed VMAs due to quirks of their MMUs. This patch adds support for
depositing a table in DAX PMD fault handling path for archs that require it.
Other architectures should see no functional changes.

Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: linux-mm@kvack.org
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
---
 mm/huge_memory.c | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index aa01dd47cc65..a84909cf20d3 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -715,7 +715,8 @@ int do_huge_pmd_anonymous_page(struct vm_fault *vmf)
 }
 
 static void insert_pfn_pmd(struct vm_area_struct *vma, unsigned long addr,
-		pmd_t *pmd, pfn_t pfn, pgprot_t prot, bool write)
+		pmd_t *pmd, pfn_t pfn, pgprot_t prot, bool write,
+		pgtable_t pgtable)
 {
 	struct mm_struct *mm = vma->vm_mm;
 	pmd_t entry;
@@ -729,6 +730,12 @@ static void insert_pfn_pmd(struct vm_area_struct *vma, unsigned long addr,
 		entry = pmd_mkyoung(pmd_mkdirty(entry));
 		entry = maybe_pmd_mkwrite(entry, vma);
 	}
+
+	if (pgtable) {
+		pgtable_trans_huge_deposit(mm, pmd, pgtable);
+		atomic_long_inc(&mm->nr_ptes);
+	}
+
 	set_pmd_at(mm, addr, pmd, entry);
 	update_mmu_cache_pmd(vma, addr, pmd);
 	spin_unlock(ptl);
@@ -738,6 +745,7 @@ int vmf_insert_pfn_pmd(struct vm_area_struct *vma, unsigned long addr,
 			pmd_t *pmd, pfn_t pfn, bool write)
 {
 	pgprot_t pgprot = vma->vm_page_prot;
+	pgtable_t pgtable = NULL;
 	/*
 	 * If we had pmd_special, we could avoid all these restrictions,
 	 * but we need to be consistent with PTEs and architectures that
@@ -752,9 +760,15 @@ int vmf_insert_pfn_pmd(struct vm_area_struct *vma, unsigned long addr,
 	if (addr < vma->vm_start || addr >= vma->vm_end)
 		return VM_FAULT_SIGBUS;
 
+	if (arch_needs_pgtable_deposit()) {
+		pgtable = pte_alloc_one(vma->vm_mm, addr);
+		if (!pgtable)
+			return VM_FAULT_OOM;
+	}
+
 	track_pfn_insert(vma, &pgprot, pfn);
 
-	insert_pfn_pmd(vma, addr, pmd, pfn, pgprot, write);
+	insert_pfn_pmd(vma, addr, pmd, pfn, pgprot, write, pgtable);
 	return VM_FAULT_NOPAGE;
 }
 EXPORT_SYMBOL_GPL(vmf_insert_pfn_pmd);
@@ -1611,6 +1625,8 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 			tlb->fullmm);
 	tlb_remove_pmd_tlb_entry(tlb, pmd, addr);
 	if (vma_is_dax(vma)) {
+		if (arch_needs_pgtable_deposit())
+			zap_deposited_table(tlb->mm, pmd);
 		spin_unlock(ptl);
 		if (is_huge_zero_pmd(orig_pmd))
 			tlb_remove_page_size(tlb, pmd_page(orig_pmd), HPAGE_PMD_SIZE);
-- 
2.9.3

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 2/9] mm/huge_memory: Deposit a pgtable for DAX PMD faults when required
@ 2017-04-11 17:42   ` Oliver O'Halloran
  0 siblings, 0 replies; 65+ messages in thread
From: Oliver O'Halloran @ 2017-04-11 17:42 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: arbab, bsingharora, linux-nvdimm, Oliver O'Halloran,
	Aneesh Kumar K.V, linux-mm

Although all architectures use a deposited page table for THP on anonymous VMAs
some architectures (s390 and powerpc) require the deposited storage even for
file backed VMAs due to quirks of their MMUs. This patch adds support for
depositing a table in DAX PMD fault handling path for archs that require it.
Other architectures should see no functional changes.

Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: linux-mm@kvack.org
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
---
 mm/huge_memory.c | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index aa01dd47cc65..a84909cf20d3 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -715,7 +715,8 @@ int do_huge_pmd_anonymous_page(struct vm_fault *vmf)
 }
 
 static void insert_pfn_pmd(struct vm_area_struct *vma, unsigned long addr,
-		pmd_t *pmd, pfn_t pfn, pgprot_t prot, bool write)
+		pmd_t *pmd, pfn_t pfn, pgprot_t prot, bool write,
+		pgtable_t pgtable)
 {
 	struct mm_struct *mm = vma->vm_mm;
 	pmd_t entry;
@@ -729,6 +730,12 @@ static void insert_pfn_pmd(struct vm_area_struct *vma, unsigned long addr,
 		entry = pmd_mkyoung(pmd_mkdirty(entry));
 		entry = maybe_pmd_mkwrite(entry, vma);
 	}
+
+	if (pgtable) {
+		pgtable_trans_huge_deposit(mm, pmd, pgtable);
+		atomic_long_inc(&mm->nr_ptes);
+	}
+
 	set_pmd_at(mm, addr, pmd, entry);
 	update_mmu_cache_pmd(vma, addr, pmd);
 	spin_unlock(ptl);
@@ -738,6 +745,7 @@ int vmf_insert_pfn_pmd(struct vm_area_struct *vma, unsigned long addr,
 			pmd_t *pmd, pfn_t pfn, bool write)
 {
 	pgprot_t pgprot = vma->vm_page_prot;
+	pgtable_t pgtable = NULL;
 	/*
 	 * If we had pmd_special, we could avoid all these restrictions,
 	 * but we need to be consistent with PTEs and architectures that
@@ -752,9 +760,15 @@ int vmf_insert_pfn_pmd(struct vm_area_struct *vma, unsigned long addr,
 	if (addr < vma->vm_start || addr >= vma->vm_end)
 		return VM_FAULT_SIGBUS;
 
+	if (arch_needs_pgtable_deposit()) {
+		pgtable = pte_alloc_one(vma->vm_mm, addr);
+		if (!pgtable)
+			return VM_FAULT_OOM;
+	}
+
 	track_pfn_insert(vma, &pgprot, pfn);
 
-	insert_pfn_pmd(vma, addr, pmd, pfn, pgprot, write);
+	insert_pfn_pmd(vma, addr, pmd, pfn, pgprot, write, pgtable);
 	return VM_FAULT_NOPAGE;
 }
 EXPORT_SYMBOL_GPL(vmf_insert_pfn_pmd);
@@ -1611,6 +1625,8 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 			tlb->fullmm);
 	tlb_remove_pmd_tlb_entry(tlb, pmd, addr);
 	if (vma_is_dax(vma)) {
+		if (arch_needs_pgtable_deposit())
+			zap_deposited_table(tlb->mm, pmd);
 		spin_unlock(ptl);
 		if (is_huge_zero_pmd(orig_pmd))
 			tlb_remove_page_size(tlb, pmd_page(orig_pmd), HPAGE_PMD_SIZE);
-- 
2.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 2/9] mm/huge_memory: Deposit a pgtable for DAX PMD faults when required
@ 2017-04-11 17:42   ` Oliver O'Halloran
  0 siblings, 0 replies; 65+ messages in thread
From: Oliver O'Halloran @ 2017-04-11 17:42 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: arbab, bsingharora, linux-nvdimm, Oliver O'Halloran,
	Aneesh Kumar K.V, linux-mm

Although all architectures use a deposited page table for THP on anonymous VMAs
some architectures (s390 and powerpc) require the deposited storage even for
file backed VMAs due to quirks of their MMUs. This patch adds support for
depositing a table in DAX PMD fault handling path for archs that require it.
Other architectures should see no functional changes.

Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: linux-mm@kvack.org
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
---
 mm/huge_memory.c | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index aa01dd47cc65..a84909cf20d3 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -715,7 +715,8 @@ int do_huge_pmd_anonymous_page(struct vm_fault *vmf)
 }
 
 static void insert_pfn_pmd(struct vm_area_struct *vma, unsigned long addr,
-		pmd_t *pmd, pfn_t pfn, pgprot_t prot, bool write)
+		pmd_t *pmd, pfn_t pfn, pgprot_t prot, bool write,
+		pgtable_t pgtable)
 {
 	struct mm_struct *mm = vma->vm_mm;
 	pmd_t entry;
@@ -729,6 +730,12 @@ static void insert_pfn_pmd(struct vm_area_struct *vma, unsigned long addr,
 		entry = pmd_mkyoung(pmd_mkdirty(entry));
 		entry = maybe_pmd_mkwrite(entry, vma);
 	}
+
+	if (pgtable) {
+		pgtable_trans_huge_deposit(mm, pmd, pgtable);
+		atomic_long_inc(&mm->nr_ptes);
+	}
+
 	set_pmd_at(mm, addr, pmd, entry);
 	update_mmu_cache_pmd(vma, addr, pmd);
 	spin_unlock(ptl);
@@ -738,6 +745,7 @@ int vmf_insert_pfn_pmd(struct vm_area_struct *vma, unsigned long addr,
 			pmd_t *pmd, pfn_t pfn, bool write)
 {
 	pgprot_t pgprot = vma->vm_page_prot;
+	pgtable_t pgtable = NULL;
 	/*
 	 * If we had pmd_special, we could avoid all these restrictions,
 	 * but we need to be consistent with PTEs and architectures that
@@ -752,9 +760,15 @@ int vmf_insert_pfn_pmd(struct vm_area_struct *vma, unsigned long addr,
 	if (addr < vma->vm_start || addr >= vma->vm_end)
 		return VM_FAULT_SIGBUS;
 
+	if (arch_needs_pgtable_deposit()) {
+		pgtable = pte_alloc_one(vma->vm_mm, addr);
+		if (!pgtable)
+			return VM_FAULT_OOM;
+	}
+
 	track_pfn_insert(vma, &pgprot, pfn);
 
-	insert_pfn_pmd(vma, addr, pmd, pfn, pgprot, write);
+	insert_pfn_pmd(vma, addr, pmd, pfn, pgprot, write, pgtable);
 	return VM_FAULT_NOPAGE;
 }
 EXPORT_SYMBOL_GPL(vmf_insert_pfn_pmd);
@@ -1611,6 +1625,8 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 			tlb->fullmm);
 	tlb_remove_pmd_tlb_entry(tlb, pmd, addr);
 	if (vma_is_dax(vma)) {
+		if (arch_needs_pgtable_deposit())
+			zap_deposited_table(tlb->mm, pmd);
 		spin_unlock(ptl);
 		if (is_huge_zero_pmd(orig_pmd))
 			tlb_remove_page_size(tlb, pmd_page(orig_pmd), HPAGE_PMD_SIZE);
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 3/9] powerpc/mm: Add _PAGE_DEVMAP for ppc64.
  2017-04-11 17:42 ` Oliver O'Halloran
@ 2017-04-11 17:42   ` Oliver O'Halloran
  -1 siblings, 0 replies; 65+ messages in thread
From: Oliver O'Halloran @ 2017-04-11 17:42 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: bsingharora, Aneesh Kumar K.V, arbab, linux-nvdimm

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

Add a _PAGE_DEVMAP bit for PTE and DAX PMD entires. PowerPC doesn't
currently support PUD faults so we haven't extended it to the PUD
level.

Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
---
 arch/powerpc/include/asm/book3s/64/pgtable.h | 37 +++++++++++++++++++++++++++-
 1 file changed, 36 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
index fb72ff6b98e6..b5fc6337649e 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -78,6 +78,9 @@
 
 #define _PAGE_SOFT_DIRTY	_RPAGE_SW3 /* software: software dirty tracking */
 #define _PAGE_SPECIAL		_RPAGE_SW2 /* software: special page */
+#define _PAGE_DEVMAP		_RPAGE_SW1
+#define __HAVE_ARCH_PTE_DEVMAP
+
 /*
  * Drivers request for cache inhibited pte mapping using _PAGE_NO_CACHE
  * Instead of fixing all of them, add an alternate define which
@@ -602,6 +605,16 @@ static inline pte_t pte_mkhuge(pte_t pte)
 	return pte;
 }
 
+static inline pte_t pte_mkdevmap(pte_t pte)
+{
+	return __pte(pte_val(pte) | _PAGE_SPECIAL|_PAGE_DEVMAP);
+}
+
+static inline int pte_devmap(pte_t pte)
+{
+	return !!(pte_raw(pte) & cpu_to_be64(_PAGE_DEVMAP));
+}
+
 static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
 {
 	/* FIXME!! check whether this need to be a conditional */
@@ -966,6 +979,9 @@ static inline pte_t *pmdp_ptep(pmd_t *pmd)
 #define pmd_mk_savedwrite(pmd)	pte_pmd(pte_mk_savedwrite(pmd_pte(pmd)))
 #define pmd_clear_savedwrite(pmd)	pte_pmd(pte_clear_savedwrite(pmd_pte(pmd)))
 
+#define pud_pfn(...) (0)
+#define pgd_pfn(...) (0)
+
 #ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY
 #define pmd_soft_dirty(pmd)    pte_soft_dirty(pmd_pte(pmd))
 #define pmd_mksoft_dirty(pmd)  pte_pmd(pte_mksoft_dirty(pmd_pte(pmd)))
@@ -1140,7 +1156,6 @@ static inline int pmd_move_must_withdraw(struct spinlock *new_pmd_ptl,
 	return true;
 }
 
-
 #define arch_needs_pgtable_deposit arch_needs_pgtable_deposit
 static inline bool arch_needs_pgtable_deposit(void)
 {
@@ -1149,6 +1164,26 @@ static inline bool arch_needs_pgtable_deposit(void)
 	return true;
 }
 
+static inline pmd_t pmd_mkdevmap(pmd_t pmd)
+{
+	return pte_pmd(pte_mkdevmap(pmd_pte(pmd)));
+}
+
+static inline int pmd_devmap(pmd_t pmd)
+{
+	return pte_devmap(pmd_pte(pmd));
+}
+
+static inline int pud_devmap(pud_t pud)
+{
+	return 0;
+}
+
+static inline int pgd_devmap(pgd_t pgd)
+{
+	return 0;
+}
+
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 #endif /* __ASSEMBLY__ */
 #endif /* _ASM_POWERPC_BOOK3S_64_PGTABLE_H_ */
-- 
2.9.3

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 3/9] powerpc/mm: Add _PAGE_DEVMAP for ppc64.
@ 2017-04-11 17:42   ` Oliver O'Halloran
  0 siblings, 0 replies; 65+ messages in thread
From: Oliver O'Halloran @ 2017-04-11 17:42 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: arbab, bsingharora, linux-nvdimm, Aneesh Kumar K.V,
	Oliver O'Halloran

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

Add a _PAGE_DEVMAP bit for PTE and DAX PMD entires. PowerPC doesn't
currently support PUD faults so we haven't extended it to the PUD
level.

Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
---
 arch/powerpc/include/asm/book3s/64/pgtable.h | 37 +++++++++++++++++++++++++++-
 1 file changed, 36 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
index fb72ff6b98e6..b5fc6337649e 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -78,6 +78,9 @@
 
 #define _PAGE_SOFT_DIRTY	_RPAGE_SW3 /* software: software dirty tracking */
 #define _PAGE_SPECIAL		_RPAGE_SW2 /* software: special page */
+#define _PAGE_DEVMAP		_RPAGE_SW1
+#define __HAVE_ARCH_PTE_DEVMAP
+
 /*
  * Drivers request for cache inhibited pte mapping using _PAGE_NO_CACHE
  * Instead of fixing all of them, add an alternate define which
@@ -602,6 +605,16 @@ static inline pte_t pte_mkhuge(pte_t pte)
 	return pte;
 }
 
+static inline pte_t pte_mkdevmap(pte_t pte)
+{
+	return __pte(pte_val(pte) | _PAGE_SPECIAL|_PAGE_DEVMAP);
+}
+
+static inline int pte_devmap(pte_t pte)
+{
+	return !!(pte_raw(pte) & cpu_to_be64(_PAGE_DEVMAP));
+}
+
 static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
 {
 	/* FIXME!! check whether this need to be a conditional */
@@ -966,6 +979,9 @@ static inline pte_t *pmdp_ptep(pmd_t *pmd)
 #define pmd_mk_savedwrite(pmd)	pte_pmd(pte_mk_savedwrite(pmd_pte(pmd)))
 #define pmd_clear_savedwrite(pmd)	pte_pmd(pte_clear_savedwrite(pmd_pte(pmd)))
 
+#define pud_pfn(...) (0)
+#define pgd_pfn(...) (0)
+
 #ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY
 #define pmd_soft_dirty(pmd)    pte_soft_dirty(pmd_pte(pmd))
 #define pmd_mksoft_dirty(pmd)  pte_pmd(pte_mksoft_dirty(pmd_pte(pmd)))
@@ -1140,7 +1156,6 @@ static inline int pmd_move_must_withdraw(struct spinlock *new_pmd_ptl,
 	return true;
 }
 
-
 #define arch_needs_pgtable_deposit arch_needs_pgtable_deposit
 static inline bool arch_needs_pgtable_deposit(void)
 {
@@ -1149,6 +1164,26 @@ static inline bool arch_needs_pgtable_deposit(void)
 	return true;
 }
 
+static inline pmd_t pmd_mkdevmap(pmd_t pmd)
+{
+	return pte_pmd(pte_mkdevmap(pmd_pte(pmd)));
+}
+
+static inline int pmd_devmap(pmd_t pmd)
+{
+	return pte_devmap(pmd_pte(pmd));
+}
+
+static inline int pud_devmap(pud_t pud)
+{
+	return 0;
+}
+
+static inline int pgd_devmap(pgd_t pgd)
+{
+	return 0;
+}
+
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 #endif /* __ASSEMBLY__ */
 #endif /* _ASM_POWERPC_BOOK3S_64_PGTABLE_H_ */
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 4/9] powerpc/mm: Reshuffle vmemmap_free()
  2017-04-11 17:42 ` Oliver O'Halloran
@ 2017-04-11 17:42   ` Oliver O'Halloran
  -1 siblings, 0 replies; 65+ messages in thread
From: Oliver O'Halloran @ 2017-04-11 17:42 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: bsingharora, arbab, linux-nvdimm

Removes an indentation level and shuffles some code around to make the
following patch cleaner. No functional changes.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
---
 arch/powerpc/mm/init_64.c | 47 +++++++++++++++++++++++++----------------------
 1 file changed, 25 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index ec84b31c6c86..f8124edb6ffa 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -234,12 +234,15 @@ static unsigned long vmemmap_list_free(unsigned long start)
 void __ref vmemmap_free(unsigned long start, unsigned long end)
 {
 	unsigned long page_size = 1 << mmu_psize_defs[mmu_vmemmap_psize].shift;
+	unsigned long page_order = get_order(page_size);
 
 	start = _ALIGN_DOWN(start, page_size);
 
 	pr_debug("vmemmap_free %lx...%lx\n", start, end);
 
 	for (; start < end; start += page_size) {
+		struct page *page = pfn_to_page(addr >> PAGE_SHIFT);
+		unsigned int nr_pages;
 		unsigned long addr;
 
 		/*
@@ -251,29 +254,29 @@ void __ref vmemmap_free(unsigned long start, unsigned long end)
 			continue;
 
 		addr = vmemmap_list_free(start);
-		if (addr) {
-			struct page *page = pfn_to_page(addr >> PAGE_SHIFT);
-
-			if (PageReserved(page)) {
-				/* allocated from bootmem */
-				if (page_size < PAGE_SIZE) {
-					/*
-					 * this shouldn't happen, but if it is
-					 * the case, leave the memory there
-					 */
-					WARN_ON_ONCE(1);
-				} else {
-					unsigned int nr_pages =
-						1 << get_order(page_size);
-					while (nr_pages--)
-						free_reserved_page(page++);
-				}
-			} else
-				free_pages((unsigned long)(__va(addr)),
-							get_order(page_size));
-
-			vmemmap_remove_mapping(start, page_size);
+		if (!addr)
+			continue;
+
+		page = pfn_to_page(addr >> PAGE_SHIFT);
+		nr_pages = 1 << page_order;
+
+		if (PageReserved(page)) {
+			/* allocated from bootmem */
+			if (page_size < PAGE_SIZE) {
+				/*
+				 * this shouldn't happen, but if it is
+				 * the case, leave the memory there
+				 */
+				WARN_ON_ONCE(1);
+			} else {
+				while (nr_pages--)
+					free_reserved_page(page++);
+			}
+		} else {
+			free_pages((unsigned long)(__va(addr)), page_order);
 		}
+
+		vmemmap_remove_mapping(start, page_size);
 	}
 }
 #endif
-- 
2.9.3

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 4/9] powerpc/mm: Reshuffle vmemmap_free()
@ 2017-04-11 17:42   ` Oliver O'Halloran
  0 siblings, 0 replies; 65+ messages in thread
From: Oliver O'Halloran @ 2017-04-11 17:42 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: arbab, bsingharora, linux-nvdimm, Oliver O'Halloran

Removes an indentation level and shuffles some code around to make the
following patch cleaner. No functional changes.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
---
 arch/powerpc/mm/init_64.c | 47 +++++++++++++++++++++++++----------------------
 1 file changed, 25 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index ec84b31c6c86..f8124edb6ffa 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -234,12 +234,15 @@ static unsigned long vmemmap_list_free(unsigned long start)
 void __ref vmemmap_free(unsigned long start, unsigned long end)
 {
 	unsigned long page_size = 1 << mmu_psize_defs[mmu_vmemmap_psize].shift;
+	unsigned long page_order = get_order(page_size);
 
 	start = _ALIGN_DOWN(start, page_size);
 
 	pr_debug("vmemmap_free %lx...%lx\n", start, end);
 
 	for (; start < end; start += page_size) {
+		struct page *page = pfn_to_page(addr >> PAGE_SHIFT);
+		unsigned int nr_pages;
 		unsigned long addr;
 
 		/*
@@ -251,29 +254,29 @@ void __ref vmemmap_free(unsigned long start, unsigned long end)
 			continue;
 
 		addr = vmemmap_list_free(start);
-		if (addr) {
-			struct page *page = pfn_to_page(addr >> PAGE_SHIFT);
-
-			if (PageReserved(page)) {
-				/* allocated from bootmem */
-				if (page_size < PAGE_SIZE) {
-					/*
-					 * this shouldn't happen, but if it is
-					 * the case, leave the memory there
-					 */
-					WARN_ON_ONCE(1);
-				} else {
-					unsigned int nr_pages =
-						1 << get_order(page_size);
-					while (nr_pages--)
-						free_reserved_page(page++);
-				}
-			} else
-				free_pages((unsigned long)(__va(addr)),
-							get_order(page_size));
-
-			vmemmap_remove_mapping(start, page_size);
+		if (!addr)
+			continue;
+
+		page = pfn_to_page(addr >> PAGE_SHIFT);
+		nr_pages = 1 << page_order;
+
+		if (PageReserved(page)) {
+			/* allocated from bootmem */
+			if (page_size < PAGE_SIZE) {
+				/*
+				 * this shouldn't happen, but if it is
+				 * the case, leave the memory there
+				 */
+				WARN_ON_ONCE(1);
+			} else {
+				while (nr_pages--)
+					free_reserved_page(page++);
+			}
+		} else {
+			free_pages((unsigned long)(__va(addr)), page_order);
 		}
+
+		vmemmap_remove_mapping(start, page_size);
 	}
 }
 #endif
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 5/9] powerpc/vmemmap: Add altmap support
  2017-04-11 17:42 ` Oliver O'Halloran
@ 2017-04-11 17:42   ` Oliver O'Halloran
  -1 siblings, 0 replies; 65+ messages in thread
From: Oliver O'Halloran @ 2017-04-11 17:42 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: bsingharora, arbab, linux-nvdimm

Adds support to powerpc for the altmap feature of ZONE_DEVICE memory. An
altmap is a driver provided region that is used to provide the backing
storage for the struct pages of ZONE_DEVICE memory. In situations where
large amount of ZONE_DEVICE memory is being added to the system the
altmap reduces pressure on main system memory by allowing the mm/
metadata to be stored on the device itself rather in main memory.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
---
 arch/powerpc/mm/init_64.c | 20 +++++++++++++++-----
 arch/powerpc/mm/mem.c     | 16 +++++++++++++---
 2 files changed, 28 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index f8124edb6ffa..225fbb8034e6 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -44,6 +44,7 @@
 #include <linux/slab.h>
 #include <linux/of_fdt.h>
 #include <linux/libfdt.h>
+#include <linux/memremap.h>
 
 #include <asm/pgalloc.h>
 #include <asm/page.h>
@@ -171,13 +172,17 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node)
 	pr_debug("vmemmap_populate %lx..%lx, node %d\n", start, end, node);
 
 	for (; start < end; start += page_size) {
+		struct vmem_altmap *altmap;
 		void *p;
 		int rc;
 
 		if (vmemmap_populated(start, page_size))
 			continue;
 
-		p = vmemmap_alloc_block(page_size, node);
+		/* altmap lookups only work at section boundaries */
+		altmap = to_vmem_altmap(SECTION_ALIGN_DOWN(start));
+
+		p =  __vmemmap_alloc_block_buf(page_size, node, altmap);
 		if (!p)
 			return -ENOMEM;
 
@@ -241,9 +246,10 @@ void __ref vmemmap_free(unsigned long start, unsigned long end)
 	pr_debug("vmemmap_free %lx...%lx\n", start, end);
 
 	for (; start < end; start += page_size) {
-		struct page *page = pfn_to_page(addr >> PAGE_SHIFT);
-		unsigned int nr_pages;
-		unsigned long addr;
+		unsigned long nr_pages, addr;
+		struct vmem_altmap *altmap;
+		struct page *section_base;
+		struct page *page;
 
 		/*
 		 * the section has already be marked as invalid, so
@@ -258,9 +264,13 @@ void __ref vmemmap_free(unsigned long start, unsigned long end)
 			continue;
 
 		page = pfn_to_page(addr >> PAGE_SHIFT);
+		section_base = pfn_to_page(vmemmap_section_start(start));
 		nr_pages = 1 << page_order;
 
-		if (PageReserved(page)) {
+		altmap = to_vmem_altmap((unsigned long) section_base);
+		if (altmap) {
+			vmem_altmap_free(altmap, nr_pages);
+		} else if (PageReserved(page)) {
 			/* allocated from bootmem */
 			if (page_size < PAGE_SIZE) {
 				/*
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index 3bbba178b464..6f7b64eaa9d8 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -36,6 +36,7 @@
 #include <linux/hugetlb.h>
 #include <linux/slab.h>
 #include <linux/vmalloc.h>
+#include <linux/memremap.h>
 
 #include <asm/pgalloc.h>
 #include <asm/prom.h>
@@ -176,7 +177,8 @@ int arch_remove_memory(u64 start, u64 size, enum memory_type type)
 {
 	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long nr_pages = size >> PAGE_SHIFT;
-	struct zone *zone;
+	struct vmem_altmap *altmap;
+	struct page *page;
 	int ret;
 
 	/*
@@ -193,8 +195,16 @@ int arch_remove_memory(u64 start, u64 size, enum memory_type type)
 		return -EINVAL;
 	}
 
-	zone = page_zone(pfn_to_page(start_pfn));
-	ret = __remove_pages(zone, start_pfn, nr_pages);
+	/*
+	 * If we have an altmap then we need to skip over any reserved PFNs
+	 * when querying the zone.
+	 */
+	page = pfn_to_page(start_pfn);
+	altmap = to_vmem_altmap((unsigned long) page);
+	if (altmap)
+		page += vmem_altmap_offset(altmap);
+
+	ret = __remove_pages(page_zone(page), start_pfn, nr_pages);
 	if (ret)
 		return ret;
 
-- 
2.9.3

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 5/9] powerpc/vmemmap: Add altmap support
@ 2017-04-11 17:42   ` Oliver O'Halloran
  0 siblings, 0 replies; 65+ messages in thread
From: Oliver O'Halloran @ 2017-04-11 17:42 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: arbab, bsingharora, linux-nvdimm, Oliver O'Halloran

Adds support to powerpc for the altmap feature of ZONE_DEVICE memory. An
altmap is a driver provided region that is used to provide the backing
storage for the struct pages of ZONE_DEVICE memory. In situations where
large amount of ZONE_DEVICE memory is being added to the system the
altmap reduces pressure on main system memory by allowing the mm/
metadata to be stored on the device itself rather in main memory.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
---
 arch/powerpc/mm/init_64.c | 20 +++++++++++++++-----
 arch/powerpc/mm/mem.c     | 16 +++++++++++++---
 2 files changed, 28 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index f8124edb6ffa..225fbb8034e6 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -44,6 +44,7 @@
 #include <linux/slab.h>
 #include <linux/of_fdt.h>
 #include <linux/libfdt.h>
+#include <linux/memremap.h>
 
 #include <asm/pgalloc.h>
 #include <asm/page.h>
@@ -171,13 +172,17 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node)
 	pr_debug("vmemmap_populate %lx..%lx, node %d\n", start, end, node);
 
 	for (; start < end; start += page_size) {
+		struct vmem_altmap *altmap;
 		void *p;
 		int rc;
 
 		if (vmemmap_populated(start, page_size))
 			continue;
 
-		p = vmemmap_alloc_block(page_size, node);
+		/* altmap lookups only work at section boundaries */
+		altmap = to_vmem_altmap(SECTION_ALIGN_DOWN(start));
+
+		p =  __vmemmap_alloc_block_buf(page_size, node, altmap);
 		if (!p)
 			return -ENOMEM;
 
@@ -241,9 +246,10 @@ void __ref vmemmap_free(unsigned long start, unsigned long end)
 	pr_debug("vmemmap_free %lx...%lx\n", start, end);
 
 	for (; start < end; start += page_size) {
-		struct page *page = pfn_to_page(addr >> PAGE_SHIFT);
-		unsigned int nr_pages;
-		unsigned long addr;
+		unsigned long nr_pages, addr;
+		struct vmem_altmap *altmap;
+		struct page *section_base;
+		struct page *page;
 
 		/*
 		 * the section has already be marked as invalid, so
@@ -258,9 +264,13 @@ void __ref vmemmap_free(unsigned long start, unsigned long end)
 			continue;
 
 		page = pfn_to_page(addr >> PAGE_SHIFT);
+		section_base = pfn_to_page(vmemmap_section_start(start));
 		nr_pages = 1 << page_order;
 
-		if (PageReserved(page)) {
+		altmap = to_vmem_altmap((unsigned long) section_base);
+		if (altmap) {
+			vmem_altmap_free(altmap, nr_pages);
+		} else if (PageReserved(page)) {
 			/* allocated from bootmem */
 			if (page_size < PAGE_SIZE) {
 				/*
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index 3bbba178b464..6f7b64eaa9d8 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -36,6 +36,7 @@
 #include <linux/hugetlb.h>
 #include <linux/slab.h>
 #include <linux/vmalloc.h>
+#include <linux/memremap.h>
 
 #include <asm/pgalloc.h>
 #include <asm/prom.h>
@@ -176,7 +177,8 @@ int arch_remove_memory(u64 start, u64 size, enum memory_type type)
 {
 	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long nr_pages = size >> PAGE_SHIFT;
-	struct zone *zone;
+	struct vmem_altmap *altmap;
+	struct page *page;
 	int ret;
 
 	/*
@@ -193,8 +195,16 @@ int arch_remove_memory(u64 start, u64 size, enum memory_type type)
 		return -EINVAL;
 	}
 
-	zone = page_zone(pfn_to_page(start_pfn));
-	ret = __remove_pages(zone, start_pfn, nr_pages);
+	/*
+	 * If we have an altmap then we need to skip over any reserved PFNs
+	 * when querying the zone.
+	 */
+	page = pfn_to_page(start_pfn);
+	altmap = to_vmem_altmap((unsigned long) page);
+	if (altmap)
+		page += vmem_altmap_offset(altmap);
+
+	ret = __remove_pages(page_zone(page), start_pfn, nr_pages);
 	if (ret)
 		return ret;
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 6/9] powerpc, mm: Enable ZONE_DEVICE on powerpc
  2017-04-11 17:42 ` Oliver O'Halloran
@ 2017-04-11 17:42   ` Oliver O'Halloran
  -1 siblings, 0 replies; 65+ messages in thread
From: Oliver O'Halloran @ 2017-04-11 17:42 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: bsingharora, arbab, linux-nvdimm

Flip the switch. Running around and screaming "IT'S ALIVE" is optional,
but recommended.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
---
 mm/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/Kconfig b/mm/Kconfig
index 43d000e44424..d696af58f97f 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -724,7 +724,7 @@ config ZONE_DEVICE
 	depends on MEMORY_HOTPLUG
 	depends on MEMORY_HOTREMOVE
 	depends on SPARSEMEM_VMEMMAP
-	depends on X86_64 #arch_add_memory() comprehends device memory
+	depends on (X86_64 || PPC_BOOK3S_64)  #arch_add_memory() comprehends device memory
 
 	help
 	  Device memory hotplug support allows for establishing pmem,
-- 
2.9.3

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 6/9] powerpc, mm: Enable ZONE_DEVICE on powerpc
@ 2017-04-11 17:42   ` Oliver O'Halloran
  0 siblings, 0 replies; 65+ messages in thread
From: Oliver O'Halloran @ 2017-04-11 17:42 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: arbab, bsingharora, linux-nvdimm, Oliver O'Halloran

Flip the switch. Running around and screaming "IT'S ALIVE" is optional,
but recommended.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
---
 mm/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/Kconfig b/mm/Kconfig
index 43d000e44424..d696af58f97f 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -724,7 +724,7 @@ config ZONE_DEVICE
 	depends on MEMORY_HOTPLUG
 	depends on MEMORY_HOTREMOVE
 	depends on SPARSEMEM_VMEMMAP
-	depends on X86_64 #arch_add_memory() comprehends device memory
+	depends on (X86_64 || PPC_BOOK3S_64)  #arch_add_memory() comprehends device memory
 
 	help
 	  Device memory hotplug support allows for establishing pmem,
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 7/9] powerpc/mm: Wire up ioremap_cache
  2017-04-11 17:42 ` Oliver O'Halloran
@ 2017-04-11 17:42   ` Oliver O'Halloran
  -1 siblings, 0 replies; 65+ messages in thread
From: Oliver O'Halloran @ 2017-04-11 17:42 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: bsingharora, arbab, linux-nvdimm

The default implementation of ioremap_cache() is aliased to ioremap().
On powerpc ioremap() creates cache-inhibited mappings by default which
is almost certainly not what you wanted.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
---
 arch/powerpc/include/asm/io.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/include/asm/io.h b/arch/powerpc/include/asm/io.h
index 5ed292431b5b..839eb031857f 100644
--- a/arch/powerpc/include/asm/io.h
+++ b/arch/powerpc/include/asm/io.h
@@ -757,6 +757,8 @@ extern void __iomem *ioremap_prot(phys_addr_t address, unsigned long size,
 extern void __iomem *ioremap_wc(phys_addr_t address, unsigned long size);
 #define ioremap_nocache(addr, size)	ioremap((addr), (size))
 #define ioremap_uc(addr, size)		ioremap((addr), (size))
+#define ioremap_cache(addr, size) \
+	ioremap_prot((addr), (size), pgprot_val(PAGE_KERNEL))
 
 extern void iounmap(volatile void __iomem *addr);
 
-- 
2.9.3

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 7/9] powerpc/mm: Wire up ioremap_cache
@ 2017-04-11 17:42   ` Oliver O'Halloran
  0 siblings, 0 replies; 65+ messages in thread
From: Oliver O'Halloran @ 2017-04-11 17:42 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: arbab, bsingharora, linux-nvdimm, Oliver O'Halloran

The default implementation of ioremap_cache() is aliased to ioremap().
On powerpc ioremap() creates cache-inhibited mappings by default which
is almost certainly not what you wanted.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
---
 arch/powerpc/include/asm/io.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/include/asm/io.h b/arch/powerpc/include/asm/io.h
index 5ed292431b5b..839eb031857f 100644
--- a/arch/powerpc/include/asm/io.h
+++ b/arch/powerpc/include/asm/io.h
@@ -757,6 +757,8 @@ extern void __iomem *ioremap_prot(phys_addr_t address, unsigned long size,
 extern void __iomem *ioremap_wc(phys_addr_t address, unsigned long size);
 #define ioremap_nocache(addr, size)	ioremap((addr), (size))
 #define ioremap_uc(addr, size)		ioremap((addr), (size))
+#define ioremap_cache(addr, size) \
+	ioremap_prot((addr), (size), pgprot_val(PAGE_KERNEL))
 
 extern void iounmap(volatile void __iomem *addr);
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 8/9] powerpc/mm: Wire up hpte_removebolted for powernv
  2017-04-11 17:42 ` Oliver O'Halloran
@ 2017-04-11 17:42   ` Oliver O'Halloran
  -1 siblings, 0 replies; 65+ messages in thread
From: Oliver O'Halloran @ 2017-04-11 17:42 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: linux-nvdimm, bsingharora, arbab, Anton Blanchard, Rashmica Gupta

From: Rashmica Gupta <rashmica.g@gmail.com>

Adds support for removing bolted (i.e kernel linear mapping) mappings on
powernv. This is needed to support memory hot unplug operations which
are required for the teardown of DAX/PMEM devices.

Cc: Rashmica Gupta <rashmica.g@gmail.com>
Cc: Anton Blanchard <anton@samba.org>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
---
Could the original author of this add their S-o-b? I pulled it out of
Rashmica's memtrace patch, but I remember someone saying Anton wrote
it originally.
---
 arch/powerpc/mm/hash_native_64.c | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/arch/powerpc/mm/hash_native_64.c b/arch/powerpc/mm/hash_native_64.c
index 65bb8f33b399..9ba91d4905a4 100644
--- a/arch/powerpc/mm/hash_native_64.c
+++ b/arch/powerpc/mm/hash_native_64.c
@@ -407,6 +407,36 @@ static void native_hpte_updateboltedpp(unsigned long newpp, unsigned long ea,
 	tlbie(vpn, psize, psize, ssize, 0);
 }
 
+/*
+ * Remove a bolted kernel entry. Memory hotplug uses this.
+ *
+ * No need to lock here because we should be the only user.
+ */
+static int native_hpte_removebolted(unsigned long ea, int psize, int ssize)
+{
+	unsigned long vpn;
+	unsigned long vsid;
+	long slot;
+	struct hash_pte *hptep;
+
+	vsid = get_kernel_vsid(ea, ssize);
+	vpn = hpt_vpn(ea, vsid, ssize);
+
+	slot = native_hpte_find(vpn, psize, ssize);
+	if (slot == -1)
+		return -ENOENT;
+
+	hptep = htab_address + slot;
+
+	/* Invalidate the hpte */
+	hptep->v = 0;
+
+	/* Invalidate the TLB */
+	tlbie(vpn, psize, psize, ssize, 0);
+	return 0;
+}
+
+
 static void native_hpte_invalidate(unsigned long slot, unsigned long vpn,
 				   int bpsize, int apsize, int ssize, int local)
 {
@@ -725,6 +755,7 @@ void __init hpte_init_native(void)
 	mmu_hash_ops.hpte_invalidate	= native_hpte_invalidate;
 	mmu_hash_ops.hpte_updatepp	= native_hpte_updatepp;
 	mmu_hash_ops.hpte_updateboltedpp = native_hpte_updateboltedpp;
+	mmu_hash_ops.hpte_removebolted = native_hpte_removebolted;
 	mmu_hash_ops.hpte_insert	= native_hpte_insert;
 	mmu_hash_ops.hpte_remove	= native_hpte_remove;
 	mmu_hash_ops.hpte_clear_all	= native_hpte_clear;
-- 
2.9.3

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 8/9] powerpc/mm: Wire up hpte_removebolted for powernv
@ 2017-04-11 17:42   ` Oliver O'Halloran
  0 siblings, 0 replies; 65+ messages in thread
From: Oliver O'Halloran @ 2017-04-11 17:42 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: arbab, bsingharora, linux-nvdimm, Rashmica Gupta,
	Anton Blanchard, Oliver O'Halloran

From: Rashmica Gupta <rashmica.g@gmail.com>

Adds support for removing bolted (i.e kernel linear mapping) mappings on
powernv. This is needed to support memory hot unplug operations which
are required for the teardown of DAX/PMEM devices.

Cc: Rashmica Gupta <rashmica.g@gmail.com>
Cc: Anton Blanchard <anton@samba.org>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
---
Could the original author of this add their S-o-b? I pulled it out of
Rashmica's memtrace patch, but I remember someone saying Anton wrote
it originally.
---
 arch/powerpc/mm/hash_native_64.c | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/arch/powerpc/mm/hash_native_64.c b/arch/powerpc/mm/hash_native_64.c
index 65bb8f33b399..9ba91d4905a4 100644
--- a/arch/powerpc/mm/hash_native_64.c
+++ b/arch/powerpc/mm/hash_native_64.c
@@ -407,6 +407,36 @@ static void native_hpte_updateboltedpp(unsigned long newpp, unsigned long ea,
 	tlbie(vpn, psize, psize, ssize, 0);
 }
 
+/*
+ * Remove a bolted kernel entry. Memory hotplug uses this.
+ *
+ * No need to lock here because we should be the only user.
+ */
+static int native_hpte_removebolted(unsigned long ea, int psize, int ssize)
+{
+	unsigned long vpn;
+	unsigned long vsid;
+	long slot;
+	struct hash_pte *hptep;
+
+	vsid = get_kernel_vsid(ea, ssize);
+	vpn = hpt_vpn(ea, vsid, ssize);
+
+	slot = native_hpte_find(vpn, psize, ssize);
+	if (slot == -1)
+		return -ENOENT;
+
+	hptep = htab_address + slot;
+
+	/* Invalidate the hpte */
+	hptep->v = 0;
+
+	/* Invalidate the TLB */
+	tlbie(vpn, psize, psize, ssize, 0);
+	return 0;
+}
+
+
 static void native_hpte_invalidate(unsigned long slot, unsigned long vpn,
 				   int bpsize, int apsize, int ssize, int local)
 {
@@ -725,6 +755,7 @@ void __init hpte_init_native(void)
 	mmu_hash_ops.hpte_invalidate	= native_hpte_invalidate;
 	mmu_hash_ops.hpte_updatepp	= native_hpte_updatepp;
 	mmu_hash_ops.hpte_updateboltedpp = native_hpte_updateboltedpp;
+	mmu_hash_ops.hpte_removebolted = native_hpte_removebolted;
 	mmu_hash_ops.hpte_insert	= native_hpte_insert;
 	mmu_hash_ops.hpte_remove	= native_hpte_remove;
 	mmu_hash_ops.hpte_clear_all	= native_hpte_clear;
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 9/9] powerpc: Add pmem API support
  2017-04-11 17:42 ` Oliver O'Halloran
@ 2017-04-11 17:42   ` Oliver O'Halloran
  -1 siblings, 0 replies; 65+ messages in thread
From: Oliver O'Halloran @ 2017-04-11 17:42 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: bsingharora, arbab, linux-nvdimm

Initial powerpc support for the arch-specific bit of the persistent
memory API. Nothing fancy here.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
---
 arch/powerpc/Kconfig            |   1 +
 arch/powerpc/include/asm/pmem.h | 109 ++++++++++++++++++++++++++++++++++++++++
 arch/powerpc/kernel/misc_64.S   |   2 +-
 3 files changed, 111 insertions(+), 1 deletion(-)
 create mode 100644 arch/powerpc/include/asm/pmem.h

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index d7413ed700b8..cf84d0db49ab 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -87,6 +87,7 @@ config PPC
 	select ARCH_HAS_DMA_SET_COHERENT_MASK
 	select ARCH_HAS_ELF_RANDOMIZE
 	select ARCH_HAS_GCOV_PROFILE_ALL
+	select ARCH_HAS_PMEM_API
 	select ARCH_HAS_SCALED_CPUTIME		if VIRT_CPU_ACCOUNTING_NATIVE
 	select ARCH_HAS_SG_CHAIN
 	select ARCH_HAS_TICK_BROADCAST		if GENERIC_CLOCKEVENTS_BROADCAST
diff --git a/arch/powerpc/include/asm/pmem.h b/arch/powerpc/include/asm/pmem.h
new file mode 100644
index 000000000000..27da9594040f
--- /dev/null
+++ b/arch/powerpc/include/asm/pmem.h
@@ -0,0 +1,109 @@
+/*
+ * Copyright(c) 2017 IBM Corporation. All rights reserved.
+ *
+ * Based on the x86 version.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ */
+#ifndef __ASM_POWERPC_PMEM_H__
+#define __ASM_POWERPC_PMEM_H__
+
+#include <linux/uio.h>
+#include <linux/uaccess.h>
+#include <asm/cacheflush.h>
+
+/*
+ * See include/linux/pmem.h for API documentation
+ *
+ * PPC specific notes:
+ *
+ * 1. PPC has no non-temporal (cache bypassing) stores so we're stuck with
+ *    doing cache writebacks.
+ *
+ * 2. DCBST is a suggestion. DCBF *will* force a writeback.
+ *
+ */
+
+static inline void arch_wb_cache_pmem(void *addr, size_t size)
+{
+	unsigned long iaddr = (unsigned long) addr;
+
+	/* NB: contains a barrier */
+	flush_inval_dcache_range(iaddr, iaddr + size);
+}
+
+/* invalidate and writeback are functionally identical */
+#define arch_invalidate_pmem arch_wb_cache_pmem
+
+static inline void arch_memcpy_to_pmem(void *dst, const void *src, size_t n)
+{
+	int unwritten;
+
+	/*
+	 * We are copying between two kernel buffers, if
+	 * __copy_from_user_inatomic_nocache() returns an error (page
+	 * fault) we would have already reported a general protection fault
+	 * before the WARN+BUG.
+	 *
+	 * XXX: replace this with a hand-rolled memcpy+dcbf
+	 */
+	unwritten = __copy_from_user_inatomic(dst, (void __user *) src, n);
+	if (WARN(unwritten, "%s: fault copying %p <- %p unwritten: %d\n",
+				__func__, dst, src, unwritten))
+		BUG();
+
+	arch_wb_cache_pmem(dst, n);
+}
+
+static inline int arch_memcpy_from_pmem(void *dst, const void *src, size_t n)
+{
+	/*
+	 * TODO: We should have most of the infrastructure for MCE handling
+	 *       but it needs to be made slightly smarter.
+	 */
+	memcpy(dst, src, n);
+	return 0;
+}
+
+static inline size_t arch_copy_from_iter_pmem(void *addr, size_t bytes,
+		struct iov_iter *i)
+{
+	size_t len;
+
+	/* XXX: under what conditions would this return len < size? */
+	len = copy_from_iter(addr, bytes, i);
+	arch_wb_cache_pmem(addr, bytes - len);
+
+	return len;
+}
+
+static inline void arch_clear_pmem(void *addr, size_t size)
+{
+	void *start = addr;
+
+	/*
+	 * XXX: A hand rolled dcbz+dcbf loop would probably be better.
+	 */
+
+	if (((uintptr_t) addr & ~PAGE_MASK) == 0) {
+		while (size >= PAGE_SIZE) {
+			clear_page(addr);
+			addr += PAGE_SIZE;
+			size -= PAGE_SIZE;
+		}
+	}
+
+	if (size)
+		memset(addr, 0, size);
+
+	arch_wb_cache_pmem(start, size);
+}
+
+#endif /* __ASM_POWERPC_PMEM_H__ */
diff --git a/arch/powerpc/kernel/misc_64.S b/arch/powerpc/kernel/misc_64.S
index c119044cad0d..1378a8d61faf 100644
--- a/arch/powerpc/kernel/misc_64.S
+++ b/arch/powerpc/kernel/misc_64.S
@@ -182,7 +182,7 @@ _GLOBAL(flush_dcache_phys_range)
 	isync
 	blr
 
-_GLOBAL(flush_inval_dcache_range)
+_GLOBAL_TOC(flush_inval_dcache_range)
  	ld	r10,PPC64_CACHES@toc(r2)
 	lwz	r7,DCACHEL1BLOCKSIZE(r10)	/* Get dcache block size */
 	addi	r5,r7,-1
-- 
2.9.3

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 9/9] powerpc: Add pmem API support
@ 2017-04-11 17:42   ` Oliver O'Halloran
  0 siblings, 0 replies; 65+ messages in thread
From: Oliver O'Halloran @ 2017-04-11 17:42 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: arbab, bsingharora, linux-nvdimm, Oliver O'Halloran

Initial powerpc support for the arch-specific bit of the persistent
memory API. Nothing fancy here.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
---
 arch/powerpc/Kconfig            |   1 +
 arch/powerpc/include/asm/pmem.h | 109 ++++++++++++++++++++++++++++++++++++++++
 arch/powerpc/kernel/misc_64.S   |   2 +-
 3 files changed, 111 insertions(+), 1 deletion(-)
 create mode 100644 arch/powerpc/include/asm/pmem.h

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index d7413ed700b8..cf84d0db49ab 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -87,6 +87,7 @@ config PPC
 	select ARCH_HAS_DMA_SET_COHERENT_MASK
 	select ARCH_HAS_ELF_RANDOMIZE
 	select ARCH_HAS_GCOV_PROFILE_ALL
+	select ARCH_HAS_PMEM_API
 	select ARCH_HAS_SCALED_CPUTIME		if VIRT_CPU_ACCOUNTING_NATIVE
 	select ARCH_HAS_SG_CHAIN
 	select ARCH_HAS_TICK_BROADCAST		if GENERIC_CLOCKEVENTS_BROADCAST
diff --git a/arch/powerpc/include/asm/pmem.h b/arch/powerpc/include/asm/pmem.h
new file mode 100644
index 000000000000..27da9594040f
--- /dev/null
+++ b/arch/powerpc/include/asm/pmem.h
@@ -0,0 +1,109 @@
+/*
+ * Copyright(c) 2017 IBM Corporation. All rights reserved.
+ *
+ * Based on the x86 version.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ */
+#ifndef __ASM_POWERPC_PMEM_H__
+#define __ASM_POWERPC_PMEM_H__
+
+#include <linux/uio.h>
+#include <linux/uaccess.h>
+#include <asm/cacheflush.h>
+
+/*
+ * See include/linux/pmem.h for API documentation
+ *
+ * PPC specific notes:
+ *
+ * 1. PPC has no non-temporal (cache bypassing) stores so we're stuck with
+ *    doing cache writebacks.
+ *
+ * 2. DCBST is a suggestion. DCBF *will* force a writeback.
+ *
+ */
+
+static inline void arch_wb_cache_pmem(void *addr, size_t size)
+{
+	unsigned long iaddr = (unsigned long) addr;
+
+	/* NB: contains a barrier */
+	flush_inval_dcache_range(iaddr, iaddr + size);
+}
+
+/* invalidate and writeback are functionally identical */
+#define arch_invalidate_pmem arch_wb_cache_pmem
+
+static inline void arch_memcpy_to_pmem(void *dst, const void *src, size_t n)
+{
+	int unwritten;
+
+	/*
+	 * We are copying between two kernel buffers, if
+	 * __copy_from_user_inatomic_nocache() returns an error (page
+	 * fault) we would have already reported a general protection fault
+	 * before the WARN+BUG.
+	 *
+	 * XXX: replace this with a hand-rolled memcpy+dcbf
+	 */
+	unwritten = __copy_from_user_inatomic(dst, (void __user *) src, n);
+	if (WARN(unwritten, "%s: fault copying %p <- %p unwritten: %d\n",
+				__func__, dst, src, unwritten))
+		BUG();
+
+	arch_wb_cache_pmem(dst, n);
+}
+
+static inline int arch_memcpy_from_pmem(void *dst, const void *src, size_t n)
+{
+	/*
+	 * TODO: We should have most of the infrastructure for MCE handling
+	 *       but it needs to be made slightly smarter.
+	 */
+	memcpy(dst, src, n);
+	return 0;
+}
+
+static inline size_t arch_copy_from_iter_pmem(void *addr, size_t bytes,
+		struct iov_iter *i)
+{
+	size_t len;
+
+	/* XXX: under what conditions would this return len < size? */
+	len = copy_from_iter(addr, bytes, i);
+	arch_wb_cache_pmem(addr, bytes - len);
+
+	return len;
+}
+
+static inline void arch_clear_pmem(void *addr, size_t size)
+{
+	void *start = addr;
+
+	/*
+	 * XXX: A hand rolled dcbz+dcbf loop would probably be better.
+	 */
+
+	if (((uintptr_t) addr & ~PAGE_MASK) == 0) {
+		while (size >= PAGE_SIZE) {
+			clear_page(addr);
+			addr += PAGE_SIZE;
+			size -= PAGE_SIZE;
+		}
+	}
+
+	if (size)
+		memset(addr, 0, size);
+
+	arch_wb_cache_pmem(start, size);
+}
+
+#endif /* __ASM_POWERPC_PMEM_H__ */
diff --git a/arch/powerpc/kernel/misc_64.S b/arch/powerpc/kernel/misc_64.S
index c119044cad0d..1378a8d61faf 100644
--- a/arch/powerpc/kernel/misc_64.S
+++ b/arch/powerpc/kernel/misc_64.S
@@ -182,7 +182,7 @@ _GLOBAL(flush_dcache_phys_range)
 	isync
 	blr
 
-_GLOBAL(flush_inval_dcache_range)
+_GLOBAL_TOC(flush_inval_dcache_range)
  	ld	r10,PPC64_CACHES@toc(r2)
 	lwz	r7,DCACHEL1BLOCKSIZE(r10)	/* Get dcache block size */
 	addi	r5,r7,-1
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* Re: ZONE_DEVICE and pmem API support for powerpc
  2017-04-11 17:42 ` Oliver O'Halloran
@ 2017-04-11 18:22   ` Dan Williams
  -1 siblings, 0 replies; 65+ messages in thread
From: Dan Williams @ 2017-04-11 18:22 UTC (permalink / raw)
  To: Oliver O'Halloran; +Cc: bsingharora, linuxppc-dev, arbab, linux-nvdimm

On Tue, Apr 11, 2017 at 10:42 AM, Oliver O'Halloran <oohall@gmail.com> wrote:
> Hi all,
>
> This series adds support for ZONE_DEVICE and the pmem api on powerpc. Namely,
> support for altmaps and the various bits and pieces required for DAX PMD faults.
> The first two patches touch generic mm/ code, but otherwise this is fairly well
> contained in arch/powerpc.
>
> If the nvdimm folks could sanity check this series I'd appreciate it.

Quick feedback: I'm in the process of cleaning up and resubmitting my
patch set to push the pmem api down into the driver directly.

    https://lwn.net/Articles/713064/

I'm also reworking memory hotplug to allow sub-section allocations
which has collided with Michal Hocko's hotplug reworks. It will be
good to have some more eyes on that work to understand the cross-arch
implications.

    https://lkml.org/lkml/2017/3/19/146

> Series is based on next-20170411, but it should apply elsewhere with minor
> fixups to arch_{add|remove}_memory due to conflicts with HMM.  For those
> interested in testing this, there is a driver and matching firmware that carves
> out some system memory for use as an emulated Con Tutto memory card.
>
> Driver: https://github.com/oohal/linux/tree/contutto-next
> Firmware: https://github.com/oohal/skiboot/tree/fake-contutto
>
> Edit core/init.c:686 to control the amount of memory borrowed for the emulated
> device.  I'm keeping the driver out of tree for a until 4.13 since I plan on
> reworking the firmware interface anyway and There's at least one showstopper
> bug.

Is this memory card I/O-cache coherent? I.e. existing dma mapping api
can hand out mappings to it? Just trying to figure out if this the
existing pmem-definition of ZONE_DEVICE or a new one.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: ZONE_DEVICE and pmem API support for powerpc
@ 2017-04-11 18:22   ` Dan Williams
  0 siblings, 0 replies; 65+ messages in thread
From: Dan Williams @ 2017-04-11 18:22 UTC (permalink / raw)
  To: Oliver O'Halloran; +Cc: linuxppc-dev, bsingharora, arbab, linux-nvdimm

On Tue, Apr 11, 2017 at 10:42 AM, Oliver O'Halloran <oohall@gmail.com> wrote:
> Hi all,
>
> This series adds support for ZONE_DEVICE and the pmem api on powerpc. Namely,
> support for altmaps and the various bits and pieces required for DAX PMD faults.
> The first two patches touch generic mm/ code, but otherwise this is fairly well
> contained in arch/powerpc.
>
> If the nvdimm folks could sanity check this series I'd appreciate it.

Quick feedback: I'm in the process of cleaning up and resubmitting my
patch set to push the pmem api down into the driver directly.

    https://lwn.net/Articles/713064/

I'm also reworking memory hotplug to allow sub-section allocations
which has collided with Michal Hocko's hotplug reworks. It will be
good to have some more eyes on that work to understand the cross-arch
implications.

    https://lkml.org/lkml/2017/3/19/146

> Series is based on next-20170411, but it should apply elsewhere with minor
> fixups to arch_{add|remove}_memory due to conflicts with HMM.  For those
> interested in testing this, there is a driver and matching firmware that carves
> out some system memory for use as an emulated Con Tutto memory card.
>
> Driver: https://github.com/oohal/linux/tree/contutto-next
> Firmware: https://github.com/oohal/skiboot/tree/fake-contutto
>
> Edit core/init.c:686 to control the amount of memory borrowed for the emulated
> device.  I'm keeping the driver out of tree for a until 4.13 since I plan on
> reworking the firmware interface anyway and There's at least one showstopper
> bug.

Is this memory card I/O-cache coherent? I.e. existing dma mapping api
can hand out mappings to it? Just trying to figure out if this the
existing pmem-definition of ZONE_DEVICE or a new one.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 8/9] powerpc/mm: Wire up hpte_removebolted for powernv
  2017-04-11 17:42   ` Oliver O'Halloran
@ 2017-04-11 22:50       ` Anton Blanchard
  -1 siblings, 0 replies; 65+ messages in thread
From: Anton Blanchard @ 2017-04-11 22:50 UTC (permalink / raw)
  To: Oliver O'Halloran
  Cc: Rashmica Gupta, linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ,
	arbab-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw

Hi Oliver,

> From: Rashmica Gupta <rashmica.g-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> 
> Adds support for removing bolted (i.e kernel linear mapping) mappings
> on powernv. This is needed to support memory hot unplug operations
> which are required for the teardown of DAX/PMEM devices.
> 
> Cc: Rashmica Gupta <rashmica.g-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> Cc: Anton Blanchard <anton-eUNUBHrolfbYtjvyW6yDsg@public.gmane.org>
> Signed-off-by: Oliver O'Halloran <oohall-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> ---
> Could the original author of this add their S-o-b? I pulled it out of
> Rashmica's memtrace patch, but I remember someone saying Anton wrote
> it originally.

I did.

Signed-off-by: Anton Blanchard <anton-eUNUBHrolfbYtjvyW6yDsg@public.gmane.org>

Anton

> ---
>  arch/powerpc/mm/hash_native_64.c | 31 +++++++++++++++++++++++++++++++
>  1 file changed, 31 insertions(+)
> 
> diff --git a/arch/powerpc/mm/hash_native_64.c
> b/arch/powerpc/mm/hash_native_64.c index 65bb8f33b399..9ba91d4905a4
> 100644 --- a/arch/powerpc/mm/hash_native_64.c
> +++ b/arch/powerpc/mm/hash_native_64.c
> @@ -407,6 +407,36 @@ static void native_hpte_updateboltedpp(unsigned
> long newpp, unsigned long ea, tlbie(vpn, psize, psize, ssize, 0);
>  }
>  
> +/*
> + * Remove a bolted kernel entry. Memory hotplug uses this.
> + *
> + * No need to lock here because we should be the only user.
> + */
> +static int native_hpte_removebolted(unsigned long ea, int psize, int
> ssize) +{
> +	unsigned long vpn;
> +	unsigned long vsid;
> +	long slot;
> +	struct hash_pte *hptep;
> +
> +	vsid = get_kernel_vsid(ea, ssize);
> +	vpn = hpt_vpn(ea, vsid, ssize);
> +
> +	slot = native_hpte_find(vpn, psize, ssize);
> +	if (slot == -1)
> +		return -ENOENT;
> +
> +	hptep = htab_address + slot;
> +
> +	/* Invalidate the hpte */
> +	hptep->v = 0;
> +
> +	/* Invalidate the TLB */
> +	tlbie(vpn, psize, psize, ssize, 0);
> +	return 0;
> +}
> +
> +
>  static void native_hpte_invalidate(unsigned long slot, unsigned long
> vpn, int bpsize, int apsize, int ssize, int local)
>  {
> @@ -725,6 +755,7 @@ void __init hpte_init_native(void)
>  	mmu_hash_ops.hpte_invalidate	= native_hpte_invalidate;
>  	mmu_hash_ops.hpte_updatepp	= native_hpte_updatepp;
>  	mmu_hash_ops.hpte_updateboltedpp =
> native_hpte_updateboltedpp;
> +	mmu_hash_ops.hpte_removebolted = native_hpte_removebolted;
>  	mmu_hash_ops.hpte_insert	= native_hpte_insert;
>  	mmu_hash_ops.hpte_remove	= native_hpte_remove;
>  	mmu_hash_ops.hpte_clear_all	= native_hpte_clear;

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 8/9] powerpc/mm: Wire up hpte_removebolted for powernv
@ 2017-04-11 22:50       ` Anton Blanchard
  0 siblings, 0 replies; 65+ messages in thread
From: Anton Blanchard @ 2017-04-11 22:50 UTC (permalink / raw)
  To: Oliver O'Halloran; +Cc: linuxppc-dev, linux-nvdimm, arbab, Rashmica Gupta

Hi Oliver,

> From: Rashmica Gupta <rashmica.g@gmail.com>
> 
> Adds support for removing bolted (i.e kernel linear mapping) mappings
> on powernv. This is needed to support memory hot unplug operations
> which are required for the teardown of DAX/PMEM devices.
> 
> Cc: Rashmica Gupta <rashmica.g@gmail.com>
> Cc: Anton Blanchard <anton@samba.org>
> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
> ---
> Could the original author of this add their S-o-b? I pulled it out of
> Rashmica's memtrace patch, but I remember someone saying Anton wrote
> it originally.

I did.

Signed-off-by: Anton Blanchard <anton@samba.org>

Anton

> ---
>  arch/powerpc/mm/hash_native_64.c | 31 +++++++++++++++++++++++++++++++
>  1 file changed, 31 insertions(+)
> 
> diff --git a/arch/powerpc/mm/hash_native_64.c
> b/arch/powerpc/mm/hash_native_64.c index 65bb8f33b399..9ba91d4905a4
> 100644 --- a/arch/powerpc/mm/hash_native_64.c
> +++ b/arch/powerpc/mm/hash_native_64.c
> @@ -407,6 +407,36 @@ static void native_hpte_updateboltedpp(unsigned
> long newpp, unsigned long ea, tlbie(vpn, psize, psize, ssize, 0);
>  }
>  
> +/*
> + * Remove a bolted kernel entry. Memory hotplug uses this.
> + *
> + * No need to lock here because we should be the only user.
> + */
> +static int native_hpte_removebolted(unsigned long ea, int psize, int
> ssize) +{
> +	unsigned long vpn;
> +	unsigned long vsid;
> +	long slot;
> +	struct hash_pte *hptep;
> +
> +	vsid = get_kernel_vsid(ea, ssize);
> +	vpn = hpt_vpn(ea, vsid, ssize);
> +
> +	slot = native_hpte_find(vpn, psize, ssize);
> +	if (slot == -1)
> +		return -ENOENT;
> +
> +	hptep = htab_address + slot;
> +
> +	/* Invalidate the hpte */
> +	hptep->v = 0;
> +
> +	/* Invalidate the TLB */
> +	tlbie(vpn, psize, psize, ssize, 0);
> +	return 0;
> +}
> +
> +
>  static void native_hpte_invalidate(unsigned long slot, unsigned long
> vpn, int bpsize, int apsize, int ssize, int local)
>  {
> @@ -725,6 +755,7 @@ void __init hpte_init_native(void)
>  	mmu_hash_ops.hpte_invalidate	= native_hpte_invalidate;
>  	mmu_hash_ops.hpte_updatepp	= native_hpte_updatepp;
>  	mmu_hash_ops.hpte_updateboltedpp =
> native_hpte_updateboltedpp;
> +	mmu_hash_ops.hpte_removebolted = native_hpte_removebolted;
>  	mmu_hash_ops.hpte_insert	= native_hpte_insert;
>  	mmu_hash_ops.hpte_remove	= native_hpte_remove;
>  	mmu_hash_ops.hpte_clear_all	= native_hpte_clear;

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 8/9] powerpc/mm: Wire up hpte_removebolted for powernv
  2017-04-11 22:50       ` Anton Blanchard
@ 2017-04-12  0:18         ` Stephen Rothwell
  -1 siblings, 0 replies; 65+ messages in thread
From: Stephen Rothwell @ 2017-04-12  0:18 UTC (permalink / raw)
  To: Oliver O'Halloran
  Cc: Rashmica Gupta, linuxppc-dev, Anton Blanchard, arbab, linux-nvdimm

Hi Oliver,

On Wed, 12 Apr 2017 08:50:56 +1000 Anton Blanchard <anton@samba.org> wrote:
>
> > From: Rashmica Gupta <rashmica.g@gmail.com>
> > 
> > Adds support for removing bolted (i.e kernel linear mapping) mappings
> > on powernv. This is needed to support memory hot unplug operations
> > which are required for the teardown of DAX/PMEM devices.
> > 
> > Cc: Rashmica Gupta <rashmica.g@gmail.com>
> > Cc: Anton Blanchard <anton@samba.org>
> > Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
> > ---
> > Could the original author of this add their S-o-b? I pulled it out of
> > Rashmica's memtrace patch, but I remember someone saying Anton wrote
> > it originally.  
> 
> I did.
> 
> Signed-off-by: Anton Blanchard <anton@samba.org>

If you are going to claim that Rashmica authored this patch (and you do
with the From: line above), then you need her Signed-off-by as well.

-- 
Cheers,
Stephen Rothwell
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 8/9] powerpc/mm: Wire up hpte_removebolted for powernv
@ 2017-04-12  0:18         ` Stephen Rothwell
  0 siblings, 0 replies; 65+ messages in thread
From: Stephen Rothwell @ 2017-04-12  0:18 UTC (permalink / raw)
  To: Oliver O'Halloran
  Cc: Anton Blanchard, Rashmica Gupta, linuxppc-dev, arbab, linux-nvdimm

Hi Oliver,

On Wed, 12 Apr 2017 08:50:56 +1000 Anton Blanchard <anton@samba.org> wrote:
>
> > From: Rashmica Gupta <rashmica.g@gmail.com>
> > 
> > Adds support for removing bolted (i.e kernel linear mapping) mappings
> > on powernv. This is needed to support memory hot unplug operations
> > which are required for the teardown of DAX/PMEM devices.
> > 
> > Cc: Rashmica Gupta <rashmica.g@gmail.com>
> > Cc: Anton Blanchard <anton@samba.org>
> > Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
> > ---
> > Could the original author of this add their S-o-b? I pulled it out of
> > Rashmica's memtrace patch, but I remember someone saying Anton wrote
> > it originally.  
> 
> I did.
> 
> Signed-off-by: Anton Blanchard <anton@samba.org>

If you are going to claim that Rashmica authored this patch (and you do
with the From: line above), then you need her Signed-off-by as well.

-- 
Cheers,
Stephen Rothwell

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 3/9] powerpc/mm: Add _PAGE_DEVMAP for ppc64.
  2017-04-11 17:42   ` Oliver O'Halloran
@ 2017-04-12  0:19     ` Stephen Rothwell
  -1 siblings, 0 replies; 65+ messages in thread
From: Stephen Rothwell @ 2017-04-12  0:19 UTC (permalink / raw)
  To: Oliver O'Halloran; +Cc: linuxppc-dev, Aneesh Kumar K.V, arbab, linux-nvdimm

Hi Oliver,

On Wed, 12 Apr 2017 03:42:27 +1000 Oliver O'Halloran <oohall@gmail.com> wrote:
>
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> Add a _PAGE_DEVMAP bit for PTE and DAX PMD entires. PowerPC doesn't
> currently support PUD faults so we haven't extended it to the PUD
> level.
> 
> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>

This needs Aneesh's Signed-off-by.

-- 
Cheers,
Stephen Rothwell
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 3/9] powerpc/mm: Add _PAGE_DEVMAP for ppc64.
@ 2017-04-12  0:19     ` Stephen Rothwell
  0 siblings, 0 replies; 65+ messages in thread
From: Stephen Rothwell @ 2017-04-12  0:19 UTC (permalink / raw)
  To: Oliver O'Halloran; +Cc: linuxppc-dev, Aneesh Kumar K.V, arbab, linux-nvdimm

Hi Oliver,

On Wed, 12 Apr 2017 03:42:27 +1000 Oliver O'Halloran <oohall@gmail.com> wrote:
>
> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> 
> Add a _PAGE_DEVMAP bit for PTE and DAX PMD entires. PowerPC doesn't
> currently support PUD faults so we haven't extended it to the PUD
> level.
> 
> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>

This needs Aneesh's Signed-off-by.

-- 
Cheers,
Stephen Rothwell

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 5/9] powerpc/vmemmap: Add altmap support
  2017-04-11 17:42   ` Oliver O'Halloran
@ 2017-04-12  0:24     ` Balbir Singh
  -1 siblings, 0 replies; 65+ messages in thread
From: Balbir Singh @ 2017-04-12  0:24 UTC (permalink / raw)
  To: Oliver O'Halloran, linuxppc-dev; +Cc: arbab, linux-nvdimm

On Wed, 2017-04-12 at 03:42 +1000, Oliver O'Halloran wrote:
> Adds support to powerpc for the altmap feature of ZONE_DEVICE memory. An
> altmap is a driver provided region that is used to provide the backing
> storage for the struct pages of ZONE_DEVICE memory. In situations where
> large amount of ZONE_DEVICE memory is being added to the system the
> altmap reduces pressure on main system memory by allowing the mm/
> metadata to be stored on the device itself rather in main memory.
> 
> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
> ---

Reviewed-by: Balbir Singh <bsingharora@gmail.com>
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 5/9] powerpc/vmemmap: Add altmap support
@ 2017-04-12  0:24     ` Balbir Singh
  0 siblings, 0 replies; 65+ messages in thread
From: Balbir Singh @ 2017-04-12  0:24 UTC (permalink / raw)
  To: Oliver O'Halloran, linuxppc-dev; +Cc: arbab, linux-nvdimm

On Wed, 2017-04-12 at 03:42 +1000, Oliver O'Halloran wrote:
> Adds support to powerpc for the altmap feature of ZONE_DEVICE memory. An
> altmap is a driver provided region that is used to provide the backing
> storage for the struct pages of ZONE_DEVICE memory. In situations where
> large amount of ZONE_DEVICE memory is being added to the system the
> altmap reduces pressure on main system memory by allowing the mm/
> metadata to be stored on the device itself rather in main memory.
> 
> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
> ---

Reviewed-by: Balbir Singh <bsingharora@gmail.com>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 6/9] powerpc, mm: Enable ZONE_DEVICE on powerpc
  2017-04-11 17:42   ` Oliver O'Halloran
@ 2017-04-12  0:25     ` Balbir Singh
  -1 siblings, 0 replies; 65+ messages in thread
From: Balbir Singh @ 2017-04-12  0:25 UTC (permalink / raw)
  To: Oliver O'Halloran, linuxppc-dev; +Cc: arbab, linux-nvdimm

On Wed, 2017-04-12 at 03:42 +1000, Oliver O'Halloran wrote:
> Flip the switch. Running around and screaming "IT'S ALIVE" is optional,
> but recommended.
> 
> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
> ---
>  mm/Kconfig | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 43d000e44424..d696af58f97f 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -724,7 +724,7 @@ config ZONE_DEVICE
>  	depends on MEMORY_HOTPLUG
>  	depends on MEMORY_HOTREMOVE
>  	depends on SPARSEMEM_VMEMMAP
> -	depends on X86_64 #arch_add_memory() comprehends device memory
> +	depends on (X86_64 || PPC_BOOK3S_64)  #arch_add_memory() comprehends device memory

Reviewed-by: Balbir Singh <bsingharora@gmail.com>
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 6/9] powerpc, mm: Enable ZONE_DEVICE on powerpc
@ 2017-04-12  0:25     ` Balbir Singh
  0 siblings, 0 replies; 65+ messages in thread
From: Balbir Singh @ 2017-04-12  0:25 UTC (permalink / raw)
  To: Oliver O'Halloran, linuxppc-dev; +Cc: arbab, linux-nvdimm

On Wed, 2017-04-12 at 03:42 +1000, Oliver O'Halloran wrote:
> Flip the switch. Running around and screaming "IT'S ALIVE" is optional,
> but recommended.
> 
> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
> ---
>  mm/Kconfig | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 43d000e44424..d696af58f97f 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -724,7 +724,7 @@ config ZONE_DEVICE
>  	depends on MEMORY_HOTPLUG
>  	depends on MEMORY_HOTREMOVE
>  	depends on SPARSEMEM_VMEMMAP
> -	depends on X86_64 #arch_add_memory() comprehends device memory
> +	depends on (X86_64 || PPC_BOOK3S_64)  #arch_add_memory() comprehends device memory

Reviewed-by: Balbir Singh <bsingharora@gmail.com>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 4/9] powerpc/mm: Reshuffle vmemmap_free()
  2017-04-11 17:42   ` Oliver O'Halloran
@ 2017-04-12  0:33     ` Stephen Rothwell
  -1 siblings, 0 replies; 65+ messages in thread
From: Stephen Rothwell @ 2017-04-12  0:33 UTC (permalink / raw)
  To: Oliver O'Halloran; +Cc: linuxppc-dev, arbab, linux-nvdimm

Hi Oliver,

On Wed, 12 Apr 2017 03:42:28 +1000 Oliver O'Halloran <oohall@gmail.com> wrote:
>
> diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
> index ec84b31c6c86..f8124edb6ffa 100644
> --- a/arch/powerpc/mm/init_64.c
> +++ b/arch/powerpc/mm/init_64.c
> @@ -234,12 +234,15 @@ static unsigned long vmemmap_list_free(unsigned long start)
>  void __ref vmemmap_free(unsigned long start, unsigned long end)
>  {
>  	unsigned long page_size = 1 << mmu_psize_defs[mmu_vmemmap_psize].shift;
> +	unsigned long page_order = get_order(page_size);
>  
>  	start = _ALIGN_DOWN(start, page_size);
>  
>  	pr_debug("vmemmap_free %lx...%lx\n", start, end);
>  
>  	for (; start < end; start += page_size) {
> +		struct page *page = pfn_to_page(addr >> PAGE_SHIFT);

The declaration of addr is below here and, even so, it would be
uninitialised ...

> +		unsigned int nr_pages;
>  		unsigned long addr;

-- 
Cheers,
Stephen Rothwell
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 4/9] powerpc/mm: Reshuffle vmemmap_free()
@ 2017-04-12  0:33     ` Stephen Rothwell
  0 siblings, 0 replies; 65+ messages in thread
From: Stephen Rothwell @ 2017-04-12  0:33 UTC (permalink / raw)
  To: Oliver O'Halloran; +Cc: linuxppc-dev, arbab, linux-nvdimm

Hi Oliver,

On Wed, 12 Apr 2017 03:42:28 +1000 Oliver O'Halloran <oohall@gmail.com> wrote:
>
> diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
> index ec84b31c6c86..f8124edb6ffa 100644
> --- a/arch/powerpc/mm/init_64.c
> +++ b/arch/powerpc/mm/init_64.c
> @@ -234,12 +234,15 @@ static unsigned long vmemmap_list_free(unsigned long start)
>  void __ref vmemmap_free(unsigned long start, unsigned long end)
>  {
>  	unsigned long page_size = 1 << mmu_psize_defs[mmu_vmemmap_psize].shift;
> +	unsigned long page_order = get_order(page_size);
>  
>  	start = _ALIGN_DOWN(start, page_size);
>  
>  	pr_debug("vmemmap_free %lx...%lx\n", start, end);
>  
>  	for (; start < end; start += page_size) {
> +		struct page *page = pfn_to_page(addr >> PAGE_SHIFT);

The declaration of addr is below here and, even so, it would be
uninitialised ...

> +		unsigned int nr_pages;
>  		unsigned long addr;

-- 
Cheers,
Stephen Rothwell

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 6/9] powerpc, mm: Enable ZONE_DEVICE on powerpc
  2017-04-11 17:42   ` Oliver O'Halloran
@ 2017-04-12  0:43     ` Stephen Rothwell
  -1 siblings, 0 replies; 65+ messages in thread
From: Stephen Rothwell @ 2017-04-12  0:43 UTC (permalink / raw)
  To: Oliver O'Halloran; +Cc: linuxppc-dev, arbab, linux-nvdimm

Hi Oliver,

On Wed, 12 Apr 2017 03:42:30 +1000 Oliver O'Halloran <oohall@gmail.com> wrote:
>
> Flip the switch. Running around and screaming "IT'S ALIVE" is optional,
> but recommended.
> 
> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
> ---
>  mm/Kconfig | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 43d000e44424..d696af58f97f 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -724,7 +724,7 @@ config ZONE_DEVICE
>  	depends on MEMORY_HOTPLUG
>  	depends on MEMORY_HOTREMOVE
>  	depends on SPARSEMEM_VMEMMAP
> -	depends on X86_64 #arch_add_memory() comprehends device memory
> +	depends on (X86_64 || PPC_BOOK3S_64)  #arch_add_memory() comprehends device memory
>  

That's fine, but at what point do we create
CONFIG_ARCH_HAVE_ZONE_DEVICE, replace the "depends on
<archs/platforms>" above with "depends on ARCH_HAVE_ZONE_DEVICE" and
select that from the appropriate places?

-- 
Cheers,
Stephen Rothwell
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 6/9] powerpc, mm: Enable ZONE_DEVICE on powerpc
@ 2017-04-12  0:43     ` Stephen Rothwell
  0 siblings, 0 replies; 65+ messages in thread
From: Stephen Rothwell @ 2017-04-12  0:43 UTC (permalink / raw)
  To: Oliver O'Halloran; +Cc: linuxppc-dev, arbab, linux-nvdimm

Hi Oliver,

On Wed, 12 Apr 2017 03:42:30 +1000 Oliver O'Halloran <oohall@gmail.com> wrote:
>
> Flip the switch. Running around and screaming "IT'S ALIVE" is optional,
> but recommended.
> 
> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
> ---
>  mm/Kconfig | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 43d000e44424..d696af58f97f 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -724,7 +724,7 @@ config ZONE_DEVICE
>  	depends on MEMORY_HOTPLUG
>  	depends on MEMORY_HOTREMOVE
>  	depends on SPARSEMEM_VMEMMAP
> -	depends on X86_64 #arch_add_memory() comprehends device memory
> +	depends on (X86_64 || PPC_BOOK3S_64)  #arch_add_memory() comprehends device memory
>  

That's fine, but at what point do we create
CONFIG_ARCH_HAVE_ZONE_DEVICE, replace the "depends on
<archs/platforms>" above with "depends on ARCH_HAVE_ZONE_DEVICE" and
select that from the appropriate places?

-- 
Cheers,
Stephen Rothwell

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: ZONE_DEVICE and pmem API support for powerpc
  2017-04-11 17:42 ` Oliver O'Halloran
@ 2017-04-12  1:10   ` Stephen Rothwell
  -1 siblings, 0 replies; 65+ messages in thread
From: Stephen Rothwell @ 2017-04-12  1:10 UTC (permalink / raw)
  To: Oliver O'Halloran; +Cc: linuxppc-dev, arbab, linux-nvdimm

Hi Oliver,

On Wed, 12 Apr 2017 03:42:24 +1000 Oliver O'Halloran <oohall@gmail.com> wrote:
>
> Series is based on next-20170411, but it should apply elsewhere with minor
> fixups to arch_{add|remove}_memory due to conflicts with HMM.  For those

Just to make life fun for you, Andrew has dropped the HMM patches from
his quilt series today (and so they will not be in next-20170412).

-- 
Cheers,
Stephen Rothwell
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: ZONE_DEVICE and pmem API support for powerpc
@ 2017-04-12  1:10   ` Stephen Rothwell
  0 siblings, 0 replies; 65+ messages in thread
From: Stephen Rothwell @ 2017-04-12  1:10 UTC (permalink / raw)
  To: Oliver O'Halloran; +Cc: linuxppc-dev, arbab, linux-nvdimm

Hi Oliver,

On Wed, 12 Apr 2017 03:42:24 +1000 Oliver O'Halloran <oohall@gmail.com> wrote:
>
> Series is based on next-20170411, but it should apply elsewhere with minor
> fixups to arch_{add|remove}_memory due to conflicts with HMM.  For those

Just to make life fun for you, Andrew has dropped the HMM patches from
his quilt series today (and so they will not be in next-20170412).

-- 
Cheers,
Stephen Rothwell

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 8/9] powerpc/mm: Wire up hpte_removebolted for powernv
  2017-04-11 17:42   ` Oliver O'Halloran
@ 2017-04-12  1:53     ` Balbir Singh
  -1 siblings, 0 replies; 65+ messages in thread
From: Balbir Singh @ 2017-04-12  1:53 UTC (permalink / raw)
  To: Oliver O'Halloran, linuxppc-dev
  Cc: Rashmica Gupta, Anton Blanchard, arbab, linux-nvdimm

On Wed, 2017-04-12 at 03:42 +1000, Oliver O'Halloran wrote:
> From: Rashmica Gupta <rashmica.g@gmail.com>
> 
> Adds support for removing bolted (i.e kernel linear mapping) mappings on
> powernv. This is needed to support memory hot unplug operations which
> are required for the teardown of DAX/PMEM devices.
> 
> Cc: Rashmica Gupta <rashmica.g@gmail.com>
> Cc: Anton Blanchard <anton@samba.org>
> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
> ---
> Could the original author of this add their S-o-b? I pulled it out of
> Rashmica's memtrace patch, but I remember someone saying Anton wrote
> it originally.
> ---
>  arch/powerpc/mm/hash_native_64.c | 31 +++++++++++++++++++++++++++++++
>  1 file changed, 31 insertions(+)
> 
> diff --git a/arch/powerpc/mm/hash_native_64.c b/arch/powerpc/mm/hash_native_64.c
> index 65bb8f33b399..9ba91d4905a4 100644
> --- a/arch/powerpc/mm/hash_native_64.c
> +++ b/arch/powerpc/mm/hash_native_64.c
> @@ -407,6 +407,36 @@ static void native_hpte_updateboltedpp(unsigned long newpp, unsigned long ea,
>  	tlbie(vpn, psize, psize, ssize, 0);
>  }
>  
> +/*
> + * Remove a bolted kernel entry. Memory hotplug uses this.
> + *
> + * No need to lock here because we should be the only user.

As long as this is after the necessary isolation and is called from
arch_remove_memory(), I think we should be fine

> + */
> +static int native_hpte_removebolted(unsigned long ea, int psize, int ssize)
> +{
> +	unsigned long vpn;
> +	unsigned long vsid;
> +	long slot;
> +	struct hash_pte *hptep;
> +
> +	vsid = get_kernel_vsid(ea, ssize);
> +	vpn = hpt_vpn(ea, vsid, ssize);
> +
> +	slot = native_hpte_find(vpn, psize, ssize);
> +	if (slot == -1)
> +		return -ENOENT;

If slot == -1, it means someone else removed the HPTE entry? Are we racing?
I suspect we should never hit this situation during hotunplug, specifically
since this is bolted.

> +
> +	hptep = htab_address + slot;
> +
> +	/* Invalidate the hpte */
> +	hptep->v = 0;

Under DEBUG or otherwise, I would add more checks like

1. was hpte_v & HPTE_V_VALID and BOLTED set? If not, we've already invalidated
that hpte and we can skip the tlbie. Since this was bolted you might be right
that it is always valid and bolted



> +
> +	/* Invalidate the TLB */
> +	tlbie(vpn, psize, psize, ssize, 0);

The API also does not clear linear_map_hash_slots[] under DEBUG_PAGEALLOC

> +	return 0;
> +}
> +
> +

Balbir Singh.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 8/9] powerpc/mm: Wire up hpte_removebolted for powernv
@ 2017-04-12  1:53     ` Balbir Singh
  0 siblings, 0 replies; 65+ messages in thread
From: Balbir Singh @ 2017-04-12  1:53 UTC (permalink / raw)
  To: Oliver O'Halloran, linuxppc-dev
  Cc: arbab, linux-nvdimm, Rashmica Gupta, Anton Blanchard

On Wed, 2017-04-12 at 03:42 +1000, Oliver O'Halloran wrote:
> From: Rashmica Gupta <rashmica.g@gmail.com>
> 
> Adds support for removing bolted (i.e kernel linear mapping) mappings on
> powernv. This is needed to support memory hot unplug operations which
> are required for the teardown of DAX/PMEM devices.
> 
> Cc: Rashmica Gupta <rashmica.g@gmail.com>
> Cc: Anton Blanchard <anton@samba.org>
> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
> ---
> Could the original author of this add their S-o-b? I pulled it out of
> Rashmica's memtrace patch, but I remember someone saying Anton wrote
> it originally.
> ---
>  arch/powerpc/mm/hash_native_64.c | 31 +++++++++++++++++++++++++++++++
>  1 file changed, 31 insertions(+)
> 
> diff --git a/arch/powerpc/mm/hash_native_64.c b/arch/powerpc/mm/hash_native_64.c
> index 65bb8f33b399..9ba91d4905a4 100644
> --- a/arch/powerpc/mm/hash_native_64.c
> +++ b/arch/powerpc/mm/hash_native_64.c
> @@ -407,6 +407,36 @@ static void native_hpte_updateboltedpp(unsigned long newpp, unsigned long ea,
>  	tlbie(vpn, psize, psize, ssize, 0);
>  }
>  
> +/*
> + * Remove a bolted kernel entry. Memory hotplug uses this.
> + *
> + * No need to lock here because we should be the only user.

As long as this is after the necessary isolation and is called from
arch_remove_memory(), I think we should be fine

> + */
> +static int native_hpte_removebolted(unsigned long ea, int psize, int ssize)
> +{
> +	unsigned long vpn;
> +	unsigned long vsid;
> +	long slot;
> +	struct hash_pte *hptep;
> +
> +	vsid = get_kernel_vsid(ea, ssize);
> +	vpn = hpt_vpn(ea, vsid, ssize);
> +
> +	slot = native_hpte_find(vpn, psize, ssize);
> +	if (slot == -1)
> +		return -ENOENT;

If slot == -1, it means someone else removed the HPTE entry? Are we racing?
I suspect we should never hit this situation during hotunplug, specifically
since this is bolted.

> +
> +	hptep = htab_address + slot;
> +
> +	/* Invalidate the hpte */
> +	hptep->v = 0;

Under DEBUG or otherwise, I would add more checks like

1. was hpte_v & HPTE_V_VALID and BOLTED set? If not, we've already invalidated
that hpte and we can skip the tlbie. Since this was bolted you might be right
that it is always valid and bolted



> +
> +	/* Invalidate the TLB */
> +	tlbie(vpn, psize, psize, ssize, 0);

The API also does not clear linear_map_hash_slots[] under DEBUG_PAGEALLOC

> +	return 0;
> +}
> +
> +

Balbir Singh.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 6/9] powerpc, mm: Enable ZONE_DEVICE on powerpc
  2017-04-12  0:43     ` Stephen Rothwell
@ 2017-04-12  2:03       ` Michael Ellerman
  -1 siblings, 0 replies; 65+ messages in thread
From: Michael Ellerman @ 2017-04-12  2:03 UTC (permalink / raw)
  To: Stephen Rothwell, Oliver O'Halloran; +Cc: linuxppc-dev, arbab, linux-nvdimm

Stephen Rothwell <sfr@canb.auug.org.au> writes:

> Hi Oliver,
>
> On Wed, 12 Apr 2017 03:42:30 +1000 Oliver O'Halloran <oohall@gmail.com> wrote:
>>
>> Flip the switch. Running around and screaming "IT'S ALIVE" is optional,
>> but recommended.
>> 
>> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
>> ---
>>  mm/Kconfig | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/mm/Kconfig b/mm/Kconfig
>> index 43d000e44424..d696af58f97f 100644
>> --- a/mm/Kconfig
>> +++ b/mm/Kconfig
>> @@ -724,7 +724,7 @@ config ZONE_DEVICE
>>  	depends on MEMORY_HOTPLUG
>>  	depends on MEMORY_HOTREMOVE
>>  	depends on SPARSEMEM_VMEMMAP
>> -	depends on X86_64 #arch_add_memory() comprehends device memory
>> +	depends on (X86_64 || PPC_BOOK3S_64)  #arch_add_memory() comprehends device memory
>>  
>
> That's fine, but at what point do we create
> CONFIG_ARCH_HAVE_ZONE_DEVICE, replace the "depends on
> <archs/platforms>" above with "depends on ARCH_HAVE_ZONE_DEVICE" and
> select that from the appropriate places?

You mean CONFIG_HAVE_ZONE_DEVICE :)

A patch to do that, and update x86, would be a good precursor to this
series. It could probably go in right now, and be in place for when this
series lands.

cheers
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 6/9] powerpc, mm: Enable ZONE_DEVICE on powerpc
@ 2017-04-12  2:03       ` Michael Ellerman
  0 siblings, 0 replies; 65+ messages in thread
From: Michael Ellerman @ 2017-04-12  2:03 UTC (permalink / raw)
  To: Stephen Rothwell, Oliver O'Halloran; +Cc: linuxppc-dev, arbab, linux-nvdimm

Stephen Rothwell <sfr@canb.auug.org.au> writes:

> Hi Oliver,
>
> On Wed, 12 Apr 2017 03:42:30 +1000 Oliver O'Halloran <oohall@gmail.com> wrote:
>>
>> Flip the switch. Running around and screaming "IT'S ALIVE" is optional,
>> but recommended.
>> 
>> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
>> ---
>>  mm/Kconfig | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/mm/Kconfig b/mm/Kconfig
>> index 43d000e44424..d696af58f97f 100644
>> --- a/mm/Kconfig
>> +++ b/mm/Kconfig
>> @@ -724,7 +724,7 @@ config ZONE_DEVICE
>>  	depends on MEMORY_HOTPLUG
>>  	depends on MEMORY_HOTREMOVE
>>  	depends on SPARSEMEM_VMEMMAP
>> -	depends on X86_64 #arch_add_memory() comprehends device memory
>> +	depends on (X86_64 || PPC_BOOK3S_64)  #arch_add_memory() comprehends device memory
>>  
>
> That's fine, but at what point do we create
> CONFIG_ARCH_HAVE_ZONE_DEVICE, replace the "depends on
> <archs/platforms>" above with "depends on ARCH_HAVE_ZONE_DEVICE" and
> select that from the appropriate places?

You mean CONFIG_HAVE_ZONE_DEVICE :)

A patch to do that, and update x86, would be a good precursor to this
series. It could probably go in right now, and be in place for when this
series lands.

cheers

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 3/9] powerpc/mm: Add _PAGE_DEVMAP for ppc64.
  2017-04-12  0:19     ` Stephen Rothwell
@ 2017-04-12  3:07       ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 65+ messages in thread
From: Aneesh Kumar K.V @ 2017-04-12  3:07 UTC (permalink / raw)
  To: Stephen Rothwell, Oliver O'Halloran; +Cc: linuxppc-dev, arbab, linux-nvdimm



On Wednesday 12 April 2017 05:49 AM, Stephen Rothwell wrote:
> Hi Oliver,
>
> On Wed, 12 Apr 2017 03:42:27 +1000 Oliver O'Halloran <oohall@gmail.com> wrote:
>>
>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>>
>> Add a _PAGE_DEVMAP bit for PTE and DAX PMD entires. PowerPC doesn't
>> currently support PUD faults so we haven't extended it to the PUD
>> level.
>>
>> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
>> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
>
> This needs Aneesh's Signed-off-by.
>

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

-aneesh

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 3/9] powerpc/mm: Add _PAGE_DEVMAP for ppc64.
@ 2017-04-12  3:07       ` Aneesh Kumar K.V
  0 siblings, 0 replies; 65+ messages in thread
From: Aneesh Kumar K.V @ 2017-04-12  3:07 UTC (permalink / raw)
  To: Stephen Rothwell, Oliver O'Halloran; +Cc: linuxppc-dev, arbab, linux-nvdimm



On Wednesday 12 April 2017 05:49 AM, Stephen Rothwell wrote:
> Hi Oliver,
>
> On Wed, 12 Apr 2017 03:42:27 +1000 Oliver O'Halloran <oohall@gmail.com> wrote:
>>
>> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>>
>> Add a _PAGE_DEVMAP bit for PTE and DAX PMD entires. PowerPC doesn't
>> currently support PUD faults so we haven't extended it to the PUD
>> level.
>>
>> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
>> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
>
> This needs Aneesh's Signed-off-by.
>

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

-aneesh

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 8/9] powerpc/mm: Wire up hpte_removebolted for powernv
  2017-04-12  0:18         ` Stephen Rothwell
@ 2017-04-12  3:30           ` Rashmica Gupta
  -1 siblings, 0 replies; 65+ messages in thread
From: Rashmica Gupta @ 2017-04-12  3:30 UTC (permalink / raw)
  To: Stephen Rothwell, Oliver O'Halloran
  Cc: Rashmica Gupta, linuxppc-dev, Anton Blanchard, arbab, linux-nvdimm



On 12/04/17 10:18, Stephen Rothwell wrote:
> Hi Oliver,
>
> On Wed, 12 Apr 2017 08:50:56 +1000 Anton Blanchard <anton@samba.org> wrote:
>>> From: Rashmica Gupta <rashmica.g@gmail.com>
>>>
>>> Adds support for removing bolted (i.e kernel linear mapping) mappings
>>> on powernv. This is needed to support memory hot unplug operations
>>> which are required for the teardown of DAX/PMEM devices.
>>>
>>> Cc: Rashmica Gupta <rashmica.g@gmail.com>
>>> Cc: Anton Blanchard <anton@samba.org>
>>> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
>>> ---
>>> Could the original author of this add their S-o-b? I pulled it out of
>>> Rashmica's memtrace patch, but I remember someone saying Anton wrote
>>> it originally.
>> I did.
>>
>> Signed-off-by: Anton Blanchard <anton@samba.org>
> If you are going to claim that Rashmica authored this patch (and you do
> with the From: line above), then you need her Signed-off-by as well.
>
Oliver, can you change the 'From' to a 'Reviewed By'?
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 8/9] powerpc/mm: Wire up hpte_removebolted for powernv
@ 2017-04-12  3:30           ` Rashmica Gupta
  0 siblings, 0 replies; 65+ messages in thread
From: Rashmica Gupta @ 2017-04-12  3:30 UTC (permalink / raw)
  To: Stephen Rothwell, Oliver O'Halloran
  Cc: Anton Blanchard, Rashmica Gupta, linuxppc-dev, arbab, linux-nvdimm



On 12/04/17 10:18, Stephen Rothwell wrote:
> Hi Oliver,
>
> On Wed, 12 Apr 2017 08:50:56 +1000 Anton Blanchard <anton@samba.org> wrote:
>>> From: Rashmica Gupta <rashmica.g@gmail.com>
>>>
>>> Adds support for removing bolted (i.e kernel linear mapping) mappings
>>> on powernv. This is needed to support memory hot unplug operations
>>> which are required for the teardown of DAX/PMEM devices.
>>>
>>> Cc: Rashmica Gupta <rashmica.g@gmail.com>
>>> Cc: Anton Blanchard <anton@samba.org>
>>> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
>>> ---
>>> Could the original author of this add their S-o-b? I pulled it out of
>>> Rashmica's memtrace patch, but I remember someone saying Anton wrote
>>> it originally.
>> I did.
>>
>> Signed-off-by: Anton Blanchard <anton@samba.org>
> If you are going to claim that Rashmica authored this patch (and you do
> with the From: line above), then you need her Signed-off-by as well.
>
Oliver, can you change the 'From' to a 'Reviewed By'?

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 1/9] mm/huge_memory: Use zap_deposited_table() more
  2017-04-11 17:42   ` Oliver O'Halloran
@ 2017-04-12  5:44     ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 65+ messages in thread
From: Aneesh Kumar K.V @ 2017-04-12  5:44 UTC (permalink / raw)
  To: Oliver O'Halloran, linuxppc-dev
  Cc: arbab, bsingharora, linux-nvdimm, Kirill A. Shutemov, linux-mm

Oliver O'Halloran <oohall@gmail.com> writes:

> Depending flags of the PMD being zapped there may or may not be a
> deposited pgtable to be freed. In two of the three cases this is open
> coded while the third uses the zap_deposited_table() helper. This patch
> converts the others to use the helper to clean things up a bit.
>
> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> Cc: linux-mm@kvack.org
> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

> ---
> For reference:
>
> void zap_deposited_table(struct mm_struct *mm, pmd_t *pmd)
> {
>         pgtable_t pgtable;
>
>         pgtable = pgtable_trans_huge_withdraw(mm, pmd);
>         pte_free(mm, pgtable);
>         atomic_long_dec(&mm->nr_ptes);
> }
> ---
>  mm/huge_memory.c | 8 ++------
>  1 file changed, 2 insertions(+), 6 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index b787c4cfda0e..aa01dd47cc65 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1615,8 +1615,7 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
>  		if (is_huge_zero_pmd(orig_pmd))
>  			tlb_remove_page_size(tlb, pmd_page(orig_pmd), HPAGE_PMD_SIZE);
>  	} else if (is_huge_zero_pmd(orig_pmd)) {
> -		pte_free(tlb->mm, pgtable_trans_huge_withdraw(tlb->mm, pmd));
> -		atomic_long_dec(&tlb->mm->nr_ptes);
> +		zap_deposited_table(tlb->mm, pmd);
>  		spin_unlock(ptl);
>  		tlb_remove_page_size(tlb, pmd_page(orig_pmd), HPAGE_PMD_SIZE);
>  	} else {
> @@ -1625,10 +1624,7 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
>  		VM_BUG_ON_PAGE(page_mapcount(page) < 0, page);
>  		VM_BUG_ON_PAGE(!PageHead(page), page);
>  		if (PageAnon(page)) {
> -			pgtable_t pgtable;
> -			pgtable = pgtable_trans_huge_withdraw(tlb->mm, pmd);
> -			pte_free(tlb->mm, pgtable);
> -			atomic_long_dec(&tlb->mm->nr_ptes);
> +			zap_deposited_table(tlb->mm, pmd);
>  			add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR);
>  		} else {
>  			if (arch_needs_pgtable_deposit())
> -- 
> 2.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 1/9] mm/huge_memory: Use zap_deposited_table() more
@ 2017-04-12  5:44     ` Aneesh Kumar K.V
  0 siblings, 0 replies; 65+ messages in thread
From: Aneesh Kumar K.V @ 2017-04-12  5:44 UTC (permalink / raw)
  To: Oliver O'Halloran, linuxppc-dev
  Cc: arbab, bsingharora, linux-nvdimm, Oliver O'Halloran,
	Kirill A. Shutemov, linux-mm

Oliver O'Halloran <oohall@gmail.com> writes:

> Depending flags of the PMD being zapped there may or may not be a
> deposited pgtable to be freed. In two of the three cases this is open
> coded while the third uses the zap_deposited_table() helper. This patch
> converts the others to use the helper to clean things up a bit.
>
> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> Cc: linux-mm@kvack.org
> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

> ---
> For reference:
>
> void zap_deposited_table(struct mm_struct *mm, pmd_t *pmd)
> {
>         pgtable_t pgtable;
>
>         pgtable = pgtable_trans_huge_withdraw(mm, pmd);
>         pte_free(mm, pgtable);
>         atomic_long_dec(&mm->nr_ptes);
> }
> ---
>  mm/huge_memory.c | 8 ++------
>  1 file changed, 2 insertions(+), 6 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index b787c4cfda0e..aa01dd47cc65 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1615,8 +1615,7 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
>  		if (is_huge_zero_pmd(orig_pmd))
>  			tlb_remove_page_size(tlb, pmd_page(orig_pmd), HPAGE_PMD_SIZE);
>  	} else if (is_huge_zero_pmd(orig_pmd)) {
> -		pte_free(tlb->mm, pgtable_trans_huge_withdraw(tlb->mm, pmd));
> -		atomic_long_dec(&tlb->mm->nr_ptes);
> +		zap_deposited_table(tlb->mm, pmd);
>  		spin_unlock(ptl);
>  		tlb_remove_page_size(tlb, pmd_page(orig_pmd), HPAGE_PMD_SIZE);
>  	} else {
> @@ -1625,10 +1624,7 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
>  		VM_BUG_ON_PAGE(page_mapcount(page) < 0, page);
>  		VM_BUG_ON_PAGE(!PageHead(page), page);
>  		if (PageAnon(page)) {
> -			pgtable_t pgtable;
> -			pgtable = pgtable_trans_huge_withdraw(tlb->mm, pmd);
> -			pte_free(tlb->mm, pgtable);
> -			atomic_long_dec(&tlb->mm->nr_ptes);
> +			zap_deposited_table(tlb->mm, pmd);
>  			add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR);
>  		} else {
>  			if (arch_needs_pgtable_deposit())
> -- 
> 2.9.3

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 2/9] mm/huge_memory: Deposit a pgtable for DAX PMD faults when required
  2017-04-11 17:42   ` Oliver O'Halloran
  (?)
@ 2017-04-12  5:51     ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 65+ messages in thread
From: Aneesh Kumar K.V @ 2017-04-12  5:51 UTC (permalink / raw)
  To: Oliver O'Halloran, linuxppc-dev
  Cc: linux-mm, bsingharora, arbab, linux-nvdimm

Oliver O'Halloran <oohall@gmail.com> writes:

> Although all architectures use a deposited page table for THP on anonymous VMAs
> some architectures (s390 and powerpc) require the deposited storage even for
> file backed VMAs due to quirks of their MMUs. This patch adds support for
> depositing a table in DAX PMD fault handling path for archs that require it.
> Other architectures should see no functional changes.
>
> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> Cc: linux-mm@kvack.org
> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>


> ---
>  mm/huge_memory.c | 20 ++++++++++++++++++--
>  1 file changed, 18 insertions(+), 2 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index aa01dd47cc65..a84909cf20d3 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -715,7 +715,8 @@ int do_huge_pmd_anonymous_page(struct vm_fault *vmf)
>  }
>
>  static void insert_pfn_pmd(struct vm_area_struct *vma, unsigned long addr,
> -		pmd_t *pmd, pfn_t pfn, pgprot_t prot, bool write)
> +		pmd_t *pmd, pfn_t pfn, pgprot_t prot, bool write,
> +		pgtable_t pgtable)
>  {
>  	struct mm_struct *mm = vma->vm_mm;
>  	pmd_t entry;
> @@ -729,6 +730,12 @@ static void insert_pfn_pmd(struct vm_area_struct *vma, unsigned long addr,
>  		entry = pmd_mkyoung(pmd_mkdirty(entry));
>  		entry = maybe_pmd_mkwrite(entry, vma);
>  	}
> +
> +	if (pgtable) {
> +		pgtable_trans_huge_deposit(mm, pmd, pgtable);
> +		atomic_long_inc(&mm->nr_ptes);
> +	}
> +
>  	set_pmd_at(mm, addr, pmd, entry);
>  	update_mmu_cache_pmd(vma, addr, pmd);
>  	spin_unlock(ptl);
> @@ -738,6 +745,7 @@ int vmf_insert_pfn_pmd(struct vm_area_struct *vma, unsigned long addr,
>  			pmd_t *pmd, pfn_t pfn, bool write)
>  {
>  	pgprot_t pgprot = vma->vm_page_prot;
> +	pgtable_t pgtable = NULL;
>  	/*
>  	 * If we had pmd_special, we could avoid all these restrictions,
>  	 * but we need to be consistent with PTEs and architectures that
> @@ -752,9 +760,15 @@ int vmf_insert_pfn_pmd(struct vm_area_struct *vma, unsigned long addr,
>  	if (addr < vma->vm_start || addr >= vma->vm_end)
>  		return VM_FAULT_SIGBUS;
>
> +	if (arch_needs_pgtable_deposit()) {
> +		pgtable = pte_alloc_one(vma->vm_mm, addr);
> +		if (!pgtable)
> +			return VM_FAULT_OOM;
> +	}
> +
>  	track_pfn_insert(vma, &pgprot, pfn);
>
> -	insert_pfn_pmd(vma, addr, pmd, pfn, pgprot, write);
> +	insert_pfn_pmd(vma, addr, pmd, pfn, pgprot, write, pgtable);
>  	return VM_FAULT_NOPAGE;
>  }
>  EXPORT_SYMBOL_GPL(vmf_insert_pfn_pmd);
> @@ -1611,6 +1625,8 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
>  			tlb->fullmm);
>  	tlb_remove_pmd_tlb_entry(tlb, pmd, addr);
>  	if (vma_is_dax(vma)) {
> +		if (arch_needs_pgtable_deposit())
> +			zap_deposited_table(tlb->mm, pmd);
>  		spin_unlock(ptl);
>  		if (is_huge_zero_pmd(orig_pmd))
>  			tlb_remove_page_size(tlb, pmd_page(orig_pmd), HPAGE_PMD_SIZE);
> -- 
> 2.9.3

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 2/9] mm/huge_memory: Deposit a pgtable for DAX PMD faults when required
@ 2017-04-12  5:51     ` Aneesh Kumar K.V
  0 siblings, 0 replies; 65+ messages in thread
From: Aneesh Kumar K.V @ 2017-04-12  5:51 UTC (permalink / raw)
  To: Oliver O'Halloran, linuxppc-dev
  Cc: arbab, bsingharora, linux-nvdimm, linux-mm

Oliver O'Halloran <oohall@gmail.com> writes:

> Although all architectures use a deposited page table for THP on anonymous VMAs
> some architectures (s390 and powerpc) require the deposited storage even for
> file backed VMAs due to quirks of their MMUs. This patch adds support for
> depositing a table in DAX PMD fault handling path for archs that require it.
> Other architectures should see no functional changes.
>
> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> Cc: linux-mm@kvack.org
> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>


> ---
>  mm/huge_memory.c | 20 ++++++++++++++++++--
>  1 file changed, 18 insertions(+), 2 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index aa01dd47cc65..a84909cf20d3 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -715,7 +715,8 @@ int do_huge_pmd_anonymous_page(struct vm_fault *vmf)
>  }
>
>  static void insert_pfn_pmd(struct vm_area_struct *vma, unsigned long addr,
> -		pmd_t *pmd, pfn_t pfn, pgprot_t prot, bool write)
> +		pmd_t *pmd, pfn_t pfn, pgprot_t prot, bool write,
> +		pgtable_t pgtable)
>  {
>  	struct mm_struct *mm = vma->vm_mm;
>  	pmd_t entry;
> @@ -729,6 +730,12 @@ static void insert_pfn_pmd(struct vm_area_struct *vma, unsigned long addr,
>  		entry = pmd_mkyoung(pmd_mkdirty(entry));
>  		entry = maybe_pmd_mkwrite(entry, vma);
>  	}
> +
> +	if (pgtable) {
> +		pgtable_trans_huge_deposit(mm, pmd, pgtable);
> +		atomic_long_inc(&mm->nr_ptes);
> +	}
> +
>  	set_pmd_at(mm, addr, pmd, entry);
>  	update_mmu_cache_pmd(vma, addr, pmd);
>  	spin_unlock(ptl);
> @@ -738,6 +745,7 @@ int vmf_insert_pfn_pmd(struct vm_area_struct *vma, unsigned long addr,
>  			pmd_t *pmd, pfn_t pfn, bool write)
>  {
>  	pgprot_t pgprot = vma->vm_page_prot;
> +	pgtable_t pgtable = NULL;
>  	/*
>  	 * If we had pmd_special, we could avoid all these restrictions,
>  	 * but we need to be consistent with PTEs and architectures that
> @@ -752,9 +760,15 @@ int vmf_insert_pfn_pmd(struct vm_area_struct *vma, unsigned long addr,
>  	if (addr < vma->vm_start || addr >= vma->vm_end)
>  		return VM_FAULT_SIGBUS;
>
> +	if (arch_needs_pgtable_deposit()) {
> +		pgtable = pte_alloc_one(vma->vm_mm, addr);
> +		if (!pgtable)
> +			return VM_FAULT_OOM;
> +	}
> +
>  	track_pfn_insert(vma, &pgprot, pfn);
>
> -	insert_pfn_pmd(vma, addr, pmd, pfn, pgprot, write);
> +	insert_pfn_pmd(vma, addr, pmd, pfn, pgprot, write, pgtable);
>  	return VM_FAULT_NOPAGE;
>  }
>  EXPORT_SYMBOL_GPL(vmf_insert_pfn_pmd);
> @@ -1611,6 +1625,8 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
>  			tlb->fullmm);
>  	tlb_remove_pmd_tlb_entry(tlb, pmd, addr);
>  	if (vma_is_dax(vma)) {
> +		if (arch_needs_pgtable_deposit())
> +			zap_deposited_table(tlb->mm, pmd);
>  		spin_unlock(ptl);
>  		if (is_huge_zero_pmd(orig_pmd))
>  			tlb_remove_page_size(tlb, pmd_page(orig_pmd), HPAGE_PMD_SIZE);
> -- 
> 2.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 2/9] mm/huge_memory: Deposit a pgtable for DAX PMD faults when required
@ 2017-04-12  5:51     ` Aneesh Kumar K.V
  0 siblings, 0 replies; 65+ messages in thread
From: Aneesh Kumar K.V @ 2017-04-12  5:51 UTC (permalink / raw)
  To: Oliver O'Halloran, linuxppc-dev
  Cc: arbab, bsingharora, linux-nvdimm, Oliver O'Halloran, linux-mm

Oliver O'Halloran <oohall@gmail.com> writes:

> Although all architectures use a deposited page table for THP on anonymous VMAs
> some architectures (s390 and powerpc) require the deposited storage even for
> file backed VMAs due to quirks of their MMUs. This patch adds support for
> depositing a table in DAX PMD fault handling path for archs that require it.
> Other architectures should see no functional changes.
>
> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> Cc: linux-mm@kvack.org
> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>


> ---
>  mm/huge_memory.c | 20 ++++++++++++++++++--
>  1 file changed, 18 insertions(+), 2 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index aa01dd47cc65..a84909cf20d3 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -715,7 +715,8 @@ int do_huge_pmd_anonymous_page(struct vm_fault *vmf)
>  }
>
>  static void insert_pfn_pmd(struct vm_area_struct *vma, unsigned long addr,
> -		pmd_t *pmd, pfn_t pfn, pgprot_t prot, bool write)
> +		pmd_t *pmd, pfn_t pfn, pgprot_t prot, bool write,
> +		pgtable_t pgtable)
>  {
>  	struct mm_struct *mm = vma->vm_mm;
>  	pmd_t entry;
> @@ -729,6 +730,12 @@ static void insert_pfn_pmd(struct vm_area_struct *vma, unsigned long addr,
>  		entry = pmd_mkyoung(pmd_mkdirty(entry));
>  		entry = maybe_pmd_mkwrite(entry, vma);
>  	}
> +
> +	if (pgtable) {
> +		pgtable_trans_huge_deposit(mm, pmd, pgtable);
> +		atomic_long_inc(&mm->nr_ptes);
> +	}
> +
>  	set_pmd_at(mm, addr, pmd, entry);
>  	update_mmu_cache_pmd(vma, addr, pmd);
>  	spin_unlock(ptl);
> @@ -738,6 +745,7 @@ int vmf_insert_pfn_pmd(struct vm_area_struct *vma, unsigned long addr,
>  			pmd_t *pmd, pfn_t pfn, bool write)
>  {
>  	pgprot_t pgprot = vma->vm_page_prot;
> +	pgtable_t pgtable = NULL;
>  	/*
>  	 * If we had pmd_special, we could avoid all these restrictions,
>  	 * but we need to be consistent with PTEs and architectures that
> @@ -752,9 +760,15 @@ int vmf_insert_pfn_pmd(struct vm_area_struct *vma, unsigned long addr,
>  	if (addr < vma->vm_start || addr >= vma->vm_end)
>  		return VM_FAULT_SIGBUS;
>
> +	if (arch_needs_pgtable_deposit()) {
> +		pgtable = pte_alloc_one(vma->vm_mm, addr);
> +		if (!pgtable)
> +			return VM_FAULT_OOM;
> +	}
> +
>  	track_pfn_insert(vma, &pgprot, pfn);
>
> -	insert_pfn_pmd(vma, addr, pmd, pfn, pgprot, write);
> +	insert_pfn_pmd(vma, addr, pmd, pfn, pgprot, write, pgtable);
>  	return VM_FAULT_NOPAGE;
>  }
>  EXPORT_SYMBOL_GPL(vmf_insert_pfn_pmd);
> @@ -1611,6 +1625,8 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
>  			tlb->fullmm);
>  	tlb_remove_pmd_tlb_entry(tlb, pmd, addr);
>  	if (vma_is_dax(vma)) {
> +		if (arch_needs_pgtable_deposit())
> +			zap_deposited_table(tlb->mm, pmd);
>  		spin_unlock(ptl);
>  		if (is_huge_zero_pmd(orig_pmd))
>  			tlb_remove_page_size(tlb, pmd_page(orig_pmd), HPAGE_PMD_SIZE);
> -- 
> 2.9.3

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: ZONE_DEVICE and pmem API support for powerpc
  2017-04-11 18:22   ` Dan Williams
@ 2017-04-12  9:14     ` Oliver O'Halloran
  -1 siblings, 0 replies; 65+ messages in thread
From: Oliver O'Halloran @ 2017-04-12  9:14 UTC (permalink / raw)
  To: Dan Williams; +Cc: Balbir Singh, linuxppc-dev, Reza Arbab, linux-nvdimm

On Wed, Apr 12, 2017 at 4:22 AM, Dan Williams <dan.j.williams@intel.com> wrote:
> On Tue, Apr 11, 2017 at 10:42 AM, Oliver O'Halloran <oohall@gmail.com> wrote:
>> Hi all,
>>
>> This series adds support for ZONE_DEVICE and the pmem api on powerpc. Namely,
>> support for altmaps and the various bits and pieces required for DAX PMD faults.
>> The first two patches touch generic mm/ code, but otherwise this is fairly well
>> contained in arch/powerpc.
>>
>> If the nvdimm folks could sanity check this series I'd appreciate it.
>
> Quick feedback: I'm in the process of cleaning up and resubmitting my
> patch set to push the pmem api down into the driver directly.
>
>     https://lwn.net/Articles/713064/

That's been on my radar for a while and I was hoping it would be in
4.12. Moving operations into the driver makes a lot of sense from a
design perspective and it should make supporting some of the
contutto's eccentricities a bit easier.

> I'm also reworking memory hotplug to allow sub-section allocations
> which has collided with Michal Hocko's hotplug reworks. It will be
> good to have some more eyes on that work to understand the cross-arch
> implications.
>
>     https://lkml.org/lkml/2017/3/19/146

I'd been putting off looking at this since I figured it would clash
with the hotplug rework and HMM, but I'll see if I can get it working
on ppc.

>> Series is based on next-20170411, but it should apply elsewhere with minor
>> fixups to arch_{add|remove}_memory due to conflicts with HMM.  For those
>> interested in testing this, there is a driver and matching firmware that carves
>> out some system memory for use as an emulated Con Tutto memory card.
>>
>> Driver: https://github.com/oohal/linux/tree/contutto-next
>> Firmware: https://github.com/oohal/skiboot/tree/fake-contutto
>>
>> Edit core/init.c:686 to control the amount of memory borrowed for the emulated
>> device.  I'm keeping the driver out of tree for a until 4.13 since I plan on
>> reworking the firmware interface anyway and There's at least one showstopper
>> bug.
>
> Is this memory card I/O-cache coherent? I.e. existing dma mapping api
> can hand out mappings to it? Just trying to figure out if this the
> existing pmem-definition of ZONE_DEVICE or a new one.

As far as the rest of the system is concerned Con Tutto memory is
identical to normal system memory. All accesses to the card's memory
is mediated by a memory controller which participates in the memory
coherency protocol. That said, the link between the card and the
system is non-coherent so logic on the FPGA can access memory
incoherently. I'm primarily interested in using the card as a memory
platform so I haven't spent a lot of time thinking about the latter
use case, but a different concept of device memory might be required
there.

Oliver
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: ZONE_DEVICE and pmem API support for powerpc
@ 2017-04-12  9:14     ` Oliver O'Halloran
  0 siblings, 0 replies; 65+ messages in thread
From: Oliver O'Halloran @ 2017-04-12  9:14 UTC (permalink / raw)
  To: Dan Williams; +Cc: linuxppc-dev, Balbir Singh, Reza Arbab, linux-nvdimm

On Wed, Apr 12, 2017 at 4:22 AM, Dan Williams <dan.j.williams@intel.com> wrote:
> On Tue, Apr 11, 2017 at 10:42 AM, Oliver O'Halloran <oohall@gmail.com> wrote:
>> Hi all,
>>
>> This series adds support for ZONE_DEVICE and the pmem api on powerpc. Namely,
>> support for altmaps and the various bits and pieces required for DAX PMD faults.
>> The first two patches touch generic mm/ code, but otherwise this is fairly well
>> contained in arch/powerpc.
>>
>> If the nvdimm folks could sanity check this series I'd appreciate it.
>
> Quick feedback: I'm in the process of cleaning up and resubmitting my
> patch set to push the pmem api down into the driver directly.
>
>     https://lwn.net/Articles/713064/

That's been on my radar for a while and I was hoping it would be in
4.12. Moving operations into the driver makes a lot of sense from a
design perspective and it should make supporting some of the
contutto's eccentricities a bit easier.

> I'm also reworking memory hotplug to allow sub-section allocations
> which has collided with Michal Hocko's hotplug reworks. It will be
> good to have some more eyes on that work to understand the cross-arch
> implications.
>
>     https://lkml.org/lkml/2017/3/19/146

I'd been putting off looking at this since I figured it would clash
with the hotplug rework and HMM, but I'll see if I can get it working
on ppc.

>> Series is based on next-20170411, but it should apply elsewhere with minor
>> fixups to arch_{add|remove}_memory due to conflicts with HMM.  For those
>> interested in testing this, there is a driver and matching firmware that carves
>> out some system memory for use as an emulated Con Tutto memory card.
>>
>> Driver: https://github.com/oohal/linux/tree/contutto-next
>> Firmware: https://github.com/oohal/skiboot/tree/fake-contutto
>>
>> Edit core/init.c:686 to control the amount of memory borrowed for the emulated
>> device.  I'm keeping the driver out of tree for a until 4.13 since I plan on
>> reworking the firmware interface anyway and There's at least one showstopper
>> bug.
>
> Is this memory card I/O-cache coherent? I.e. existing dma mapping api
> can hand out mappings to it? Just trying to figure out if this the
> existing pmem-definition of ZONE_DEVICE or a new one.

As far as the rest of the system is concerned Con Tutto memory is
identical to normal system memory. All accesses to the card's memory
is mediated by a memory controller which participates in the memory
coherency protocol. That said, the link between the card and the
system is non-coherent so logic on the FPGA can access memory
incoherently. I'm primarily interested in using the card as a memory
platform so I haven't spent a lot of time thinking about the latter
use case, but a different concept of device memory might be required
there.

Oliver

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 8/9] powerpc/mm: Wire up hpte_removebolted for powernv
  2017-04-12  1:53     ` Balbir Singh
@ 2017-04-13  4:21       ` Oliver O'Halloran
  -1 siblings, 0 replies; 65+ messages in thread
From: Oliver O'Halloran @ 2017-04-13  4:21 UTC (permalink / raw)
  To: Balbir Singh
  Cc: Rashmica Gupta, linuxppc-dev, Anton Blanchard, Reza Arbab, linux-nvdimm

On Wed, Apr 12, 2017 at 11:53 AM, Balbir Singh <bsingharora@gmail.com> wrote:
> On Wed, 2017-04-12 at 03:42 +1000, Oliver O'Halloran wrote:
>> From: Rashmica Gupta <rashmica.g@gmail.com>
>>
>> Adds support for removing bolted (i.e kernel linear mapping) mappings on
>> powernv. This is needed to support memory hot unplug operations which
>> are required for the teardown of DAX/PMEM devices.
>>
>> Cc: Rashmica Gupta <rashmica.g@gmail.com>
>> Cc: Anton Blanchard <anton@samba.org>
>> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
>> ---
>> Could the original author of this add their S-o-b? I pulled it out of
>> Rashmica's memtrace patch, but I remember someone saying Anton wrote
>> it originally.
>> ---
>>  arch/powerpc/mm/hash_native_64.c | 31 +++++++++++++++++++++++++++++++
>>  1 file changed, 31 insertions(+)
>>
>> diff --git a/arch/powerpc/mm/hash_native_64.c b/arch/powerpc/mm/hash_native_64.c
>> index 65bb8f33b399..9ba91d4905a4 100644
>> --- a/arch/powerpc/mm/hash_native_64.c
>> +++ b/arch/powerpc/mm/hash_native_64.c
>> @@ -407,6 +407,36 @@ static void native_hpte_updateboltedpp(unsigned long newpp, unsigned long ea,
>>       tlbie(vpn, psize, psize, ssize, 0);
>>  }
>>
>> +/*
>> + * Remove a bolted kernel entry. Memory hotplug uses this.
>> + *
>> + * No need to lock here because we should be the only user.
>
> As long as this is after the necessary isolation and is called from
> arch_remove_memory(), I think we should be fine
>
>> + */
>> +static int native_hpte_removebolted(unsigned long ea, int psize, int ssize)
>> +{
>> +     unsigned long vpn;
>> +     unsigned long vsid;
>> +     long slot;
>> +     struct hash_pte *hptep;
>> +
>> +     vsid = get_kernel_vsid(ea, ssize);
>> +     vpn = hpt_vpn(ea, vsid, ssize);
>> +
>> +     slot = native_hpte_find(vpn, psize, ssize);
>> +     if (slot == -1)
>> +             return -ENOENT;
>
> If slot == -1, it means someone else removed the HPTE entry? Are we racing?
> I suspect we should never hit this situation during hotunplug, specifically
> since this is bolted.

Or the slot was never populated in the first place. I'd rather keep
the current behaviour since it aligns with the behaviour of
pSeries_lpar_hpte_removebolted and we might hit these situations in
the future if the sub-section hotplug patches are merged (big if...).

>
>> +
>> +     hptep = htab_address + slot;
>> +
>> +     /* Invalidate the hpte */
>> +     hptep->v = 0;
>
> Under DEBUG or otherwise, I would add more checks like
>
> 1. was hpte_v & HPTE_V_VALID and BOLTED set? If not, we've already invalidated
> that hpte and we can skip the tlbie. Since this was bolted you might be right
> that it is always valid and bolted

A VM_WARN_ON() if the bolted bit is clear might be appropriate. We
don't need to check the valid bit since hpte_native_find() will fail
if it's cleared.

>
>> +
>> +     /* Invalidate the TLB */
>> +     tlbie(vpn, psize, psize, ssize, 0);
>
> The API also does not clear linear_map_hash_slots[] under DEBUG_PAGEALLOC

I'm not sure what API you're referring to here. The tracking for
linear_map_hash_slots[] is agnostic of mmu_hash_ops so we shouldn't be
touching it here. It also looks like DEBUG_PAGEALLOC is a bit broken
with hotplugged memory anyway so I think that's a fix for a different
patch.

>
>> +     return 0;
>> +}
>> +
>> +
>
> Balbir Singh.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 8/9] powerpc/mm: Wire up hpte_removebolted for powernv
@ 2017-04-13  4:21       ` Oliver O'Halloran
  0 siblings, 0 replies; 65+ messages in thread
From: Oliver O'Halloran @ 2017-04-13  4:21 UTC (permalink / raw)
  To: Balbir Singh
  Cc: linuxppc-dev, Reza Arbab, linux-nvdimm, Rashmica Gupta, Anton Blanchard

On Wed, Apr 12, 2017 at 11:53 AM, Balbir Singh <bsingharora@gmail.com> wrote:
> On Wed, 2017-04-12 at 03:42 +1000, Oliver O'Halloran wrote:
>> From: Rashmica Gupta <rashmica.g@gmail.com>
>>
>> Adds support for removing bolted (i.e kernel linear mapping) mappings on
>> powernv. This is needed to support memory hot unplug operations which
>> are required for the teardown of DAX/PMEM devices.
>>
>> Cc: Rashmica Gupta <rashmica.g@gmail.com>
>> Cc: Anton Blanchard <anton@samba.org>
>> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
>> ---
>> Could the original author of this add their S-o-b? I pulled it out of
>> Rashmica's memtrace patch, but I remember someone saying Anton wrote
>> it originally.
>> ---
>>  arch/powerpc/mm/hash_native_64.c | 31 +++++++++++++++++++++++++++++++
>>  1 file changed, 31 insertions(+)
>>
>> diff --git a/arch/powerpc/mm/hash_native_64.c b/arch/powerpc/mm/hash_native_64.c
>> index 65bb8f33b399..9ba91d4905a4 100644
>> --- a/arch/powerpc/mm/hash_native_64.c
>> +++ b/arch/powerpc/mm/hash_native_64.c
>> @@ -407,6 +407,36 @@ static void native_hpte_updateboltedpp(unsigned long newpp, unsigned long ea,
>>       tlbie(vpn, psize, psize, ssize, 0);
>>  }
>>
>> +/*
>> + * Remove a bolted kernel entry. Memory hotplug uses this.
>> + *
>> + * No need to lock here because we should be the only user.
>
> As long as this is after the necessary isolation and is called from
> arch_remove_memory(), I think we should be fine
>
>> + */
>> +static int native_hpte_removebolted(unsigned long ea, int psize, int ssize)
>> +{
>> +     unsigned long vpn;
>> +     unsigned long vsid;
>> +     long slot;
>> +     struct hash_pte *hptep;
>> +
>> +     vsid = get_kernel_vsid(ea, ssize);
>> +     vpn = hpt_vpn(ea, vsid, ssize);
>> +
>> +     slot = native_hpte_find(vpn, psize, ssize);
>> +     if (slot == -1)
>> +             return -ENOENT;
>
> If slot == -1, it means someone else removed the HPTE entry? Are we racing?
> I suspect we should never hit this situation during hotunplug, specifically
> since this is bolted.

Or the slot was never populated in the first place. I'd rather keep
the current behaviour since it aligns with the behaviour of
pSeries_lpar_hpte_removebolted and we might hit these situations in
the future if the sub-section hotplug patches are merged (big if...).

>
>> +
>> +     hptep = htab_address + slot;
>> +
>> +     /* Invalidate the hpte */
>> +     hptep->v = 0;
>
> Under DEBUG or otherwise, I would add more checks like
>
> 1. was hpte_v & HPTE_V_VALID and BOLTED set? If not, we've already invalidated
> that hpte and we can skip the tlbie. Since this was bolted you might be right
> that it is always valid and bolted

A VM_WARN_ON() if the bolted bit is clear might be appropriate. We
don't need to check the valid bit since hpte_native_find() will fail
if it's cleared.

>
>> +
>> +     /* Invalidate the TLB */
>> +     tlbie(vpn, psize, psize, ssize, 0);
>
> The API also does not clear linear_map_hash_slots[] under DEBUG_PAGEALLOC

I'm not sure what API you're referring to here. The tracking for
linear_map_hash_slots[] is agnostic of mmu_hash_ops so we shouldn't be
touching it here. It also looks like DEBUG_PAGEALLOC is a bit broken
with hotplugged memory anyway so I think that's a fix for a different
patch.

>
>> +     return 0;
>> +}
>> +
>> +
>
> Balbir Singh.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 3/9] powerpc/mm: Add _PAGE_DEVMAP for ppc64.
  2017-04-11 17:42   ` Oliver O'Halloran
@ 2017-04-13  5:20     ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 65+ messages in thread
From: Aneesh Kumar K.V @ 2017-04-13  5:20 UTC (permalink / raw)
  To: Oliver O'Halloran, linuxppc-dev; +Cc: bsingharora, arbab, linux-nvdimm

Oliver O'Halloran <oohall@gmail.com> writes:

> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>
> Add a _PAGE_DEVMAP bit for PTE and DAX PMD entires. PowerPC doesn't
> currently support PUD faults so we haven't extended it to the PUD
> level.
>
> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>


Few changes we would need. We will now need to make sure a devmap
pmd entry is not confused with THP. ie,

we should compare against _PAGE_PTE and _PAGE_DEVMAP in
pmd_trans_huge(). hash already has one bit we use to differentiate
between hugetlb and THP. May be we can genarlize this and come up with a
way to differentiate THP, HUGETLB,pmd DEVMAP entries ?


also I don't see you handing get_user_pages_fast() ?


-aneesh

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 3/9] powerpc/mm: Add _PAGE_DEVMAP for ppc64.
@ 2017-04-13  5:20     ` Aneesh Kumar K.V
  0 siblings, 0 replies; 65+ messages in thread
From: Aneesh Kumar K.V @ 2017-04-13  5:20 UTC (permalink / raw)
  To: Oliver O'Halloran, linuxppc-dev
  Cc: arbab, bsingharora, linux-nvdimm, Oliver O'Halloran

Oliver O'Halloran <oohall@gmail.com> writes:

> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>
> Add a _PAGE_DEVMAP bit for PTE and DAX PMD entires. PowerPC doesn't
> currently support PUD faults so we haven't extended it to the PUD
> level.
>
> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>


Few changes we would need. We will now need to make sure a devmap
pmd entry is not confused with THP. ie,

we should compare against _PAGE_PTE and _PAGE_DEVMAP in
pmd_trans_huge(). hash already has one bit we use to differentiate
between hugetlb and THP. May be we can genarlize this and come up with a
way to differentiate THP, HUGETLB,pmd DEVMAP entries ?


also I don't see you handing get_user_pages_fast() ?


-aneesh

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 8/9] powerpc/mm: Wire up hpte_removebolted for powernv
  2017-04-13  4:21       ` Oliver O'Halloran
@ 2017-04-13 10:10         ` Michael Ellerman
  -1 siblings, 0 replies; 65+ messages in thread
From: Michael Ellerman @ 2017-04-13 10:10 UTC (permalink / raw)
  To: Oliver O'Halloran, Balbir Singh
  Cc: Rashmica Gupta, linuxppc-dev, Anton Blanchard, Reza Arbab, linux-nvdimm

Oliver O'Halloran <oohall@gmail.com> writes:
> On Wed, Apr 12, 2017 at 11:53 AM, Balbir Singh <bsingharora@gmail.com> wrote:
>>
>> The API also does not clear linear_map_hash_slots[] under DEBUG_PAGEALLOC
>
> I'm not sure what API you're referring to here. The tracking for
> linear_map_hash_slots[] is agnostic of mmu_hash_ops so we shouldn't be
> touching it here. It also looks like DEBUG_PAGEALLOC is a bit broken
> with hotplugged memory anyway so I think that's a fix for a different
> patch.

It's old and has probably bit-rotted, so I'm happy for it to be fixed up
later if necessary.

cheers
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 8/9] powerpc/mm: Wire up hpte_removebolted for powernv
@ 2017-04-13 10:10         ` Michael Ellerman
  0 siblings, 0 replies; 65+ messages in thread
From: Michael Ellerman @ 2017-04-13 10:10 UTC (permalink / raw)
  To: Oliver O'Halloran, Balbir Singh
  Cc: Rashmica Gupta, linuxppc-dev, Anton Blanchard, Reza Arbab, linux-nvdimm

Oliver O'Halloran <oohall@gmail.com> writes:
> On Wed, Apr 12, 2017 at 11:53 AM, Balbir Singh <bsingharora@gmail.com> wrote:
>>
>> The API also does not clear linear_map_hash_slots[] under DEBUG_PAGEALLOC
>
> I'm not sure what API you're referring to here. The tracking for
> linear_map_hash_slots[] is agnostic of mmu_hash_ops so we shouldn't be
> touching it here. It also looks like DEBUG_PAGEALLOC is a bit broken
> with hotplugged memory anyway so I think that's a fix for a different
> patch.

It's old and has probably bit-rotted, so I'm happy for it to be fixed up
later if necessary.

cheers

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 1/9] mm/huge_memory: Use zap_deposited_table() more
  2017-04-11 17:42   ` Oliver O'Halloran
@ 2017-04-18 21:35     ` David Rientjes
  -1 siblings, 0 replies; 65+ messages in thread
From: David Rientjes @ 2017-04-18 21:35 UTC (permalink / raw)
  To: Oliver O'Halloran
  Cc: linuxppc-dev, arbab, bsingharora, linux-nvdimm, Aneesh Kumar K.V,
	Kirill A. Shutemov, linux-mm

On Wed, 12 Apr 2017, Oliver O'Halloran wrote:

> Depending flags of the PMD being zapped there may or may not be a
> deposited pgtable to be freed. In two of the three cases this is open
> coded while the third uses the zap_deposited_table() helper. This patch
> converts the others to use the helper to clean things up a bit.
> 
> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> Cc: linux-mm@kvack.org
> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>

Acked-by: David Rientjes <rientjes@google.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 1/9] mm/huge_memory: Use zap_deposited_table() more
@ 2017-04-18 21:35     ` David Rientjes
  0 siblings, 0 replies; 65+ messages in thread
From: David Rientjes @ 2017-04-18 21:35 UTC (permalink / raw)
  To: Oliver O'Halloran
  Cc: linuxppc-dev, arbab, bsingharora, linux-nvdimm, Aneesh Kumar K.V,
	Kirill A. Shutemov, linux-mm

On Wed, 12 Apr 2017, Oliver O'Halloran wrote:

> Depending flags of the PMD being zapped there may or may not be a
> deposited pgtable to be freed. In two of the three cases this is open
> coded while the third uses the zap_deposited_table() helper. This patch
> converts the others to use the helper to clean things up a bit.
> 
> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> Cc: linux-mm@kvack.org
> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>

Acked-by: David Rientjes <rientjes@google.com>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [7/9] powerpc/mm: Wire up ioremap_cache
  2017-04-11 17:42   ` Oliver O'Halloran
@ 2017-04-23 11:53     ` Michael Ellerman
  -1 siblings, 0 replies; 65+ messages in thread
From: Michael Ellerman @ 2017-04-23 11:53 UTC (permalink / raw)
  To: Oliver O'Halloran, linuxppc-dev; +Cc: arbab, linux-nvdimm

On Tue, 2017-04-11 at 17:42:31 UTC, Oliver O'Halloran wrote:
> The default implementation of ioremap_cache() is aliased to ioremap().
> On powerpc ioremap() creates cache-inhibited mappings by default which
> is almost certainly not what you wanted.
> 
> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/f855b2f544d664cfa3055edb7ffd20

cheers
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [7/9] powerpc/mm: Wire up ioremap_cache
@ 2017-04-23 11:53     ` Michael Ellerman
  0 siblings, 0 replies; 65+ messages in thread
From: Michael Ellerman @ 2017-04-23 11:53 UTC (permalink / raw)
  To: Oliver O'Halloran, linuxppc-dev
  Cc: Oliver O'Halloran, arbab, linux-nvdimm

On Tue, 2017-04-11 at 17:42:31 UTC, Oliver O'Halloran wrote:
> The default implementation of ioremap_cache() is aliased to ioremap().
> On powerpc ioremap() creates cache-inhibited mappings by default which
> is almost certainly not what you wanted.
> 
> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/f855b2f544d664cfa3055edb7ffd20

cheers

^ permalink raw reply	[flat|nested] 65+ messages in thread

end of thread, other threads:[~2017-04-23 11:53 UTC | newest]

Thread overview: 65+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-11 17:42 ZONE_DEVICE and pmem API support for powerpc Oliver O'Halloran
2017-04-11 17:42 ` Oliver O'Halloran
2017-04-11 17:42 ` [PATCH 1/9] mm/huge_memory: Use zap_deposited_table() more Oliver O'Halloran
2017-04-11 17:42   ` Oliver O'Halloran
2017-04-11 17:42   ` Oliver O'Halloran
2017-04-12  5:44   ` Aneesh Kumar K.V
2017-04-12  5:44     ` Aneesh Kumar K.V
2017-04-18 21:35   ` David Rientjes
2017-04-18 21:35     ` David Rientjes
2017-04-11 17:42 ` [PATCH 2/9] mm/huge_memory: Deposit a pgtable for DAX PMD faults when required Oliver O'Halloran
2017-04-11 17:42   ` Oliver O'Halloran
2017-04-11 17:42   ` Oliver O'Halloran
2017-04-12  5:51   ` Aneesh Kumar K.V
2017-04-12  5:51     ` Aneesh Kumar K.V
2017-04-12  5:51     ` Aneesh Kumar K.V
2017-04-11 17:42 ` [PATCH 3/9] powerpc/mm: Add _PAGE_DEVMAP for ppc64 Oliver O'Halloran
2017-04-11 17:42   ` Oliver O'Halloran
2017-04-12  0:19   ` Stephen Rothwell
2017-04-12  0:19     ` Stephen Rothwell
2017-04-12  3:07     ` Aneesh Kumar K.V
2017-04-12  3:07       ` Aneesh Kumar K.V
2017-04-13  5:20   ` Aneesh Kumar K.V
2017-04-13  5:20     ` Aneesh Kumar K.V
2017-04-11 17:42 ` [PATCH 4/9] powerpc/mm: Reshuffle vmemmap_free() Oliver O'Halloran
2017-04-11 17:42   ` Oliver O'Halloran
2017-04-12  0:33   ` Stephen Rothwell
2017-04-12  0:33     ` Stephen Rothwell
2017-04-11 17:42 ` [PATCH 5/9] powerpc/vmemmap: Add altmap support Oliver O'Halloran
2017-04-11 17:42   ` Oliver O'Halloran
2017-04-12  0:24   ` Balbir Singh
2017-04-12  0:24     ` Balbir Singh
2017-04-11 17:42 ` [PATCH 6/9] powerpc, mm: Enable ZONE_DEVICE on powerpc Oliver O'Halloran
2017-04-11 17:42   ` Oliver O'Halloran
2017-04-12  0:25   ` Balbir Singh
2017-04-12  0:25     ` Balbir Singh
2017-04-12  0:43   ` Stephen Rothwell
2017-04-12  0:43     ` Stephen Rothwell
2017-04-12  2:03     ` Michael Ellerman
2017-04-12  2:03       ` Michael Ellerman
2017-04-11 17:42 ` [PATCH 7/9] powerpc/mm: Wire up ioremap_cache Oliver O'Halloran
2017-04-11 17:42   ` Oliver O'Halloran
2017-04-23 11:53   ` [7/9] " Michael Ellerman
2017-04-23 11:53     ` Michael Ellerman
2017-04-11 17:42 ` [PATCH 8/9] powerpc/mm: Wire up hpte_removebolted for powernv Oliver O'Halloran
2017-04-11 17:42   ` Oliver O'Halloran
     [not found]   ` <20170411174233.21902-9-oohall-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-04-11 22:50     ` Anton Blanchard
2017-04-11 22:50       ` Anton Blanchard
2017-04-12  0:18       ` Stephen Rothwell
2017-04-12  0:18         ` Stephen Rothwell
2017-04-12  3:30         ` Rashmica Gupta
2017-04-12  3:30           ` Rashmica Gupta
2017-04-12  1:53   ` Balbir Singh
2017-04-12  1:53     ` Balbir Singh
2017-04-13  4:21     ` Oliver O'Halloran
2017-04-13  4:21       ` Oliver O'Halloran
2017-04-13 10:10       ` Michael Ellerman
2017-04-13 10:10         ` Michael Ellerman
2017-04-11 17:42 ` [PATCH 9/9] powerpc: Add pmem API support Oliver O'Halloran
2017-04-11 17:42   ` Oliver O'Halloran
2017-04-11 18:22 ` ZONE_DEVICE and pmem API support for powerpc Dan Williams
2017-04-11 18:22   ` Dan Williams
2017-04-12  9:14   ` Oliver O'Halloran
2017-04-12  9:14     ` Oliver O'Halloran
2017-04-12  1:10 ` Stephen Rothwell
2017-04-12  1:10   ` Stephen Rothwell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.