From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f197.google.com (mail-pf0-f197.google.com [209.85.192.197]) by kanga.kvack.org (Postfix) with ESMTP id 3EAF16B025F for ; Thu, 10 Aug 2017 13:09:56 -0400 (EDT) Received: by mail-pf0-f197.google.com with SMTP id o82so12710544pfj.11 for ; Thu, 10 Aug 2017 10:09:56 -0700 (PDT) Received: from foss.arm.com (foss.arm.com. [217.140.101.70]) by mx.google.com with ESMTP id x10si4711477plm.860.2017.08.10.10.09.54 for ; Thu, 10 Aug 2017 10:09:54 -0700 (PDT) From: Punit Agrawal Subject: [PATCH v6 0/9] arm64: Enable contiguous pte hugepage support Date: Thu, 10 Aug 2017 18:08:57 +0100 Message-Id: <20170810170906.30772-1-punit.agrawal@arm.com> Sender: owner-linux-mm@kvack.org List-ID: To: will.deacon@arm.com, catalin.marinas@arm.com Cc: Punit Agrawal , linux-mm@kvack.org, steve.capper@arm.com, linux-arm-kernel@lists.infradead.org, mark.rutland@arm.com Hi, This series re-enables contiguous hugepage support for arm64. In v6, I've addressed all the concerns raised on the previous version. Notable changes in this version - * Patch 4 - "Add break-before-make logic for contiguous entries" - added clear_flush() and use it in set_huge_pte_at() - Updated huge_ptep_clear_flush() to use clear_flush() - Track dirty bit based on returned value from get_clear_flush() in huge_ptep_set_access_flags() - Dropped Mark's reviewed-by (please re-apply if you're still happy with the patch) All the dependent series ([2], [3]) for enabling contiguous hugepage support have been merged in the previous cycle. Additionally, a patch to clarify the semantics of huge_pte_offset() in generic code[6] is currently in the -mm tree. Previous postings can be found at [0], [1], [4], [5], [7]. The patches are based on v4.13-rc4. If there are no further comments please consider merging for the next release cycle. Thanks, Punit [0] https://www.spinics.net/lists/arm-kernel/msg570422.html [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2017-March/497027.html [2] https://www.spinics.net/lists/arm-kernel/msg581657.html [3] https://www.spinics.net/lists/arm-kernel/msg583342.html [4] https://www.spinics.net/lists/arm-kernel/msg583367.html [5] https://www.spinics.net/lists/arm-kernel/msg582758.html [6] https://lkml.org/lkml/2017/7/25/536 [7] https://www.spinics.net/lists/arm-kernel/msg597777.html Punit Agrawal (4): arm64: hugetlb: Handle swap entries in huge_pte_offset() for contiguous hugepages arm64: hugetlb: Override huge_pte_clear() to support contiguous hugepages arm64: hugetlb: Override set_huge_swap_pte_at() to support contiguous hugepages arm64: Re-enable support for contiguous hugepages Steve Capper (5): arm64: hugetlb: set_huge_pte_at Add WARN_ON on !pte_present arm64: hugetlb: Introduce pte_pgprot helper arm64: hugetlb: Spring clean huge pte accessors arm64: hugetlb: Add break-before-make logic for contiguous entries arm64: hugetlb: Cleanup setup_hugepagesz arch/arm64/include/asm/hugetlb.h | 9 +- arch/arm64/mm/hugetlbpage.c | 310 ++++++++++++++++++++++++++++----------- 2 files changed, 236 insertions(+), 83 deletions(-) -- 2.13.2 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f199.google.com (mail-pf0-f199.google.com [209.85.192.199]) by kanga.kvack.org (Postfix) with ESMTP id D92FE6B02B4 for ; Thu, 10 Aug 2017 13:10:06 -0400 (EDT) Received: by mail-pf0-f199.google.com with SMTP id b83so12804767pfl.6 for ; Thu, 10 Aug 2017 10:10:06 -0700 (PDT) Received: from foss.arm.com (foss.arm.com. [217.140.101.70]) by mx.google.com with ESMTP id l21si4244982pgo.614.2017.08.10.10.10.05 for ; Thu, 10 Aug 2017 10:10:05 -0700 (PDT) From: Punit Agrawal Subject: [PATCH v6 1/9] arm64: hugetlb: set_huge_pte_at Add WARN_ON on !pte_present Date: Thu, 10 Aug 2017 18:08:58 +0100 Message-Id: <20170810170906.30772-2-punit.agrawal@arm.com> In-Reply-To: <20170810170906.30772-1-punit.agrawal@arm.com> References: <20170810170906.30772-1-punit.agrawal@arm.com> Sender: owner-linux-mm@kvack.org List-ID: To: will.deacon@arm.com, catalin.marinas@arm.com Cc: Steve Capper , linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, mark.rutland@arm.com, David Woods , Punit Agrawal From: Steve Capper This patch adds a WARN_ON to set_huge_pte_at as the accessor assumes that entries to be written down are all present. (There are separate accessors to clear huge ptes). We will need to handle the !pte_present case where memory offlining is used on hugetlb pages. swap and migration entries will be supplied to set_huge_pte_at in this case. Cc: David Woods Signed-off-by: Steve Capper Signed-off-by: Punit Agrawal --- arch/arm64/mm/hugetlbpage.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index 656e0ece2289..7b61e4833432 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c @@ -67,6 +67,12 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, unsigned long pfn; pgprot_t hugeprot; + /* + * Code needs to be expanded to handle huge swap and migration + * entries. Needed for HUGETLB and MEMORY_FAILURE. + */ + WARN_ON(!pte_present(pte)); + if (!pte_cont(pte)) { set_pte_at(mm, addr, ptep, pte); return; -- 2.13.2 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg0-f71.google.com (mail-pg0-f71.google.com [74.125.83.71]) by kanga.kvack.org (Postfix) with ESMTP id E0DEE6B02F3 for ; Thu, 10 Aug 2017 13:10:21 -0400 (EDT) Received: by mail-pg0-f71.google.com with SMTP id y192so12548661pgd.12 for ; Thu, 10 Aug 2017 10:10:21 -0700 (PDT) Received: from foss.arm.com (foss.arm.com. [217.140.101.70]) by mx.google.com with ESMTP id f13si4122108pgn.594.2017.08.10.10.10.20 for ; Thu, 10 Aug 2017 10:10:20 -0700 (PDT) From: Punit Agrawal Subject: [PATCH v6 2/9] arm64: hugetlb: Introduce pte_pgprot helper Date: Thu, 10 Aug 2017 18:08:59 +0100 Message-Id: <20170810170906.30772-3-punit.agrawal@arm.com> In-Reply-To: <20170810170906.30772-1-punit.agrawal@arm.com> References: <20170810170906.30772-1-punit.agrawal@arm.com> Sender: owner-linux-mm@kvack.org List-ID: To: will.deacon@arm.com, catalin.marinas@arm.com Cc: Steve Capper , linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, mark.rutland@arm.com, David Woods , Punit Agrawal From: Steve Capper Rather than xor pte bits in various places, use this helper function. Cc: David Woods Signed-off-by: Steve Capper Signed-off-by: Punit Agrawal Reviewed-by: Mark Rutland --- arch/arm64/mm/hugetlbpage.c | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index 7b61e4833432..cb84ca33bc6b 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c @@ -41,6 +41,16 @@ int pud_huge(pud_t pud) #endif } +/* + * Select all bits except the pfn + */ +static inline pgprot_t pte_pgprot(pte_t pte) +{ + unsigned long pfn = pte_pfn(pte); + + return __pgprot(pte_val(pfn_pte(pfn, __pgprot(0))) ^ pte_val(pte)); +} + static int find_num_contig(struct mm_struct *mm, unsigned long addr, pte_t *ptep, size_t *pgsize) { @@ -80,7 +90,7 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, ncontig = find_num_contig(mm, addr, ptep, &pgsize); pfn = pte_pfn(pte); - hugeprot = __pgprot(pte_val(pfn_pte(pfn, __pgprot(0))) ^ pte_val(pte)); + hugeprot = pte_pgprot(pte); for (i = 0; i < ncontig; i++) { pr_debug("%s: set pte %p to 0x%llx\n", __func__, ptep, pte_val(pfn_pte(pfn, hugeprot))); @@ -223,9 +233,7 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma, size_t pgsize = 0; unsigned long pfn = pte_pfn(pte); /* Select all bits except the pfn */ - pgprot_t hugeprot = - __pgprot(pte_val(pfn_pte(pfn, __pgprot(0))) ^ - pte_val(pte)); + pgprot_t hugeprot = pte_pgprot(pte); pfn = pte_pfn(pte); ncontig = find_num_contig(vma->vm_mm, addr, ptep, -- 2.13.2 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg0-f70.google.com (mail-pg0-f70.google.com [74.125.83.70]) by kanga.kvack.org (Postfix) with ESMTP id 84A616B02F4 for ; Thu, 10 Aug 2017 13:10:39 -0400 (EDT) Received: by mail-pg0-f70.google.com with SMTP id l2so13037191pgu.2 for ; Thu, 10 Aug 2017 10:10:39 -0700 (PDT) Received: from foss.arm.com (foss.arm.com. [217.140.101.70]) by mx.google.com with ESMTP id l8si4758902pln.261.2017.08.10.10.10.37 for ; Thu, 10 Aug 2017 10:10:38 -0700 (PDT) From: Punit Agrawal Subject: [PATCH v6 3/9] arm64: hugetlb: Spring clean huge pte accessors Date: Thu, 10 Aug 2017 18:09:00 +0100 Message-Id: <20170810170906.30772-4-punit.agrawal@arm.com> In-Reply-To: <20170810170906.30772-1-punit.agrawal@arm.com> References: <20170810170906.30772-1-punit.agrawal@arm.com> Sender: owner-linux-mm@kvack.org List-ID: To: will.deacon@arm.com, catalin.marinas@arm.com Cc: Steve Capper , linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, mark.rutland@arm.com, David Woods , Punit Agrawal From: Steve Capper This patch aims to re-structure the huge pte accessors without affecting their functionality. Control flow is changed to reduce indentation and expanded use is made of post for loop variable modification. It is then much easier to add break-before-make semantics in a subsequent patch. Cc: David Woods Signed-off-by: Steve Capper Signed-off-by: Punit Agrawal Reviewed-by: Mark Rutland --- arch/arm64/mm/hugetlbpage.c | 119 ++++++++++++++++++++------------------------ 1 file changed, 54 insertions(+), 65 deletions(-) diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index cb84ca33bc6b..08deed7c71f0 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c @@ -74,7 +74,7 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, size_t pgsize; int i; int ncontig; - unsigned long pfn; + unsigned long pfn, dpfn; pgprot_t hugeprot; /* @@ -90,14 +90,13 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, ncontig = find_num_contig(mm, addr, ptep, &pgsize); pfn = pte_pfn(pte); + dpfn = pgsize >> PAGE_SHIFT; hugeprot = pte_pgprot(pte); - for (i = 0; i < ncontig; i++) { + + for (i = 0; i < ncontig; i++, ptep++, addr += pgsize, pfn += dpfn) { pr_debug("%s: set pte %p to 0x%llx\n", __func__, ptep, pte_val(pfn_pte(pfn, hugeprot))); set_pte_at(mm, addr, ptep, pfn_pte(pfn, hugeprot)); - ptep++; - pfn += pgsize >> PAGE_SHIFT; - addr += pgsize; } } @@ -195,91 +194,81 @@ pte_t arch_make_huge_pte(pte_t entry, struct vm_area_struct *vma, pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep) { - pte_t pte; - - if (pte_cont(*ptep)) { - int ncontig, i; - size_t pgsize; - bool is_dirty = false; - - ncontig = find_num_contig(mm, addr, ptep, &pgsize); - /* save the 1st pte to return */ - pte = ptep_get_and_clear(mm, addr, ptep); - for (i = 1, addr += pgsize; i < ncontig; ++i, addr += pgsize) { - /* - * If HW_AFDBM is enabled, then the HW could - * turn on the dirty bit for any of the page - * in the set, so check them all. - */ - ++ptep; - if (pte_dirty(ptep_get_and_clear(mm, addr, ptep))) - is_dirty = true; - } - if (is_dirty) - return pte_mkdirty(pte); - else - return pte; - } else { + int ncontig, i; + size_t pgsize; + pte_t orig_pte = huge_ptep_get(ptep); + + if (!pte_cont(orig_pte)) return ptep_get_and_clear(mm, addr, ptep); + + ncontig = find_num_contig(mm, addr, ptep, &pgsize); + for (i = 0; i < ncontig; i++, addr += pgsize, ptep++) { + /* + * If HW_AFDBM is enabled, then the HW could + * turn on the dirty bit for any of the page + * in the set, so check them all. + */ + if (pte_dirty(ptep_get_and_clear(mm, addr, ptep))) + orig_pte = pte_mkdirty(orig_pte); } + + return orig_pte; } int huge_ptep_set_access_flags(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep, pte_t pte, int dirty) { - if (pte_cont(pte)) { - int ncontig, i, changed = 0; - size_t pgsize = 0; - unsigned long pfn = pte_pfn(pte); - /* Select all bits except the pfn */ - pgprot_t hugeprot = pte_pgprot(pte); - - pfn = pte_pfn(pte); - ncontig = find_num_contig(vma->vm_mm, addr, ptep, - &pgsize); - for (i = 0; i < ncontig; ++i, ++ptep, addr += pgsize) { - changed |= ptep_set_access_flags(vma, addr, ptep, - pfn_pte(pfn, - hugeprot), - dirty); - pfn += pgsize >> PAGE_SHIFT; - } - return changed; - } else { + int ncontig, i, changed = 0; + size_t pgsize = 0; + unsigned long pfn = pte_pfn(pte), dpfn; + pgprot_t hugeprot; + + if (!pte_cont(pte)) return ptep_set_access_flags(vma, addr, ptep, pte, dirty); + + ncontig = find_num_contig(vma->vm_mm, addr, ptep, &pgsize); + dpfn = pgsize >> PAGE_SHIFT; + hugeprot = pte_pgprot(pte); + + for (i = 0; i < ncontig; i++, ptep++, addr += pgsize, pfn += dpfn) { + changed |= ptep_set_access_flags(vma, addr, ptep, + pfn_pte(pfn, hugeprot), dirty); } + + return changed; } void huge_ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr, pte_t *ptep) { - if (pte_cont(*ptep)) { - int ncontig, i; - size_t pgsize = 0; + int ncontig, i; + size_t pgsize; - ncontig = find_num_contig(mm, addr, ptep, &pgsize); - for (i = 0; i < ncontig; ++i, ++ptep, addr += pgsize) - ptep_set_wrprotect(mm, addr, ptep); - } else { + if (!pte_cont(*ptep)) { ptep_set_wrprotect(mm, addr, ptep); + return; } + + ncontig = find_num_contig(mm, addr, ptep, &pgsize); + for (i = 0; i < ncontig; i++, ptep++, addr += pgsize) + ptep_set_wrprotect(mm, addr, ptep); } void huge_ptep_clear_flush(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep) { - if (pte_cont(*ptep)) { - int ncontig, i; - size_t pgsize = 0; - - ncontig = find_num_contig(vma->vm_mm, addr, ptep, - &pgsize); - for (i = 0; i < ncontig; ++i, ++ptep, addr += pgsize) - ptep_clear_flush(vma, addr, ptep); - } else { + int ncontig, i; + size_t pgsize; + + if (!pte_cont(*ptep)) { ptep_clear_flush(vma, addr, ptep); + return; } + + ncontig = find_num_contig(vma->vm_mm, addr, ptep, &pgsize); + for (i = 0; i < ncontig; i++, ptep++, addr += pgsize) + ptep_clear_flush(vma, addr, ptep); } static __init int setup_hugepagesz(char *opt) -- 2.13.2 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f200.google.com (mail-pf0-f200.google.com [209.85.192.200]) by kanga.kvack.org (Postfix) with ESMTP id 64E666B02FD for ; Thu, 10 Aug 2017 13:10:55 -0400 (EDT) Received: by mail-pf0-f200.google.com with SMTP id r187so12783568pfr.8 for ; Thu, 10 Aug 2017 10:10:55 -0700 (PDT) Received: from foss.arm.com (foss.arm.com. [217.140.101.70]) by mx.google.com with ESMTP id u4si4440086pfj.154.2017.08.10.10.10.53 for ; Thu, 10 Aug 2017 10:10:54 -0700 (PDT) From: Punit Agrawal Subject: [PATCH v6 4/9] arm64: hugetlb: Add break-before-make logic for contiguous entries Date: Thu, 10 Aug 2017 18:09:01 +0100 Message-Id: <20170810170906.30772-5-punit.agrawal@arm.com> In-Reply-To: <20170810170906.30772-1-punit.agrawal@arm.com> References: <20170810170906.30772-1-punit.agrawal@arm.com> Sender: owner-linux-mm@kvack.org List-ID: To: will.deacon@arm.com, catalin.marinas@arm.com, mark.rutland@arm.com Cc: Steve Capper , linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, David Woods , Punit Agrawal From: Steve Capper It has become apparent that one has to take special care when modifying attributes of memory mappings that employ the contiguous bit. Both the requirement and the architecturally correct "Break-Before-Make" technique of updating contiguous entries can be found described in: ARM DDI 0487A.k_iss10775, "Misprogramming of the Contiguous bit", page D4-1762. The huge pte accessors currently replace the attributes of contiguous pte entries in place thus can, on certain platforms, lead to TLB conflict aborts or even erroneous results returned from TLB lookups. This patch adds two helper functions - * get_clear_flush(.) - clears a contiguous entry and returns the head pte (whilst taking care to retain dirty bit information that could have been modified by DBM). * clear_flush(.) that clears a contiguous entry A tlb invalidate is performed to then ensure that there is no possibility of multiple tlb entries being present for the same region. Cc: David Woods Signed-off-by: Steve Capper (Added helper clear_flush(), updated commit log, and comments cleanup) Signed-off-by: Punit Agrawal --- Hi Mark, I've dropped your reviewed-by tag due to the patch update. I'd appreciate if you could take a look at the new version. Thanks! --- arch/arm64/mm/hugetlbpage.c | 107 +++++++++++++++++++++++++++++++++++--------- 1 file changed, 86 insertions(+), 21 deletions(-) diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index 08deed7c71f0..d3a6713048a2 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c @@ -68,6 +68,62 @@ static int find_num_contig(struct mm_struct *mm, unsigned long addr, return CONT_PTES; } +/* + * Changing some bits of contiguous entries requires us to follow a + * Break-Before-Make approach, breaking the whole contiguous set + * before we can change any entries. See ARM DDI 0487A.k_iss10775, + * "Misprogramming of the Contiguous bit", page D4-1762. + * + * This helper performs the break step. + */ +static pte_t get_clear_flush(struct mm_struct *mm, + unsigned long addr, + pte_t *ptep, + unsigned long pgsize, + unsigned long ncontig) +{ + unsigned long i, saddr = addr; + struct vm_area_struct vma = { .vm_mm = mm }; + pte_t orig_pte = huge_ptep_get(ptep); + + /* + * If we already have a faulting entry then we don't need + * to break before make (there won't be a tlb entry cached). + */ + if (!pte_present(orig_pte)) + return orig_pte; + + for (i = 0; i < ncontig; i++, addr += pgsize, ptep++) { + pte_t pte = ptep_get_and_clear(mm, addr, ptep); + + /* + * If HW_AFDBM is enabled, then the HW could turn on + * the dirty bit for any page in the set, so check + * them all. All hugetlb entries are already young. + */ + if (IS_ENABLED(CONFIG_ARM64_HW_AFDBM) && pte_dirty(pte)) + orig_pte = pte_mkdirty(orig_pte); + } + + flush_tlb_range(&vma, saddr, addr); + return orig_pte; +} + +static void clear_flush(struct mm_struct *mm, + unsigned long addr, + pte_t *ptep, + unsigned long pgsize, + unsigned long ncontig) +{ + unsigned long i, saddr = addr; + struct vm_area_struct vma = { .vm_mm = mm }; + + for (i = 0; i < ncontig; i++, addr += pgsize, ptep++) + pte_clear(mm, addr, ptep); + + flush_tlb_range(&vma, saddr, addr); +} + void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t pte) { @@ -93,6 +149,8 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, dpfn = pgsize >> PAGE_SHIFT; hugeprot = pte_pgprot(pte); + clear_flush(mm, addr, ptep, pgsize, ncontig); + for (i = 0; i < ncontig; i++, ptep++, addr += pgsize, pfn += dpfn) { pr_debug("%s: set pte %p to 0x%llx\n", __func__, ptep, pte_val(pfn_pte(pfn, hugeprot))); @@ -194,7 +252,7 @@ pte_t arch_make_huge_pte(pte_t entry, struct vm_area_struct *vma, pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep) { - int ncontig, i; + int ncontig; size_t pgsize; pte_t orig_pte = huge_ptep_get(ptep); @@ -202,17 +260,8 @@ pte_t huge_ptep_get_and_clear(struct mm_struct *mm, return ptep_get_and_clear(mm, addr, ptep); ncontig = find_num_contig(mm, addr, ptep, &pgsize); - for (i = 0; i < ncontig; i++, addr += pgsize, ptep++) { - /* - * If HW_AFDBM is enabled, then the HW could - * turn on the dirty bit for any of the page - * in the set, so check them all. - */ - if (pte_dirty(ptep_get_and_clear(mm, addr, ptep))) - orig_pte = pte_mkdirty(orig_pte); - } - return orig_pte; + return get_clear_flush(mm, addr, ptep, pgsize, ncontig); } int huge_ptep_set_access_flags(struct vm_area_struct *vma, @@ -222,6 +271,7 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma, int ncontig, i, changed = 0; size_t pgsize = 0; unsigned long pfn = pte_pfn(pte), dpfn; + pte_t orig_pte; pgprot_t hugeprot; if (!pte_cont(pte)) @@ -229,12 +279,18 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma, ncontig = find_num_contig(vma->vm_mm, addr, ptep, &pgsize); dpfn = pgsize >> PAGE_SHIFT; - hugeprot = pte_pgprot(pte); - for (i = 0; i < ncontig; i++, ptep++, addr += pgsize, pfn += dpfn) { - changed |= ptep_set_access_flags(vma, addr, ptep, - pfn_pte(pfn, hugeprot), dirty); - } + orig_pte = get_clear_flush(vma->vm_mm, addr, ptep, pgsize, ncontig); + if (!pte_same(orig_pte, pte)) + changed = 1; + + /* Make sure we don't lose the dirty state */ + if (pte_dirty(orig_pte)) + pte = pte_mkdirty(pte); + + hugeprot = pte_pgprot(pte); + for (i = 0; i < ncontig; i++, ptep++, addr += pgsize, pfn += dpfn) + set_pte_at(vma->vm_mm, addr, ptep, pfn_pte(pfn, hugeprot)); return changed; } @@ -244,6 +300,9 @@ void huge_ptep_set_wrprotect(struct mm_struct *mm, { int ncontig, i; size_t pgsize; + pte_t pte = pte_wrprotect(huge_ptep_get(ptep)), orig_pte; + unsigned long pfn = pte_pfn(pte), dpfn; + pgprot_t hugeprot; if (!pte_cont(*ptep)) { ptep_set_wrprotect(mm, addr, ptep); @@ -251,14 +310,21 @@ void huge_ptep_set_wrprotect(struct mm_struct *mm, } ncontig = find_num_contig(mm, addr, ptep, &pgsize); - for (i = 0; i < ncontig; i++, ptep++, addr += pgsize) - ptep_set_wrprotect(mm, addr, ptep); + dpfn = pgsize >> PAGE_SHIFT; + + orig_pte = get_clear_flush(mm, addr, ptep, pgsize, ncontig); + if (pte_dirty(orig_pte)) + pte = pte_mkdirty(pte); + + hugeprot = pte_pgprot(pte); + for (i = 0; i < ncontig; i++, ptep++, addr += pgsize, pfn += dpfn) + set_pte_at(mm, addr, ptep, pfn_pte(pfn, hugeprot)); } void huge_ptep_clear_flush(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep) { - int ncontig, i; + int ncontig; size_t pgsize; if (!pte_cont(*ptep)) { @@ -267,8 +333,7 @@ void huge_ptep_clear_flush(struct vm_area_struct *vma, } ncontig = find_num_contig(vma->vm_mm, addr, ptep, &pgsize); - for (i = 0; i < ncontig; i++, ptep++, addr += pgsize) - ptep_clear_flush(vma, addr, ptep); + clear_flush(vma->vm_mm, addr, ptep, pgsize, ncontig); } static __init int setup_hugepagesz(char *opt) -- 2.13.2 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f198.google.com (mail-pf0-f198.google.com [209.85.192.198]) by kanga.kvack.org (Postfix) with ESMTP id B7CF46B0313 for ; Thu, 10 Aug 2017 13:11:01 -0400 (EDT) Received: by mail-pf0-f198.google.com with SMTP id j83so12755189pfe.10 for ; Thu, 10 Aug 2017 10:11:01 -0700 (PDT) Received: from foss.arm.com (foss.arm.com. [217.140.101.70]) by mx.google.com with ESMTP id k16si4800867pli.369.2017.08.10.10.11.00 for ; Thu, 10 Aug 2017 10:11:00 -0700 (PDT) From: Punit Agrawal Subject: [PATCH v6 5/9] arm64: hugetlb: Handle swap entries in huge_pte_offset() for contiguous hugepages Date: Thu, 10 Aug 2017 18:09:02 +0100 Message-Id: <20170810170906.30772-6-punit.agrawal@arm.com> In-Reply-To: <20170810170906.30772-1-punit.agrawal@arm.com> References: <20170810170906.30772-1-punit.agrawal@arm.com> Sender: owner-linux-mm@kvack.org List-ID: To: will.deacon@arm.com, catalin.marinas@arm.com Cc: Punit Agrawal , linux-mm@kvack.org, steve.capper@arm.com, linux-arm-kernel@lists.infradead.org, mark.rutland@arm.com, David Woods huge_pte_offset() was updated to correctly handle swap entries for hugepages. With the addition of the size parameter, it is now possible to disambiguate whether the request is for a regular hugepage or a contiguous hugepage. Fix huge_pte_offset() for contiguous hugepages by using the size to find the correct page table entry. Signed-off-by: Punit Agrawal Cc: David Woods --- arch/arm64/mm/hugetlbpage.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index d3a6713048a2..09e79785c019 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c @@ -210,6 +210,7 @@ pte_t *huge_pte_offset(struct mm_struct *mm, pgd_t *pgd; pud_t *pud; pmd_t *pmd; + pte_t *pte; pgd = pgd_offset(mm, addr); pr_debug("%s: addr:0x%lx pgd:%p\n", __func__, addr, pgd); @@ -217,19 +218,29 @@ pte_t *huge_pte_offset(struct mm_struct *mm, return NULL; pud = pud_offset(pgd, addr); - if (pud_none(*pud)) + if (pud_none(*pud) && sz != PUD_SIZE) return NULL; /* swap or huge page */ if (!pud_present(*pud) || pud_huge(*pud)) return (pte_t *)pud; /* table; check the next level */ + if (sz == CONT_PMD_SIZE) + addr &= CONT_PMD_MASK; + pmd = pmd_offset(pud, addr); - if (pmd_none(*pmd)) + if (pmd_none(*pmd) && + !(sz == PMD_SIZE || sz == CONT_PMD_SIZE)) return NULL; if (!pmd_present(*pmd) || pmd_huge(*pmd)) return (pte_t *)pmd; + if (sz == CONT_PTE_SIZE) { + pte = pte_offset_kernel( + pmd, (addr & CONT_PTE_MASK)); + return pte; + } + return NULL; } -- 2.13.2 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg0-f72.google.com (mail-pg0-f72.google.com [74.125.83.72]) by kanga.kvack.org (Postfix) with ESMTP id 43D276B037C for ; Thu, 10 Aug 2017 13:11:06 -0400 (EDT) Received: by mail-pg0-f72.google.com with SMTP id y190so13069202pgb.3 for ; Thu, 10 Aug 2017 10:11:06 -0700 (PDT) Received: from foss.arm.com (foss.arm.com. [217.140.101.70]) by mx.google.com with ESMTP id z12si4221810pgc.584.2017.08.10.10.11.04 for ; Thu, 10 Aug 2017 10:11:05 -0700 (PDT) From: Punit Agrawal Subject: [PATCH v6 6/9] arm64: hugetlb: Override huge_pte_clear() to support contiguous hugepages Date: Thu, 10 Aug 2017 18:09:03 +0100 Message-Id: <20170810170906.30772-7-punit.agrawal@arm.com> In-Reply-To: <20170810170906.30772-1-punit.agrawal@arm.com> References: <20170810170906.30772-1-punit.agrawal@arm.com> Sender: owner-linux-mm@kvack.org List-ID: To: will.deacon@arm.com, catalin.marinas@arm.com Cc: Punit Agrawal , linux-mm@kvack.org, steve.capper@arm.com, linux-arm-kernel@lists.infradead.org, mark.rutland@arm.com, David Woods The default huge_pte_clear() implementation does not clear contiguous page table entries when it encounters contiguous hugepages that are supported on arm64. Fix this by overriding the default implementation to clear all the entries associated with contiguous hugepages. Signed-off-by: Punit Agrawal Cc: David Woods --- arch/arm64/include/asm/hugetlb.h | 6 +++++- arch/arm64/mm/hugetlbpage.c | 36 ++++++++++++++++++++++++++++++++++++ 2 files changed, 41 insertions(+), 1 deletion(-) diff --git a/arch/arm64/include/asm/hugetlb.h b/arch/arm64/include/asm/hugetlb.h index 793bd73b0d07..df8c0aea0917 100644 --- a/arch/arm64/include/asm/hugetlb.h +++ b/arch/arm64/include/asm/hugetlb.h @@ -18,7 +18,6 @@ #ifndef __ASM_HUGETLB_H #define __ASM_HUGETLB_H -#include #include static inline pte_t huge_ptep_get(pte_t *ptep) @@ -82,6 +81,11 @@ extern void huge_ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr, pte_t *ptep); extern void huge_ptep_clear_flush(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep); +extern void huge_pte_clear(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, unsigned long sz); +#define huge_pte_clear huge_pte_clear + +#include #ifdef CONFIG_ARCH_HAS_GIGANTIC_PAGE static inline bool gigantic_page_supported(void) { return true; } diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index 09e79785c019..b69430a04e87 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c @@ -68,6 +68,30 @@ static int find_num_contig(struct mm_struct *mm, unsigned long addr, return CONT_PTES; } +static inline int num_contig_ptes(unsigned long size, size_t *pgsize) +{ + int contig_ptes = 0; + + *pgsize = size; + + switch (size) { + case PUD_SIZE: + case PMD_SIZE: + contig_ptes = 1; + break; + case CONT_PMD_SIZE: + *pgsize = PMD_SIZE; + contig_ptes = CONT_PMDS; + break; + case CONT_PTE_SIZE: + *pgsize = PAGE_SIZE; + contig_ptes = CONT_PTES; + break; + } + + return contig_ptes; +} + /* * Changing some bits of contiguous entries requires us to follow a * Break-Before-Make approach, breaking the whole contiguous set @@ -260,6 +284,18 @@ pte_t arch_make_huge_pte(pte_t entry, struct vm_area_struct *vma, return entry; } +void huge_pte_clear(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, unsigned long sz) +{ + int i, ncontig; + size_t pgsize; + + ncontig = num_contig_ptes(sz, &pgsize); + + for (i = 0; i < ncontig; i++, addr += pgsize, ptep++) + pte_clear(mm, addr, ptep); +} + pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep) { -- 2.13.2 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg0-f70.google.com (mail-pg0-f70.google.com [74.125.83.70]) by kanga.kvack.org (Postfix) with ESMTP id 25C486B0387 for ; Thu, 10 Aug 2017 13:11:22 -0400 (EDT) Received: by mail-pg0-f70.google.com with SMTP id u199so12660408pgb.13 for ; Thu, 10 Aug 2017 10:11:22 -0700 (PDT) Received: from foss.arm.com (foss.arm.com. [217.140.101.70]) by mx.google.com with ESMTP id m125si4398564pfb.306.2017.08.10.10.11.20 for ; Thu, 10 Aug 2017 10:11:20 -0700 (PDT) From: Punit Agrawal Subject: [PATCH v6 7/9] arm64: hugetlb: Override set_huge_swap_pte_at() to support contiguous hugepages Date: Thu, 10 Aug 2017 18:09:04 +0100 Message-Id: <20170810170906.30772-8-punit.agrawal@arm.com> In-Reply-To: <20170810170906.30772-1-punit.agrawal@arm.com> References: <20170810170906.30772-1-punit.agrawal@arm.com> Sender: owner-linux-mm@kvack.org List-ID: To: will.deacon@arm.com, catalin.marinas@arm.com Cc: Punit Agrawal , linux-mm@kvack.org, steve.capper@arm.com, linux-arm-kernel@lists.infradead.org, mark.rutland@arm.com, David Woods The default implementation of set_huge_swap_pte_at() does not support hugepages consisting of contiguous ptes. Override it to add support for contiguous hugepages. Signed-off-by: Punit Agrawal Cc: David Woods --- arch/arm64/include/asm/hugetlb.h | 3 +++ arch/arm64/mm/hugetlbpage.c | 12 ++++++++++++ 2 files changed, 15 insertions(+) diff --git a/arch/arm64/include/asm/hugetlb.h b/arch/arm64/include/asm/hugetlb.h index df8c0aea0917..1dca41bea16a 100644 --- a/arch/arm64/include/asm/hugetlb.h +++ b/arch/arm64/include/asm/hugetlb.h @@ -84,6 +84,9 @@ extern void huge_ptep_clear_flush(struct vm_area_struct *vma, extern void huge_pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep, unsigned long sz); #define huge_pte_clear huge_pte_clear +extern void set_huge_swap_pte_at(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, pte_t pte, unsigned long sz); +#define set_huge_swap_pte_at set_huge_swap_pte_at #include diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index b69430a04e87..f6b2ef23285d 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c @@ -182,6 +182,18 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, } } +void set_huge_swap_pte_at(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, pte_t pte, unsigned long sz) +{ + int i, ncontig; + size_t pgsize; + + ncontig = num_contig_ptes(sz, &pgsize); + + for (i = 0; i < ncontig; i++, ptep++) + set_pte(ptep, pte); +} + pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr, unsigned long sz) { -- 2.13.2 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f198.google.com (mail-pf0-f198.google.com [209.85.192.198]) by kanga.kvack.org (Postfix) with ESMTP id D39246B039F for ; Thu, 10 Aug 2017 13:11:26 -0400 (EDT) Received: by mail-pf0-f198.google.com with SMTP id 24so13006508pfk.5 for ; Thu, 10 Aug 2017 10:11:26 -0700 (PDT) Received: from foss.arm.com (foss.arm.com. [217.140.101.70]) by mx.google.com with ESMTP id b35si4719939plh.358.2017.08.10.10.11.24 for ; Thu, 10 Aug 2017 10:11:24 -0700 (PDT) From: Punit Agrawal Subject: [PATCH v6 8/9] arm64: Re-enable support for contiguous hugepages Date: Thu, 10 Aug 2017 18:09:05 +0100 Message-Id: <20170810170906.30772-9-punit.agrawal@arm.com> In-Reply-To: <20170810170906.30772-1-punit.agrawal@arm.com> References: <20170810170906.30772-1-punit.agrawal@arm.com> Sender: owner-linux-mm@kvack.org List-ID: To: will.deacon@arm.com, catalin.marinas@arm.com Cc: Punit Agrawal , linux-mm@kvack.org, steve.capper@arm.com, linux-arm-kernel@lists.infradead.org, mark.rutland@arm.com also known as - Revert "Revert "Revert "arm64: hugetlb: partial revert of 66b3923a1a0f""" Now that our hugetlb implementation is compliant with the break-before-make requirements of the architecture and we have addressed some of the issues in core code required for properly dealing with hardware poisoning of contiguous hugepages let's re-enable support for contiguous hugepages. This reverts commit 6ae979ab39a368c18ceb0424bf824d172d6ab56f. Signed-off-by: Punit Agrawal --- arch/arm64/mm/hugetlbpage.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index f6b2ef23285d..273f126072e4 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c @@ -403,6 +403,10 @@ static __init int setup_hugepagesz(char *opt) hugetlb_add_hstate(PMD_SHIFT - PAGE_SHIFT); } else if (ps == PUD_SIZE) { hugetlb_add_hstate(PUD_SHIFT - PAGE_SHIFT); + } else if (ps == (PAGE_SIZE * CONT_PTES)) { + hugetlb_add_hstate(CONT_PTE_SHIFT); + } else if (ps == (PMD_SIZE * CONT_PMDS)) { + hugetlb_add_hstate((PMD_SHIFT + CONT_PMD_SHIFT) - PAGE_SHIFT); } else { hugetlb_bad_size(); pr_err("hugepagesz: Unsupported page size %lu K\n", ps >> 10); @@ -411,3 +415,13 @@ static __init int setup_hugepagesz(char *opt) return 1; } __setup("hugepagesz=", setup_hugepagesz); + +#ifdef CONFIG_ARM64_64K_PAGES +static __init int add_default_hugepagesz(void) +{ + if (size_to_hstate(CONT_PTES * PAGE_SIZE) == NULL) + hugetlb_add_hstate(CONT_PTE_SHIFT); + return 0; +} +arch_initcall(add_default_hugepagesz); +#endif -- 2.13.2 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg0-f71.google.com (mail-pg0-f71.google.com [74.125.83.71]) by kanga.kvack.org (Postfix) with ESMTP id BC6EB6B03A1 for ; Thu, 10 Aug 2017 13:11:29 -0400 (EDT) Received: by mail-pg0-f71.google.com with SMTP id v77so12581840pgb.15 for ; Thu, 10 Aug 2017 10:11:29 -0700 (PDT) Received: from foss.arm.com (foss.arm.com. [217.140.101.70]) by mx.google.com with ESMTP id r20si4379342pfl.571.2017.08.10.10.11.28 for ; Thu, 10 Aug 2017 10:11:28 -0700 (PDT) From: Punit Agrawal Subject: [PATCH v6 9/9] arm64: hugetlb: Cleanup setup_hugepagesz Date: Thu, 10 Aug 2017 18:09:06 +0100 Message-Id: <20170810170906.30772-10-punit.agrawal@arm.com> In-Reply-To: <20170810170906.30772-1-punit.agrawal@arm.com> References: <20170810170906.30772-1-punit.agrawal@arm.com> Sender: owner-linux-mm@kvack.org List-ID: To: will.deacon@arm.com, catalin.marinas@arm.com Cc: Steve Capper , linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, mark.rutland@arm.com, David Woods , stable@vger.kernel.org, Punit Agrawal From: Steve Capper Replace a lot of if statements with switch and case labels to make it much clearer which huge page sizes are supported. Also, we prevent PUD_SIZE from being used on systems not running with 4KB PAGE_SIZE. Before if one supplied PUD_SIZE in these circumstances, then unusuable huge page sizes would be in use. Fixes: 084bd29810a5 ("ARM64: mm: HugeTLB support.") Cc: David Woods Cc: Signed-off-by: Steve Capper Signed-off-by: Punit Agrawal Reviewed-by: Mark Rutland --- arch/arm64/mm/hugetlbpage.c | 26 +++++++++++++------------- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index 273f126072e4..2c9913121bfd 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c @@ -399,20 +399,20 @@ static __init int setup_hugepagesz(char *opt) { unsigned long ps = memparse(opt, &opt); - if (ps == PMD_SIZE) { - hugetlb_add_hstate(PMD_SHIFT - PAGE_SHIFT); - } else if (ps == PUD_SIZE) { - hugetlb_add_hstate(PUD_SHIFT - PAGE_SHIFT); - } else if (ps == (PAGE_SIZE * CONT_PTES)) { - hugetlb_add_hstate(CONT_PTE_SHIFT); - } else if (ps == (PMD_SIZE * CONT_PMDS)) { - hugetlb_add_hstate((PMD_SHIFT + CONT_PMD_SHIFT) - PAGE_SHIFT); - } else { - hugetlb_bad_size(); - pr_err("hugepagesz: Unsupported page size %lu K\n", ps >> 10); - return 0; + switch (ps) { +#ifdef CONFIG_ARM64_4K_PAGES + case PUD_SIZE: +#endif + case PMD_SIZE * CONT_PMDS: + case PMD_SIZE: + case PAGE_SIZE * CONT_PTES: + hugetlb_add_hstate(ilog2(ps) - PAGE_SHIFT); + return 1; } - return 1; + + hugetlb_bad_size(); + pr_err("hugepagesz: Unsupported page size %lu K\n", ps >> 10); + return 0; } __setup("hugepagesz=", setup_hugepagesz); -- 2.13.2 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg0-f71.google.com (mail-pg0-f71.google.com [74.125.83.71]) by kanga.kvack.org (Postfix) with ESMTP id 6B0A36B025F for ; Thu, 17 Aug 2017 14:03:19 -0400 (EDT) Received: by mail-pg0-f71.google.com with SMTP id x189so126693555pgb.11 for ; Thu, 17 Aug 2017 11:03:19 -0700 (PDT) Received: from foss.arm.com (foss.arm.com. [217.140.101.70]) by mx.google.com with ESMTP id c63si2398932pfe.580.2017.08.17.11.03.17 for ; Thu, 17 Aug 2017 11:03:17 -0700 (PDT) Date: Thu, 17 Aug 2017 19:03:11 +0100 From: Catalin Marinas Subject: Re: [PATCH v6 4/9] arm64: hugetlb: Add break-before-make logic for contiguous entries Message-ID: <20170817180311.uwrz64g3bkwfdkrn@armageddon.cambridge.arm.com> References: <20170810170906.30772-1-punit.agrawal@arm.com> <20170810170906.30772-5-punit.agrawal@arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170810170906.30772-5-punit.agrawal@arm.com> Sender: owner-linux-mm@kvack.org List-ID: To: Punit Agrawal Cc: will.deacon@arm.com, mark.rutland@arm.com, linux-mm@kvack.org, David Woods , linux-arm-kernel@lists.infradead.org, Steve Capper On Thu, Aug 10, 2017 at 06:09:01PM +0100, Punit Agrawal wrote: > --- a/arch/arm64/mm/hugetlbpage.c > +++ b/arch/arm64/mm/hugetlbpage.c > @@ -68,6 +68,62 @@ static int find_num_contig(struct mm_struct *mm, unsigned long addr, > return CONT_PTES; > } > > +/* > + * Changing some bits of contiguous entries requires us to follow a > + * Break-Before-Make approach, breaking the whole contiguous set > + * before we can change any entries. See ARM DDI 0487A.k_iss10775, > + * "Misprogramming of the Contiguous bit", page D4-1762. > + * > + * This helper performs the break step. > + */ > +static pte_t get_clear_flush(struct mm_struct *mm, > + unsigned long addr, > + pte_t *ptep, > + unsigned long pgsize, > + unsigned long ncontig) > +{ > + unsigned long i, saddr = addr; > + struct vm_area_struct vma = { .vm_mm = mm }; > + pte_t orig_pte = huge_ptep_get(ptep); > + > + /* > + * If we already have a faulting entry then we don't need > + * to break before make (there won't be a tlb entry cached). > + */ > + if (!pte_present(orig_pte)) > + return orig_pte; I first thought we could relax this check to pte_valid() as we don't care about the PROT_NONE case for hardware page table updates. However, I realised that we call this where we expect the pte to be entirely cleared but we simply skip it if !present (e.g. swap entry). Is this correct? > + > + for (i = 0; i < ncontig; i++, addr += pgsize, ptep++) { > + pte_t pte = ptep_get_and_clear(mm, addr, ptep); > + > + /* > + * If HW_AFDBM is enabled, then the HW could turn on > + * the dirty bit for any page in the set, so check > + * them all. All hugetlb entries are already young. > + */ > + if (IS_ENABLED(CONFIG_ARM64_HW_AFDBM) && pte_dirty(pte)) > + orig_pte = pte_mkdirty(orig_pte); > + } > + > + flush_tlb_range(&vma, saddr, addr); > + return orig_pte; > +} It would be better if you do something like bool valid = pte_valid(org_pte); ... if (valid) flush_tlb_range(...); > + > +static void clear_flush(struct mm_struct *mm, > + unsigned long addr, > + pte_t *ptep, > + unsigned long pgsize, > + unsigned long ncontig) > +{ > + unsigned long i, saddr = addr; > + struct vm_area_struct vma = { .vm_mm = mm }; > + > + for (i = 0; i < ncontig; i++, addr += pgsize, ptep++) > + pte_clear(mm, addr, ptep); > + > + flush_tlb_range(&vma, saddr, addr); > +} > + > void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, > pte_t *ptep, pte_t pte) > { > @@ -93,6 +149,8 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, > dpfn = pgsize >> PAGE_SHIFT; > hugeprot = pte_pgprot(pte); > > + clear_flush(mm, addr, ptep, pgsize, ncontig); > + > for (i = 0; i < ncontig; i++, ptep++, addr += pgsize, pfn += dpfn) { > pr_debug("%s: set pte %p to 0x%llx\n", __func__, ptep, > pte_val(pfn_pte(pfn, hugeprot))); > @@ -194,7 +252,7 @@ pte_t arch_make_huge_pte(pte_t entry, struct vm_area_struct *vma, > pte_t huge_ptep_get_and_clear(struct mm_struct *mm, > unsigned long addr, pte_t *ptep) > { > - int ncontig, i; > + int ncontig; > size_t pgsize; > pte_t orig_pte = huge_ptep_get(ptep); > > @@ -202,17 +260,8 @@ pte_t huge_ptep_get_and_clear(struct mm_struct *mm, > return ptep_get_and_clear(mm, addr, ptep); > > ncontig = find_num_contig(mm, addr, ptep, &pgsize); > - for (i = 0; i < ncontig; i++, addr += pgsize, ptep++) { > - /* > - * If HW_AFDBM is enabled, then the HW could > - * turn on the dirty bit for any of the page > - * in the set, so check them all. > - */ > - if (pte_dirty(ptep_get_and_clear(mm, addr, ptep))) > - orig_pte = pte_mkdirty(orig_pte); > - } > > - return orig_pte; > + return get_clear_flush(mm, addr, ptep, pgsize, ncontig); > } E.g. here you don't always clear the pte if a swap entry. > > int huge_ptep_set_access_flags(struct vm_area_struct *vma, > @@ -222,6 +271,7 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma, > int ncontig, i, changed = 0; > size_t pgsize = 0; > unsigned long pfn = pte_pfn(pte), dpfn; > + pte_t orig_pte; > pgprot_t hugeprot; > > if (!pte_cont(pte)) > @@ -229,12 +279,18 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma, > > ncontig = find_num_contig(vma->vm_mm, addr, ptep, &pgsize); > dpfn = pgsize >> PAGE_SHIFT; > - hugeprot = pte_pgprot(pte); > > - for (i = 0; i < ncontig; i++, ptep++, addr += pgsize, pfn += dpfn) { > - changed |= ptep_set_access_flags(vma, addr, ptep, > - pfn_pte(pfn, hugeprot), dirty); > - } > + orig_pte = get_clear_flush(vma->vm_mm, addr, ptep, pgsize, ncontig); > + if (!pte_same(orig_pte, pte)) > + changed = 1; > + > + /* Make sure we don't lose the dirty state */ > + if (pte_dirty(orig_pte)) > + pte = pte_mkdirty(pte); > + > + hugeprot = pte_pgprot(pte); > + for (i = 0; i < ncontig; i++, ptep++, addr += pgsize, pfn += dpfn) > + set_pte_at(vma->vm_mm, addr, ptep, pfn_pte(pfn, hugeprot)); > > return changed; > } > @@ -244,6 +300,9 @@ void huge_ptep_set_wrprotect(struct mm_struct *mm, > { > int ncontig, i; > size_t pgsize; > + pte_t pte = pte_wrprotect(huge_ptep_get(ptep)), orig_pte; I'm not particularly fond of too many function calls in the variable initialisation part. I would rather keep pte_wrprotect further down where you also make it "dirty". > + unsigned long pfn = pte_pfn(pte), dpfn; > + pgprot_t hugeprot; > > if (!pte_cont(*ptep)) { > ptep_set_wrprotect(mm, addr, ptep); > @@ -251,14 +310,21 @@ void huge_ptep_set_wrprotect(struct mm_struct *mm, > } > > ncontig = find_num_contig(mm, addr, ptep, &pgsize); > - for (i = 0; i < ncontig; i++, ptep++, addr += pgsize) > - ptep_set_wrprotect(mm, addr, ptep); > + dpfn = pgsize >> PAGE_SHIFT; > + > + orig_pte = get_clear_flush(mm, addr, ptep, pgsize, ncontig); Can you not use just set pte here instead of deriving it from *ptep early on? pte = get_clear_flush(mm, addr, ptep, pgsize, ncontig); pte = pte_wrprotect(pte); > + if (pte_dirty(orig_pte)) > + pte = pte_mkdirty(pte); > + > + hugeprot = pte_pgprot(pte); > + for (i = 0; i < ncontig; i++, ptep++, addr += pgsize, pfn += dpfn) > + set_pte_at(mm, addr, ptep, pfn_pte(pfn, hugeprot)); > } -- Catalin -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg0-f69.google.com (mail-pg0-f69.google.com [74.125.83.69]) by kanga.kvack.org (Postfix) with ESMTP id 222296B025F for ; Fri, 18 Aug 2017 06:30:30 -0400 (EDT) Received: by mail-pg0-f69.google.com with SMTP id d12so100037616pgt.8 for ; Fri, 18 Aug 2017 03:30:30 -0700 (PDT) Received: from foss.arm.com (usa-sjc-mx-foss1.foss.arm.com. [217.140.101.70]) by mx.google.com with ESMTP id g33si3629819plb.812.2017.08.18.03.30.26 for ; Fri, 18 Aug 2017 03:30:26 -0700 (PDT) From: Punit Agrawal Subject: Re: [PATCH v6 4/9] arm64: hugetlb: Add break-before-make logic for contiguous entries References: <20170810170906.30772-1-punit.agrawal@arm.com> <20170810170906.30772-5-punit.agrawal@arm.com> <20170817180311.uwrz64g3bkwfdkrn@armageddon.cambridge.arm.com> Date: Fri, 18 Aug 2017 11:30:22 +0100 In-Reply-To: <20170817180311.uwrz64g3bkwfdkrn@armageddon.cambridge.arm.com> (Catalin Marinas's message of "Thu, 17 Aug 2017 19:03:11 +0100") Message-ID: <87shgpnp6p.fsf@e105922-lin.cambridge.arm.com> MIME-Version: 1.0 Content-Type: text/plain Sender: owner-linux-mm@kvack.org List-ID: To: Catalin Marinas Cc: will.deacon@arm.com, mark.rutland@arm.com, linux-mm@kvack.org, David Woods , linux-arm-kernel@lists.infradead.org, Steve Capper Catalin Marinas writes: > On Thu, Aug 10, 2017 at 06:09:01PM +0100, Punit Agrawal wrote: >> --- a/arch/arm64/mm/hugetlbpage.c >> +++ b/arch/arm64/mm/hugetlbpage.c >> @@ -68,6 +68,62 @@ static int find_num_contig(struct mm_struct *mm, unsigned long addr, >> return CONT_PTES; >> } >> >> +/* >> + * Changing some bits of contiguous entries requires us to follow a >> + * Break-Before-Make approach, breaking the whole contiguous set >> + * before we can change any entries. See ARM DDI 0487A.k_iss10775, >> + * "Misprogramming of the Contiguous bit", page D4-1762. >> + * >> + * This helper performs the break step. >> + */ >> +static pte_t get_clear_flush(struct mm_struct *mm, >> + unsigned long addr, >> + pte_t *ptep, >> + unsigned long pgsize, >> + unsigned long ncontig) >> +{ >> + unsigned long i, saddr = addr; >> + struct vm_area_struct vma = { .vm_mm = mm }; >> + pte_t orig_pte = huge_ptep_get(ptep); >> + >> + /* >> + * If we already have a faulting entry then we don't need >> + * to break before make (there won't be a tlb entry cached). >> + */ >> + if (!pte_present(orig_pte)) >> + return orig_pte; > > I first thought we could relax this check to pte_valid() as we don't > care about the PROT_NONE case for hardware page table updates. However, > I realised that we call this where we expect the pte to be entirely > cleared but we simply skip it if !present (e.g. swap entry). Is this > correct? I've checked back and come to the conclusion that get_clear_flush() will not get called with swap entries. In the case of huge_ptep_get_and_clear() below, the callers (__unmap_hugepage_range() and hugetlb_change_protection()) check for swap entries before calling. Similarly I'll relax the check to pte_valid(). > >> + >> + for (i = 0; i < ncontig; i++, addr += pgsize, ptep++) { >> + pte_t pte = ptep_get_and_clear(mm, addr, ptep); >> + >> + /* >> + * If HW_AFDBM is enabled, then the HW could turn on >> + * the dirty bit for any page in the set, so check >> + * them all. All hugetlb entries are already young. >> + */ >> + if (IS_ENABLED(CONFIG_ARM64_HW_AFDBM) && pte_dirty(pte)) >> + orig_pte = pte_mkdirty(orig_pte); >> + } >> + >> + flush_tlb_range(&vma, saddr, addr); >> + return orig_pte; >> +} > > It would be better if you do something like > > bool valid = pte_valid(org_pte); > ... > if (valid) > flush_tlb_range(...); With the above change of pte_present() to pte_valid() this isn't needed anymore. > >> + >> +static void clear_flush(struct mm_struct *mm, >> + unsigned long addr, >> + pte_t *ptep, >> + unsigned long pgsize, >> + unsigned long ncontig) >> +{ >> + unsigned long i, saddr = addr; >> + struct vm_area_struct vma = { .vm_mm = mm }; >> + >> + for (i = 0; i < ncontig; i++, addr += pgsize, ptep++) >> + pte_clear(mm, addr, ptep); >> + >> + flush_tlb_range(&vma, saddr, addr); >> +} >> + >> void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, >> pte_t *ptep, pte_t pte) >> { >> @@ -93,6 +149,8 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, >> dpfn = pgsize >> PAGE_SHIFT; >> hugeprot = pte_pgprot(pte); >> >> + clear_flush(mm, addr, ptep, pgsize, ncontig); >> + >> for (i = 0; i < ncontig; i++, ptep++, addr += pgsize, pfn += dpfn) { >> pr_debug("%s: set pte %p to 0x%llx\n", __func__, ptep, >> pte_val(pfn_pte(pfn, hugeprot))); >> @@ -194,7 +252,7 @@ pte_t arch_make_huge_pte(pte_t entry, struct vm_area_struct *vma, >> pte_t huge_ptep_get_and_clear(struct mm_struct *mm, >> unsigned long addr, pte_t *ptep) >> { >> - int ncontig, i; >> + int ncontig; >> size_t pgsize; >> pte_t orig_pte = huge_ptep_get(ptep); >> >> @@ -202,17 +260,8 @@ pte_t huge_ptep_get_and_clear(struct mm_struct *mm, >> return ptep_get_and_clear(mm, addr, ptep); >> >> ncontig = find_num_contig(mm, addr, ptep, &pgsize); >> - for (i = 0; i < ncontig; i++, addr += pgsize, ptep++) { >> - /* >> - * If HW_AFDBM is enabled, then the HW could >> - * turn on the dirty bit for any of the page >> - * in the set, so check them all. >> - */ >> - if (pte_dirty(ptep_get_and_clear(mm, addr, ptep))) >> - orig_pte = pte_mkdirty(orig_pte); >> - } >> >> - return orig_pte; >> + return get_clear_flush(mm, addr, ptep, pgsize, ncontig); >> } > > E.g. here you don't always clear the pte if a swap entry. > >> >> int huge_ptep_set_access_flags(struct vm_area_struct *vma, >> @@ -222,6 +271,7 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma, >> int ncontig, i, changed = 0; >> size_t pgsize = 0; >> unsigned long pfn = pte_pfn(pte), dpfn; >> + pte_t orig_pte; >> pgprot_t hugeprot; >> >> if (!pte_cont(pte)) >> @@ -229,12 +279,18 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma, >> >> ncontig = find_num_contig(vma->vm_mm, addr, ptep, &pgsize); >> dpfn = pgsize >> PAGE_SHIFT; >> - hugeprot = pte_pgprot(pte); >> >> - for (i = 0; i < ncontig; i++, ptep++, addr += pgsize, pfn += dpfn) { >> - changed |= ptep_set_access_flags(vma, addr, ptep, >> - pfn_pte(pfn, hugeprot), dirty); >> - } >> + orig_pte = get_clear_flush(vma->vm_mm, addr, ptep, pgsize, ncontig); >> + if (!pte_same(orig_pte, pte)) >> + changed = 1; >> + >> + /* Make sure we don't lose the dirty state */ >> + if (pte_dirty(orig_pte)) >> + pte = pte_mkdirty(pte); >> + >> + hugeprot = pte_pgprot(pte); >> + for (i = 0; i < ncontig; i++, ptep++, addr += pgsize, pfn += dpfn) >> + set_pte_at(vma->vm_mm, addr, ptep, pfn_pte(pfn, hugeprot)); >> >> return changed; >> } >> @@ -244,6 +300,9 @@ void huge_ptep_set_wrprotect(struct mm_struct *mm, >> { >> int ncontig, i; >> size_t pgsize; >> + pte_t pte = pte_wrprotect(huge_ptep_get(ptep)), orig_pte; > > I'm not particularly fond of too many function calls in the variable > initialisation part. I would rather keep pte_wrprotect further down > where you also make it "dirty". > >> + unsigned long pfn = pte_pfn(pte), dpfn; >> + pgprot_t hugeprot; >> >> if (!pte_cont(*ptep)) { >> ptep_set_wrprotect(mm, addr, ptep); >> @@ -251,14 +310,21 @@ void huge_ptep_set_wrprotect(struct mm_struct *mm, >> } >> >> ncontig = find_num_contig(mm, addr, ptep, &pgsize); >> - for (i = 0; i < ncontig; i++, ptep++, addr += pgsize) >> - ptep_set_wrprotect(mm, addr, ptep); >> + dpfn = pgsize >> PAGE_SHIFT; >> + >> + orig_pte = get_clear_flush(mm, addr, ptep, pgsize, ncontig); > > Can you not use just set pte here instead of deriving it from *ptep > early on? > > pte = get_clear_flush(mm, addr, ptep, pgsize, ncontig); > pte = pte_wrprotect(pte); > I've simplified this locally now. I'll run through a few tests and post a new version. Thanks for review. Punit >> + if (pte_dirty(orig_pte)) >> + pte = pte_mkdirty(pte); >> + >> + hugeprot = pte_pgprot(pte); >> + for (i = 0; i < ncontig; i++, ptep++, addr += pgsize, pfn += dpfn) >> + set_pte_at(mm, addr, ptep, pfn_pte(pfn, hugeprot)); >> } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg0-f72.google.com (mail-pg0-f72.google.com [74.125.83.72]) by kanga.kvack.org (Postfix) with ESMTP id 74DCC6B02C3 for ; Fri, 18 Aug 2017 06:43:34 -0400 (EDT) Received: by mail-pg0-f72.google.com with SMTP id w187so164346469pgb.10 for ; Fri, 18 Aug 2017 03:43:34 -0700 (PDT) Received: from foss.arm.com (usa-sjc-mx-foss1.foss.arm.com. [217.140.101.70]) by mx.google.com with ESMTP id b96si1676573pli.366.2017.08.18.03.43.33 for ; Fri, 18 Aug 2017 03:43:33 -0700 (PDT) Date: Fri, 18 Aug 2017 11:43:28 +0100 From: Catalin Marinas Subject: Re: [PATCH v6 4/9] arm64: hugetlb: Add break-before-make logic for contiguous entries Message-ID: <20170818104327.a5yep2p3ntjbffug@armageddon.cambridge.arm.com> References: <20170810170906.30772-1-punit.agrawal@arm.com> <20170810170906.30772-5-punit.agrawal@arm.com> <20170817180311.uwrz64g3bkwfdkrn@armageddon.cambridge.arm.com> <87shgpnp6p.fsf@e105922-lin.cambridge.arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87shgpnp6p.fsf@e105922-lin.cambridge.arm.com> Sender: owner-linux-mm@kvack.org List-ID: To: Punit Agrawal Cc: mark.rutland@arm.com, David Woods , Steve Capper , will.deacon@arm.com, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org On Fri, Aug 18, 2017 at 11:30:22AM +0100, Punit Agrawal wrote: > Catalin Marinas writes: > > > On Thu, Aug 10, 2017 at 06:09:01PM +0100, Punit Agrawal wrote: > >> --- a/arch/arm64/mm/hugetlbpage.c > >> +++ b/arch/arm64/mm/hugetlbpage.c > >> @@ -68,6 +68,62 @@ static int find_num_contig(struct mm_struct *mm, unsigned long addr, > >> return CONT_PTES; > >> } > >> > >> +/* > >> + * Changing some bits of contiguous entries requires us to follow a > >> + * Break-Before-Make approach, breaking the whole contiguous set > >> + * before we can change any entries. See ARM DDI 0487A.k_iss10775, > >> + * "Misprogramming of the Contiguous bit", page D4-1762. > >> + * > >> + * This helper performs the break step. > >> + */ > >> +static pte_t get_clear_flush(struct mm_struct *mm, > >> + unsigned long addr, > >> + pte_t *ptep, > >> + unsigned long pgsize, > >> + unsigned long ncontig) > >> +{ > >> + unsigned long i, saddr = addr; > >> + struct vm_area_struct vma = { .vm_mm = mm }; > >> + pte_t orig_pte = huge_ptep_get(ptep); > >> + > >> + /* > >> + * If we already have a faulting entry then we don't need > >> + * to break before make (there won't be a tlb entry cached). > >> + */ > >> + if (!pte_present(orig_pte)) > >> + return orig_pte; > > > > I first thought we could relax this check to pte_valid() as we don't > > care about the PROT_NONE case for hardware page table updates. However, > > I realised that we call this where we expect the pte to be entirely > > cleared but we simply skip it if !present (e.g. swap entry). Is this > > correct? > > I've checked back and come to the conclusion that get_clear_flush() will > not get called with swap entries. > > In the case of huge_ptep_get_and_clear() below, the callers > (__unmap_hugepage_range() and hugetlb_change_protection()) check for > swap entries before calling. Similarly > > I'll relax the check to pte_valid(). Thanks for checking but I would still keep the semantics of the generic huge_ptep_get_and_clear() where the entry is always zeroed. It shouldn't have any performance impact since this function won't be called for swap entries, but just in case anyone changes the core code later on. -- Catalin -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg0-f70.google.com (mail-pg0-f70.google.com [74.125.83.70]) by kanga.kvack.org (Postfix) with ESMTP id 51EFF6B025F for ; Fri, 18 Aug 2017 07:20:23 -0400 (EDT) Received: by mail-pg0-f70.google.com with SMTP id 123so165036560pga.5 for ; Fri, 18 Aug 2017 04:20:23 -0700 (PDT) Received: from foss.arm.com (usa-sjc-mx-foss1.foss.arm.com. [217.140.101.70]) by mx.google.com with ESMTP id b14si3694806pll.905.2017.08.18.04.20.21 for ; Fri, 18 Aug 2017 04:20:22 -0700 (PDT) Date: Fri, 18 Aug 2017 12:20:16 +0100 From: Catalin Marinas Subject: Re: [PATCH v6 5/9] arm64: hugetlb: Handle swap entries in huge_pte_offset() for contiguous hugepages Message-ID: <20170818112015.2cvkb7y3gkozz5ip@armageddon.cambridge.arm.com> References: <20170810170906.30772-1-punit.agrawal@arm.com> <20170810170906.30772-6-punit.agrawal@arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170810170906.30772-6-punit.agrawal@arm.com> Sender: owner-linux-mm@kvack.org List-ID: To: Punit Agrawal Cc: will.deacon@arm.com, mark.rutland@arm.com, David Woods , steve.capper@arm.com, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org On Thu, Aug 10, 2017 at 06:09:02PM +0100, Punit Agrawal wrote: > diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c > index d3a6713048a2..09e79785c019 100644 > --- a/arch/arm64/mm/hugetlbpage.c > +++ b/arch/arm64/mm/hugetlbpage.c > @@ -210,6 +210,7 @@ pte_t *huge_pte_offset(struct mm_struct *mm, > pgd_t *pgd; > pud_t *pud; > pmd_t *pmd; > + pte_t *pte; > > pgd = pgd_offset(mm, addr); > pr_debug("%s: addr:0x%lx pgd:%p\n", __func__, addr, pgd); > @@ -217,19 +218,29 @@ pte_t *huge_pte_offset(struct mm_struct *mm, > return NULL; > > pud = pud_offset(pgd, addr); > - if (pud_none(*pud)) > + if (pud_none(*pud) && sz != PUD_SIZE) > return NULL; > /* swap or huge page */ > if (!pud_present(*pud) || pud_huge(*pud)) > return (pte_t *)pud; > /* table; check the next level */ So if sz == PUD_SIZE and we have pud_none(*pud) == true, it returns the pud. Isn't this different from what you proposed for the generic huge_pte_offset()? [1] > > + if (sz == CONT_PMD_SIZE) > + addr &= CONT_PMD_MASK; > + > pmd = pmd_offset(pud, addr); > - if (pmd_none(*pmd)) > + if (pmd_none(*pmd) && > + !(sz == PMD_SIZE || sz == CONT_PMD_SIZE)) > return NULL; Again, if sz == PMD_SIZE, you no longer return NULL. The generic proposal in [1] looks like: if (pmd_none(*pmd)) return NULL; and that's even when sz == PMD_SIZE. Anyway, I think we need to push for [1] again to be accepted before we go ahead with these changes. [1] http://lkml.kernel.org/r/20170725154114.24131-2-punit.agrawal@arm.com -- Catalin -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg0-f70.google.com (mail-pg0-f70.google.com [74.125.83.70]) by kanga.kvack.org (Postfix) with ESMTP id 85DE06B02C3 for ; Fri, 18 Aug 2017 08:49:01 -0400 (EDT) Received: by mail-pg0-f70.google.com with SMTP id y129so170482781pgy.1 for ; Fri, 18 Aug 2017 05:49:01 -0700 (PDT) Received: from foss.arm.com (foss.arm.com. [217.140.101.70]) by mx.google.com with ESMTP id g13si3520591pgu.50.2017.08.18.05.48.59 for ; Fri, 18 Aug 2017 05:49:00 -0700 (PDT) From: Punit Agrawal Subject: Re: [PATCH v6 4/9] arm64: hugetlb: Add break-before-make logic for contiguous entries References: <20170810170906.30772-1-punit.agrawal@arm.com> <20170810170906.30772-5-punit.agrawal@arm.com> <20170817180311.uwrz64g3bkwfdkrn@armageddon.cambridge.arm.com> <87shgpnp6p.fsf@e105922-lin.cambridge.arm.com> <20170818104327.a5yep2p3ntjbffug@armageddon.cambridge.arm.com> Date: Fri, 18 Aug 2017 13:48:56 +0100 In-Reply-To: <20170818104327.a5yep2p3ntjbffug@armageddon.cambridge.arm.com> (Catalin Marinas's message of "Fri, 18 Aug 2017 11:43:28 +0100") Message-ID: <87o9rdnirr.fsf@e105922-lin.cambridge.arm.com> MIME-Version: 1.0 Content-Type: text/plain Sender: owner-linux-mm@kvack.org List-ID: To: Catalin Marinas Cc: mark.rutland@arm.com, David Woods , Steve Capper , will.deacon@arm.com, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org Catalin Marinas writes: > On Fri, Aug 18, 2017 at 11:30:22AM +0100, Punit Agrawal wrote: >> Catalin Marinas writes: >> >> > On Thu, Aug 10, 2017 at 06:09:01PM +0100, Punit Agrawal wrote: >> >> --- a/arch/arm64/mm/hugetlbpage.c >> >> +++ b/arch/arm64/mm/hugetlbpage.c >> >> @@ -68,6 +68,62 @@ static int find_num_contig(struct mm_struct *mm, unsigned long addr, >> >> return CONT_PTES; >> >> } >> >> >> >> +/* >> >> + * Changing some bits of contiguous entries requires us to follow a >> >> + * Break-Before-Make approach, breaking the whole contiguous set >> >> + * before we can change any entries. See ARM DDI 0487A.k_iss10775, >> >> + * "Misprogramming of the Contiguous bit", page D4-1762. >> >> + * >> >> + * This helper performs the break step. >> >> + */ >> >> +static pte_t get_clear_flush(struct mm_struct *mm, >> >> + unsigned long addr, >> >> + pte_t *ptep, >> >> + unsigned long pgsize, >> >> + unsigned long ncontig) >> >> +{ >> >> + unsigned long i, saddr = addr; >> >> + struct vm_area_struct vma = { .vm_mm = mm }; >> >> + pte_t orig_pte = huge_ptep_get(ptep); >> >> + >> >> + /* >> >> + * If we already have a faulting entry then we don't need >> >> + * to break before make (there won't be a tlb entry cached). >> >> + */ >> >> + if (!pte_present(orig_pte)) >> >> + return orig_pte; >> > >> > I first thought we could relax this check to pte_valid() as we don't >> > care about the PROT_NONE case for hardware page table updates. However, >> > I realised that we call this where we expect the pte to be entirely >> > cleared but we simply skip it if !present (e.g. swap entry). Is this >> > correct? >> >> I've checked back and come to the conclusion that get_clear_flush() will >> not get called with swap entries. >> >> In the case of huge_ptep_get_and_clear() below, the callers >> (__unmap_hugepage_range() and hugetlb_change_protection()) check for >> swap entries before calling. Similarly >> >> I'll relax the check to pte_valid(). > > Thanks for checking but I would still keep the semantics of the generic > huge_ptep_get_and_clear() where the entry is always zeroed. It shouldn't > have any performance impact since this function won't be called for swap > entries, but just in case anyone changes the core code later on. Makes sense. I'll drop the check and unconditionally clear the entries. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg0-f69.google.com (mail-pg0-f69.google.com [74.125.83.69]) by kanga.kvack.org (Postfix) with ESMTP id 377646B025F for ; Fri, 18 Aug 2017 09:49:46 -0400 (EDT) Received: by mail-pg0-f69.google.com with SMTP id r133so170329773pgr.6 for ; Fri, 18 Aug 2017 06:49:46 -0700 (PDT) Received: from foss.arm.com (usa-sjc-mx-foss1.foss.arm.com. [217.140.101.70]) by mx.google.com with ESMTP id g3si4020131plk.281.2017.08.18.06.49.44 for ; Fri, 18 Aug 2017 06:49:45 -0700 (PDT) From: Punit Agrawal Subject: Re: [PATCH v6 5/9] arm64: hugetlb: Handle swap entries in huge_pte_offset() for contiguous hugepages References: <20170810170906.30772-1-punit.agrawal@arm.com> <20170810170906.30772-6-punit.agrawal@arm.com> <20170818112015.2cvkb7y3gkozz5ip@armageddon.cambridge.arm.com> Date: Fri, 18 Aug 2017 14:49:41 +0100 In-Reply-To: <20170818112015.2cvkb7y3gkozz5ip@armageddon.cambridge.arm.com> (Catalin Marinas's message of "Fri, 18 Aug 2017 12:20:16 +0100") Message-ID: <87inhlnfyi.fsf@e105922-lin.cambridge.arm.com> MIME-Version: 1.0 Content-Type: text/plain Sender: owner-linux-mm@kvack.org List-ID: To: Catalin Marinas Cc: will.deacon@arm.com, mark.rutland@arm.com, David Woods , steve.capper@arm.com, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org Catalin Marinas writes: > On Thu, Aug 10, 2017 at 06:09:02PM +0100, Punit Agrawal wrote: >> diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c >> index d3a6713048a2..09e79785c019 100644 >> --- a/arch/arm64/mm/hugetlbpage.c >> +++ b/arch/arm64/mm/hugetlbpage.c >> @@ -210,6 +210,7 @@ pte_t *huge_pte_offset(struct mm_struct *mm, >> pgd_t *pgd; >> pud_t *pud; >> pmd_t *pmd; >> + pte_t *pte; >> >> pgd = pgd_offset(mm, addr); >> pr_debug("%s: addr:0x%lx pgd:%p\n", __func__, addr, pgd); >> @@ -217,19 +218,29 @@ pte_t *huge_pte_offset(struct mm_struct *mm, >> return NULL; >> >> pud = pud_offset(pgd, addr); >> - if (pud_none(*pud)) >> + if (pud_none(*pud) && sz != PUD_SIZE) >> return NULL; >> /* swap or huge page */ >> if (!pud_present(*pud) || pud_huge(*pud)) >> return (pte_t *)pud; >> /* table; check the next level */ > > So if sz == PUD_SIZE and we have pud_none(*pud) == true, it returns the > pud. Isn't this different from what you proposed for the generic > huge_pte_offset()? [1] I think I missed this case in the generic version. As hugetlb_fault() deals with p*d_none() entries by calling hugetlb_no_page(), the thinking was that returning the p*d saves us an extra round trip by avoiding the call to huge_pte_alloc(). > >> >> + if (sz == CONT_PMD_SIZE) >> + addr &= CONT_PMD_MASK; >> + >> pmd = pmd_offset(pud, addr); >> - if (pmd_none(*pmd)) >> + if (pmd_none(*pmd) && >> + !(sz == PMD_SIZE || sz == CONT_PMD_SIZE)) >> return NULL; > > Again, if sz == PMD_SIZE, you no longer return NULL. The generic > proposal in [1] looks like: > > if (pmd_none(*pmd)) > return NULL; > > and that's even when sz == PMD_SIZE. > > Anyway, I think we need to push for [1] again to be accepted before we > go ahead with these changes. [1] is already queued in Andrew's tree. I'll send an update - hopefully it can be picked up for the next merge. > > [1] http://lkml.kernel.org/r/20170725154114.24131-2-punit.agrawal@arm.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org