linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Catalin Marinas <catalin.marinas@arm.com>
To: Zhenyu Ye <yezhenyu2@huawei.com>
Cc: will@kernel.org, suzuki.poulose@arm.com, maz@kernel.org,
	steven.price@arm.com, guohanjun@huawei.com, olof@lixom.net,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org,
	linux-mm@kvack.org, arm@kernel.org, xiexiangyou@huawei.com,
	prime.zeng@hisilicon.com, zhangshaokun@hisilicon.com,
	kuhn.chenqun@huawei.com
Subject: Re: [RFC PATCH v3 2/2] arm64: tlb: Use the TLBI RANGE feature in arm64
Date: Thu, 14 May 2020 16:28:40 +0100	[thread overview]
Message-ID: <20200514152840.GC1907@gaia> (raw)
In-Reply-To: <20200414112835.1121-3-yezhenyu2@huawei.com>

Hi Zhenyu,

On Tue, Apr 14, 2020 at 07:28:35PM +0800, Zhenyu Ye wrote:
> diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
> index b76df828e6b7..3a1816770bd1 100644
> --- a/arch/arm64/include/asm/tlb.h
> +++ b/arch/arm64/include/asm/tlb.h
> @@ -38,7 +38,12 @@ static inline void tlb_flush(struct mmu_gather *tlb)
>  		return;
>  	}
>  
> -	__flush_tlb_range(&vma, tlb->start, tlb->end, stride, last_level);
> +	if (cpus_have_const_cap(ARM64_HAS_TLBI_RANGE))
> +		__flush_tlb_range_directly(&vma, tlb->start, tlb->end,
> +					   stride, last_level);
> +	else
> +		__flush_tlb_range(&vma, tlb->start, tlb->end,
> +				  stride, last_level);

I think you could move such check in __flush_tlb_range() and avoid
cpus_have_const_cap() in two places. More on this below.

> diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
> index bc3949064725..a482188ea563 100644
> --- a/arch/arm64/include/asm/tlbflush.h
> +++ b/arch/arm64/include/asm/tlbflush.h
> @@ -59,6 +59,44 @@
>  		__ta;						\
>  	})
>  
> +/*
> + * This macro creates a properly formatted VA operand for the TLBI RANGE.
> + * The value bit assignments are:
> + *
> + * +----------+------+-------+-------+-------+----------------------+
> + * |   ASID   |  TG  | SCALE |  NUM  |  TTL  |        BADDR         |
> + * +-----------------+-------+-------+-------+----------------------+
> + * |63      48|47  46|45   44|43   39|38   37|36                   0|
> + *
> + * The address range is determined by below formula:
> + * [BADDR, BADDR + (NUM + 1) * 2^(5*SCALE + 1) * PAGESIZE)
> + *
> + */
> +#define __TLBI_VADDR_RANGE(addr, asid, tg, scale, num, ttl)	\
> +	({							\
> +		unsigned long __ta = (addr) >> PAGE_SHIFT;	\
> +		__ta &= GENMASK_ULL(36, 0);			\
> +		__ta |= (unsigned long)(ttl) << 37;		\
> +		__ta |= (unsigned long)(num) << 39;		\
> +		__ta |= (unsigned long)(scale) << 44;		\
> +		__ta |= (unsigned long)(tg) << 46;		\
> +		__ta |= (unsigned long)(asid) << 48;		\
> +		__ta;						\
> +	})
> +
> +#define TLB_RANGE_MASK_SHIFT 5
> +#define TLB_RANGE_MASK GENMASK_ULL(TLB_RANGE_MASK_SHIFT - 1, 0)
> +
> +/*
> + * __TG defines translation granule of the system, which is defined by
> + * PAGE_SHIFT.  Used by TTL.
> + *  - 4KB	: 1
> + *  - 16KB	: 2
> + *  - 64KB	: 3
> + */
> +#define __TG	((PAGE_SHIFT - 12) / 2 + 1)

I don't think we need __TLBI_VADDR_RANGE to take a tg argument since
it's always the same.

> +
> +
>  /*
>   *	TLB Invalidation
>   *	================
> @@ -171,12 +209,83 @@ static inline void flush_tlb_page(struct vm_area_struct *vma,
>  	dsb(ish);
>  }
>  
> +/* The maximum range size of one TLBI-RANGE instruction */
> +#define MAX_TLBI_RANGE_SIZE	(1UL << 21)

Nitpick: call this MAX_TLBI_RANGE_PAGES as that's not an address range.

It may be useful to have a macro for the range here, something like:

#define __TLBI_PAGES(num, scale)	((num + 1) << (5 * scale + 1))

and define MAX_TLBI_RANGE_PAGES in terms of this macro as
__TLBI_PAGES(31, 3).

> +
> +/*
> + * This interface uses the *rvale1is* instruction to flush TLBs
> + * in [start, end) directly.
> + * This instruction is supported from ARM v8.4.
> + */
> +static inline void __flush_tlb_range_directly(struct vm_area_struct *vma,
> +				unsigned long start, unsigned long end,
> +				unsigned long stride, bool last_level)
> +{
> +	int num = 0;
> +	int scale = 0;
> +	unsigned long asid = ASID(vma->vm_mm);
> +	unsigned long addr = 0;
> +	unsigned long range_size;
> +
> +	start = round_down(start, stride);
> +	end = round_up(end, stride);
> +	range_size = (end - start) >> PAGE_SHIFT;
> +
> +	if (range_size > MAX_TLBI_RANGE_SIZE) {
> +		flush_tlb_mm(vma->vm_mm);
> +		return;
> +	}
> +
> +	dsb(ishst);
> +
> +	/*
> +	 * The minimum size of TLB RANGE is 2 PAGE;
> +	 * Use normal TLB instruction to handle odd PAGEs

Nitpick: no need to capitalise PAGE.

> +	 */
> +	if (range_size % 2 == 1) {
> +		addr = __TLBI_VADDR(start, asid);
> +		if (last_level) {
> +			__tlbi(vale1is, addr);
> +			__tlbi_user(vale1is, addr);
> +		} else {
> +			__tlbi(vae1is, addr);
> +			__tlbi_user(vae1is, addr);
> +		}
> +		start += 1 << PAGE_SHIFT;
> +		range_size -= 1;
> +	}
> +
> +	range_size >>= 1;
> +	while (range_size > 0) {
> +		num = (range_size & TLB_RANGE_MASK) - 1;
> +		if (num >= 0) {
> +			addr = __TLBI_VADDR_RANGE(start, asid, __TG,
> +						  scale, num, 0);
> +			if (last_level) {
> +				__tlbi(rvale1is, addr);
> +				__tlbi_user(rvale1is, addr);
> +			} else {
> +				__tlbi(rvae1is, addr);
> +				__tlbi_user(rvae1is, addr);
> +			}
> +			start += (num + 1) << (5 * scale + 1) << PAGE_SHIFT;

You could use the __TLBI_PAGES macro I proposed above.

> +		}
> +		scale++;
> +		range_size >>= TLB_RANGE_MASK_SHIFT;
> +	}

So, you start from scale 0 and increment it until you reach the maximum.
I think (haven't done the maths on paper) you could also start from the
top with something like scale = ilog2(range_size) / 5. Not sure it's
significantly better though, maybe avoiding the loop 3 times if your
range is 2MB (which happens with huge pages).

Anyway, I think it would be more efficient if we combine the
__flush_tlb_range() and the _directly one into the same function with a
single loop for both. For example, if the stride is 2MB already, we can
handle this with a single classic TLBI without all the calculations for
the range operation. The hardware may also handle this better since the
software already told it there can be only one entry in that 2MB range.
So each loop iteration could figure which operation to use based on
cpucaps, TLBI range ops, stride and reduce range_size accordingly.

-- 
Catalin


  reply	other threads:[~2020-05-14 15:28 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-14 11:28 [RFC PATCH v3 0/2] arm64: tlb: add support for TLBI RANGE instructions Zhenyu Ye
2020-04-14 11:28 ` [RFC PATCH v3 1/2] arm64: tlb: Detect the ARMv8.4 TLBI RANGE feature Zhenyu Ye
2020-05-05 10:14   ` Mark Rutland
2020-05-11 12:25     ` Zhenyu Ye
2020-05-18  4:22       ` Anshuman Khandual
2020-05-18 12:29         ` Zhenyu Ye
2020-04-14 11:28 ` [RFC PATCH v3 2/2] arm64: tlb: Use the TLBI RANGE feature in arm64 Zhenyu Ye
2020-05-14 15:28   ` Catalin Marinas [this message]
2020-05-18 12:21     ` Zhenyu Ye
2020-05-20 17:08       ` Catalin Marinas
2020-06-01 14:57         ` Zhenyu Ye
2020-06-09 13:26       ` Zhenyu Ye

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200514152840.GC1907@gaia \
    --to=catalin.marinas@arm.com \
    --cc=arm@kernel.org \
    --cc=guohanjun@huawei.com \
    --cc=kuhn.chenqun@huawei.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=maz@kernel.org \
    --cc=olof@lixom.net \
    --cc=prime.zeng@hisilicon.com \
    --cc=steven.price@arm.com \
    --cc=suzuki.poulose@arm.com \
    --cc=will@kernel.org \
    --cc=xiexiangyou@huawei.com \
    --cc=yezhenyu2@huawei.com \
    --cc=zhangshaokun@hisilicon.com \
    --subject='Re: [RFC PATCH v3 2/2] arm64: tlb: Use the TLBI RANGE feature in arm64' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).