linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Yin, Fengwei" <fengwei.yin@intel.com>
To: Ryan Roberts <ryan.roberts@arm.com>,
	Matthew Wilcox <willy@infradead.org>
Cc: <linux-arch@vger.kernel.org>, <will@kernel.org>,
	<linux-mm@kvack.org>, <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v4 35/36] mm: Convert do_set_pte() to set_pte_range()
Date: Fri, 17 Mar 2023 16:19:44 +0800	[thread overview]
Message-ID: <d2e90338-6200-f005-110d-4626fda067a2@intel.com> (raw)
In-Reply-To: <25bf8e75-cc2e-7d08-dbba-41c53ab751b0@arm.com>



On 3/17/2023 4:00 PM, Ryan Roberts wrote:
> On 17/03/2023 06:33, Yin, Fengwei wrote:
>>
>>
>> On 3/17/2023 11:44 AM, Matthew Wilcox wrote:
>>> On Fri, Mar 17, 2023 at 09:58:17AM +0800, Yin, Fengwei wrote:
>>>>
>>>>
>>>> On 3/17/2023 1:52 AM, Matthew Wilcox wrote:
>>>>> On Thu, Mar 16, 2023 at 04:38:58PM +0000, Ryan Roberts wrote:
>>>>>> On 16/03/2023 16:23, Yin, Fengwei wrote:
>>>>>>>> I think you are changing behavior here - is this intentional? Previously this
>>>>>>>> would be evaluated per page, now its evaluated once for the whole range. The
>>>>>>>> intention below is that directly faulted pages are mapped young and prefaulted
>>>>>>>> pages are mapped old. But now a whole range will be mapped the same.
>>>>>>>
>>>>>>> Yes. You are right here.
>>>>>>>
>>>>>>> Look at the prefault and cpu_has_hw_af for ARM64, it looks like we
>>>>>>> can avoid to handle vmf->address == addr specially. It's OK to 
>>>>>>> drop prefault and change the logic here a little bit to:
>>>>>>>   if (arch_wants_old_prefaulted_pte())
>>>>>>>       entry = pte_mkold(entry);
>>>>>>>   else
>>>>>>>       entry = pte_sw_mkyong(entry);
>>>>>>>
>>>>>>> It's not necessary to use pte_sw_mkyong for vmf->address == addr
>>>>>>> because HW will set the ACCESS bit in page table entry.
>>>>>>>
>>>>>>> Add Will Deacon in case I missed something here. Thanks.
>>>>>>
>>>>>> I'll defer to Will's response, but not all arm HW supports HW access flag
>>>>>> management. In that case it's done by SW, so I would imagine that by setting
>>>>>> this to old initially, we will get a second fault to set the access bit, which
>>>>>> will slow things down. I wonder if you will need to split this into (up to) 3
>>>>>> calls to set_ptes()?
>>>>>
>>>>> I don't think we should do that.  The limited information I have from
>>>>> various microarchitectures is that the PTEs must differ only in their
>>>>> PFN bits in order to use larger TLB entries.  That includes the Accessed
>>>>> bit (or equivalent).  So we should mkyoung all the PTEs in the same
>>>>> folio, at least initially.
>>>>>
>>>>> That said, we should still do this conditionally.  We'll prefault some
>>>>> other folios too.  So I think this should be:
>>>>>
>>>>>         bool prefault = (addr > vmf->address) || ((addr + nr) < vmf->address);
>>>>>
>>>> According to commit 46bdb4277f98e70d0c91f4289897ade533fe9e80, if hardware access
>>>> flag is supported on ARM64, there is benefit if prefault PTEs is set as "old".
>>>> If we change prefault like above, the PTEs is set as "yong" which loose benefit
>>>> on ARM64 with hardware access flag.
>>>>
>>>> ITOH, if from "old" to "yong" is cheap, why not leave all PTEs of folio as "old"
>>>> and let hardware to update it to "yong"?
>>>
>>> Because we're tracking the entire folio as a single entity.  So we're
>>> better off avoiding the extra pagefaults to update the accessed bit,
>>> which won't actually give us any information (vmscan needs to know "were
>>> any of the accessed bits set", not "how many of them were set").
>> There is no extra pagefaults to update the accessed bit. There are three cases here:
>> 1. hardware support access flag and cheap from "old" to "yong" without extra fault
>> 2. hardware support access flag and expensive from "old" to "yong" without extra fault
>> 3. no hardware support access flag (extra pagefaults from "old" to "yong". Expensive)
>>
>> For #2 and #3, it's expensive from "old" to "yong", so we always set PTEs "yong" in
>> page fault.
>> For #1, It's cheap from "old" to "yong", so it's OK to set PTEs "old" in page fault.
>> And hardware will set it to "yong" when access memory. Actually, ARM64 with hardware
>> access bit requires to set PTEs "old".
> 
> Your logic makes sense, but it doesn't take into account the HPA
> micro-architectural feature present in some ARM CPUs. HPA can transparently
> coalesce multiple pages into a single TLB entry when certain conditions are met
> (roughly; upto 4 pages physically and virtually contiguous and all within a
> 4-page natural alignment). But as Matthew says, this works out better when all
> pte attributes (including access and dirty) match. Given the reason for setting
> the prefault pages to old is so that vmscan can do a better job of finding cold
> pages, and given vmscan will now be looking for folios and not individual pages
> (I assume?), I agree with Matthew that we should make whole folios young or old.
> It will marginally increase our chances of the access and dirty bits being
> consistent across the whole 4-page block that the HW tries to coalesce. If we
> unconditionally make everything old, the hw will set accessed for the single
> page that faulted, and we therefore don't have consistency for that 4-page block.
My concern was that the benefit of "old" PTEs for ARM64 with hardware access bit
will be lost. The workloads (application launch latency and direct reclaim according
to commit 46bdb4277f98e70d0c91f4289897ade533fe9e80) can show regression with this
series. Thanks.

BTW, with TLB merge feature, should hardware update coalesce multiple pages access
bit together? otherwise, it's avoidable that only one page access is set by hardware
finally.

Regards
Yin, Fengwei

> 
>>
>>>
>>> Anyway, hopefully Ryan can test this and let us know if it fixes the
>>> regression he sees.
>> I highly suspect the regression Ryan saw is not related with this but another my
>> stupid work. I will send out the testing patch soon. Thanks.
> 
> I tested a version of this where I made everything unconditionally young,
> thinking it might be the source of the perf regression, before I reported it. It
> doesn't make any difference. So I agree the regression is somewhere else.
> 
> Thanks,
> Ryan
> 
>>
>>
>> Regards
>> Yin, Fengwei
> 


  reply	other threads:[~2023-03-17  8:20 UTC|newest]

Thread overview: 139+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-15  5:14 [PATCH v4 00/36] New page table range API Matthew Wilcox (Oracle)
2023-03-15  5:14 ` [PATCH v4 01/36] mm: Convert page_table_check_pte_set() to page_table_check_ptes_set() Matthew Wilcox (Oracle)
2023-03-15  9:21   ` Mike Rapoport
2023-03-23 18:36   ` Pasha Tatashin
2023-05-25  2:16   ` Anshuman Khandual
2023-03-15  5:14 ` [PATCH v4 02/36] mm: Add generic flush_icache_pages() and documentation Matthew Wilcox (Oracle)
2023-03-15  9:27   ` Mike Rapoport
2023-05-25  2:23   ` Anshuman Khandual
2023-03-15  5:14 ` [PATCH v4 03/36] mm: Add folio_flush_mapping() Matthew Wilcox (Oracle)
2023-03-15  9:28   ` Mike Rapoport
2023-05-25  2:35   ` Anshuman Khandual
2023-03-15  5:14 ` [PATCH v4 04/36] mm: Remove ARCH_IMPLEMENTS_FLUSH_DCACHE_FOLIO Matthew Wilcox (Oracle)
2023-03-15  9:28   ` Mike Rapoport
2023-05-25  2:43   ` Anshuman Khandual
2023-03-15  5:14 ` [PATCH v4 05/36] mm: Add default definition of set_ptes() Matthew Wilcox (Oracle)
2023-03-15  9:34   ` Mike Rapoport
2023-05-25  3:01   ` Anshuman Khandual
2023-05-25  4:06     ` Matthew Wilcox
2023-03-15  5:14 ` [PATCH v4 06/36] alpha: Implement the new page table range API Matthew Wilcox (Oracle)
2023-03-15  9:41   ` Mike Rapoport
2023-03-15  5:14 ` [PATCH v4 07/36] arc: " Matthew Wilcox (Oracle)
2023-03-15  9:44   ` Mike Rapoport
2023-03-15  5:14 ` [PATCH v4 08/36] arm: " Matthew Wilcox (Oracle)
2023-03-15  9:48   ` Mike Rapoport
2023-03-15 10:56   ` Russell King (Oracle)
2023-03-15  5:14 ` [PATCH v4 09/36] arm64: " Matthew Wilcox (Oracle)
2023-03-15  9:49   ` Mike Rapoport
2023-05-25  3:35   ` Anshuman Khandual
2023-05-25  4:05     ` Matthew Wilcox
2023-05-25  4:43       ` Anshuman Khandual
2023-03-15  5:14 ` [PATCH v4 10/36] csky: " Matthew Wilcox (Oracle)
2023-03-15  9:50   ` Mike Rapoport
2023-03-15  5:14 ` [PATCH v4 11/36] hexagon: " Matthew Wilcox (Oracle)
2023-03-15  9:54   ` Mike Rapoport
2023-03-15  5:14 ` [PATCH v4 12/36] ia64: " Matthew Wilcox (Oracle)
2023-03-15  9:55   ` Mike Rapoport
2023-03-15  5:14 ` [PATCH v4 13/36] loongarch: " Matthew Wilcox (Oracle)
2023-03-15 10:07   ` Mike Rapoport
2023-03-15  5:14 ` [PATCH v4 14/36] m68k: " Matthew Wilcox (Oracle)
2023-03-15  7:43   ` Geert Uytterhoeven
2023-03-16 16:32     ` Geert Uytterhoeven
2023-03-15 10:07   ` Mike Rapoport
2023-03-15  5:14 ` [PATCH v4 15/36] microblaze: " Matthew Wilcox (Oracle)
2023-03-15 10:07   ` Mike Rapoport
2023-03-15  5:14 ` [PATCH v4 16/36] mips: " Matthew Wilcox (Oracle)
2023-03-15 10:08   ` Mike Rapoport
2023-03-15 10:50   ` Thomas Bogendoerfer
2023-03-15 20:33     ` Matthew Wilcox
2023-03-17 15:29       ` Thomas Bogendoerfer
2023-03-19 18:45         ` Thomas Bogendoerfer
2023-03-19 20:16           ` Matthew Wilcox
2023-03-21 11:30             ` Thomas Bogendoerfer
2023-03-15  5:14 ` [PATCH v4 17/36] nios2: " Matthew Wilcox (Oracle)
2023-03-15 10:08   ` Mike Rapoport
2023-06-13 22:45     ` Dinh Nguyen
2023-07-10 20:18       ` Matthew Wilcox
2023-07-10 23:10         ` Dinh Nguyen
2023-03-15  5:14 ` [PATCH v4 18/36] openrisc: " Matthew Wilcox (Oracle)
2023-03-15 10:09   ` Mike Rapoport
2023-03-15  5:14 ` [PATCH v4 19/36] parisc: " Matthew Wilcox (Oracle)
2023-03-15 10:09   ` Mike Rapoport
2023-03-15  5:14 ` [PATCH v4 20/36] powerpc: " Matthew Wilcox (Oracle)
2023-03-15  9:43   ` Christophe Leroy
2023-03-15 10:18     ` Christophe Leroy
2023-03-17  3:47       ` Matthew Wilcox
2023-03-18  9:19         ` Christophe Leroy
2023-07-10 20:24           ` Matthew Wilcox
2023-07-11  4:40             ` Christophe Leroy
2023-03-15 10:09   ` Mike Rapoport
2023-03-15  5:14 ` [PATCH v4 21/36] riscv: " Matthew Wilcox (Oracle)
2023-03-15 10:10   ` Mike Rapoport
2023-03-15  5:14 ` [PATCH v4 22/36] s390: " Matthew Wilcox (Oracle)
2023-03-15 10:10   ` Mike Rapoport
2023-03-15  5:14 ` [PATCH v4 23/36] superh: " Matthew Wilcox (Oracle)
2023-03-15  7:22   ` John Paul Adrian Glaubitz
2023-03-15  7:36   ` John Paul Adrian Glaubitz
2023-03-15 10:10   ` Mike Rapoport
2023-03-15  5:14 ` [PATCH v4 24/36] sparc32: " Matthew Wilcox (Oracle)
2023-03-15 10:11   ` Mike Rapoport
2023-03-15  5:14 ` [PATCH v4 25/36] sparc64: " Matthew Wilcox (Oracle)
2023-03-15 10:11   ` Mike Rapoport
2023-03-15  5:14 ` [PATCH v4 26/36] um: " Matthew Wilcox (Oracle)
2023-03-15 10:12   ` Mike Rapoport
2023-03-15  5:14 ` [PATCH v4 27/36] x86: " Matthew Wilcox (Oracle)
2023-03-15 10:12   ` Mike Rapoport
2023-03-15 10:34   ` Peter Zijlstra
2023-03-15 11:16     ` Mike Rapoport
2023-03-15 11:19       ` Peter Zijlstra
2023-03-15 16:12         ` Matthew Wilcox
2023-03-15  5:14 ` [PATCH v4 28/36] xtensa: " Matthew Wilcox (Oracle)
2023-03-15 10:12   ` Mike Rapoport
2023-03-15  5:14 ` [PATCH v4 29/36] mm: Remove page_mapping_file() Matthew Wilcox (Oracle)
2023-05-25  3:50   ` Anshuman Khandual
2023-05-25  4:03     ` Matthew Wilcox
2023-05-25  4:46       ` Anshuman Khandual
2023-05-25  5:37   ` Anshuman Khandual
2023-03-15  5:14 ` [PATCH v4 30/36] mm: Rationalise flush_icache_pages() and flush_icache_page() Matthew Wilcox (Oracle)
2023-03-15  5:14 ` [PATCH v4 31/36] mm: Tidy up set_ptes definition Matthew Wilcox (Oracle)
2023-05-25  6:20   ` Anshuman Khandual
2023-03-15  5:14 ` [PATCH v4 32/36] mm: Use flush_icache_pages() in do_set_pmd() Matthew Wilcox (Oracle)
2023-05-25  6:31   ` Anshuman Khandual
2023-03-15  5:14 ` [PATCH v4 33/36] filemap: Add filemap_map_folio_range() Matthew Wilcox (Oracle)
2023-03-15  5:14 ` [PATCH v4 34/36] rmap: add folio_add_file_rmap_range() Matthew Wilcox (Oracle)
2023-03-15 13:34   ` Ryan Roberts
2023-03-15 16:08     ` Ryan Roberts
2023-03-15 22:58       ` Yin Fengwei
2023-03-16 16:27       ` Yin, Fengwei
2023-03-16 16:34         ` Ryan Roberts
2023-03-17  8:23           ` Yin, Fengwei
2023-03-17 12:46             ` Ryan Roberts
2023-03-17 13:28               ` Yin, Fengwei
2023-03-15  5:14 ` [PATCH v4 35/36] mm: Convert do_set_pte() to set_pte_range() Matthew Wilcox (Oracle)
2023-03-15 15:26   ` Ryan Roberts
2023-03-16 16:23     ` Yin, Fengwei
2023-03-16 16:38       ` Ryan Roberts
2023-03-16 16:41         ` Yin, Fengwei
2023-03-16 16:50           ` Ryan Roberts
2023-03-16 17:52         ` Matthew Wilcox
2023-03-17  1:58           ` Yin, Fengwei
2023-03-17  3:44             ` Matthew Wilcox
2023-03-17  6:33               ` Yin, Fengwei
2023-03-17  8:00                 ` Ryan Roberts
2023-03-17  8:19                   ` Yin, Fengwei [this message]
2023-03-17 13:00                     ` Ryan Roberts
2023-03-17 13:44                       ` Yin, Fengwei
2023-03-24 14:58                     ` Will Deacon
2023-03-24 15:11                       ` Matthew Wilcox
2023-03-24 17:23                         ` Will Deacon
2023-03-27  1:23                           ` Yin Fengwei
2023-03-20 13:38               ` Yin, Fengwei
2023-03-20 14:08                 ` Matthew Wilcox
2023-03-21  1:58                   ` Yin, Fengwei
2023-03-21  5:13                   ` Yin Fengwei
2023-05-30  8:07                   ` [PATCH 0/4] New page table range API fixup patches Yin Fengwei
2023-05-30  8:07                     ` [PATCH 1/4] filemap: avoid interfere with xas.xa_index Yin Fengwei
2023-05-30  8:07                     ` [PATCH 2/4] rmap: fix typo in folio_add_file_rmap_range() Yin Fengwei
2023-05-30  8:07                     ` [PATCH 3/4] mm: mark PTEs referencing the accessed folio young Yin Fengwei
2023-05-30  8:07                     ` [PATCH 4/4] filemap: Check address range in filemap_map_folio_range() Yin Fengwei
2023-03-15  5:14 ` [PATCH v4 36/36] filemap: Batch PTE mappings Matthew Wilcox (Oracle)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d2e90338-6200-f005-110d-4626fda067a2@intel.com \
    --to=fengwei.yin@intel.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ryan.roberts@arm.com \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).