Re: [PATCH v1 1/3] mm: split vm_normal_pages for LRU and non-LRU handling

From: Alistair Popple <apopple@nvidia.com>
To: Felix Kuehling <felix.kuehling@amd.com>
Cc: Matthew Wilcox <willy@infradead.org>,
	Alex Sierra <alex.sierra@amd.com>,
	jgg@nvidia.com, david@redhat.com, linux-mm@kvack.org,
	rcampbell@nvidia.com, linux-ext4@vger.kernel.org,
	linux-xfs@vger.kernel.org, amd-gfx@lists.freedesktop.org,
	dri-devel@lists.freedesktop.org, hch@lst.de, jglisse@redhat.com,
	akpm@linux-foundation.org
Subject: Re: [PATCH v1 1/3] mm: split vm_normal_pages for LRU and non-LRU handling
Date: Thu, 17 Mar 2022 13:50:37 +1100	[thread overview]
Message-ID: <87mthp8f2g.fsf@nvdebian.thelocal> (raw)
In-Reply-To: <651099d6-21ae-16a6-e500-a87002468cda@amd.com>

[-- Attachment #1: Type: text/plain, Size: 2861 bytes --]

Felix Kuehling <felix.kuehling@amd.com> writes:

> Am 2022-03-10 um 14:25 schrieb Matthew Wilcox:
>> On Thu, Mar 10, 2022 at 11:26:31AM -0600, Alex Sierra wrote:
>>> @@ -606,7 +606,7 @@ static void print_bad_pte(struct vm_area_struct *vma, unsigned long addr,
>>>    * PFNMAP mappings in order to support COWable mappings.
>>>    *
>>>    */
>>> -struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
>>> +struct page *vm_normal_any_page(struct vm_area_struct *vma, unsigned long addr,
>>>   			    pte_t pte)
>>>   {
>>>   	unsigned long pfn = pte_pfn(pte);
>>> @@ -620,8 +620,6 @@ struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
>>>   			return NULL;
>>>   		if (is_zero_pfn(pfn))
>>>   			return NULL;
>>> -		if (pte_devmap(pte))
>>> -			return NULL;
>>>     		print_bad_pte(vma, addr, pte, NULL);
>>>   		return NULL;
>> ... what?
>>
>> Haven't you just made it so that a devmap page always prints a bad PTE
>> message, and then returns NULL anyway?
>
> Yeah, that was stupid. :/  I think the long-term goal was to get rid of
> pte_devmap. But for now, as long as we have pte_special with pte_devmap,
> we'll need a special case to handle that like a normal page.
>
> I only see the PFN_DEV|PFN_MAP flags set in a few places: drivers/dax/device.c,
> drivers/nvdimm/pmem.c, fs/fuse/virtio_fs.c. I guess we need to test at least one
> of them for this patch series to make sure we're not breaking them.
>
>
>>
>> Surely this should be:
>>
>> 		if (pte_devmap(pte))
>> -			return NULL;
>> +			return pfn_to_page(pfn);
>>
>> or maybe
>>
>> +			goto check_pfn;
>>
>> But I don't know about that highest_memmap_pfn check.
>
> Looks to me like it should work. highest_memmap_pfn gets updated in
> memremap_pages -> pagemap_range -> move_pfn_range_to_zone ->
> memmap_init_range.

FWIW the previous version of this feature which was removed in 25b2995a35b6
("mm: remove MEMORY_DEVICE_PUBLIC support") had a similar comparison with
highest_memmap_pfn:

if (likely(pfn <= highest_memmap_pfn)) {
        struct page *page = pfn_to_page(pfn);

        if (is_device_public_page(page)) {
                if (with_public_device)
                        return page;
                return NULL;
        }
}

> Regards,
>   Felix
>
>
>>
>>> @@ -661,6 +659,22 @@ struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
>>>   	return pfn_to_page(pfn);
>>>   }
>>>   +/*
>>> + * vm_normal_lru_page -- This function gets the "struct page" associated
>>> + * with a pte only for page cache and anon page. These pages are LRU handled.
>>> + */
>>> +struct page *vm_normal_lru_page(struct vm_area_struct *vma, unsigned long addr,
>>> +			    pte_t pte)
>> It seems a shame to add a new function without proper kernel-doc.
>>