All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Nadav Amit <nadav.amit@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>,
	"zhangliang (AG)" <zhangliang5@huawei.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linux-MM <linux-mm@kvack.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	wangzhigang17@huawei.com,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH] mm: reuse the unshared swapcache page in do_wp_page
Date: Thu, 20 Jan 2022 21:49:13 +0100	[thread overview]
Message-ID: <7a18f74f-9dc2-f23d-4f1c-c7a9217f8317@redhat.com> (raw)
In-Reply-To: <288FB900-A688-4EDB-95C6-E63B6E0A15D1@gmail.com>

On 20.01.22 21:46, Nadav Amit wrote:
> 
> 
>> On Jan 20, 2022, at 12:37 PM, David Hildenbrand <david@redhat.com> wrote:
>>
>> On 20.01.22 21:09, David Hildenbrand wrote:
>>> On 20.01.22 21:07, Matthew Wilcox wrote:
>>>> On Thu, Jan 20, 2022 at 08:55:12PM +0100, David Hildenbrand wrote:
>>>>>>>> David, does any of it regards the lru_cache_add() reference issue that I
>>>>>>>> mentioned? [1]
>>>>
>>>>> +++ b/mm/memory.c
>>>>> @@ -3291,19 +3291,28 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf)
>>>>>        if (PageAnon(vmf->page)) {
>>>>>                struct page *page = vmf->page;
>>>>>
>>>>> -               /* PageKsm() doesn't necessarily raise the page refcount */
>>>>> -               if (PageKsm(page) || page_count(page) != 1)
>>>>> +               /*
>>>>> +                * PageKsm() doesn't necessarily raise the page refcount.
>>>>> +                *
>>>>> +                * These checks are racy as long as we haven't locked the page;
>>>>> +                * they are a pure optimization to avoid trying to lock the page
>>>>> +                * and trying to free the swap cache when there is little hope
>>>>> +                * it will actually result in a refcount of 1.
>>>>> +                */
>>>>> +               if (PageKsm(page) || page_count(page) > 1 + PageSwapCache(page))
>>>>>                        goto copy;
>>>>>                if (!trylock_page(page))
>>>>>                        goto copy;
>>>>> -               if (PageKsm(page) || page_mapcount(page) != 1 || page_count(page) != 1) {
>>>>> +               if (PageSwapCache(page))
>>>>> +                       try_to_free_swap(page);
>>>>> +               if (PageKsm(page) || page_count(page) != 1) {
>>>>>                        unlock_page(page);
>>>>>                        goto copy;
>>>>>                }
>>>>>                /*
>>>>> -                * Ok, we've got the only map reference, and the only
>>>>> -                * page count reference, and the page is locked,
>>>>> -                * it's dark out, and we're wearing sunglasses. Hit it.
>>>>> +                * Ok, we've got the only page reference from our mapping
>>>>> +                * and the page is locked, it's dark out, and we're wearing
>>>>> +                * sunglasses. Hit it.
>>>>>                 */
>>>>>                unlock_page(page);
>>>>>                wp_page_reuse(vmf);
>>>>>
>>>>>
>>>>> I added some vmstats that monitor various paths. After one run of
>>>>> 	./forceswap 2 1000000 1
>>>>> I'm left with a rough delta (including some noise) of
>>>>> 	anon_wp_copy_count 1799
>>>>> 	anon_wp_copy_count_early 1
>>>>> 	anon_wp_copy_lock 983396
>>>>> 	anon_wp_reuse 0
>>>>>
>>>>> The relevant part of your reproducer is
>>>>>
>>>>> 	for (i = 0; i < nops; i++) {
>>>>> 		if (madvise((void *)p, PAGE_SIZE * npages, MADV_PAGEOUT)) {
>>>>> 			perror("madvise");
>>>>> 			exit(-1);
>>>>> 		}
>>>>>
>>>>> 		for (j = 0; j < npages; j++) {
>>>>> 			c = p[j * PAGE_SIZE];
>>>>> 			c++;
>>>>> 			time -= rdtscp();
>>>>> 			p[j * PAGE_SIZE] = c;
>>>>> 			time += rdtscp();
>>>>> 		}
>>>>> 	}
>>>>>
>>>>> For this specific reproducer at least, the page lock seems to be the thingy that prohibits
>>>>> reuse if I interpret the numbers correctly. We pass the initial page_count() check.
>>>>>
>>>>> Haven't looked into the details, and I would be curious how that performs with actual
>>>>> workloads, if we can reproduce similar behavior.
>>>>
>>>> I don't see how that patch addresses the lru issue.  Wouldn't we need
>>>> something like ...
>>>>
>>>> 	if (!PageLRU(page))
>>>> 		lru_add_drain_all();
>>>>
>>
>> lru_add_drain_all() takes a mutex ... best we can do I guess is drain
>> the local CPU using lru_add_drain(). I'll go play with it and see what
>> breaks :)
>>
> 
> I did hack something similar and it solved the problem, but I felt it is
> a hack. If the thread is scheduled on another core, or if the write fault
> is triggered by another thread it wouldn’t work.

Yes, it will not match easily. One question would be how often it would
help in practice and if it would be worth the price.

> 
> If you look for a real-world workload that behaves similarly, you can try
> memcached with memory pressure and low latency device (I used 
> pmem-emulated). This is the workload in which I encountered the issue
> first.

Yes, I agree. So PageAnonExclusive is our best bet ... hopefully.

-- 
Thanks,

David / dhildenb


  reply	other threads:[~2022-01-20 20:49 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-13 14:03 [PATCH] mm: reuse the unshared swapcache page in do_wp_page Liang Zhang
2022-01-13 14:39 ` Matthew Wilcox
2022-01-13 14:46   ` David Hildenbrand
2022-01-13 15:02     ` Matthew Wilcox
2022-01-13 15:04       ` David Hildenbrand
2022-01-13 16:37   ` Linus Torvalds
2022-01-13 16:48     ` David Hildenbrand
2022-01-13 17:14       ` Linus Torvalds
2022-01-13 17:25         ` David Hildenbrand
2022-01-13 17:44           ` Linus Torvalds
2022-01-13 17:55             ` David Hildenbrand
2022-01-13 18:55               ` Linus Torvalds
2022-01-13 21:07             ` Matthew Wilcox
2022-01-13 22:21               ` Linus Torvalds
2022-01-14  5:00       ` zhangliang (AG)
2022-01-14 11:23         ` David Hildenbrand
2022-01-17  2:11           ` zhangliang (AG)
2022-01-17 12:58             ` David Hildenbrand
2022-01-17 13:31               ` zhangliang (AG)
2022-01-20 14:15                 ` David Hildenbrand
2022-01-20 14:39                   ` Matthew Wilcox
2022-01-20 15:26                     ` David Hildenbrand
2022-01-20 15:36                       ` Matthew Wilcox
2022-01-20 15:39                         ` David Hildenbrand
2022-01-20 15:45                           ` Matthew Wilcox
2022-01-20 15:51                             ` David Hildenbrand
2022-01-20 16:09                               ` Matthew Wilcox
2022-01-20 16:35                                 ` David Hildenbrand
2022-01-20 15:37                       ` Linus Torvalds
2022-01-20 15:46                         ` David Hildenbrand
2022-01-20 17:22                           ` Linus Torvalds
2022-01-20 17:49                             ` David Hildenbrand
2022-01-20 17:48                   ` Nadav Amit
2022-01-20 18:00                     ` David Hildenbrand
2022-01-20 18:11                       ` Nadav Amit
2022-01-20 18:19                         ` David Hildenbrand
2022-01-20 19:55                         ` David Hildenbrand
2022-01-20 20:07                           ` Matthew Wilcox
2022-01-20 20:09                             ` David Hildenbrand
2022-01-20 20:37                               ` David Hildenbrand
2022-01-20 20:46                                 ` Nadav Amit
2022-01-20 20:49                                   ` David Hildenbrand [this message]
2022-01-21  9:01                                     ` David Hildenbrand
2022-01-21 17:43                                       ` Nadav Amit
2022-01-20 20:18                           ` David Hildenbrand
2022-01-14  3:29   ` zhangliang (AG)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7a18f74f-9dc2-f23d-4f1c-c7a9217f8317@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nadav.amit@gmail.com \
    --cc=torvalds@linux-foundation.org \
    --cc=wangzhigang17@huawei.com \
    --cc=willy@infradead.org \
    --cc=zhangliang5@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.