From: Christian Borntraeger <borntraeger@de.ibm.com>
To: Peter Xu <peterx@redhat.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>,
linux-kernel@vger.kernel.org,
Gerald Schaefer <gerald.schaefer@de.ibm.com>,
Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Andrea Arcangeli <aarcange@redhat.com>,
Heiko Carstens <heiko.carstens@de.ibm.com>,
Vasily Gorbik <gor@linux.ibm.com>,
linux-s390@vger.kernel.org
Subject: Re: [PATCH 19/25] mm/s390: Use mm_fault_accounting()
Date: Wed, 17 Jun 2020 18:14:52 +0200 [thread overview]
Message-ID: <8bd8dcf6-f2f0-d44e-9bf8-6fd4fe299aa9@de.ibm.com> (raw)
In-Reply-To: <20200617160617.GD76766@xz-x1>
On 17.06.20 18:06, Peter Xu wrote:
> Hi, Christian,
>
> On Wed, Jun 17, 2020 at 08:19:29AM +0200, Christian Borntraeger wrote:
>>
>>
>> On 16.06.20 18:35, Peter Xu wrote:
>>> Hi, Alexander,
>>>
>>> On Tue, Jun 16, 2020 at 05:59:33PM +0200, Alexander Gordeev wrote:
>>>>> @@ -489,21 +489,7 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
>>>>> if (unlikely(fault & VM_FAULT_ERROR))
>>>>> goto out_up;
>>>>>
>>>>> - /*
>>>>> - * Major/minor page fault accounting is only done on the
>>>>> - * initial attempt. If we go through a retry, it is extremely
>>>>> - * likely that the page will be found in page cache at that point.
>>>>> - */
>>>>> if (flags & FAULT_FLAG_ALLOW_RETRY) {
>>>>> - if (fault & VM_FAULT_MAJOR) {
>>>>> - tsk->maj_flt++;
>>>>> - perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ, 1,
>>>>> - regs, address);
>>>>> - } else {
>>>>> - tsk->min_flt++;
>>>>> - perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN, 1,
>>>>> - regs, address);
>>>>> - }
>>>>> if (fault & VM_FAULT_RETRY) {
>>>>> if (IS_ENABLED(CONFIG_PGSTE) && gmap &&
>>>>> (flags & FAULT_FLAG_RETRY_NOWAIT)) {
>
> [1]
>
>>>>
>>>> Seems like the call to mm_fault_accounting() will be missed if
>>>> we entered here with FAULT_FLAG_RETRY_NOWAIT flag set, since it
>>>> jumps to "out_up"...
>>>
>>> This is true as a functional change. However that also means that we've got a
>>> VM_FAULT_RETRY, which hints that this fault has been requested to retry rather
>>> than handled correctly (for instance, due to some try_lock failed during the
>>> fault process).
>>>
>>> To me, that case should not be counted as a page fault at all? Or we might get
>>> the same duplicated accounting when the page fault retried from a higher stack.
>>>
>>> Thanks
>>
>> This case below (the one with the gmap) is the KVM case for doing a so called
>> pseudo page fault to our guests. (we notify our guests about major host page
>> faults and let it reschedule to something else instead of halting the vcpu).
>> This is being resolved with either gup or fixup_user_fault asynchronously by
>> KVM code (this can also be sync when the guest does not match some conditions)
>> We do not change the counters in that code as far as I can tell so we should
>> continue to do it here.
>>
>> (see arch/s390/kvm/kvm-s390.c
>> static int vcpu_post_run(struct kvm_vcpu *vcpu, int exit_reason)
>> {
>> [...]
>> } else if (current->thread.gmap_pfault) {
>> trace_kvm_s390_major_guest_pfault(vcpu);
>> current->thread.gmap_pfault = 0;
>> if (kvm_arch_setup_async_pf(vcpu))
>> return 0;
>> return kvm_arch_fault_in_page(vcpu, current->thread.gmap_addr, 1);
>> }
>
> Please correct me if I'm wrong... but I still think what this patch does is the
> right thing to do.
>
> Note again that IMHO when reached [1] above it means the page fault is not
> handled correctly so we need to fallback to KVM async page fault, then we
> shouldn't increment the accountings until it's finally handled correctly. That
> final accounting should be done in the async pf path in gup code where the page
> fault is handled:
>
> kvm_arch_fault_in_page
> gmap_fault
> fixup_user_fault
>
> Where in fixup_user_fault() we have:
>
> if (tsk) {
> if (major)
> tsk->maj_flt++;
> else
> tsk->min_flt++;
> }
>
Right that case does work. Its the case where we do not inject a pseudo pagefault and
instead fall back to synchronous fault-in.
What is about the other case:
kvm_setup_async_pf
->workqueue
async_pf_execute
get_user_pages_remote
Does get_user_pages_remote do the accounting as well? I cant see that.
next prev parent reply other threads:[~2020-06-17 16:15 UTC|newest]
Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-06-15 22:15 [PATCH 00/25] mm: Page fault accounting cleanups Peter Xu
2020-06-15 22:15 ` [PATCH 01/25] mm/um: Fix extra accounting for page fault retries Peter Xu
2020-06-15 22:15 ` [PATCH 02/25] mm: Introduce mm_fault_accounting() Peter Xu
2020-06-15 22:32 ` Linus Torvalds
2020-06-15 23:19 ` Peter Xu
2020-06-16 19:00 ` Andrew Morton
2020-06-17 16:26 ` Peter Xu
2020-06-15 22:15 ` [PATCH 03/25] mm/alpha: Use mm_fault_accounting() Peter Xu
2020-06-15 22:15 ` [PATCH 04/25] mm/arc: " Peter Xu
2020-06-15 22:15 ` [PATCH 05/25] mm/arm: " Peter Xu
2020-06-15 22:15 ` [PATCH 06/25] mm/arm64: " Peter Xu
2020-06-16 7:43 ` Will Deacon
2020-06-16 15:59 ` Peter Xu
2020-06-15 22:15 ` [PATCH 07/25] mm/csky: " Peter Xu
2020-06-17 7:04 ` Guo Ren
2020-06-17 15:49 ` Peter Xu
2020-06-17 17:53 ` Linus Torvalds
2020-06-17 19:58 ` Peter Xu
2020-06-17 20:15 ` Linus Torvalds
2020-06-18 14:38 ` Peter Xu
2020-06-18 17:15 ` Linus Torvalds
2020-06-18 21:24 ` Peter Xu
2020-06-18 22:28 ` Peter Xu
2020-06-18 22:59 ` Linus Torvalds
2020-06-15 22:15 ` [PATCH 08/25] mm/hexagon: " Peter Xu
2020-06-15 22:15 ` [PATCH 09/25] mm/ia64: " Peter Xu
2020-06-15 22:15 ` [PATCH 10/25] mm/m68k: " Peter Xu
2020-06-15 22:15 ` [PATCH 11/25] mm/microblaze: " Peter Xu
2020-06-15 22:15 ` [PATCH 12/25] mm/mips: " Peter Xu
2020-06-15 22:15 ` [PATCH 13/25] mm/nds32: " Peter Xu
2020-06-17 1:05 ` Greentime Hu
2020-06-15 22:15 ` [PATCH 14/25] mm/nios2: " Peter Xu
2020-06-15 22:15 ` [PATCH 15/25] mm/openrisc: " Peter Xu
2020-06-16 18:11 ` Stafford Horne
2020-06-15 22:15 ` [PATCH 16/25] mm/parisc: " Peter Xu
2020-06-15 22:15 ` [PATCH 17/25] mm/powerpc: " Peter Xu
2020-06-15 22:16 ` [PATCH 18/25] mm/riscv: " Peter Xu
2020-06-18 23:49 ` Palmer Dabbelt
2020-06-19 0:12 ` Peter Xu
2020-06-15 22:23 ` [PATCH 19/25] mm/s390: " Peter Xu
2020-06-16 15:59 ` Alexander Gordeev
2020-06-16 16:35 ` Peter Xu
2020-06-17 6:19 ` Christian Borntraeger
2020-06-17 16:06 ` Peter Xu
2020-06-17 16:14 ` Christian Borntraeger [this message]
2020-06-17 16:44 ` Peter Xu
2020-06-15 22:23 ` [PATCH 20/25] mm/sh: " Peter Xu
2020-07-20 21:25 ` Rich Felker
2020-07-20 22:05 ` Peter Xu
2020-06-15 22:23 ` [PATCH 21/25] mm/sparc32: " Peter Xu
2020-06-15 22:23 ` [PATCH 22/25] mm/sparc64: " Peter Xu
2020-06-15 22:23 ` [PATCH 23/25] mm/unicore32: " Peter Xu
2020-06-15 22:23 ` [PATCH 24/25] mm/x86: " Peter Xu
2020-06-15 22:23 ` [PATCH 25/25] mm/xtensa: " Peter Xu
2020-06-15 23:13 ` Max Filippov
2020-06-16 18:55 ` [PATCH 00/25] mm: Page fault accounting cleanups Linus Torvalds
2020-06-16 21:03 ` Peter Xu
2020-06-17 0:55 ` Michael Ellerman
2020-06-17 8:04 ` Will Deacon
2020-06-17 16:10 ` Peter Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8bd8dcf6-f2f0-d44e-9bf8-6fd4fe299aa9@de.ibm.com \
--to=borntraeger@de.ibm.com \
--cc=aarcange@redhat.com \
--cc=agordeev@linux.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=gerald.schaefer@de.ibm.com \
--cc=gor@linux.ibm.com \
--cc=heiko.carstens@de.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-s390@vger.kernel.org \
--cc=peterx@redhat.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).