x86/mm: fix return value of p[um]dp_set_access_flags
diff mbox series

Message ID 20190919082549.3895-1-richardw.yang@linux.intel.com
State New, archived
Headers show
Series
  • x86/mm: fix return value of p[um]dp_set_access_flags
Related show

Commit Message

Wei Yang Sept. 19, 2019, 8:25 a.m. UTC
Function p[um]dp_set_access_flags is used with update_mmu_cache_p[um]d
and the return value from p[um]dp_set_access_flags indicates whether it
is necessary to do the cache update.

From current code logic, only when changed && dirty, related page table
entry would be updated. It is not necessary to update cache when the
real page table entry is not changed.

Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
---
 arch/x86/mm/pgtable.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Comments

Dave Hansen Sept. 19, 2019, 5:25 p.m. UTC | #1
On 9/19/19 1:25 AM, Wei Yang wrote:
> Function p[um]dp_set_access_flags is used with update_mmu_cache_p[um]d
> and the return value from p[um]dp_set_access_flags indicates whether it
> is necessary to do the cache update.

If this change is correct, why was it not applied to
ptep_set_access_flags()?  That function has the same form.

BTW, I rather dislike the 'dirty' variable name.  It seems to be
indicating whether the fault was a write fault or not and whether we
*expect* the dirty bit to be set in 'entry'.

> From current code logic, only when changed && dirty, related page table
> entry would be updated. It is not necessary to update cache when the
> real page table entry is not changed.

This logic doesn't really hold up, though.

If we are only writing accessed and/or dirty bits, then we *never* need
to flush.  The flush might avoid a stale TLB entry causing an extra page
walk by the hardware, but it's probably not ever worth the cost of the
flush.

Unless there's something weird happening with paravirt, I can't ever see
a good reason to flush the TLB when just setting accessed/dirty bits on
bare-metal.

This seems like a place where a debugging check to validate that only
accessed/dirty bits are only being set would be a good idea.
Wei Yang Sept. 20, 2019, 2:18 a.m. UTC | #2
On Thu, Sep 19, 2019 at 10:25:05AM -0700, Dave Hansen wrote:
>On 9/19/19 1:25 AM, Wei Yang wrote:
>> Function p[um]dp_set_access_flags is used with update_mmu_cache_p[um]d
>> and the return value from p[um]dp_set_access_flags indicates whether it
>> is necessary to do the cache update.
>
>If this change is correct, why was it not applied to
>ptep_set_access_flags()?  That function has the same form.
>

Thanks, you are right, I missed this point.

>BTW, I rather dislike the 'dirty' variable name.  It seems to be
>indicating whether the fault was a write fault or not and whether we
>*expect* the dirty bit to be set in 'entry'.
>

I found some comments for ptep_set_access_flags in OLD code, which says

/*
 * We only update the dirty/accessed state if we set
 * the dirty bit by hand in the kernel, since the hardware
 * will do the accessed bit for us, and we don't want to
 * race with other CPU's that might be updating the dirty
 * bit at the same time.
 */

I guess this is reason for the naming. Looks currently we use this value in
some other way.

>> From current code logic, only when changed && dirty, related page table
>> entry would be updated. It is not necessary to update cache when the
>> real page table entry is not changed.
>
>This logic doesn't really hold up, though.
>
>If we are only writing accessed and/or dirty bits, then we *never* need
>to flush.  The flush might avoid a stale TLB entry causing an extra page
>walk by the hardware, but it's probably not ever worth the cost of the
>flush.
>
>Unless there's something weird happening with paravirt, I can't ever see
>a good reason to flush the TLB when just setting accessed/dirty bits on
>bare-metal.
>
>This seems like a place where a debugging check to validate that only
>accessed/dirty bits are only being set would be a good idea.

The function update_mmu_cache_pmd is not for x86, IMHO, since it is empty on
x86.

I took a look into the definition on powerpc, which is defined in
arch/powerpc/mm/book3s64/pgtable.c, which preload the content of this HPTE. So
the cache update here is not TLB flush on x86.

And then I compare the definition of ptep_set_access_flags on x86 and powerpc.
On powerpc, which is defined in arch/powerpc/mm/pgtable.c, update the pte
without checking *dirty*. This logic make sense to me. So I am confused with
why on x86 we need to check both *changed* and *dirty*. 

Then I searched the history, and found commit 8dab5241d06bf may explain
something.

The commit log says:

    This patch reworks ptep_set_access_flags() semantics, implementations and
    callers so that it's now responsible for returning whether an update is
    necessary or not (basically whether the PTE actually changed).

If my understanding is correct, the return value should indicate whether the
PTE actually changed.

But the change introduced in commit 8dab5241d06bf is not doing the exact
thing on x86. Since we only change PTE only both *dirty* and *change*, just
return *changed* seems not match the semantics. 

Last but not least, since update_mmu_cache_pmd is empty, even return value is
not correct, it doesn't break anything.

Hope my understanding is correct.
Dave Hansen Sept. 20, 2019, 4:16 p.m. UTC | #3
On 9/19/19 7:18 PM, Wei Yang wrote:
> Last but not least, since update_mmu_cache_pmd is empty, even return
> value is not correct, it doesn't break anything.

In other words, this patch has no functional effect and does not provide
a "fix".  What's the point of patching this stuff if it has no effect?
Wei Yang Sept. 23, 2019, 6 a.m. UTC | #4
On Fri, Sep 20, 2019 at 09:16:12AM -0700, Dave Hansen wrote:
>On 9/19/19 7:18 PM, Wei Yang wrote:
>> Last but not least, since update_mmu_cache_pmd is empty, even return
>> value is not correct, it doesn't break anything.
>
>In other words, this patch has no functional effect and does not provide
>a "fix".  What's the point of patching this stuff if it has no effect?

It correct the function semantics. The return value of these function doesn't
meet the requirement, which will be misleading and we still need to dig in the
history to find out the correct answer.

In case we would have a valid cache update mechanism, this would introduce a
problem.

So I suggest to fix the return value to reflect the true meaning.

Patch
diff mbox series

diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 44816ff6411f..ba910f8ab43a 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -509,9 +509,10 @@  int pmdp_set_access_flags(struct vm_area_struct *vma,
 		 * #PF is architecturally guaranteed to do that and in the
 		 * worst-case we'll generate a spurious fault.
 		 */
+		return true;
 	}
 
-	return changed;
+	return false;
 }
 
 int pudp_set_access_flags(struct vm_area_struct *vma, unsigned long address,
@@ -529,9 +530,10 @@  int pudp_set_access_flags(struct vm_area_struct *vma, unsigned long address,
 		 * #PF is architecturally guaranteed to do that and in the
 		 * worst-case we'll generate a spurious fault.
 		 */
+		return true;
 	}
 
-	return changed;
+	return false;
 }
 #endif