linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH] x86/fault: move might_sleep() out of mmap read lock
@ 2021-11-19  6:58 Muchun Song
  2021-11-19 15:04 ` Dave Hansen
  0 siblings, 1 reply; 5+ messages in thread
From: Muchun Song @ 2021-11-19  6:58 UTC (permalink / raw)
  To: dave.hansen, luto, peterz, tglx, mingo, bp, hpa
  Cc: x86, linux-kernel, duanxiongchun, zhengqi.arch, Muchun Song

The mmap lock is supposed to be a contended lock sometimes, scheduling
to other task with holding mmap read lock does not seems to be a wise
choice. So move it to the front of mmap_read_trylock(). Although
mmap_read_lock() implies a might_sleep(), I think redundant check is
not a problem since this task is about to sleep and it is not a hot
path.

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
---
 arch/x86/mm/fault.c | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 4bfed53e210e..22fd1dfafa3d 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1323,6 +1323,8 @@ void do_user_addr_fault(struct pt_regs *regs,
 	}
 #endif
 
+	might_sleep();
+
 	/*
 	 * Kernel-mode access to the user address space should only occur
 	 * on well-defined single instructions listed in the exception
@@ -1346,13 +1348,6 @@ void do_user_addr_fault(struct pt_regs *regs,
 		}
 retry:
 		mmap_read_lock(mm);
-	} else {
-		/*
-		 * The above down_read_trylock() might have succeeded in
-		 * which case we'll have missed the might_sleep() from
-		 * down_read():
-		 */
-		might_sleep();
 	}
 
 	vma = find_vma(mm, address);
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] x86/fault: move might_sleep() out of mmap read lock
  2021-11-19  6:58 [RFC PATCH] x86/fault: move might_sleep() out of mmap read lock Muchun Song
@ 2021-11-19 15:04 ` Dave Hansen
  2021-11-22  6:59   ` Muchun Song
  0 siblings, 1 reply; 5+ messages in thread
From: Dave Hansen @ 2021-11-19 15:04 UTC (permalink / raw)
  To: Muchun Song, dave.hansen, luto, peterz, tglx, mingo, bp, hpa
  Cc: x86, linux-kernel, duanxiongchun, zhengqi.arch

On 11/18/21 10:58 PM, Muchun Song wrote:
> The mmap lock is supposed to be a contended lock sometimes, scheduling
> to other task with holding mmap read lock does not seems to be a wise
> choice. So move it to the front of mmap_read_trylock(). Although
> mmap_read_lock() implies a might_sleep(), I think redundant check is
> not a problem since this task is about to sleep and it is not a hot
> path.

This justification doesn't really do it for me.  There are a billion
ways to sleep in the fault path while the mmap lock is held.  Adding one
more cond_resched() doesn't seem like it would do much.

In other words, I don't think there's a performance justification here.

That said...

> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index 4bfed53e210e..22fd1dfafa3d 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -1323,6 +1323,8 @@ void do_user_addr_fault(struct pt_regs *regs,
>  	}
>  #endif
>  
> +	might_sleep();
> +
>  	/*
>  	 * Kernel-mode access to the user address space should only occur
>  	 * on well-defined single instructions listed in the exception
> @@ -1346,13 +1348,6 @@ void do_user_addr_fault(struct pt_regs *regs,
>  		}
>  retry:
>  		mmap_read_lock(mm);
> -	} else {
> -		/*
> -		 * The above down_read_trylock() might have succeeded in
> -		 * which case we'll have missed the might_sleep() from
> -		 * down_read():
> -		 */
> -		might_sleep();
>  	}
>  
>  	vma = find_vma(mm, address);

The comment is stale, which isn't great.  The might_sleep() is already
in the fast path.  So, moving it up above makes a lot of sense just in
terms of simplicity.

The might_sleep() does need a comment, though, about what it's _doing_
there.

So, right idea, good result, but for the wrong reasons.

If you want to revise the justification and add a comment, I think this
is something we can take.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] x86/fault: move might_sleep() out of mmap read lock
  2021-11-19 15:04 ` Dave Hansen
@ 2021-11-22  6:59   ` Muchun Song
  2021-11-25 10:45     ` Thomas Gleixner
  0 siblings, 1 reply; 5+ messages in thread
From: Muchun Song @ 2021-11-22  6:59 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Dave Hansen, Andrew Lutomirski, Peter Zijlstra, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Peter Anvin, X86 ML, LKML,
	Xiongchun duan, Qi Zheng

On Fri, Nov 19, 2021 at 11:04 PM Dave Hansen <dave.hansen@intel.com> wrote:
>
> On 11/18/21 10:58 PM, Muchun Song wrote:
> > The mmap lock is supposed to be a contended lock sometimes, scheduling
> > to other task with holding mmap read lock does not seems to be a wise
> > choice. So move it to the front of mmap_read_trylock(). Although
> > mmap_read_lock() implies a might_sleep(), I think redundant check is
> > not a problem since this task is about to sleep and it is not a hot
> > path.
>
> This justification doesn't really do it for me.  There are a billion
> ways to sleep in the fault path while the mmap lock is held.  Adding one
> more cond_resched() doesn't seem like it would do much.

I agree with you that there are lots of ways to sleep in the
fault path. Just try my best to not sleep with mmap lock.

>
> In other words, I don't think there's a performance justification here.
>
> That said...
>
> > diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> > index 4bfed53e210e..22fd1dfafa3d 100644
> > --- a/arch/x86/mm/fault.c
> > +++ b/arch/x86/mm/fault.c
> > @@ -1323,6 +1323,8 @@ void do_user_addr_fault(struct pt_regs *regs,
> >       }
> >  #endif
> >
> > +     might_sleep();
> > +
> >       /*
> >        * Kernel-mode access to the user address space should only occur
> >        * on well-defined single instructions listed in the exception
> > @@ -1346,13 +1348,6 @@ void do_user_addr_fault(struct pt_regs *regs,
> >               }
> >  retry:
> >               mmap_read_lock(mm);
> > -     } else {
> > -             /*
> > -              * The above down_read_trylock() might have succeeded in
> > -              * which case we'll have missed the might_sleep() from
> > -              * down_read():
> > -              */
> > -             might_sleep();
> >       }
> >
> >       vma = find_vma(mm, address);
>
> The comment is stale, which isn't great.  The might_sleep() is already
> in the fast path.  So, moving it up above makes a lot of sense just in
> terms of simplicity.

Without this patch, I didn't see the might_sleep() in the fast path. What
am I missing here?

Thanks.

>
> The might_sleep() does need a comment, though, about what it's _doing_
> there.
>
> So, right idea, good result, but for the wrong reasons.
>
> If you want to revise the justification and add a comment, I think this
> is something we can take.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] x86/fault: move might_sleep() out of mmap read lock
  2021-11-22  6:59   ` Muchun Song
@ 2021-11-25 10:45     ` Thomas Gleixner
  2021-11-26  5:54       ` [External] " Muchun Song
  0 siblings, 1 reply; 5+ messages in thread
From: Thomas Gleixner @ 2021-11-25 10:45 UTC (permalink / raw)
  To: Muchun Song, Dave Hansen
  Cc: Dave Hansen, Andrew Lutomirski, Peter Zijlstra, Ingo Molnar,
	Borislav Petkov, Peter Anvin, X86 ML, LKML, Xiongchun duan,
	Qi Zheng

Muchun, Dave!

On Mon, Nov 22 2021 at 14:59, Muchun Song wrote:
> On Fri, Nov 19, 2021 at 11:04 PM Dave Hansen <dave.hansen@intel.com> wrote:
>> >
>> > +     might_sleep();
>> > +
>> >       /*
>> >        * Kernel-mode access to the user address space should only occur
>> >        * on well-defined single instructions listed in the exception
>> > @@ -1346,13 +1348,6 @@ void do_user_addr_fault(struct pt_regs *regs,
>> >               }
>> >  retry:
>> >               mmap_read_lock(mm);
>> > -     } else {
>> > -             /*
>> > -              * The above down_read_trylock() might have succeeded in
>> > -              * which case we'll have missed the might_sleep() from
>> > -              * down_read():
>> > -              */
>> > -             might_sleep();
>> >       }
>> >
>> >       vma = find_vma(mm, address);
>>
>> The comment is stale, which isn't great.  The might_sleep() is already
>> in the fast path.  So, moving it up above makes a lot of sense just in
>> terms of simplicity.

I don't think so. The point is:

	if (unlikely(!mmap_read_trylock(mm))) {
		if (!user_mode(regs) && !search_exception_tables(regs->ip)) {
			/*
			 * Fault from code in kernel from
			 * which we do not expect faults.
			 */
			bad_area_nosemaphore(regs, error_code, address);
			return;
		}

Moving it up will make the might_sleep() splat more important than an
unexpected fault when the unexpected fault happens in e.g. a preemption
disabled region. That's wrong because the important information in this
case is not the might sleep splat. The important information is the
fault itself.

But moving it up is even more wrong for spurious faults which are
correctly handled in that case via:

     bad_area_nosemaphore()
       __bad_area_nosemaphore()
         kernelmode_fixup_or_oops()
            handle(AMD erratum #91)
              is_prefetch()

So if such a spurious fault happens in a condition which would trigger
the might_sleep() splat then moving might_sleep() before the trylock()
will cause false positives. So, no. It's going to stay where it is.

> Without this patch, I didn't see the might_sleep() in the fast path. What
> am I missing here?

I have no idea what you are doing. If the trylock() succeeds and the
fault happened in e.g. a preemption disabled region then the
might_sleep() in the else path will trigger no matter what.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [External] Re: [RFC PATCH] x86/fault: move might_sleep() out of mmap read lock
  2021-11-25 10:45     ` Thomas Gleixner
@ 2021-11-26  5:54       ` Muchun Song
  0 siblings, 0 replies; 5+ messages in thread
From: Muchun Song @ 2021-11-26  5:54 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Dave Hansen, Dave Hansen, Andrew Lutomirski, Peter Zijlstra,
	Ingo Molnar, Borislav Petkov, Peter Anvin, X86 ML, LKML,
	Xiongchun duan, Qi Zheng

On Thu, Nov 25, 2021 at 6:45 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> Muchun, Dave!
>
> On Mon, Nov 22 2021 at 14:59, Muchun Song wrote:
> > On Fri, Nov 19, 2021 at 11:04 PM Dave Hansen <dave.hansen@intel.com> wrote:
> >> >
> >> > +     might_sleep();
> >> > +
> >> >       /*
> >> >        * Kernel-mode access to the user address space should only occur
> >> >        * on well-defined single instructions listed in the exception
> >> > @@ -1346,13 +1348,6 @@ void do_user_addr_fault(struct pt_regs *regs,
> >> >               }
> >> >  retry:
> >> >               mmap_read_lock(mm);
> >> > -     } else {
> >> > -             /*
> >> > -              * The above down_read_trylock() might have succeeded in
> >> > -              * which case we'll have missed the might_sleep() from
> >> > -              * down_read():
> >> > -              */
> >> > -             might_sleep();
> >> >       }
> >> >
> >> >       vma = find_vma(mm, address);
> >>
> >> The comment is stale, which isn't great.  The might_sleep() is already
> >> in the fast path.  So, moving it up above makes a lot of sense just in
> >> terms of simplicity.
>
> I don't think so. The point is:
>
>         if (unlikely(!mmap_read_trylock(mm))) {
>                 if (!user_mode(regs) && !search_exception_tables(regs->ip)) {
>                         /*
>                          * Fault from code in kernel from
>                          * which we do not expect faults.
>                          */
>                         bad_area_nosemaphore(regs, error_code, address);
>                         return;
>                 }
>
> Moving it up will make the might_sleep() splat more important than an
> unexpected fault when the unexpected fault happens in e.g. a preemption
> disabled region. That's wrong because the important information in this
> case is not the might sleep splat. The important information is the
> fault itself.
>
> But moving it up is even more wrong for spurious faults which are
> correctly handled in that case via:
>
>      bad_area_nosemaphore()
>        __bad_area_nosemaphore()
>          kernelmode_fixup_or_oops()
>             handle(AMD erratum #91)
>               is_prefetch()
>
> So if such a spurious fault happens in a condition which would trigger
> the might_sleep() splat then moving might_sleep() before the trylock()
> will cause false positives. So, no. It's going to stay where it is.

Got it. I didn't realize those cases. Thank you for your patient
explanation.


>
> > Without this patch, I didn't see the might_sleep() in the fast path. What
> > am I missing here?
>
> I have no idea what you are doing. If the trylock() succeeds and the
> fault happened in e.g. a preemption disabled region then the
> might_sleep() in the else path will trigger no matter what.
>
> Thanks,
>
>         tglx

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-11-26  5:57 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-19  6:58 [RFC PATCH] x86/fault: move might_sleep() out of mmap read lock Muchun Song
2021-11-19 15:04 ` Dave Hansen
2021-11-22  6:59   ` Muchun Song
2021-11-25 10:45     ` Thomas Gleixner
2021-11-26  5:54       ` [External] " Muchun Song

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).