All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andy Lutomirski <luto@amacapital.net>
To: Ingo Molnar <mingo@kernel.org>
Cc: fweisbec@redhat.com, Paolo Bonzini <pbonzini@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>, X86 ML <x86@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	Mike Galbraith <umgwanakikbuti@gmail.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Rik van Riel <riel@redhat.com>,
	williams@redhat.com
Subject: Re: [PATCH 3/3] context_tracking,x86: remove extraneous irq disable & enable from context tracking on syscall entry
Date: Thu, 7 May 2015 05:22:40 -0700	[thread overview]
Message-ID: <CALCETrXtd2PBLBhAdKPTxuMbE9pUSvsDLpBJnd2srJg5hk=LhQ@mail.gmail.com> (raw)
In-Reply-To: <20150507104845.GB14924@gmail.com>

On May 7, 2015 4:18 PM, "Ingo Molnar" <mingo@kernel.org> wrote:
>
>
> * Rik van Riel <riel@redhat.com> wrote:
>
> > > If, on the other hand, you're just going to remotely sample the
> > > in-memory context, that sounds good.
> >
> > It's the latter.
> >
> > If you look at /proc/<pid>/{stack,syscall,wchan} and other files,
> > you will see we already have ways to determine, from in memory
> > content, where a program is running at a certain point in time.
> >
> > In fact, the timer interrupt based accounting does a similar thing.
> > It has a task examine its own in-memory state to figure out what it
> > was doing before the timer interrupt happened.
> >
> > The kernel side stack pointer is probably enough to tell us whether
> > a task is active in kernel space, on an irq stack, or (maybe) in
> > user space. Not convinced about the latter, we may need to look at
> > the same state the RCU code keeps track of to see what mode a task
> > is in...
> >
> > I am looking at the code to see what locks we need to grab.
> >
> > I suspect the runqueue lock may be enough, to ensure that the task
> > struct, and stack do not go away while we are looking at them.
>
> That will be enough, especially if you get to the task reference via
> rq->curr.
>
> > We cannot take the lock_trace(task) from irq context, and we
> > probably do not need to anyway, since we do not care about a precise
> > stack trace for the task.
>
> So one worry with this and similar approaches of statistically
> detecting user mode would be the fact that on the way out to
> user-space we don't really destroy the previous call trace - we just
> pop off the stack (non-destructively), restore RIPs and are gone.
>
> We'll need that percpu flag I suspect.
>
> And once we have the flag, we can get rid of the per syscall RCU
> callback as well, relatively easily: with CMPXCHG (in
> synchronize_rcu()!) we can reliably sample whether a CPU is in user
> mode right now, while the syscall entry/exit path does not use any
> atomics, we can just use a simple MOV.
>
> Once we observe 'user mode', then we have observed quiescent state and
> synchronize_rcu() can continue. If we've observed kernel mode we can
> frob the remote task's TIF_ flags to make it go into a quiescent state
> publishing routine on syscall-return.
>

How does that work?

If the exit code writes the per-cpu flag and then checks TIF_whatever,
we need a barrier to avoid a race where we end up in user mode without
seeing the flag.

I think the right solution is to accept that race and just have the
RCU code send an IPI (or check again) if it sees too long of a period
elapse in kernel mode.

I think the flag should be a counter, though.  That way a workload
that makes lots of syscalls will be quickly detected as going through
quiescent states even if it's never actually observed in user mode.

> The only hard requirement of this scheme from the RCU synchronization
> POV is that all kernel contexts that may touch RCU state need to flip
> this flag reliably to 'kernel mode': i.e. all irq handlers, traps,
> NMIs and all syscall variants need to do this.
>
> But once it's there, it's really neat.
>

We already have to do this with the current code.  I went through and
checked all of the IST entries a couple versions ago.

I think we need to clean up the current garbage asm first, though.  See:

https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commit/?h=x86/entry&id=d22f1dca4c7c93fdd1ce754e38d71d1961c0f9ac

(Very much unfinished, and it should probably be split up, but AFAICT
it works.  Don't hold your breath for a real version.)

--Andy

> Thanks,
>
>         Ingo

  parent reply	other threads:[~2015-05-07 12:23 UTC|newest]

Thread overview: 83+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-30 21:23 [PATCH 0/3] reduce nohz_full syscall overhead by 10% riel
2015-04-30 21:23 ` [PATCH 1/3] reduce indentation in __acct_update_integrals riel
2015-04-30 21:23 ` [PATCH 2/3] remove local_irq_save from __acct_update_integrals riel
2015-04-30 21:23 ` [PATCH 3/3] context_tracking,x86: remove extraneous irq disable & enable from context tracking on syscall entry riel
2015-04-30 21:56   ` Andy Lutomirski
2015-05-01  6:40   ` Ingo Molnar
2015-05-01 15:20     ` Rik van Riel
2015-05-01 15:59       ` Ingo Molnar
2015-05-01 16:03         ` Andy Lutomirski
2015-05-01 16:21           ` Ingo Molnar
2015-05-01 16:26             ` Rik van Riel
2015-05-01 16:34               ` Ingo Molnar
2015-05-01 18:05                 ` Rik van Riel
2015-05-01 18:40                   ` Ingo Molnar
2015-05-01 19:11                     ` Rik van Riel
2015-05-01 19:37                       ` Andy Lutomirski
2015-05-02  5:27                         ` Ingo Molnar
2015-05-02 18:27                           ` Rik van Riel
2015-05-03 18:41                           ` Andy Lutomirski
2015-05-07 10:35                             ` Ingo Molnar
2015-05-04  9:26                           ` Paolo Bonzini
2015-05-04 13:30                             ` Rik van Riel
2015-05-04 14:06                             ` Rik van Riel
2015-05-04 14:19                             ` Rik van Riel
2015-05-04 15:59                             ` question about RCU dynticks_nesting Rik van Riel
2015-05-04 18:39                               ` Paul E. McKenney
2015-05-04 19:39                                 ` Rik van Riel
2015-05-04 20:02                                   ` Paul E. McKenney
2015-05-04 20:13                                     ` Rik van Riel
2015-05-04 20:38                                       ` Paul E. McKenney
2015-05-04 20:53                                         ` Rik van Riel
2015-05-05  5:54                                           ` Paul E. McKenney
2015-05-06  1:49                                             ` Mike Galbraith
2015-05-06  3:44                                               ` Mike Galbraith
2015-05-06  6:06                                                 ` Paul E. McKenney
2015-05-06  6:52                                                   ` Mike Galbraith
2015-05-06  7:01                                                     ` Mike Galbraith
2015-05-07  0:59                                           ` Frederic Weisbecker
2015-05-07 15:44                                             ` Rik van Riel
2015-05-04 19:00                               ` Rik van Riel
2015-05-04 19:39                                 ` Paul E. McKenney
2015-05-04 19:59                                   ` Rik van Riel
2015-05-04 20:40                                     ` Paul E. McKenney
2015-05-05 10:53                                   ` Peter Zijlstra
2015-05-05 12:34                                     ` Paul E. McKenney
2015-05-05 13:00                                       ` Peter Zijlstra
2015-05-05 18:35                                         ` Paul E. McKenney
2015-05-05 21:09                                           ` Rik van Riel
2015-05-06  5:41                                             ` Paul E. McKenney
2015-05-05 10:48                                 ` Peter Zijlstra
2015-05-05 10:51                                   ` Peter Zijlstra
2015-05-05 12:30                                     ` Paul E. McKenney
2015-05-02  4:06                   ` [PATCH 3/3] context_tracking,x86: remove extraneous irq disable & enable from context tracking on syscall entry Mike Galbraith
2015-05-01 16:37             ` Ingo Molnar
2015-05-01 16:40               ` Rik van Riel
2015-05-01 16:45                 ` Ingo Molnar
2015-05-01 16:54                   ` Rik van Riel
2015-05-01 17:12                     ` Ingo Molnar
2015-05-01 17:22                       ` Rik van Riel
2015-05-01 17:59                         ` Ingo Molnar
2015-05-01 16:22           ` Rik van Riel
2015-05-01 16:27             ` Ingo Molnar
2015-05-03 13:23       ` Mike Galbraith
2015-05-03 17:30         ` Rik van Riel
2015-05-03 18:24           ` Andy Lutomirski
2015-05-03 18:52             ` Rik van Riel
2015-05-07 10:48               ` Ingo Molnar
2015-05-07 12:18                 ` Frederic Weisbecker
2015-05-07 12:29                   ` Ingo Molnar
2015-05-07 15:47                     ` Rik van Riel
2015-05-08  7:58                       ` Ingo Molnar
2015-05-07 12:22                 ` Andy Lutomirski [this message]
2015-05-07 12:44                   ` Ingo Molnar
2015-05-07 12:49                     ` Ingo Molnar
2015-05-08  6:17                       ` Paul E. McKenney
2015-05-07 12:52                     ` Andy Lutomirski
2015-05-07 15:08                       ` Ingo Molnar
2015-05-07 17:47                         ` Andy Lutomirski
2015-05-08  6:37                           ` Ingo Molnar
2015-05-08 10:59                             ` Andy Lutomirski
2015-05-08 11:27                               ` Ingo Molnar
2015-05-08 12:56                                 ` Andy Lutomirski
2015-05-08 13:27                                   ` Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CALCETrXtd2PBLBhAdKPTxuMbE9pUSvsDLpBJnd2srJg5hk=LhQ@mail.gmail.com' \
    --to=luto@amacapital.net \
    --cc=fweisbec@redhat.com \
    --cc=heiko.carstens@de.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=umgwanakikbuti@gmail.com \
    --cc=williams@redhat.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.