From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ingo Molnar Subject: Re: [PATCH 4/6] kvm tools: Add rwlock wrapper Date: Sun, 29 May 2011 09:35:50 +0200 Message-ID: <20110529073550.GA21254@elte.hu> References: <20110526180518.GA3572@elte.hu> <4DDE97CE.4000302@redhat.com> <20110526202531.GA2765@elte.hu> <20110526230508.GA15983@Krystal> <20110527102533.GA24608@elte.hu> <20110527110729.GA26920@elte.hu> <4DE13AF0.2080001@redhat.com> <20110528183259.GA15019@elte.hu> <4DE1EA93.6040401@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Mathieu Desnoyers , Pekka Enberg , Sasha Levin , john@jfloren.net, kvm@vger.kernel.org, asias.hejun@gmail.com, gorcunov@gmail.com, prasadjoshi124@gmail.com, "Paul E. McKenney" To: Avi Kivity Return-path: Received: from mx3.mail.elte.hu ([157.181.1.138]:32949 "EHLO mx3.mail.elte.hu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750814Ab1E2HgF (ORCPT ); Sun, 29 May 2011 03:36:05 -0400 Content-Disposition: inline In-Reply-To: <4DE1EA93.6040401@redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: * Avi Kivity wrote: > On 05/28/2011 09:32 PM, Ingo Molnar wrote: > >* Avi Kivity wrote: > > > >> > So if you set a notification signal via fcntl(F_SETOWN) on the > >> > scheduler context switch event fd, the user-space RCU code will > >> > get a signal on every context switch. > >> > >> Context switches are completely uninteresting for userspace rcu: > >> > >> rcu_read_lock(); > >> ---> context switch > >> > >> have we learned anything from that? no. User code is always > >> preemptible and migratable. If rcu_read_lock() prevented migration > >> somehow, then we'd know that a context switch means we've started a > >> grace period for this thread. But it doesn't, so we don't. > > > >Well, in the next mail i mentioned that we can do migration events as > >well, which would be useful: instead of having to keep track of > >nr_tasks RCU grace periods we could simplify it down to nr_cpus. > > I don't see how a migration event helps. It is completely > transparent from the task's point of view. It's not transparent at all if you index RCU data structures by the current CPU index, which the kernel implementation does. Doing that has the advantage of being much more cache-compressed than the TID index, and also having better worst-case grace period latency properties than a TID index. > > But if we indexed by the TID then we wouldnt need any scheduler > > bindings at all - this is the simpler approach. > > Yes, and it maps 1:1 to the kernel implementation (cpu = task). No, the kernel indexes grace period tracking (and the write-completion queues) by CPU index. > >> What's needed are explicit notifications about grace periods. For > >> the vcpu threads, calling KVM_VCPU_RUN seems like a good point. > >> For I/O threads, completion of processing of an event is also a > >> good point. > > > > Grace period notifications are needed too, obviously. > > I'd think they're sufficient, no? Is something else needed? I think you are missing the fact that in the kernel we index RCU data structures by CPU number: static void rcu_preempt_qs(int cpu) { struct rcu_data *rdp = &per_cpu(rcu_preempt_data, cpu); ... static void rcu_preempt_note_context_switch(int cpu) { struct task_struct *t = current; unsigned long flags; struct rcu_data *rdp; struct rcu_node *rnp; if (t->rcu_read_lock_nesting && (t->rcu_read_unlock_special & RCU_READ_UNLOCK_BLOCKED) == 0) { /* Possibly blocking in an RCU read-side critical section. */ rdp = per_cpu_ptr(rcu_preempt_state.rda, cpu); ... Which could be changed over to be per task in user-space by treating the TID as a 'virtual CPU' equivalent. This probably lengthens worst-case rcu_sync() latencies rather significantly though - possibly turning urcu into a stop_machine_run() equivalent in the worst-case. (but i could be wrong about this last bit) Thanks, Ingo