From mboxrd@z Thu Jan  1 00:00:00 1970
From: "Wu, Feng" <feng.wu@intel.com>
Subject: Re: [PATCH v8 15/17] vmx: VT-d posted-interrupt core
 logic handling
Date: Wed, 28 Oct 2015 02:40:02 +0000
Message-ID: <E959C4978C3B6342920538CF579893F00AE0F6FE@SHSMSX104.ccr.corp.intel.com>
References: <1444640103-4685-1-git-send-email-feng.wu@intel.com>
	<1444640103-4685-16-git-send-email-feng.wu@intel.com>
	<1445870370.2717.103.camel@citrix.com>
	<E959C4978C3B6342920538CF579893F00AE08CB9@SHSMSX104.ccr.corp.intel.com>
	<1445948204.2937.130.camel@citrix.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
In-Reply-To: <1445948204.2937.130.camel@citrix.com>
Content-Language: en-US
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Dario Faggioli <dario.faggioli@citrix.com>, "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
Cc: "Tian, Kevin" <kevin.tian@intel.com>, "Wu, Feng" <feng.wu@intel.com>, George Dunlap <george.dunlap@eu.citrix.com>, Andrew Cooper <andrew.cooper3@citrix.com>, Jan Beulich <jbeulich@suse.com>, Keir Fraser <keir@xen.org>
List-Id: xen-devel@lists.xenproject.org


> -----Original Message-----
> From: Dario Faggioli [mailto:dario.faggioli@citrix.com]
> Sent: Tuesday, October 27, 2015 8:17 PM
> To: Wu, Feng <feng.wu@intel.com>; xen-devel@lists.xen.org
> Cc: Tian, Kevin <kevin.tian@intel.com>; Keir Fraser <keir@xen.org>; George
> Dunlap <george.dunlap@eu.citrix.com>; Andrew Cooper
> <andrew.cooper3@citrix.com>; Jan Beulich <jbeulich@suse.com>
> Subject: Re: [Xen-devel] [PATCH v8 15/17] vmx: VT-d posted-interrupt core logic
> handling
> 
> On Tue, 2015-10-27 at 05:19 +0000, Wu, Feng wrote:
> > > -----Original Message-----
> > > From: Dario Faggioli [mailto:dario.faggioli@citrix.com]
> > >
> 
> > This is something similar with patch v7 and before, doing vcpu block
> > during context switch, and seems during the discussion, you guys
> > prefer doing the vcpu blocking things outside context switch.
> >
> I know, that's why I'm not 100% sure of the path to take (I think I
> made that clear).
> 
> On one hand, I'm close to convince myself that it's "just" a rollback
> of the blocking, which is something we do already, when we clear the
> flags. On the other hand, it's two hooks, which is worse than one, IMO,
> especially if one is a 'cancel' hook. :-(
> 
> > >
> > > At the time, I "voted against" this design, because it seemed we
> > > could
> > > manage to handle interrupt ('regular' and posted) happening during
> > > blocking in one and unified way, and with _only_ arch_vcpu_block().
> > > If
> > > that is no longer the case (and it's not, as we're adding more
> > > hooks,
> > > and the need to call the second is a special case being introduced
> > > by
> > > PI), it may be worth reconsidering things...
> > >
> > > So, all in all, I don't know. As said, I don't like this
> > > cancellation
> > > hook because it's one more hook and because --while I see why it's
> > > useful in this specific case-- I don't like having it in generic
> > > code
> > > (in schedule.c), and even less having it called in two places
> > > (vcpu_block() and do_pool()). However, if others (Jan and George, I
> > > guess) are not equally concerned about it, I can live with it.
> > >
> > If I understand it correctly, this block cancel method was suggested
> > by George, please refer to the attached email. George, what is your
> > opinion about it? It is better to discuss a clear solution before I
> > continue to post another version. Thanks a lot!
> >
> Sure.
> 
> Thanks for mentioning and attaching the email.
> 
> So, bear me with me a bit: do you mind explaining (possibly again, in
> which case, sorry) why we need, for instance in vcpu_block(), to call
> the hook as early as you're calling it and not later?

No problem, it is my responsibility to explain this to your guys, so
you can give better review! :)

> 
> I mean, what's the problem with something like this:
> 
> void vcpu_block(void)
> {
>     struct vcpu *v = current;
> 
>     set_bit(_VPF_blocked, &v->pause_flags);
> 
>     /* Check for events /after/ blocking: avoids wakeup waiting race. */
>     if ( local_events_need_delivery() )
>     {
>         clear_bit(_VPF_blocked, &v->pause_flags);
>     }
>     else
>     {
>  -->    arch_vcpu_block(v);
>         TRACE_2D(TRC_SCHED_BLOCK, v->domain->domain_id, v->vcpu_id);
>         raise_softirq(SCHEDULE_SOFTIRQ);
>     }
> }
> 
> ?

Here is the story, in arch_vcpu_block(), we will change the status of
PI descriptor and put the vCPU in the pCPU blocked list. However,
during updating the posted-interrupt descriptor for blocked vCPU,
we need check whether 'ON' is set, if that is the case, we cannot
blocked the vCPU since an notification event comes in, so in patch
v7 and before, we do it this way (in v7, the following blocking related
code is running in context switch) :

+        v->arch.hvm_vmx.pi_block_cpu = v->processor;
+
+        spin_lock(&per_cpu(pi_blocked_vcpu_lock, v->arch.hvm_vmx.pi_block_cpu));
+        list_add_tail(&v->arch.hvm_vmx.pi_blocked_vcpu_list,
+                      &per_cpu(pi_blocked_vcpu, v->arch.hvm_vmx.pi_block_cpu));
+        spin_unlock(&per_cpu(pi_blocked_vcpu_lock,
+                    v->arch.hvm_vmx.pi_block_cpu));
+
+        do {
+            old.control = new.control = pi_desc->control;
+
+            /* Should not block the vCPU if an interrupt was posted for it. */
+            if ( pi_test_on(&old) )
+            {
+                spin_unlock_irqrestore(&v->arch.hvm_vmx.pi_lock, flags);
+                vcpu_unblock(v);
+                return;
+            }
+
+            /*
+             * Change the 'NDST' field to v->arch.hvm_vmx.pi_block_cpu,
+             * so when external interrupts from assigned deivces happen,
+             * wakeup notifiction event will go to
+             * v->arch.hvm_vmx.pi_block_cpu, then in pi_wakeup_interrupt()
+             * we can find the vCPU in the right list to wake up.
+             */
+            dest = cpu_physical_id(v->arch.hvm_vmx.pi_block_cpu);
+
+            if ( x2apic_enabled )
+                new.ndst = dest;
+            else
+                new.ndst = MASK_INSR(dest, PI_xAPIC_NDST_MASK);
+
+            pi_clear_sn(&new);
+            new.nv = pi_wakeup_vector;
+        } while ( cmpxchg(&pi_desc->control, old.control, new.control) !=
+                  old.control );
+    }

However, seems using vcpu_unblock(v) is not a good ideas in context
switch, so we put the above code in a arch_vcpu_block hook, and
George think it is not a good idea to check 'ON' during update the
PI status, and we need roll back after that if needed. So here comes
the current solution in v8: removing the 'ON' checking in vmx_vcpu_block()
and reuse local_events_need_delivery() in vcpu_block(), if it is true,
call arch_vcpu_block_cancel() to roll back the previous blocking
operations.

Back to your suggestion above, if we put the arch_vcpu_block() inside
the else part of vcpu_block(), we still need check if 'ON' is set during
updating PI status, if it is we also need to roll back the operations in
vmx_vcpu_block().

> 
> In fact, George said this in the mail you mention:
> "We shouldn't need to actually clear SN [in the arch_block hook]; SN
> should already be clear because the vcpu should be currently running.
> And if it's just been running, then NDST should also already be the
> correct pcpu."

Yes, I think this is correct to me too.

> 
> And that seems correct to me. So, the difference looks to me to be
> "only" the NV, and whether or not the vcpu will be in a blocked list
> already. The latter, seems something we can easily compensate for (and
> you're doing it already, AFAICT); the former, I'm not sure whether it
> could be an issue or not.
> 
> What am I missing?

Does above explanation make sense to you? If you have any concern,
feel free to ask me! :) 

> 
> Note that this is "just" to understand and form an opinion. Sorry again
> if what I asked have been analyzed already, but I don't remember
> anything like that, and I'm not super-familiar with these interrupt
> things. :-/
> Still in that email, there is something about the possibility of having
> to disable the interrupts. I guess that didn't end up to be necessary?

Yes, after more think, maybe we don't need to disable interrupts if
we handle things carefully. Let's forget this at this moment. :-/

Thanks,
Feng

> 
> Thanks and Regards,
> Dario
> --
> <<This happens because I choose it to happen!>> (Raistlin Majere)
> -----------------------------------------------------------------
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)