From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1751215AbdFEPdu (ORCPT <rfc822;w@1wt.eu>);
        Mon, 5 Jun 2017 11:33:50 -0400
Received: from userp1040.oracle.com ([156.151.31.81]:45874 "EHLO
        userp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1751112AbdFEPdt (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 5 Jun 2017 11:33:49 -0400
Subject: Re: [Xen-devel] [PATCH] xen-evtchn: Bind dyn evtchn:qemu-dm interrupt
 to next online VCPU
To: Anoob Soman <anoob.soman@citrix.com>, xen-devel@lists.xenproject.org,
        linux-kernel@vger.kernel.org
References: <1496414988-12878-1-git-send-email-anoob.soman@citrix.com>
 <363cb97a-7dc1-ae4f-da93-30e7658cef00@oracle.com>
 <5a5d9355-34fc-57aa-825c-81123f6bb74e@citrix.com>
 <c8640680-b179-a6d6-cbb9-825d4bf0017d@oracle.com>
 <0c2a3a4b-e442-8c8c-6a71-6f9972ff29fc@citrix.com>
Cc: jgross@suse.com
From: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Message-ID: <f6ae8dd3-0a17-bb62-596f-3512f0ab102b@oracle.com>
Date: Mon, 5 Jun 2017 11:32:44 -0400
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
 Thunderbird/45.8.0
MIME-Version: 1.0
In-Reply-To: <0c2a3a4b-e442-8c8c-6a71-6f9972ff29fc@citrix.com>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Source-IP: userv0022.oracle.com [156.151.31.74]
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On 06/05/2017 10:49 AM, Anoob Soman wrote:
> On 05/06/17 15:10, Boris Ostrovsky wrote:
>>> The reason for percpu instead of global, was to avoid locking. We can
>>> have a global variable (last_cpu) without locking, but value of
>>> last_cpu wont be consistent, without locks. Moreover, since
>>> irq_affinity is also used in the calculation of cpu to bind, having a
>>> percpu or global wouldn't really matter, as the result (selected_cpu)
>>> is more likely to be random (because different irqs can have different
>>> affinity). What do you guys suggest.
>> Doesn't initial affinity (which is what we expect here since irqbalance
>> has not run yet) typically cover all guest VCPUs?
>
> Yes, initial affinity covers all online VCPUs. But there is a small
> chance that initial affinity might change, before
> evtch_bind_interdom_next_vcpu is called. For example, I could run a
> script to change irq affinity, just when irq sysfs entry appears. This
> is the reason that I thought it would be sensible (based on your
> suggestion) to include irq_affinity to calculate the next VCPU. If you
> think, changing irq_affinity between request_irq() and
> evtch_bind_interdom_next_vcpu is virtually impossible, then we can
> drop affinity and just use cpu_online_mask.

I believe we do need to take affinity into consideration even if the
chance that it is non-default is small.

I am not opposed to having bind_last_selected_cpu percpu, I just wanted
to understand the reason better. Additional locking would be a downside
with a global so if you feel that percpu is worth it then I won't object.

>
>>>
>>> I think we would still require spin_lock(). spin_lock is for irq_desc.
>> If you are trying to protect affinity then it may well change after you
>> drop the lock.
>>
>> In fact, don't you have a race here? If we offline a VCPU we will (by
>> way of cpu_disable_common()->fixup_irqs()) update affinity to reflect
>> that a CPU is gone and there is a chance that xen_rebind_evtchn_to_cpu()
>> will happen after that.
>>
>> So, contrary to what I said earlier ;-) not only do you need the lock,
>> but you should hold it across xen_rebind_evtchn_to_cpu() call. Does this
>> make sense?
>
> Yes, you are correct. .irq_set_affinity pretty much does the same thing.
>
> The code will now looks like this.
> raw_spin_lock_irqsave(lock, flags);
> percpu read
> select_cpu
> percpu write
> xen_rebind_evtchn_to_cpu(evtchn, selected_cpu)
> raw_spin_unlock_irqsave(lock, flags);

(BTW, I just noticed --- you don't need to initialize desc)

-boris