From mboxrd@z Thu Jan  1 00:00:00 1970
From: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Subject: Re: [PATCH v10 11/20] x86/VPMU: Interface for setting
 PMU mode and flags
Date: Thu, 11 Sep 2014 10:12:45 -0400
Message-ID: <5411ADDD.2040003@oracle.com>
References: <1409802080-6160-1-git-send-email-boris.ostrovsky@oracle.com>
	<1409802080-6160-12-git-send-email-boris.ostrovsky@oracle.com>
	<541084C8020000780003366A@mail.emea.novell.com>
	<54108C50.7030500@oracle.com>
	<541160E90200007800033A7B@mail.emea.novell.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; Format="flowed"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
In-Reply-To: <541160E90200007800033A7B@mail.emea.novell.com>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Jan Beulich <JBeulich@suse.com>
Cc: tim@xen.org, kevin.tian@intel.com, keir@xen.org, suravee.suthikulpanit@amd.com, andrew.cooper3@citrix.com, eddie.dong@intel.com, xen-devel@lists.xen.org, Aravind.Gopalakrishnan@amd.com, jun.nakajima@intel.com
List-Id: xen-devel@lists.xenproject.org

On 09/11/2014 02:44 AM, Jan Beulich wrote:
>>>> On 10.09.14 at 19:37, <boris.ostrovsky@oracle.com> wrote:
>> On 09/10/2014 11:05 AM, Jan Beulich wrote:
>>>>>> On 04.09.14 at 05:41, <boris.ostrovsky@oracle.com> wrote:
>>>> +static int
>>>> +vpmu_force_context_switch(XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
>>>> +{
>>>> +    unsigned i, j, allbutself_num, tasknum, mycpu;
>>>> +    static s_time_t start;
>>>> +    static struct tasklet **sync_task;
>>>> +    struct vcpu *curr_vcpu = current;
>>>> +    static struct vcpu *sync_vcpu;
>>>> +    int ret = 0;
>>>> +
>>>> +    tasknum = allbutself_num = num_online_cpus() - 1;
>>>> +
>>>> +    if ( sync_task ) /* if set, we are in hypercall continuation */
>>>> +    {
>>>> +        if ( (sync_vcpu != NULL) && (sync_vcpu != curr_vcpu) )
>>>> +            /* We are not the original caller */
>>>> +            return -EAGAIN;
>>>> +        goto cont_wait;
>>>> +    }
>>>> +
>>>> +    sync_task = xmalloc_array(struct tasklet *, allbutself_num);
>>>> +    if ( !sync_task )
>>>> +    {
>>>> +        printk(XENLOG_WARNING "vpmu_force_context_switch: out of memory\n");
>>>> +        return -ENOMEM;
>>>> +    }
>>>> +
>>>> +    for ( tasknum = 0; tasknum < allbutself_num; tasknum++ )
>>>> +    {
>>>> +        sync_task[tasknum] = xmalloc(struct tasklet);
>>>> +        if ( sync_task[tasknum] == NULL )
>>>> +        {
>>>> +            printk(XENLOG_WARNING "vpmu_force_context_switch: out of memory\n");
>>>> +            ret = -ENOMEM;
>>>> +            goto out;
>>>> +        }
>>>> +        tasklet_init(sync_task[tasknum], vpmu_sched_checkin, 0);
>>>> +    }
>>>> +
>>>> +    atomic_set(&vpmu_sched_counter, 0);
>>>> +    sync_vcpu = curr_vcpu;
>>>> +
>>>> +    j = 0;
>>>> +    mycpu = smp_processor_id();
>>>> +    for_each_online_cpu( i )
>>>> +    {
>>>> +        if ( i != mycpu )
>>>> +            tasklet_schedule_on_cpu(sync_task[j++], i);
>>>> +    }
>>>> +
>>>> +    vpmu_save(curr_vcpu);
>>>> +
>>>> +    start = NOW();
>>>> +
>>>> + cont_wait:
>>>> +    /*
>>>> +     * Note that we may fail here if a CPU is hot-(un)plugged while we are
>>>> +     * waiting. We will then time out.
>>>> +     */
>>>> +    while ( atomic_read(&vpmu_sched_counter) != allbutself_num )
>>>> +    {
>>>> +        /* Give up after 5 seconds */
>>>> +        if ( NOW() > start + SECONDS(5) )
>>>> +        {
>>>> +            printk(XENLOG_WARNING
>>>> +                   "vpmu_force_context_switch: failed to sync\n");
>>>> +            ret = -EBUSY;
>>>> +            break;
>>>> +        }
>>>> +        cpu_relax();
>>>> +        if ( hypercall_preempt_check() )
>>>> +            return hypercall_create_continuation(
>>>> +                __HYPERVISOR_xenpmu_op, "ih", XENPMU_mode_set, arg);
>>>> +    }
>>> I wouldn't complain about this not being synchronized with CPU
>>> hotplug if there wasn't this hypercall continuation and relatively
>>> long timeout. Much of the state you latch in static variables will
>>> cause this operation to time out if in between a CPU got brought
>>> down.
>> It seemed to me that if we were to correctly deal with CPU hotplug it
>> would add a bit too much complexity to the code. So I felt that letting
>> the operation timeout would be a better way out.
> The please at least add a code comment making this explicit to
> future readers.

Is the comment above 'while' keyword not sufficient?

> Otoh I can't see much complexity in e.g. just
> making hot unplug attempts fail with -EAGAIN when an the
> operation here still is in progress. Of course it then needs to be
> made sure that even if for some reason the continuation never
> happens (because e.g. the guest gets stuck in a interrupt
> handler), the state would get cleared after the chosen timeout.

Yes (and the fact that I'd need to add a notifier in VPMU code or 
something like that). It's all doable but I just didn't think it's worth it.


>
>>> And as already alluded to, all this looks rather fragile anyway,
>>> even if I can't immediately spot any problems with it anymore.
>> The continuation is really a carry-over from earlier patch version when
>> I had double loops over domain and VCPUs to explicitly unload VPMUs. At
>> that time Andrew pointed out that these loops may take really long time
>> and so I added continuations.
>>
>> Now that I changed that after realizing that having each PCPU go through
>> a context switch is sufficient perhaps I don't need it any longer. Is
>> the worst case scenario of being stuck here for 5 seconds (chosen
>> somewhat arbitrary) acceptable without continuation?
> 5 seconds is _way_ too long for doing this without continuation.

Then I am also adding back your other comment from this thread

 > > +long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) 
arg)
 > > +{
 > > +    int ret = -EINVAL;
 > > +    xen_pmu_params_t pmu_params;
 > > +
 > > +    switch ( op )
 > > +    {
 > > +    case XENPMU_mode_set:
 > > +    {
 > > +        static DEFINE_SPINLOCK(xenpmu_mode_lock);
 > > +        uint32_t current_mode;
 > > +
 > > +        if ( !is_control_domain(current->domain) )
 > > +            return -EPERM;
 > > +
 > > +        if ( copy_from_guest(&pmu_params, arg, 1) )
 > > +            return -EFAULT;
 > > +
 > > +        if ( pmu_params.val & ~XENPMU_MODE_SELF )
 > > +            return -EINVAL;
 > > +
 > > +        /*
 > > +         * Return error is someone else is in the middle of 
changing mode ---
 > > +         * this is most likely indication of two system administrators
 > > +         * working against each other
 > > +         */
 > > +        if ( !spin_trylock(&xenpmu_mode_lock) )
 > > +            return -EAGAIN;
 >
 > So what happens if you can't take the lock in a continuation? If
 > returning -EAGAIN in that case is not a problem, what do you
 > need the continuation for in the first place?

EAGAIN this case means that the caller was not able to initiate the 
operation. Continuation will allow the caller to finish operation in 
progress.

-boris