From mboxrd@z Thu Jan  1 00:00:00 1970
From: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Subject: Re: [PATCH v10 11/20] x86/VPMU: Interface for setting
 PMU mode and flags
Date: Wed, 10 Sep 2014 13:37:20 -0400
Message-ID: <54108C50.7030500@oracle.com>
References: <1409802080-6160-1-git-send-email-boris.ostrovsky@oracle.com>
	<1409802080-6160-12-git-send-email-boris.ostrovsky@oracle.com>
	<541084C8020000780003366A@mail.emea.novell.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; Format="flowed"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
In-Reply-To: <541084C8020000780003366A@mail.emea.novell.com>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Jan Beulich <JBeulich@suse.com>
Cc: tim@xen.org, kevin.tian@intel.com, keir@xen.org, suravee.suthikulpanit@amd.com, andrew.cooper3@citrix.com, eddie.dong@intel.com, xen-devel@lists.xen.org, Aravind.Gopalakrishnan@amd.com, jun.nakajima@intel.com
List-Id: xen-devel@lists.xenproject.org

On 09/10/2014 11:05 AM, Jan Beulich wrote:
>>>> On 04.09.14 at 05:41, <boris.ostrovsky@oracle.com> wrote:
>> --- a/xen/arch/x86/domain.c
>> +++ b/xen/arch/x86/domain.c
>> @@ -1499,8 +1499,7 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
>>   
>>       if ( is_hvm_vcpu(prev) )
>>       {
>> -        if (prev != next)
>> -            vpmu_save(prev);
>> +        vpmu_switch_from(prev, next);
> It escapes me why you move the "prev != next" check here ...

No need to, indeed.

>
>> @@ -1543,9 +1542,9 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
>>                              !is_hardware_domain(next->domain));
>>       }
>>   
>> -    if (is_hvm_vcpu(next) && (prev != next) )
>> +    if ( is_hvm_vcpu(prev) )
>>           /* Must be done with interrupts enabled */
>> -        vpmu_load(next);
>> +        vpmu_switch_to(prev, next);
> ... and here into the wrapper functions.
>
>> +static int
>> +vpmu_force_context_switch(XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
>> +{
>> +    unsigned i, j, allbutself_num, tasknum, mycpu;
>> +    static s_time_t start;
>> +    static struct tasklet **sync_task;
>> +    struct vcpu *curr_vcpu = current;
>> +    static struct vcpu *sync_vcpu;
>> +    int ret = 0;
>> +
>> +    tasknum = allbutself_num = num_online_cpus() - 1;
>> +
>> +    if ( sync_task ) /* if set, we are in hypercall continuation */
>> +    {
>> +        if ( (sync_vcpu != NULL) && (sync_vcpu != curr_vcpu) )
>> +            /* We are not the original caller */
>> +            return -EAGAIN;
>> +        goto cont_wait;
>> +    }
>> +
>> +    sync_task = xmalloc_array(struct tasklet *, allbutself_num);
>> +    if ( !sync_task )
>> +    {
>> +        printk(XENLOG_WARNING "vpmu_force_context_switch: out of memory\n");
>> +        return -ENOMEM;
>> +    }
>> +
>> +    for ( tasknum = 0; tasknum < allbutself_num; tasknum++ )
>> +    {
>> +        sync_task[tasknum] = xmalloc(struct tasklet);
>> +        if ( sync_task[tasknum] == NULL )
>> +        {
>> +            printk(XENLOG_WARNING "vpmu_force_context_switch: out of memory\n");
>> +            ret = -ENOMEM;
>> +            goto out;
>> +        }
>> +        tasklet_init(sync_task[tasknum], vpmu_sched_checkin, 0);
>> +    }
>> +
>> +    atomic_set(&vpmu_sched_counter, 0);
>> +    sync_vcpu = curr_vcpu;
>> +
>> +    j = 0;
>> +    mycpu = smp_processor_id();
>> +    for_each_online_cpu( i )
>> +    {
>> +        if ( i != mycpu )
>> +            tasklet_schedule_on_cpu(sync_task[j++], i);
>> +    }
>> +
>> +    vpmu_save(curr_vcpu);
>> +
>> +    start = NOW();
>> +
>> + cont_wait:
>> +    /*
>> +     * Note that we may fail here if a CPU is hot-(un)plugged while we are
>> +     * waiting. We will then time out.
>> +     */
>> +    while ( atomic_read(&vpmu_sched_counter) != allbutself_num )
>> +    {
>> +        /* Give up after 5 seconds */
>> +        if ( NOW() > start + SECONDS(5) )
>> +        {
>> +            printk(XENLOG_WARNING
>> +                   "vpmu_force_context_switch: failed to sync\n");
>> +            ret = -EBUSY;
>> +            break;
>> +        }
>> +        cpu_relax();
>> +        if ( hypercall_preempt_check() )
>> +            return hypercall_create_continuation(
>> +                __HYPERVISOR_xenpmu_op, "ih", XENPMU_mode_set, arg);
>> +    }
> I wouldn't complain about this not being synchronized with CPU
> hotplug if there wasn't this hypercall continuation and relatively
> long timeout. Much of the state you latch in static variables will
> cause this operation to time out if in between a CPU got brought
> down.

It seemed to me that if we were to correctly deal with CPU hotplug it 
would add a bit too much complexity to the code. So I felt that letting 
the operation timeout would be a better way out.

>
> And as already alluded to, all this looks rather fragile anyway,
> even if I can't immediately spot any problems with it anymore.

The continuation is really a carry-over from earlier patch version when 
I had double loops over domain and VCPUs to explicitly unload VPMUs. At 
that time Andrew pointed out that these loops may take really long time 
and so I added continuations.

Now that I changed that after realizing that having each PCPU go through 
a context switch is sufficient perhaps I don't need it any longer. Is 
the worst case scenario of being stuck here for 5 seconds (chosen 
somewhat arbitrary) acceptable without continuation?


-boris

>
>> +long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
>> +{
>> +    int ret = -EINVAL;
>> +    xen_pmu_params_t pmu_params;
>> +
>> +    switch ( op )
>> +    {
>> +    case XENPMU_mode_set:
>> +    {
>> +        static DEFINE_SPINLOCK(xenpmu_mode_lock);
>> +        uint32_t current_mode;
>> +
>> +        if ( !is_control_domain(current->domain) )
>> +            return -EPERM;
>> +
>> +        if ( copy_from_guest(&pmu_params, arg, 1) )
>> +            return -EFAULT;
>> +
>> +        if ( pmu_params.val & ~XENPMU_MODE_SELF )
>> +            return -EINVAL;
>> +
>> +        /*
>> +         * Return error is someone else is in the middle of changing mode ---
>> +         * this is most likely indication of two system administrators
>> +         * working against each other
>> +         */
>> +        if ( !spin_trylock(&xenpmu_mode_lock) )
>> +            return -EAGAIN;
> So what happens if you can't take the lock in a continuation? If
> returning -EAGAIN in that case is not a problem, what do you
> need the continuation for in the first place?
>
> Jan