xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Andrii Anisov <andrii.anisov@gmail.com>
To: Julien Grall <julien.grall@arm.com>, xen-devel@lists.xenproject.org
Cc: Tim Deegan <tim@xen.org>,
	Stefano Stabellini <sstabellini@kernel.org>,
	Andrii Anisov <andrii_anisov@epam.com>, Wei Liu <wl@xen.org>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	George Dunlap <george.dunlap@eu.citrix.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	Ian Jackson <ian.jackson@eu.citrix.com>,
	Dario Faggioli <dfaggioli@suse.com>,
	Jan Beulich <jbeulich@suse.com>
Subject: Re: [Xen-devel] [RFC 1/9] schedule: Introduce per-pcpu time accounting
Date: Wed, 6 Nov 2019 13:24:51 +0200	[thread overview]
Message-ID: <f3767489-e46a-3830-8b3c-0b637b71e0b8@gmail.com> (raw)
In-Reply-To: <8c74cacb-ff73-eddc-626c-f6fa862cf5a6@arm.com>

Hello Julien,

On 28.10.19 16:28, Julien Grall wrote:
> It would be good to get a review from the scheduler maintainers (Dario, George) to make sure they are happy with the suggested states here.

I would not say I'm completely happy with this set of states. I'd like to have a discussion on this topic with scheduler maintainers. Also because they could have a different view from x86 world.

>> Introduce per-pcpu time accounting what includes the following states:
> 
> I think we need a very detailed description of each states. Otherwise it will be hard to know how to categorize it.

I agree that we need a very detailed description of each states. Ask questions if something is not clear or doubtful. I guess we could have something better after Q&A process.

> 
>>
>> TACC_HYP - the pcpu executes hypervisor code like softirq processing
>>             (including scheduling), tasklets and context switches
> 
> IHMO, "like" is too weak here. What do you exactly plan to introduce?

I think this should be what hypervisor does except hypercall and IO emulation (what is TACC_GSYNC).

> 
> For instance, on Arm, you consider that leave_hypervisor_tail() is part of TACC_HYP. This function will include some handling for synchronous trap.

I guess you are saying about `p2m_flush_vm`. I doubt here, and open for suggestions.


>> TACC_GUEST - the pcpu executes guests code
> 
> Looking at the arm64 code, you are executing some hypervisor code here. I agree this is impossible to not run any hypervisor code with TACC_GUEST, but I think this should be clarified in the documentation.

Do you mean adding few words about still having some hypervisor code near the actual context switch from/to guest (entry/return_from_trap)?

> 
>> TACC_IDLE - the low-power state of the pcpu
> 
> Did you intend to mean "idle vCPU" is in use?

No. I did mean what is written.
Currently, the idle vcpu does hypervisor work (e.g. tasklets) along with the low-power mode. IMO we have to separate them.

> 
>> TACC_IRQ - the pcpu performs interrupts processing, without separation to
>>             guest or hypervisor interrupts
>> TACC_GSYNC - the pcpu executes hypervisor code to process synchronous trap
>>               from the guest. E.g. hypercall processing or io emulation.
>>
>> Currently, the only reenterant state is TACC_IRQ. It is assumed, no changes
>> to state other than TACC_IRQ could happen until we return from nested
>> interrupts. IRQ time is accounted in a distinct way comparing to other states.
> 
> s/comparing/compare/

OK.

> 
>> It is acumulated between other states transition moments, and is substracted
> 
> s/acumulated/accumulated/ s/substracted/subtracted/

OK.

> 
>> from the old state on states transion calculation.
[1]
> 
> s/transion/transition/

OK.

> 
>>
>> Signed-off-by: Andrii Anisov <andrii_anisov@epam.com>
>> ---
>>   xen/common/schedule.c   | 81 +++++++++++++++++++++++++++++++++++++++++++++++++
>>   xen/include/xen/sched.h | 27 +++++++++++++++++
>>   2 files changed, 108 insertions(+)
>>
>> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
>> index 7b71581..6dd6603 100644
>> --- a/xen/common/schedule.c
>> +++ b/xen/common/schedule.c
>> @@ -1539,6 +1539,87 @@ static void schedule(void)
>>       context_switch(prev, next);
>>   }
>> +DEFINE_PER_CPU(struct tacc, tacc);
>> +
>> +static void tacc_state_change(enum TACC_STATES new_state)
> 
> This should never be called with the TACC_IRQ, right?

Yes. Actually, tacc->state should never be TACC_IRQ.
Because of TACC_IRQ reenterability it is handled through the tacc->irq_cnt and tacc->irq_enter_time.

> 
>> +{
>> +    s_time_t now, delta;
>> +    struct tacc* tacc = &this_cpu(tacc);
>> +    unsigned long flags;
>> +
>> +    local_irq_save(flags);
>> +
>> +    now = NOW();
>> +    delta = now - tacc->state_entry_time;
>> +
>> +    /* We do not expect states reenterability (at least through this function)*/
>> +    ASSERT(new_state != tacc->state);
>> +
>> +    tacc->state_time[tacc->state] += delta - tacc->irq_time;
>> +    tacc->state_time[TACC_IRQ] += tacc->irq_time;
>> +    tacc->irq_time = 0;
>> +    tacc->state = new_state;
>> +    tacc->state_entry_time = now;
>> +
>> +    local_irq_restore(flags);
>> +}
>> +
>> +void tacc_hyp(int place)
> 
> Place is never used except for your commented printk. So what's the goal for it?

Place is just a piece of code used for debugging, as well as printk. I keept it here because this series is very RFC, yet it could be removed.

> Also, is it really necessary to provide helper for each state? Couldn't we just introduce one functions doing all the state?

I'd like calling that stuff from assembler without parameters. But have no strong opinion here.
  
>> +{
>> +//    printk("\ttacc_hyp %u, place %d\n", smp_processor_id(), place);
>> +    tacc_state_change(TACC_HYP);
>> +}
>> +
>> +void tacc_guest(int place)
>> +{
>> +//    printk("\ttacc_guest %u, place %d\n", smp_processor_id(), place);
>> +    tacc_state_change(TACC_GUEST);
>> +}
>> +
>> +void tacc_idle(int place)
>> +{
>> +//    printk("\tidle cpu %u, place %d\n", smp_processor_id(), place);
>> +    tacc_state_change(TACC_IDLE);
>> +}
>> +
>> +void tacc_gsync(int place)
>> +{
>> +//    printk("\ttacc_gsync %u, place %d\n", smp_processor_id(), place);
>> +    tacc_state_change(TACC_GSYNC);
>> +}
>> +
>> +void tacc_irq_enter(int place)
>> +{
>> +    struct tacc* tacc = &this_cpu(tacc);
>> +
>> +//    printk("\ttacc_irq_enter %u, place %d, cnt %d\n", smp_processor_id(), place, this_cpu(tacc).irq_cnt);
>> +    ASSERT(!local_irq_is_enabled());
>> +    ASSERT(tacc->irq_cnt >= 0);
>> +
>> +    if ( tacc->irq_cnt == 0 )
>> +    {
>> +        tacc->irq_enter_time = NOW();
>> +    }
>> +
>> +    tacc->irq_cnt++;
>> +}
>> +
>> +void tacc_irq_exit(int place)
>> +{
>> +    struct tacc* tacc = &this_cpu(tacc);
>> +
>> +//    printk("\ttacc_irq_exit %u, place %d, cnt %d\n", smp_processor_id(), place, tacc->irq_cnt);
>> +    ASSERT(!local_irq_is_enabled());
>> +    ASSERT(tacc->irq_cnt > 0);
>> +    if ( tacc->irq_cnt == 1 )
>> +    {
>> +        tacc->irq_time = NOW() - tacc->irq_enter_time;
> 
> If I understand correctly, you will use irq_time to update TACC_IRQ in tacc_state_change(). It may be possible to receive another interrupt before the state is changed (e.g. HYP -> GUEST). This means only the time for the last IRQ received would be accounted.

I do lock IRQs for state change. Shouldn't that protect it?

> 
>> +        tacc->irq_enter_time = 0;
>> +    }
>> +
>> +    tacc->irq_cnt--;
>> +}
>> +
>>   void context_saved(struct vcpu *prev)
>>   {
>>       /* Clear running flag /after/ writing context to memory. */
>> diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
>> index e3601c1..04a8724 100644
>> --- a/xen/include/xen/sched.h
>> +++ b/xen/include/xen/sched.h
>> @@ -1002,6 +1002,33 @@ extern void dump_runq(unsigned char key);
>>   void arch_do_physinfo(struct xen_sysctl_physinfo *pi);
>> +enum TACC_STATES {
> 
> We don't tend to use all uppercases for enum name.

OK.

> 
>> +    TACC_HYP = 0,
> 
> enum begins at 0 and increment by one every time. So there is no need to hardcode a number.
> 
> Also, looking at the code, I think you rely on the first state to be TACC_HYP. Am I correct?

TACC_HYP is expected to be the initial state of the PCPU.

> 
>> +    TACC_GUEST = 1,
>> +    TACC_IDLE = 2,
>> +    TACC_IRQ = 3,
>> +    TACC_GSYNC = 4,
>> +    TACC_STATES_MAX
>> +};
> > It would be good to document all the states in the header as well.

OK.

> 
>> +
>> +struct tacc
> 
> Please document the structure.

OK.

> 
>> +{
>> +    s_time_t state_time[TACC_STATES_MAX];
>> +    s_time_t state_entry_time;
>> +    int state;
> 
> This should be the enum you used above here.

Yep.

>> +
>> +    s_time_t guest_time;
> 
> This is not used.

Yep, will drop it.

> 
>> +
>> +    s_time_t irq_enter_time;
>> +    s_time_t irq_time;
>> +    int irq_cnt;
> Why do you need this to be signed?

For assertion.
  
>> +};
>> +
>> +DECLARE_PER_CPU(struct tacc, tacc);
>> +
>> +void tacc_hyp(int place);
>> +void tacc_idle(int place);
>> +
>>   #endif /* __SCHED_H__ */
>>   /*
>>
> 
> Cheers,
>

-- 
Sincerely,
Andrii Anisov.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

  reply	other threads:[~2019-11-06 11:25 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-11 10:32 [Xen-devel] [RFC 0/9] Changes to time accounting Andrii Anisov
2019-09-11 10:32 ` [Xen-devel] [RFC 1/9] schedule: Introduce per-pcpu " Andrii Anisov
2019-09-11 18:01   ` Volodymyr Babchuk
2019-09-12 10:26     ` Andrii Anisov
2019-10-28 14:28   ` Julien Grall
2019-11-06 11:24     ` Andrii Anisov [this message]
2020-05-26  2:27       ` Volodymyr Babchuk
2020-05-29  8:48         ` Dario Faggioli
2020-06-02  1:12           ` Volodymyr Babchuk
2020-06-03 15:22             ` Dario Faggioli
2019-09-11 10:32 ` [Xen-devel] [RFC 2/9] sysctl: extend XEN_SYSCTL_getcpuinfo interface Andrii Anisov
2019-10-28 14:52   ` Julien Grall
2019-11-06 11:25     ` Andrii Anisov
2019-09-11 10:32 ` [Xen-devel] [RFC 3/9] xentop: show CPU load information Andrii Anisov
2019-09-11 10:32 ` [Xen-devel] [RFC 4/9] arm64: utilize time accounting Andrii Anisov
2019-09-11 17:48   ` Volodymyr Babchuk
2019-09-12 12:09     ` Andrii Anisov
2019-09-12 12:17       ` Julien Grall
2019-09-12 12:29         ` Andrii Anisov
2019-10-28 14:47   ` Julien Grall
2019-11-06 11:31     ` Andrii Anisov
2019-09-11 10:32 ` [Xen-devel] [RFC 5/9] tacc: Introduce a lockless interface for guest time Andrii Anisov
2019-09-11 10:32 ` [Xen-devel] [RFC 6/9] sched:rtds: get guest time from time accounting code Andrii Anisov
2019-09-11 10:32 ` [Xen-devel] [RFC 7/9] tacc: Introduce a locked interface for guest time Andrii Anisov
2019-09-11 10:32 ` [Xen-devel] [RFC 8/9] sched:credit: get guest time from time accounting code Andrii Anisov
2019-09-11 10:32 ` [Xen-devel] [RFC 9/9] sched:credit2: " Andrii Anisov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f3767489-e46a-3830-8b3c-0b637b71e0b8@gmail.com \
    --to=andrii.anisov@gmail.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=andrii_anisov@epam.com \
    --cc=dfaggioli@suse.com \
    --cc=george.dunlap@eu.citrix.com \
    --cc=ian.jackson@eu.citrix.com \
    --cc=jbeulich@suse.com \
    --cc=julien.grall@arm.com \
    --cc=konrad.wilk@oracle.com \
    --cc=sstabellini@kernel.org \
    --cc=tim@xen.org \
    --cc=wl@xen.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).