All of lore.kernel.org
 help / color / mirror / Atom feed
* Xen on ARM IRQ latency and scheduler overhead
@ 2017-02-10  0:54 Stefano Stabellini
  2017-02-10  8:40 ` Dario Faggioli
  2017-02-17 18:40 ` Dario Faggioli
  0 siblings, 2 replies; 19+ messages in thread
From: Stefano Stabellini @ 2017-02-10  0:54 UTC (permalink / raw)
  To: xen-devel
  Cc: george.dunlap, edgar.iglesias, dario.faggioli, sstabellini, julien.grall

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3018 bytes --]

Hi all,

I have run some IRQ latency measurements on Xen on ARM on a Xilinx
ZynqMP board (four Cortex A53 cores, GICv2).

Dom0 has 1 vcpu pinned to cpu0, DomU has 1 vcpu pinned to cpu2.
Dom0 is Ubuntu. DomU is an ad-hoc baremetal app to measure interrupt
latency: https://github.com/edgarigl/tbm

I modified the app to use the phys_timer instead of the virt_timer.  You
can build it with:

make CFG=configs/xen-guest-irq-latency.cfg 

I modified Xen to export the phys_timer to guests, see the very hacky
patch attached. This way, the phys_timer interrupt should behave like
any conventional device interrupts assigned to a guest.

These are the results, in nanosec:

                        AVG     MIN     MAX     WARM MAX

NODEBUG no WFI          1890    1800    3170    2070
NODEBUG WFI             4850    4810    7030    4980
NODEBUG no WFI credit2  2217    2090    3420    2650
NODEBUG WFI credit2     8080    7890    10320   8300

DEBUG no WFI            2252    2080    3320    2650
DEBUG WFI               6500    6140    8520    8130
DEBUG WFI, credit2      8050    7870    10680   8450

DEBUG means Xen DEBUG build.
WARM MAX is the maximum latency, taking out the first few interrupts to
warm the caches.
WFI is the ARM and ARM64 sleeping instruction, trapped and emulated by
Xen by calling vcpu_block.

As you can see, depending on whether the guest issues a WFI or not while
waiting for interrupts, the results change significantly. Interestingly,
credit2 does worse than credit1 in this area.

Trying to figure out where those 3000-4000ns of difference between the
WFI and non-WFI cases come from, I wrote a patch to zero the latency
introduced by xen/arch/arm/domain.c:schedule_tail. That saves about
1000ns. There are no other arch specific context switch functions worth
optimizing.

We are down to 2000-3000ns. Then, I started investigating the scheduler.
I measured how long it takes to run "vcpu_unblock": 1050ns, which is
significant. I don't know what is causing the remaining 1000-2000ns, but
I bet on another scheduler function. Do you have any suggestions on
which one?


Assuming that the problem is indeed the scheduler, one workaround that
we could introduce today would be to avoid calling vcpu_unblock on guest
WFI and call vcpu_yield instead. This change makes things significantly
better:

                                     AVG     MIN     MAX     WARM MAX
DEBUG WFI (yield, no block)          2900    2190    5130    5130
DEBUG WFI (yield, no block) credit2  3514    2280    6180    5430

Is that a reasonable change to make? Would it cause significantly more
power consumption in Xen (because xen/arch/arm/domain.c:idle_loop might
not be called anymore)?


If we wanted to zero the difference between the WFI and non-WFI cases,
would we need a new scheduler? A simple "noop scheduler" that statically
assigns vcpus to pcpus, one by one, until they run out, then return
error? Or do we need more extensive modifications to
xen/common/schedule.c? Any other ideas?


Thanks,

Stefano

[-- Attachment #2: Type: TEXT/PLAIN, Size: 3239 bytes --]

diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index 7e43691..f5ff69b 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -663,6 +663,7 @@ void arch_domain_destroy(struct domain *d)
     /* IOMMU page table is shared with P2M, always call
      * iommu_domain_destroy() before p2m_teardown().
      */
+    WRITE_SYSREG32(0, CNTP_CTL_EL0);
     iommu_domain_destroy(d);
     p2m_teardown(d);
     domain_vgic_free(d);
diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index a5348f2..5c8b621 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -47,7 +47,7 @@ static DEFINE_PER_CPU(uint64_t, lr_mask);
 
 static void gic_update_one_lr(struct vcpu *v, int i);
 
-static const struct gic_hw_operations *gic_hw_ops;
+const struct gic_hw_operations *gic_hw_ops;
 
 void register_gic_ops(const struct gic_hw_operations *ops)
 {
diff --git a/xen/arch/arm/irq.c b/xen/arch/arm/irq.c
index dd62ba6..9a4e50d 100644
--- a/xen/arch/arm/irq.c
+++ b/xen/arch/arm/irq.c
@@ -184,6 +184,7 @@ int request_irq(unsigned int irq, unsigned int irqflags,
 }
 
 /* Dispatch an interrupt */
+extern const struct gic_hw_operations *gic_hw_ops;
 void do_IRQ(struct cpu_user_regs *regs, unsigned int irq, int is_fiq)
 {
     struct irq_desc *desc = irq_to_desc(irq);
@@ -202,6 +203,12 @@ void do_IRQ(struct cpu_user_regs *regs, unsigned int irq, int is_fiq)
     irq_enter();
 
     spin_lock(&desc->lock);
+
+    if (irq == 30) {
+        set_bit(_IRQ_GUEST, &desc->status);
+        desc->handler = gic_hw_ops->gic_guest_irq_type;
+    }
+
     desc->handler->ack(desc);
 
     if ( !desc->action )
@@ -224,7 +231,23 @@ void do_IRQ(struct cpu_user_regs *regs, unsigned int irq, int is_fiq)
          * The irq cannot be a PPI, we only support delivery of SPIs to
          * guests.
 	 */
-        vgic_vcpu_inject_spi(info->d, info->virq);
+        if (irq != 30)
+            vgic_vcpu_inject_spi(info->d, info->virq);
+        else {
+            struct domain *d;
+            
+            for_each_domain ( d )
+            {
+                struct pending_irq *p;
+                
+                if (d->domain_id == 0 || is_idle_domain(d))
+                    continue;
+                p = irq_to_pending(d->vcpu[0], 30);
+                p->desc = desc;
+                vgic_vcpu_inject_irq(d->vcpu[0], 30);
+                break;
+            }
+        }
         goto out_no_end;
     }
 
diff --git a/xen/arch/arm/time.c b/xen/arch/arm/time.c
index 7dae28b..0249631 100644
--- a/xen/arch/arm/time.c
+++ b/xen/arch/arm/time.c
@@ -297,9 +300,9 @@ void init_timer_interrupt(void)
     /* Sensible defaults */
     WRITE_SYSREG64(0, CNTVOFF_EL2);     /* No VM-specific offset */
     /* Do not let the VMs program the physical timer, only read the physical counter */
-    WRITE_SYSREG32(CNTHCTL_EL2_EL1PCTEN, CNTHCTL_EL2);
     WRITE_SYSREG32(0, CNTP_CTL_EL0);    /* Physical timer disabled */
     WRITE_SYSREG32(0, CNTHP_CTL_EL2);   /* Hypervisor's timer disabled */
+    WRITE_SYSREG32(CNTHCTL_EL2_EL1PCTEN|CNTHCTL_EL2_EL1PCEN, CNTHCTL_EL2);
     isb();
 
     request_irq(timer_irq[TIMER_HYP_PPI], 0, timer_interrupt,

[-- Attachment #3: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: Xen on ARM IRQ latency and scheduler overhead
  2017-02-10  0:54 Xen on ARM IRQ latency and scheduler overhead Stefano Stabellini
@ 2017-02-10  8:40 ` Dario Faggioli
  2017-02-10 18:32   ` Stefano Stabellini
  2017-02-17 18:40 ` Dario Faggioli
  1 sibling, 1 reply; 19+ messages in thread
From: Dario Faggioli @ 2017-02-10  8:40 UTC (permalink / raw)
  To: Stefano Stabellini, xen-devel; +Cc: george.dunlap, edgar.iglesias, julien.grall


[-- Attachment #1.1: Type: text/plain, Size: 6562 bytes --]

On Thu, 2017-02-09 at 16:54 -0800, Stefano Stabellini wrote:
> Hi all,
> 
Hi,

> I have run some IRQ latency measurements on Xen on ARM on a Xilinx
> ZynqMP board (four Cortex A53 cores, GICv2).
> 
> Dom0 has 1 vcpu pinned to cpu0, DomU has 1 vcpu pinned to cpu2.
> Dom0 is Ubuntu. DomU is an ad-hoc baremetal app to measure interrupt
> latency: https://github.com/edgarigl/tbm
> 
Right, interesting use case. I'm glad to see there's some interest in
it, and am happy to help investigating, and trying to make things
better.

> I modified the app to use the phys_timer instead of the virt_timer. 
> You
> can build it with:
> 
> make CFG=configs/xen-guest-irq-latency.cfg 
> 
Ok, do you (or anyone) mind explaining in a little bit more details
what the app tries to measure and how it does that.

As a matter of fact, I'm quite familiar with the scenario (I've spent a
lot of time playing with cyclictest https://rt.wiki.kernel.org/index.ph
p/Cyclictest ) but I don't immediately understand the meaning of way
the timer is programmed, what is supposed to be in the various
variables/register, what actually is 'freq', etc.

> These are the results, in nanosec:
> 
>                         AVG     MIN     MAX     WARM MAX
> 
> NODEBUG no WFI          1890    1800    3170    2070
> NODEBUG WFI             4850    4810    7030    4980
> NODEBUG no WFI credit2  2217    2090    3420    2650
> NODEBUG WFI credit2     8080    7890    10320   8300
> 
> DEBUG no WFI            2252    2080    3320    2650
> DEBUG WFI               6500    6140    8520    8130
> DEBUG WFI, credit2      8050    7870    10680   8450
> 
> DEBUG means Xen DEBUG build.
>
Mmm, and Credit2 (with WFI) behave almost the same (and even a bit
better in some cases) with debug enabled. While in Credit1, debug yes
or no makes quite a few difference, AFAICT, especially in the WFI case.

That looks a bit strange, as I'd have expected the effect to be similar
(there's actually quite a bit of debug checks in Credit2, maybe even
more than in Credit1).

> WARM MAX is the maximum latency, taking out the first few interrupts
> to
> warm the caches.
> WFI is the ARM and ARM64 sleeping instruction, trapped and emulated
> by
> Xen by calling vcpu_block.
> 
> As you can see, depending on whether the guest issues a WFI or not
> while
> waiting for interrupts, the results change significantly.
> Interestingly,
> credit2 does worse than credit1 in this area.
> 
This is with current staging right? If yes, in Credit1, you on ARM
never stop the scheduler tick, like we do in x86. This means the system
is, in general, "more awake" than Credit2, which does not have a
periodic tick (and FWIW, also "more awake" of Credit1 in x86, as far as
the scheduler is concerned, at least).

Whether or not this impact significantly your measurements, I don't
know, as it depends on a bunch of factors. What we know is that this
has enough impact to trigger the RCU bug Julien discovered (in a
different scenario, I know), so I would not rule it out.

I can try sending a quick patch for disabling the tick when a CPU is
idle, but I'd need your help in testing it.

> Trying to figure out where those 3000-4000ns of difference between
> the
> WFI and non-WFI cases come from, I wrote a patch to zero the latency
> introduced by xen/arch/arm/domain.c:schedule_tail. That saves about
> 1000ns. There are no other arch specific context switch functions
> worth
> optimizing.
> 
Yeah. It would be interesting to see a trace, but we still don't have
that for ARM. :-(

> We are down to 2000-3000ns. Then, I started investigating the
> scheduler.
> I measured how long it takes to run "vcpu_unblock": 1050ns, which is
> significant. 
>
How you measured, if I can ask.

> I don't know what is causing the remaining 1000-2000ns, but
> I bet on another scheduler function. Do you have any suggestions on
> which one?
> 
Well, when a vcpu is woken up, it is put in a runqueue, and a pCPU is
poked to go get and run it. The other thing you may want to try to
measure is how much time passes between when the vCPU becomes runnable
and is added to the runqueue, and when it is actually put to run.

Again, this would be visible in tracing. :-/

> Assuming that the problem is indeed the scheduler, one workaround
> that
> we could introduce today would be to avoid calling vcpu_unblock on
> guest
> WFI and call vcpu_yield instead. This change makes things
> significantly
> better:
> 
>                                      AVG     MIN     MAX     WARM MAX
> DEBUG WFI (yield, no block)          2900    2190    5130    5130
> DEBUG WFI (yield, no block) credit2  3514    2280    6180    5430
> 
> Is that a reasonable change to make? Would it cause significantly
> more
> power consumption in Xen (because xen/arch/arm/domain.c:idle_loop
> might
> not be called anymore)?
> 
Exactly. So, I think that, as Linux has 'idle=poll', it is conceivable
to have something similar in Xen, and if we do, I guess it can be
implemented as you suggest.

But, no, I don't think this is satisfying as default, not before trying
to figure out what is going on, and if we can improve things in other
ways.

> If we wanted to zero the difference between the WFI and non-WFI
> cases,
> would we need a new scheduler? A simple "noop scheduler" that
> statically
> assigns vcpus to pcpus, one by one, until they run out, then return
> error? 
>
Well, writing such a scheduler would at least be useful as reference.
As in, the latency that you measure on it, is the minimum possible
latency the scheduler is responsible for, and we can compare that with
what we get from 'regular' schedulers.

As a matter of fact, it may also turn out useful for a couple of other
issues/reason, so I may indeed give this a go.

But it would not be much more useful than that, IMO.

> Or do we need more extensive modifications to
> xen/common/schedule.c? Any other ideas?
> 
Not yet. :-/

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Xen on ARM IRQ latency and scheduler overhead
  2017-02-10  8:40 ` Dario Faggioli
@ 2017-02-10 18:32   ` Stefano Stabellini
  2017-02-16 12:20     ` Dario Faggioli
  0 siblings, 1 reply; 19+ messages in thread
From: Stefano Stabellini @ 2017-02-10 18:32 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: george.dunlap, edgar.iglesias, julien.grall, Stefano Stabellini,
	xen-devel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 8524 bytes --]

On Fri, 10 Feb 2017, Dario Faggioli wrote:
> On Thu, 2017-02-09 at 16:54 -0800, Stefano Stabellini wrote:
> > Hi all,
> > 
> Hi,
> 
> > I have run some IRQ latency measurements on Xen on ARM on a Xilinx
> > ZynqMP board (four Cortex A53 cores, GICv2).
> > 
> > Dom0 has 1 vcpu pinned to cpu0, DomU has 1 vcpu pinned to cpu2.
> > Dom0 is Ubuntu. DomU is an ad-hoc baremetal app to measure interrupt
> > latency: https://github.com/edgarigl/tbm
> > 
> Right, interesting use case. I'm glad to see there's some interest in
> it, and am happy to help investigating, and trying to make things
> better.

Thank you!


> > I modified the app to use the phys_timer instead of the virt_timer. 
> > You
> > can build it with:
> > 
> > make CFG=configs/xen-guest-irq-latency.cfg 
> > 
> Ok, do you (or anyone) mind explaining in a little bit more details
> what the app tries to measure and how it does that.

Give a look at app/xen/guest_irq_latency/apu.c:

https://github.com/edgarigl/tbm/blob/master/app/xen/guest_irq_latency/apu.c

This is my version which uses the phys_timer (instead of the virt_timer):

https://github.com/sstabellini/tbm/blob/phys-timer/app/xen/guest_irq_latency/apu.c

Edgar can jump in to add more info if needed (he is the author of the
app), but as you can see from the code, the app is very simple. It sets
a timer event in the future, then, after receiving the event, it checks
the current time and compare it with the deadline.


> As a matter of fact, I'm quite familiar with the scenario (I've spent a
> lot of time playing with cyclictest https://rt.wiki.kernel.org/index.ph
> p/Cyclictest ) but I don't immediately understand the meaning of way
> the timer is programmed, what is supposed to be in the various
> variables/register, what actually is 'freq', etc.

The timer is programmed by writing the compare value to the cntp_cval
system register, see a64_write_timer_cval. The counter is read by
reading the cntpct system register, see
arch-aarch64/aarch64-excp.c:aarch64_irq. freq is the frequency of the
timer (which is lower than the cpu frequency). freq_k is the
multiplication factor to convert timer counter numbers into nanosec, on
my platform it's 10.

If you want more info on the timer, give a look at "Generic Timer" in
the ARM Architecture Reference Manual.


> > These are the results, in nanosec:
> > 
> >                         AVG     MIN     MAX     WARM MAX
> > 
> > NODEBUG no WFI          1890    1800    3170    2070
> > NODEBUG WFI             4850    4810    7030    4980
> > NODEBUG no WFI credit2  2217    2090    3420    2650
> > NODEBUG WFI credit2     8080    7890    10320   8300
> > 
> > DEBUG no WFI            2252    2080    3320    2650
> > DEBUG WFI               6500    6140    8520    8130
> > DEBUG WFI, credit2      8050    7870    10680   8450
> > 
> > DEBUG means Xen DEBUG build.
> >
> Mmm, and Credit2 (with WFI) behave almost the same (and even a bit
> better in some cases) with debug enabled. While in Credit1, debug yes
> or no makes quite a few difference, AFAICT, especially in the WFI case.
> 
> That looks a bit strange, as I'd have expected the effect to be similar
> (there's actually quite a bit of debug checks in Credit2, maybe even
> more than in Credit1).
> 
> > WARM MAX is the maximum latency, taking out the first few interrupts
> > to
> > warm the caches.
> > WFI is the ARM and ARM64 sleeping instruction, trapped and emulated
> > by
> > Xen by calling vcpu_block.
> > 
> > As you can see, depending on whether the guest issues a WFI or not
> > while
> > waiting for interrupts, the results change significantly.
> > Interestingly,
> > credit2 does worse than credit1 in this area.
> > 
> This is with current staging right? 

That's right.


> If yes, in Credit1, you on ARM
> never stop the scheduler tick, like we do in x86. This means the system
> is, in general, "more awake" than Credit2, which does not have a
> periodic tick (and FWIW, also "more awake" of Credit1 in x86, as far as
> the scheduler is concerned, at least).
> 
> Whether or not this impact significantly your measurements, I don't
> know, as it depends on a bunch of factors. What we know is that this
> has enough impact to trigger the RCU bug Julien discovered (in a
> different scenario, I know), so I would not rule it out.
> 
> I can try sending a quick patch for disabling the tick when a CPU is
> idle, but I'd need your help in testing it.

That might be useful, however, if I understand this right, we don't
actually want a periodic timer in Xen just to make the system more
responsive, do we?


> > Trying to figure out where those 3000-4000ns of difference between
> > the
> > WFI and non-WFI cases come from, I wrote a patch to zero the latency
> > introduced by xen/arch/arm/domain.c:schedule_tail. That saves about
> > 1000ns. There are no other arch specific context switch functions
> > worth
> > optimizing.
> > 
> Yeah. It would be interesting to see a trace, but we still don't have
> that for ARM. :-(

indeed


> > We are down to 2000-3000ns. Then, I started investigating the
> > scheduler.
> > I measured how long it takes to run "vcpu_unblock": 1050ns, which is
> > significant. 
> >
> How you measured, if I can ask.

Simple. I added a timer counter read before and after the function call:

    uint64_t n1 = 0, n2 = 0;
    n1 = READ_SYSREG64(CNTPCT_EL0);

    function_call_to_measure();

    n2 = READ_SYSREG64(CNTPCT_EL0);
    printk("DEBUG %s %d ns=%lu\n",__func__,__LINE__,(n2-n1)*10);

Where 10 is the calculated freq_k for the platform I have.



> > I don't know what is causing the remaining 1000-2000ns, but
> > I bet on another scheduler function. Do you have any suggestions on
> > which one?
> > 
> Well, when a vcpu is woken up, it is put in a runqueue, and a pCPU is
> poked to go get and run it. The other thing you may want to try to
> measure is how much time passes between when the vCPU becomes runnable
> and is added to the runqueue, and when it is actually put to run.
> 
> Again, this would be visible in tracing. :-/

I could do that of you tell me where to add the two
'READ_SYSREG64(CNTPCT_EL0)'.


> > Assuming that the problem is indeed the scheduler, one workaround
> > that
> > we could introduce today would be to avoid calling vcpu_unblock on
> > guest
> > WFI and call vcpu_yield instead. This change makes things
> > significantly
> > better:
> > 
> >                                      AVG     MIN     MAX     WARM MAX
> > DEBUG WFI (yield, no block)          2900    2190    5130    5130
> > DEBUG WFI (yield, no block) credit2  3514    2280    6180    5430
> > 
> > Is that a reasonable change to make? Would it cause significantly
> > more
> > power consumption in Xen (because xen/arch/arm/domain.c:idle_loop
> > might
> > not be called anymore)?
> > 
> Exactly. So, I think that, as Linux has 'idle=poll', it is conceivable
> to have something similar in Xen, and if we do, I guess it can be
> implemented as you suggest.
> 
> But, no, I don't think this is satisfying as default, not before trying
> to figure out what is going on, and if we can improve things in other
> ways.

OK. Should I write a patch for that? I guess it would be arm specific
initially. What do you think it would be a good name for the option?


> > If we wanted to zero the difference between the WFI and non-WFI
> > cases,
> > would we need a new scheduler? A simple "noop scheduler" that
> > statically
> > assigns vcpus to pcpus, one by one, until they run out, then return
> > error? 
> >
> Well, writing such a scheduler would at least be useful as reference.
> As in, the latency that you measure on it, is the minimum possible
> latency the scheduler is responsible for, and we can compare that with
> what we get from 'regular' schedulers.
> 
> As a matter of fact, it may also turn out useful for a couple of other
> issues/reason, so I may indeed give this a go.

Thank you! If you write it, I'll help you test it on ARM64 :-)


> But it would not be much more useful than that, IMO.

Why? Actually I know of several potential users of Xen on ARM interested
exactly in this use-case. They only have a statically defined number of
guests with a total amount of vcpu lower or equal to the number of pcpu
in the system. Wouldn't a scheduler like that help in this scenario?

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Xen on ARM IRQ latency and scheduler overhead
  2017-02-10 18:32   ` Stefano Stabellini
@ 2017-02-16 12:20     ` Dario Faggioli
  2017-02-16 19:52       ` Stefano Stabellini
  0 siblings, 1 reply; 19+ messages in thread
From: Dario Faggioli @ 2017-02-16 12:20 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: george.dunlap, edgar.iglesias, julien.grall, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 6535 bytes --]

On Fri, 2017-02-10 at 10:32 -0800, Stefano Stabellini wrote:
> On Fri, 10 Feb 2017, Dario Faggioli wrote:
> > Right, interesting use case. I'm glad to see there's some interest
> > in
> > it, and am happy to help investigating, and trying to make things
> > better.
> 
> Thank you!
> 
Hey, FYI, I am looking into this. It's just that I've got a couple of
other things in my plate right now.

> > Ok, do you (or anyone) mind explaining in a little bit more details
> > what the app tries to measure and how it does that.
> 
> Give a look at app/xen/guest_irq_latency/apu.c:
> 
> https://github.com/edgarigl/tbm/blob/master/app/xen/guest_irq_latency
> /apu.c
> 
> This is my version which uses the phys_timer (instead of the
> virt_timer):
> 
> https://github.com/sstabellini/tbm/blob/phys-timer/app/xen/guest_irq_
> latency/apu.c
> 
Yep, I did look at those.

> Edgar can jump in to add more info if needed (he is the author of the
> app), but as you can see from the code, the app is very simple. It
> sets
> a timer event in the future, then, after receiving the event, it
> checks
> the current time and compare it with the deadline.
> 
Right, and you check the current time with:

  now = aarch64_irq_get_stamp(el);

which I guess is compatible with the values you use for the counter.

> > > These are the results, in nanosec:
> > > 
> > >                         AVG     MIN     MAX     WARM MAX
> > > 
> > > NODEBUG no WFI          1890    1800    3170    2070
> > > NODEBUG WFI             4850    4810    7030    4980
> > > NODEBUG no WFI credit2  2217    2090    3420    2650
> > > NODEBUG WFI credit2     8080    7890    10320   8300
> > > 
> > > DEBUG no WFI            2252    2080    3320    2650
> > > DEBUG WFI               6500    6140    8520    8130
> > > DEBUG WFI, credit2      8050    7870    10680   8450
> > > 
> > > DEBUG means Xen DEBUG build.
> > > 
[...]
> > > As you can see, depending on whether the guest issues a WFI or
> > > not
> > > while
> > > waiting for interrupts, the results change significantly.
> > > Interestingly,
> > > credit2 does worse than credit1 in this area.
> > > 
> > This is with current staging right? 
> 
> That's right.
> 
So, when you have the chance, can I see the output of

 xl debug-key r
 xl dmesg

Both under Credit1 and Credit2?

> > I can try sending a quick patch for disabling the tick when a CPU
> > is
> > idle, but I'd need your help in testing it.
> 
> That might be useful, however, if I understand this right, we don't
> actually want a periodic timer in Xen just to make the system more
> responsive, do we?
> 
IMO, no. I'd call that an hack, and don't think we should go that
route.

Not until we have figured out and squeezed as much as possible all the
other sources of latency, and that has proven not to be enough, at
least.

I'll send the patch.

> > > Assuming that the problem is indeed the scheduler, one workaround
> > > that
> > > we could introduce today would be to avoid calling vcpu_unblock
> > > on
> > > guest
> > > WFI and call vcpu_yield instead. This change makes things
> > > significantly
> > > better:
> > > 
> > >                                      AVG     MIN     MAX     WARM
> > > MAX
> > > DEBUG WFI (yield, no block)          2900    2190    5130    5130
> > > DEBUG WFI (yield, no block) credit2  3514    2280    6180    5430
> > > 
> > > Is that a reasonable change to make? Would it cause significantly
> > > more
> > > power consumption in Xen (because xen/arch/arm/domain.c:idle_loop
> > > might
> > > not be called anymore)?
> > > 
> > Exactly. So, I think that, as Linux has 'idle=poll', it is
> > conceivable
> > to have something similar in Xen, and if we do, I guess it can be
> > implemented as you suggest.
> > 
> > But, no, I don't think this is satisfying as default, not before
> > trying
> > to figure out what is going on, and if we can improve things in
> > other
> > ways.
> 
> OK. Should I write a patch for that? I guess it would be arm specific
> initially. What do you think it would be a good name for the option?
> 
Well, I think such an option may be useful on other arches too, but we
better measure/verify that before. Therefore, I'd be ok for this to be
only implemented on ARM for now.

As per the name, I actually like the 'idle=', and as values, what about
'sleep' or 'block' for the current default, and stick to 'poll' for the
new behavior you'll implement? Or do you think it is at risk of
confusion with Linux?

An alternative would be something like 'wfi=[sleep,idle]', or
'wfi=[block,poll]', but that is ARM specific, and it'd mean we will
need another option for making x86 behave similarly.


> > But it would not be much more useful than that, IMO.
> 
> Why? Actually I know of several potential users of Xen on ARM
> interested
> exactly in this use-case. They only have a statically defined number
> of
> guests with a total amount of vcpu lower or equal to the number of
> pcpu
> in the system. Wouldn't a scheduler like that help in this scenario?
>
What I'm saying is that would be rather inflexible. In the sense that
it won't be possible to have statically pinned and dynamically moving
vcpus in the same guest, it would be hard to control what vcpu is
statically assigned to what pcpu, making a domain statically assigned
would mean move it to another cpupool (which is the only way to use a
different scheduler, right now, in Xen), and things like this.

I know there are static use cases... But I'm not entirely sure how
static they really are, and whether they, in the end, will really like
such degree of inflexibility.

But anyway, indeed I can give you a scheduler that, provided it leaves
in a cpupool with M pcpus, a soon as a new domain with n vcpus is moved
inside the pool, statically assign its n0,n1...,nk,k<=M vcpus to a
pcpu, and always stick with that. And we'll see what will happen! :-)

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Xen on ARM IRQ latency and scheduler overhead
  2017-02-16 12:20     ` Dario Faggioli
@ 2017-02-16 19:52       ` Stefano Stabellini
  2017-02-16 23:07         ` Stefano Stabellini
  0 siblings, 1 reply; 19+ messages in thread
From: Stefano Stabellini @ 2017-02-16 19:52 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: george.dunlap, edgar.iglesias, julien.grall, Stefano Stabellini,
	xen-devel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 7030 bytes --]

On Thu, 16 Feb 2017, Dario Faggioli wrote:
> On Fri, 2017-02-10 at 10:32 -0800, Stefano Stabellini wrote:
> > On Fri, 10 Feb 2017, Dario Faggioli wrote:
> > > Right, interesting use case. I'm glad to see there's some interest
> > > in
> > > it, and am happy to help investigating, and trying to make things
> > > better.
> > 
> > Thank you!
> > 
> Hey, FYI, I am looking into this. It's just that I've got a couple of
> other things in my plate right now.

OK


> > > Ok, do you (or anyone) mind explaining in a little bit more details
> > > what the app tries to measure and how it does that.
> > 
> > Give a look at app/xen/guest_irq_latency/apu.c:
> > 
> > https://github.com/edgarigl/tbm/blob/master/app/xen/guest_irq_latency
> > /apu.c
> > 
> > This is my version which uses the phys_timer (instead of the
> > virt_timer):
> > 
> > https://github.com/sstabellini/tbm/blob/phys-timer/app/xen/guest_irq_
> > latency/apu.c
> > 
> Yep, I did look at those.
> 
> > Edgar can jump in to add more info if needed (he is the author of the
> > app), but as you can see from the code, the app is very simple. It
> > sets
> > a timer event in the future, then, after receiving the event, it
> > checks
> > the current time and compare it with the deadline.
> > 
> Right, and you check the current time with:
> 
>   now = aarch64_irq_get_stamp(el);
> 
> which I guess is compatible with the values you use for the counter.

Yes


> > > > These are the results, in nanosec:
> > > > 
> > > >                         AVG     MIN     MAX     WARM MAX
> > > > 
> > > > NODEBUG no WFI          1890    1800    3170    2070
> > > > NODEBUG WFI             4850    4810    7030    4980
> > > > NODEBUG no WFI credit2  2217    2090    3420    2650
> > > > NODEBUG WFI credit2     8080    7890    10320   8300
> > > > 
> > > > DEBUG no WFI            2252    2080    3320    2650
> > > > DEBUG WFI               6500    6140    8520    8130
> > > > DEBUG WFI, credit2      8050    7870    10680   8450
> > > > 
> > > > DEBUG means Xen DEBUG build.
> > > > 
> [...]
> > > > As you can see, depending on whether the guest issues a WFI or
> > > > not
> > > > while
> > > > waiting for interrupts, the results change significantly.
> > > > Interestingly,
> > > > credit2 does worse than credit1 in this area.
> > > > 
> > > This is with current staging right? 
> > 
> > That's right.
> > 
> So, when you have the chance, can I see the output of
> 
>  xl debug-key r
>  xl dmesg
> 
> Both under Credit1 and Credit2?

I'll see what I can do.


> > > I can try sending a quick patch for disabling the tick when a CPU
> > > is
> > > idle, but I'd need your help in testing it.
> > 
> > That might be useful, however, if I understand this right, we don't
> > actually want a periodic timer in Xen just to make the system more
> > responsive, do we?
> > 
> IMO, no. I'd call that an hack, and don't think we should go that
> route.
> 
> Not until we have figured out and squeezed as much as possible all the
> other sources of latency, and that has proven not to be enough, at
> least.
> 
> I'll send the patch.
> 
> > > > Assuming that the problem is indeed the scheduler, one workaround
> > > > that
> > > > we could introduce today would be to avoid calling vcpu_unblock
> > > > on
> > > > guest
> > > > WFI and call vcpu_yield instead. This change makes things
> > > > significantly
> > > > better:
> > > > 
> > > >                                      AVG     MIN     MAX     WARM
> > > > MAX
> > > > DEBUG WFI (yield, no block)          2900    2190    5130    5130
> > > > DEBUG WFI (yield, no block) credit2  3514    2280    6180    5430
> > > > 
> > > > Is that a reasonable change to make? Would it cause significantly
> > > > more
> > > > power consumption in Xen (because xen/arch/arm/domain.c:idle_loop
> > > > might
> > > > not be called anymore)?
> > > > 
> > > Exactly. So, I think that, as Linux has 'idle=poll', it is
> > > conceivable
> > > to have something similar in Xen, and if we do, I guess it can be
> > > implemented as you suggest.
> > > 
> > > But, no, I don't think this is satisfying as default, not before
> > > trying
> > > to figure out what is going on, and if we can improve things in
> > > other
> > > ways.
> > 
> > OK. Should I write a patch for that? I guess it would be arm specific
> > initially. What do you think it would be a good name for the option?
> > 
> Well, I think such an option may be useful on other arches too, but we
> better measure/verify that before. Therefore, I'd be ok for this to be
> only implemented on ARM for now.
> 
> As per the name, I actually like the 'idle=', and as values, what about
> 'sleep' or 'block' for the current default, and stick to 'poll' for the
> new behavior you'll implement? Or do you think it is at risk of
> confusion with Linux?
> 
> An alternative would be something like 'wfi=[sleep,idle]', or
> 'wfi=[block,poll]', but that is ARM specific, and it'd mean we will
> need another option for making x86 behave similarly.

That's a good idea. vwfi=[sleep,idle] looks like the right thing to
introduce, given that the option would be ARM only at the moment and
that it's the virtual wfi not the physical wfi behavior that we are
changing.

 
> > > But it would not be much more useful than that, IMO.
> > 
> > Why? Actually I know of several potential users of Xen on ARM
> > interested
> > exactly in this use-case. They only have a statically defined number
> > of
> > guests with a total amount of vcpu lower or equal to the number of
> > pcpu
> > in the system. Wouldn't a scheduler like that help in this scenario?
> >
> What I'm saying is that would be rather inflexible. In the sense that
> it won't be possible to have statically pinned and dynamically moving
> vcpus in the same guest, it would be hard to control what vcpu is
> statically assigned to what pcpu, making a domain statically assigned
> would mean move it to another cpupool (which is the only way to use a
> different scheduler, right now, in Xen), and things like this.
> 
> I know there are static use cases... But I'm not entirely sure how
> static they really are, and whether they, in the end, will really like
> such degree of inflexibility.

They are _very_ static :-)
Think about the board on a mechanical robot or a drone. VMs are only
created at host boot and never again. In fact we are planning to
introduce a feature in Xen to be able to create a few VMs directly from
the hypervisor, to skip the tools in Dom0 for these cases.


> But anyway, indeed I can give you a scheduler that, provided it leaves
> in a cpupool with M pcpus, a soon as a new domain with n vcpus is moved
> inside the pool, statically assign its n0,n1...,nk,k<=M vcpus to a
> pcpu, and always stick with that. And we'll see what will happen! :-)

I am looking forward to it.

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Xen on ARM IRQ latency and scheduler overhead
  2017-02-16 19:52       ` Stefano Stabellini
@ 2017-02-16 23:07         ` Stefano Stabellini
  2017-02-17 11:02           ` Dario Faggioli
  0 siblings, 1 reply; 19+ messages in thread
From: Stefano Stabellini @ 2017-02-16 23:07 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: george.dunlap, edgar.iglesias, Dario Faggioli, julien.grall, xen-devel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1326 bytes --]

On Thu, 16 Feb 2017, Stefano Stabellini wrote:
> > > > >                         AVG     MIN     MAX     WARM MAX
> > > > > 
> > > > > NODEBUG no WFI          1890    1800    3170    2070
> > > > > NODEBUG WFI             4850    4810    7030    4980
> > > > > NODEBUG no WFI credit2  2217    2090    3420    2650
> > > > > NODEBUG WFI credit2     8080    7890    10320   8300
> > > > > 
> > > > > DEBUG no WFI            2252    2080    3320    2650
> > > > > DEBUG WFI               6500    6140    8520    8130
> > > > > DEBUG WFI, credit2      8050    7870    10680   8450
> > > > > 
> > > > > DEBUG means Xen DEBUG build.
> > > > > 
> > [...]
> > > > > As you can see, depending on whether the guest issues a WFI or
> > > > > not
> > > > > while
> > > > > waiting for interrupts, the results change significantly.
> > > > > Interestingly,
> > > > > credit2 does worse than credit1 in this area.
> > > > > 
> > > > This is with current staging right? 
> > > 
> > > That's right.
> > > 
> > So, when you have the chance, can I see the output of
> > 
> >  xl debug-key r
> >  xl dmesg
> > 
> > Both under Credit1 and Credit2?
> 
> I'll see what I can do.

See attached.

[-- Attachment #2: Type: TEXT/PLAIN, Size: 35358 bytes --]

(XEN) Xen version 4.9-unstable (sstabellini@) (aarch64-linux-gnu-gcc (Linaro GCC 2014.05) 4.9.1 20140422 (prerelease)) debug=y  Thu Feb 16 14:39:21 PST 2017
(XEN) Latest ChangeSet: Thu Feb 16 14:39:16 2017 -0800 git:4e0ef4d
(XEN) Processor: 410fd034: "ARM Limited", variant: 0x0, part 0xd03, rev 0x4
(XEN) 64-bit Execution:
(XEN)   Processor Features: 0000000000002222 0000000000000000
(XEN)     Exception Levels: EL3:64+32 EL2:64+32 EL1:64+32 EL0:64+32
(XEN)     Extensions: FloatingPoint AdvancedSIMD
(XEN)   Debug Features: 0000000010305106 0000000000000000
(XEN)   Auxiliary Features: 0000000000000000 0000000000000000
(XEN)   Memory Model Features: 0000000000001122 0000000000000000
(XEN)   ISA Features:  0000000000011120 0000000000000000
(XEN) 32-bit Execution:
(XEN)   Processor Features: 00000131:00011011
(XEN)     Instruction Sets: AArch32 A32 Thumb Thumb-2 Jazelle
(XEN)     Extensions: GenericTimer Security
(XEN)   Debug Features: 03010066
(XEN)   Auxiliary Features: 00000000
(XEN)   Memory Model Features: 10201105 40000000 01260000 02102211
(XEN)  ISA Features: 02101110 13112111 21232042 01112131 00011142 00011121
(XEN) Using PSCI-1.0 for SMP bringup
(XEN) Generic Timer IRQ: phys=30 hyp=26 virt=27 Freq: 99998 KHz
(XEN) GICv2 initialization:
(XEN)         gic_dist_addr=00000000f9010000
(XEN)         gic_cpu_addr=00000000f9020000
(XEN)         gic_hyp_addr=00000000f9040000
(XEN)         gic_vcpu_addr=00000000f9060000
(XEN)         gic_maintenance_irq=25
(XEN) GICv2: Adjusting CPU interface base to 0xf902f000
(XEN) GICv2: 192 lines, 4 cpus, secure (IID 0200143b).
(XEN) GICv2: WARNING: CPU0: Failed to configure IRQ26 as Edge-triggered. H/w forces to Level-triggered.
(XEN) GICv2: WARNING: CPU0: Failed to configure IRQ27 as Edge-triggered. H/w forces to Level-triggered.
(XEN) GICv2: WARNING: CPU0: Failed to configure IRQ30 as Edge-triggered. H/w forces to Level-triggered.
(XEN) Could not find scheduler: credit1
(XEN) Using 'SMP Credit Scheduler' (credit)
(XEN) Using scheduler: SMP Credit Scheduler (credit)
(XEN) Allocated console ring of 32 KiB.
(XEN) Bringing up CPU1
(XEN) GICv2: WARNING: CPU1: Failed to configure IRQ26 as Edge-triggered. H/w forces to Level-triggered.
(XEN) GICv2: WARNING: CPU1: Failed to configure IRQ27 as Edge-triggered. H/w forces to Level-triggered.
(XEN) GICv2: WARNING: CPU1: Failed to configure IRQ30 as Edge-triggered. H/w forces to Level-triggered.
(XEN) CPU 1 booted.
(XEN) Bringing up CPU2
(XEN) GICv2: WARNING: CPU2: Failed to configure IRQ26 as Edge-triggered. H/w forces to Level-triggered.
(XEN) GICv2: WARNING: CPU2: Failed to configure IRQ27 as Edge-triggered. H/w forces to Level-triggered.
(XEN) GICv2: WARNING: CPU2: Failed to configure IRQ30 as Edge-triggered. H/w forces to Level-triggered.
(XEN) CPU 2 booted.
(XEN) Bringing up CPU3
(XEN) GICv2: WARNING: CPU3: Failed to configure IRQ26 as Edge-triggered. H/w forces to Level-triggered.
(XEN) GICv2: WARNING: CPU3: Failed to configure IRQ27 as Edge-triggered. H/w forces to Level-triggered.
(XEN) GICv2: WARNING: CPU3: Failed to configure IRQ30 as Edge-triggered. H/w forces to Level-triggered.
(XEN) CPU 3 booted.
(XEN) Brought up 4 CPUs
(XEN) P2M: 40-bit IPA with 40-bit PA and 8-bit VMID
(XEN) P2M: 3 levels with order-1 root, VTCR 0x80023558
(XEN) smmu: /amba/smmu@fd800000: probing hardware configuration...
(XEN) smmu: /amba/smmu@fd800000: SMMUv2 with:
(XEN) smmu: /amba/smmu@fd800000:        stage 2 translation
(XEN) smmu: /amba/smmu@fd800000:        stream matching with 48 register groups, mask 0x7fff
(XEN) smmu: /amba/smmu@fd800000:        16 context banks (0 stage-2 only)
(XEN) smmu: /amba/smmu@fd800000:        Stage-2: 40-bit IPA -> 48-bit PA
(XEN) smmu: /amba/smmu@fd800000: registered 26 master devices
(XEN) I/O virtualisation enabled
(XEN)  - Dom0 mode: Relaxed
(XEN) Interrupt remapping enabled
(XEN) build-id: 3162bd9a1aaa9749ea1c5a484a7062f60ea73550
(XEN) alternatives: Patching with alt table 00000000400f3c38 -> 00000000400f3fec
(XEN) *** LOADING DOMAIN 0 ***
(XEN) Loading kernel from boot module @ 0000000000080000
(XEN) Loading ramdisk from boot module @ 00000000060af000
(XEN) Allocating 1:1 mappings totalling 1024MB for dom0:
(XEN) BANK[0] 0x00000040000000-0x00000080000000 (1024MB)
(XEN) Grant table range: 0x0000087fe00000-0x0000087fe61000
(XEN) smmu: /amba/smmu@fd800000: d0: p2maddr 0x000000087ff68000
(XEN) Loading zImage from 0000000000080000 to 0000000040080000-0000000043180000
(XEN) Loading dom0 initrd from 00000000060af000 to 0x0000000048200000-0x000000004a150400
(XEN) Allocating PPI 16 for event channel interrupt
(XEN) Loading dom0 DTB to 0x0000000048000000-0x0000000048006c4a
(XEN) Std. Loglevel: All
(XEN) Guest Loglevel: All
(XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch input to Xen)
(XEN) Freed 296kB init memory.
[    0.000000] Booting Linux on physical CPU 0x0
[    0.000000] Linux version 4.6.0-xilinx (xbrbbot@xcosswbld01) (gcc version 5.2.1 20151005 (Linaro GCC 5.2-2015.11-2) ) #1 SMP Thu Dec 15 18:01:07 MST 2016
[    0.000000] Boot CPU: AArch64 Processor [410fd034]
[    0.000000] earlycon: xen0 at I/O port 0x0 (options '')
[    0.000000] bootconsole [xen0] enabled
[    0.000000] efi: Getting EFI parameters from FDT:
[    0.000000] efi: UEFI not found.
[    0.000000] cma: Reserved 128 MiB at 0x0000000078000000
[    0.000000] psci: probing for conduit method from DT.
[    0.000000] psci: PSCIv0.2 detected in firmware.
[    0.000000] psci: Using standard PSCI v0.2 function IDs
[    0.000000] psci: Trusted OS migration not required
[    0.000000] Xen 4.9 support found
[    0.000000] percpu: Embedded 20 pages/cpu @ffffffc037f73000 s42240 r8192 d31488 u81920
[    0.000000] Detected VIPT I-cache on CPU0
[    0.000000] CPU features: enabling workaround for ARM erratum 845719
[    0.000000] Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 258560
[    0.000000] Kernel command line: console=hvc0 earlycon=xen earlyprintk=xen maxcpus=1
[    0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes)
[    0.000000] Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes)
[    0.000000] Inode-cache hash table entries: 65536 (order: 7, 524288 bytes)
[    0.000000] software IO TLB [mem 0x72e00000-0x76e00000] (64MB) mapped at [ffffffc032e00000-ffffffc036dfffff]
[    0.000000] Memory: 790568K/1048576K available (8136K kernel code, 561K rwdata, 2860K rodata, 460K init, 374K bss, 126936K reserved, 131072K cma-reserved)
[    0.000000] Virtual kernel memory layout:
[    0.000000]     modules : 0xffffff8000000000 - 0xffffff8008000000   (   128 MB)
[    0.000000]     vmalloc : 0xffffff8008000000 - 0xffffffbdffff0000   (   247 GB)
[    0.000000]       .text : 0xffffff8008080000 - 0xffffff8008870000   (  8128 KB)
[    0.000000]     .rodata : 0xffffff8008870000 - 0xffffff8008b3f000   (  2876 KB)
[    0.000000]       .init : 0xffffff8008b3f000 - 0xffffff8008bb2000   (   460 KB)
[    0.000000]       .data : 0xffffff8008bb2000 - 0xffffff8008c3e400   (   561 KB)
[    0.000000]     vmemmap : 0xffffffbe00000000 - 0xffffffbfc0000000   (     7 GB maximum)
[    0.000000]               0xffffffbe00000000 - 0xffffffbe00e00000   (    14 MB actual)
[    0.000000]     fixed   : 0xffffffbffe7fd000 - 0xffffffbffec00000   (  4108 KB)
[    0.000000]     PCI I/O : 0xffffffbffee00000 - 0xffffffbfffe00000   (    16 MB)
[    0.000000]     memory  : 0xffffffc000000000 - 0xffffffc040000000   (  1024 MB)
[    0.000000] Hierarchical RCU implementation.
[    0.000000]  Build-time adjustment of leaf fanout to 64.
[    0.000000]  RCU restricting CPUs from NR_CPUS=8 to nr_cpu_ids=4.
[    0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=64, nr_cpu_ids=4
[    0.000000] NR_IRQS:64 nr_irqs:64 0
(XEN) d0v0: vGICD: unhandled word write 0xffffffff to ICACTIVER4
(XEN) d0v0: vGICD: unhandled word write 0xffffffff to ICACTIVER8
(XEN) d0v0: vGICD: unhandled word write 0xffffffff to ICACTIVER12
(XEN) d0v0: vGICD: unhandled word write 0xffffffff to ICACTIVER16
(XEN) d0v0: vGICD: unhandled word write 0xffffffff to ICACTIVER20
(XEN) d0v0: vGICD: unhandled word write 0xffffffff to ICACTIVER0
[    0.000000] Architected cp15 timer(s) running at 99.99MHz (virt).
[    0.000000] clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0x171015c6fb, max_idle_ns: 440795203665 ns
[    0.000004] sched_clock: 56 bits at 99MHz, resolution 10ns, wraps every 4398046511101ns
[    0.008279] Console: colour dummy device 80x25
[    0.012382] console [hvc0] enabled
[    0.012382] console [hvc0] enabled
[    0.019204] bootconsole [xen0] disabled
[    0.019204] bootconsole [xen0] disabled
[    0.026913] Calibrating delay loop (skipped), value calculated using timer frequency.. 199.99 BogoMIPS (lpj=399995)
[    0.037362] pid_max: default: 32768 minimum: 301
[    0.042141] Mount-cache hash table entries: 2048 (order: 2, 16384 bytes)
[    0.048788] Mountpoint-cache hash table entries: 2048 (order: 2, 16384 bytes)
[    0.056539] ASID allocator initialised with 65536 entries
[    0.061530] xen:grant_table: Grant tables using version 1 layout
[    0.067499] Grant table initialized
[    0.071037] xen:events: Using FIFO-based ABI
[    0.075349] Xen: initializing cpu0
[    0.079399] zynqmp_plat_init: pm node not found
[    0.083456] EFI services will not be available.
[    0.088061] Brought up 1 CPUs
[    0.090992] SMP: Total of 1 processors activated.
[    0.095763] CPU: All CPU(s) started at EL1
[    0.099923] alternatives: patching kernel code
[    0.104878] devtmpfs: initialized
[    0.111860] DMI not present or invalid.
[    0.111979] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
[    0.124334] xor: measuring software checksum speed
[    0.166394]    8regs     :  2177.000 MB/sec
[    0.206457]    8regs_prefetch:  1934.000 MB/sec
[    0.246520]    32regs    :  2412.000 MB/sec
[    0.286585]    32regs_prefetch:  2178.000 MB/sec
[    0.286626] xor: using function: 32regs (2412.000 MB/sec)
[    0.292173] NET: Registered protocol family 16
[    0.296514] vdso: 2 pages (1 code @ ffffff8008876000, 1 data @ ffffff8008bb8000)
[    0.303066] hw-breakpoint: found 6 breakpoint and 4 watchpoint registers.
[    0.310704] DMA: preallocated 256 KiB pool for atomic allocations
[    0.316088] xen:swiotlb_xen: Warning: only able to allocate 4 MB for software IO TLB
[    0.324491] software IO TLB [mem 0x72000000-0x72400000] (4MB) mapped at [ffffffc032000000-ffffffc0323fffff]
[    0.334332] ARM CCI_400_r1 PMU driver probed
[    0.337722] /amba/dma@fd500000: could not find phandle
[    0.343190] /amba/dma@fd510000: could not find phandle
[    0.348368] /amba/dma@fd520000: could not find phandle
[    0.353562] /amba/dma@fd530000: could not find phandle
[    0.358746] /amba/dma@fd540000: could not find phandle
[    0.363946] /amba/dma@fd550000: could not find phandle
[    0.369129] /amba/dma@fd560000: could not find phandle
[    0.374321] /amba/dma@fd570000: could not find phandle
[    0.379625] /amba/dma@ffa80000: could not find phandle
[    0.384703] /amba/dma@ffa90000: could not find phandle
[    0.389886] /amba/dma@ffaa0000: could not find phandle
[    0.395083] /amba/dma@ffab0000: could not find phandle
[    0.400270] /amba/dma@ffac0000: could not find phandle
[    0.405464] /amba/dma@ffad0000: could not find phandle
[    0.410651] /amba/dma@ffae0000: could not find phandle
[    0.415844] /amba/dma@ffaf0000: could not find phandle
[    0.421126] /amba/ethernet@ff0e0000: could not find phandle
[    0.428928] /amba/spi@ff0f0000: could not find phandle
[    0.432102] /amba/sdhci@ff160000: could not find phandle
[    0.437225] /amba/sdhci@ff170000: could not find phandle
[    0.442638] /amba/usb@fe200000: could not find phandle
[    0.467070] HugeTLB registered 2 MB page size, pre-allocated 0 pages
[    0.536192] raid6: int64x1  gen()   409 MB/s
[    0.604225] raid6: int64x1  xor()   430 MB/s
[    0.672351] raid6: int64x2  gen()   617 MB/s
[    0.740502] raid6: int64x2  xor()   559 MB/s
[    0.808626] raid6: int64x4  gen()   907 MB/s
[    0.876750] raid6: int64x4  xor()   685 MB/s
[    0.944895] raid6: int64x8  gen()  1056 MB/s
[    1.012962] raid6: int64x8  xor()   683 MB/s
[    1.081131] raid6: neonx1   gen()   672 MB/s
[    1.149234] raid6: neonx1   xor()   702 MB/s
[    1.217382] raid6: neonx2   gen()  1044 MB/s
[    1.285484] raid6: neonx2   xor()   925 MB/s
[    1.353620] raid6: neonx4   gen()  1385 MB/s
[    1.421726] raid6: neonx4   xor()  1079 MB/s
[    1.489843] raid6: neonx8   gen()  1556 MB/s
[    1.557977] raid6: neonx8   xor()  1131 MB/s
[    1.558015] raid6: using algorithm neonx8 gen() 1556 MB/s
[    1.562155] raid6: .... xor() 1131 MB/s, rmw enabled
[    1.567169] raid6: using intx1 recovery algorithm
[    1.572162] ACPI: Interpreter disabled.
[    1.575851] xen:balloon: Initialising balloon driver
[    1.581157] xen_balloon: Initialising balloon driver
[    1.586566] SCSI subsystem initialized
[    1.589843] usbcore: registered new interface driver usbfs
[    1.595239] usbcore: registered new interface driver hub
[    1.600594] usbcore: registered new device driver usb
[    1.605749] media: Linux media interface: v0.10
[    1.610279] Linux video capture interface: v2.00
[    1.614952] pps_core: LinuxPPS API ver. 1 registered
[    1.619939] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@linux.it>
[    1.629124] PTP clock support registered
[    1.633131] EDAC MC: Ver: 3.0.0
[    1.637530] dmi: Firmware registration failed.
[    1.640838] FPGA manager framework
[    1.644297] Advanced Linux Sound Architecture Driver Initialized.
[    1.651805] clocksource: Switched to clocksource arch_sys_counter
[    1.656639] VFS: Disk quotas dquot_6.6.0
[    1.660555] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
[    1.667640] pnp: PnP ACPI: disabled
[    1.677722] NET: Registered protocol family 2
[    1.678115] TCP established hash table entries: 8192 (order: 4, 65536 bytes)
[    1.683729] TCP bind hash table entries: 8192 (order: 5, 131072 bytes)
[    1.690352] TCP: Hash tables configured (established 8192 bind 8192)
[    1.696668] UDP hash table entries: 512 (order: 2, 16384 bytes)
[    1.702620] UDP-Lite hash table entries: 512 (order: 2, 16384 bytes)
[    1.709187] NET: Registered protocol family 1
[    1.714143] RPC: Registered named UNIX socket transport module.
[    1.719368] RPC: Registered udp transport module.
[    1.724118] RPC: Registered tcp transport module.
[    1.728875] RPC: Registered tcp NFSv4.1 backchannel transport module.
[    1.735507] Trying to unpack rootfs image as initramfs...
[    1.829211] Freeing initrd memory: 32064K (ffffffc008200000 - ffffffc00a150000)
[    1.831837] futex hash table entries: 1024 (order: 5, 131072 bytes)
[    1.837381] audit: initializing netlink subsys (disabled)
[    1.842785] audit: type=2000 audit(1.716:1): initialized
[    1.849215] workingset: timestamp_bits=60 max_order=18 bucket_order=0
[    1.855369] NFS: Registering the id_resolver key type
[    1.859735] Key type id_resolver registered
[    1.863946] Key type id_legacy registered
[    1.868024] nfs4filelayout_init: NFSv4 File Layout Driver Registering...
[    1.874794] jffs2: version 2.2. (NAND) (SUMMARY)  © 2001-2006 Red Hat, Inc.
[    1.890483] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 247)
[    1.892368] io scheduler noop registered
[    1.896338] io scheduler deadline registered
[    1.900671] io scheduler cfq registered (default)
[    1.905913] nwl-pcie fd0e0000.pcie: Link is DOWN
[    1.910114] PCI host bridge /amba/pcie@fd0e0000 ranges:
[    1.915355]   No bus range found for /amba/pcie@fd0e0000, using [bus 00-ff]
[    1.922364]   MEM 0xe0000000..0xefffffff -> 0xe0000000
[    1.927547]   MEM 0x600000000..0x7ffffffff -> 0x600000000
[    1.933079] nwl-pcie fd0e0000.pcie: PCI host bridge to bus 0000:00
[    1.939225] pci_bus 0000:00: root bus resource [bus 00-ff]
[    1.944764] pci_bus 0000:00: root bus resource [mem 0xe0000000-0xefffffff]
[    1.951682] pci_bus 0000:00: root bus resource [mem 0x600000000-0x7ffffffff pref]
(XEN) physdev.c:17:d0v0 PHYSDEVOP cmd=25: not implemented
(XEN) physdev.c:17:d0v0 PHYSDEVOP cmd=15: not implemented
[    1.969418] pci 0000:00:00.0: Failed to add - passthrough or MSI/MSI-X might fail!
[    1.977117] pci 0000:00:00.0: PCI bridge to [bus 01-0c]
[    1.983421] xilinx-dpdma fd4c0000.dma: Xilinx DPDMA engine is probed
[    1.988983] xilinx-zynqmp-dma fd500000.dma: ZynqMP DMA driver Probe success
[    1.995839] xilinx-zynqmp-dma fd510000.dma: ZynqMP DMA driver Probe success
[    2.002833] xilinx-zynqmp-dma fd520000.dma: ZynqMP DMA driver Probe success
[    2.009840] xilinx-zynqmp-dma fd530000.dma: ZynqMP DMA driver Probe success
[    2.016839] xilinx-zynqmp-dma fd540000.dma: ZynqMP DMA driver Probe success
[    2.023862] xilinx-zynqmp-dma fd550000.dma: ZynqMP DMA driver Probe success
[    2.030855] xilinx-zynqmp-dma fd560000.dma: ZynqMP DMA driver Probe success
[    2.037859] xilinx-zynqmp-dma fd570000.dma: ZynqMP DMA driver Probe success
[    2.044870] xilinx-zynqmp-dma ffa80000.dma: ZynqMP DMA driver Probe success
[    2.051885] xilinx-zynqmp-dma ffa90000.dma: ZynqMP DMA driver Probe success
[    2.058876] xilinx-zynqmp-dma ffaa0000.dma: ZynqMP DMA driver Probe success
[    2.065892] xilinx-zynqmp-dma ffab0000.dma: ZynqMP DMA driver Probe success
[    2.072891] xilinx-zynqmp-dma ffac0000.dma: ZynqMP DMA driver Probe success
[    2.079906] xilinx-zynqmp-dma ffad0000.dma: ZynqMP DMA driver Probe success
[    2.086911] xilinx-zynqmp-dma ffae0000.dma: ZynqMP DMA driver Probe success
[    2.093913] xilinx-zynqmp-dma ffaf0000.dma: ZynqMP DMA driver Probe success
[    2.102037] xen:xen_evtchn: Event-channel device installed
[    2.153318] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
[    2.155458] ff010000.serial: ttyPS0 at MMIO 0xff010000 (irq = 215, base_baud = 6250000) is a xuartps
[    2.164196] [drm] Initialized drm 1.1.0 20060810
[    2.168225] Unable to detect cache hierarchy from DT for CPU 0
[    2.178951] brd: module loaded
[    2.182686] loop: module loaded
[    2.182731] Invalid max_queues (4), will use default max: 1.
[    2.186734] ahci-ceva fd0c0000.ahci: AHCI 0001.0301 32 slots 2 ports 6 Gbps 0x3 impl platform mode
[    2.195012] ahci-ceva fd0c0000.ahci: flags: 64bit ncq sntf pm clo only pmp fbs pio slum part ccc sds apst 
[    2.206047] scsi host0: ahci-ceva
[    2.208400] scsi host1: ahci-ceva
[    2.211527] ata1: SATA max UDMA/133 mmio [mem 0xfd0c0000-0xfd0c1fff] port 0x100 irq 212
[    2.219494] ata2: SATA max UDMA/133 mmio [mem 0xfd0c0000-0xfd0c1fff] port 0x180 irq 212
[    2.227654] mtdoops: mtd device (mtddev=name/number) must be supplied
[    2.234416] zynqmp-qspi ff0f0000.spi: rx bus width not found
[    2.240531] m25p80 spi0.0: found n25q512a, expected m25p80
[    2.245489] m25p80 spi0.0: n25q512a (131072 Kbytes)
[    2.250216] 4 ofpart partitions found on MTD device spi0.0
[    2.255731] Creating 4 MTD partitions on "spi0.0":
[    2.260573] 0x000000000000-0x000001e00000 : "boot"
[    2.265964] 0x000001e00000-0x000001e40000 : "bootenv"
[    2.271076] 0x000001e40000-0x000004240000 : "kernel"
[    2.276114] 0x000004240000-0x000008000000 : "spare"
[    2.282518] libphy: Fixed MDIO Bus: probed
[    2.285069] tun: Universal TUN/TAP device driver, 1.6
[    2.289714] tun: (C) 1999-2004 Max Krasnyansky <maxk@qualcomm.com>
[    2.299512] CAN device driver interface
[    2.300696] gpiod_set_value: invalid GPIO
[    2.304298] libphy: MACB_mii_bus: probed
[    2.311376] macb ff0e0000.ethernet eth0: Cadence GEM rev 0x50070106 at 0xff0e0000 irq 25 (00:0a:35:00:22:01)
[    2.317773] TI DP83867 ff0e0000.etherne:0c: attached PHY driver [TI DP83867] (mii_bus:phy_addr=ff0e0000.etherne:0c, irq=-1)
[    2.329427] xen_netfront: Initialising Xen virtual ethernet driver
[    2.335218] usbcore: registered new interface driver asix
[    2.340620] usbcore: registered new interface driver ax88179_178a
[    2.346749] usbcore: registered new interface driver cdc_ether
[    2.352629] usbcore: registered new interface driver net1080
[    2.358339] usbcore: registered new interface driver cdc_subset
[    2.364306] usbcore: registered new interface driver zaurus
[    2.369953] usbcore: registered new interface driver cdc_ncm
[    2.377229] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
[    2.382206] ehci-pci: EHCI PCI platform driver
[    2.386955] xhci-hcd xhci-hcd.0.auto: xHCI Host Controller
[    2.392245] xhci-hcd xhci-hcd.0.auto: new USB bus registered, assigned bus number 1
[    2.400870] xhci-hcd xhci-hcd.0.auto: hcc params 0x0238f625 hci version 0x100 quirks 0x00010010
[    2.408717] xhci-hcd xhci-hcd.0.auto: irq 220, io mem 0xfe200000
[    2.414936] usb usb1: New USB device found, idVendor=1d6b, idProduct=0002
[    2.421555] usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[    2.428815] usb usb1: Product: xHCI Host Controller
[    2.433743] usb usb1: Manufacturer: Linux 4.6.0-xilinx xhci-hcd
[    2.439707] usb usb1: SerialNumber: xhci-hcd.0.auto
[    2.444940] hub 1-0:1.0: USB hub found
[    2.448559] hub 1-0:1.0: 1 port detected
[    2.452549] xhci-hcd xhci-hcd.0.auto: xHCI Host Controller
[    2.457971] xhci-hcd xhci-hcd.0.auto: new USB bus registered, assigned bus number 2
[    2.465711] usb usb2: We don't know the algorithms for LPM for this host, disabling LPM.
[    2.473878] usb usb2: New USB device found, idVendor=1d6b, idProduct=0003
[    2.480630] usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[    2.487889] usb usb2: Product: xHCI Host Controller
[    2.492823] usb usb2: Manufacturer: Linux 4.6.0-xilinx xhci-hcd
[    2.498787] usb usb2: SerialNumber: xhci-hcd.0.auto
[    2.503975] hub 2-0:1.0: USB hub found
[    2.507586] hub 2-0:1.0: 1 port detected
[    2.511689] usbcore: registered new interface driver uas
[    2.516936] usbcore: registered new interface driver usb-storage
[    2.523312] mousedev: PS/2 mouse device common for all mice
[    2.529083] rtc_zynqmp ffa60000.rtc: rtc core: registered ffa60000.rtc as rtc0
[    2.535862] i2c /dev entries driver
[    2.541605] pca953x 0-0020: interrupt support not compiled in
[    2.546955] pca953x 0-0021: interrupt support not compiled in
[    2.551014] cdns-i2c ff020000.i2c: 400 kHz mmio ff020000 irq 201
[    2.557191] ata2: SATA link down (SStatus 0 SControl 330)
[    2.562484] ata1: SATA link down (SStatus 0 SControl 330)
[    2.568159] cdns-i2c ff030000.i2c: 400 kHz mmio ff030000 irq 202
[    2.574911] i2c i2c-0: Added multiplexed i2c bus 2
[    2.579665] i2c i2c-0: Added multiplexed i2c bus 3
[    2.584655] i2c i2c-0: Added multiplexed i2c bus 4
[    2.589068] i2c i2c-0: Added multiplexed i2c bus 5
[    2.593352] pca954x 0-0075: registered 4 multiplexed busses for I2C mux pca9544
[    2.601179] at24 6-0054: 1024 byte 24c08 EEPROM, writable, 1 bytes/write
[    2.607451] i2c i2c-6: Failed to register i2c client 24c08 at 0x55 (-16)
[    2.614197] i2c i2c-6: of_i2c: Failure registering /amba/i2c@ff030000/i2cswitch@74/i2c@0/eeprom@55
[    2.623186] i2c i2c-6: Failed to register i2c client 24c08 at 0x56 (-16)
[    2.629927] i2c i2c-6: of_i2c: Failure registering /amba/i2c@ff030000/i2cswitch@74/i2c@0/eeprom@56
[    2.638934] i2c i2c-6: Failed to register i2c client 24c08 at 0x57 (-16)
[    2.645677] i2c i2c-6: of_i2c: Failure registering /amba/i2c@ff030000/i2cswitch@74/i2c@0/eeprom@57
[    2.654674] i2c i2c-1: Added multiplexed i2c bus 6
[    2.659968] i2c i2c-1: Added multiplexed i2c bus 7
[    2.664636] i2c i2c-1: Added multiplexed i2c bus 8
[    2.669482] i2c i2c-1: Added multiplexed i2c bus 9
[    2.674317] i2c i2c-1: Added multiplexed i2c bus 10
[    2.679198] i2c i2c-1: Added multiplexed i2c bus 11
[    2.684097] i2c i2c-1: Added multiplexed i2c bus 12
[    2.689037] i2c i2c-1: Added multiplexed i2c bus 13
[    2.693771] pca954x 1-0074: registered 8 multiplexed busses for I2C switch pca9548
[    2.701596] i2c i2c-1: Added multiplexed i2c bus 14
[    2.706487] i2c i2c-1: Added multiplexed i2c bus 15
[    2.711445] i2c i2c-1: Added multiplexed i2c bus 16
[    2.716584] i2c i2c-1: Added multiplexed i2c bus 17
[    2.721445] i2c i2c-1: Added multiplexed i2c bus 18
[    2.726229] i2c i2c-1: Added multiplexed i2c bus 19
[    2.731187] i2c i2c-1: Added multiplexed i2c bus 20
[    2.736091] i2c i2c-1: Added multiplexed i2c bus 21
[    2.740828] pca954x 1-0075: registered 8 multiplexed busses for I2C switch pca9548
[    2.750093] ina2xx 2-0040: power monitor ina226 (Rshunt = 5000 uOhm)
[    2.755179] ina2xx 2-0041: power monitor ina226 (Rshunt = 5000 uOhm)
[    2.761582] ina2xx 2-0042: power monitor ina226 (Rshunt = 5000 uOhm)
[    2.768009] ina2xx 2-0043: power monitor ina226 (Rshunt = 5000 uOhm)
[    2.774384] ina2xx 2-0044: power monitor ina226 (Rshunt = 5000 uOhm)
[    2.780774] ina2xx 2-0045: power monitor ina226 (Rshunt = 5000 uOhm)
[    2.787182] ina2xx 2-0046: power monitor ina226 (Rshunt = 5000 uOhm)
[    2.793590] ina2xx 2-0047: power monitor ina226 (Rshunt = 5000 uOhm)
[    2.799996] ina2xx 2-004a: power monitor ina226 (Rshunt = 5000 uOhm)
[    2.806384] ina2xx 2-004b: power monitor ina226 (Rshunt = 5000 uOhm)
[    2.812869] ina2xx 3-0040: power monitor ina226 (Rshunt = 2000 uOhm)
[    2.819187] ina2xx 3-0041: power monitor ina226 (Rshunt = 5000 uOhm)
[    2.825588] ina2xx 3-0042: power monitor ina226 (Rshunt = 5000 uOhm)
[    2.831994] ina2xx 3-0043: power monitor ina226 (Rshunt = 5000 uOhm)
[    2.838399] ina2xx 3-0044: power monitor ina226 (Rshunt = 5000 uOhm)
[    2.844799] ina2xx 3-0045: power monitor ina226 (Rshunt = 5000 uOhm)
[    2.851191] ina2xx 3-0046: power monitor ina226 (Rshunt = 5000 uOhm)
[    2.857596] ina2xx 3-0047: power monitor ina226 (Rshunt = 5000 uOhm)
[    2.892592] EDAC MC: ECC not enabled
[    2.892881] EDAC DEVICE0: Giving out device to module zynqmp-ocm-edac controller zynqmp_ocm: DEV ff960000.memory-controller (INTERRUPT)
[    2.902927] sdhci: Secure Digital Host Controller Interface driver
[    2.909051] sdhci: Copyright(c) Pierre Ossman
[    2.913467] sdhci-pltfm: SDHCI platform and OF driver helper
[    2.919334] sdhci-arasan ff160000.sdhci: "xlnx,mio_bank " property is missing.
[    2.926451] sdhci-arasan: probe of ff160000.sdhci failed with error -22
[    2.971794] mmc0: SDHCI controller on ff170000.sdhci [ff170000.sdhci] using ADMA 64-bit
[    2.974397] ledtrig-cpu: registered to indicate activity on CPUs
[    2.980436] usbcore: registered new interface driver usbhid
[    2.985915] usbhid: USB HID core driver
[    3.006852] si570 8-005d: registered, current frequency 300000000 Hz
[    3.031440] si570 9-005d: registered, current frequency 148500000 Hz
[    3.032480] fpga_manager fpga0: Xilinx ZynqMp FPGA Manager registered
[    3.039447] pktgen: Packet Generator for packet performance testing. Version: 2.75
[    3.046764] Initializing XFRM netlink socket
[    3.050727] NET: Registered protocol family 10
[    3.057680] sit: IPv6 over IPv4 tunneling driver
[    3.060216] NET: Registered protocol family 17
[    3.064351] NET: Registered protocol family 15
[    3.068838] can: controller area network core (rev 20120528 abi 9)
[    3.075093] NET: Registered protocol family 29
[    3.079572] can: raw protocol (rev 20120528)
[    3.083890] can: broadcast manager protocol (rev 20120528 t)
[    3.089600] can: netlink gateway (rev 20130117) max_hops=1
[    3.095299] 9pnet: Installing 9P2000 support
[    3.099481] Key type dns_resolver registered
[    3.106448] registered taskstats version 1
[    3.108418] Btrfs loaded
[    3.114808] rtc_zynqmp ffa60000.rtc: setting system clock to 1986-12-22 00:43:37 UTC (535596217)
[    3.119494] ALSA device list:
[    3.122394]   No soundcards found.
[    3.129402] Freeing unused kernel memory: 460K (ffffff8008b3f000 - ffffff8008bb2000)
INIT: version 2.88 booting
[    4.087687] random: dd urandom read with 7 bits of entropy available
Fri Dec 16 01:23:11 UTC 2016
Starting internet superserver: inetd.
INIT: Entering runlevel: 5
Configuring network interfaces... [    4.186752] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
udhcpc (v1.24.1) started
Sending discover...
[    6.308108] macb ff0e0000.ethernet eth0: link up (1000/Full)
[    6.308242] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Sending discover...
Sending select for 10.0.0.191...
Lease of 10.0.0.191 obtained, lease time 604800
/etc/udhcpc.d/50default: Adding DNS 75.75.75.75
/etc/udhcpc.d/50default: Adding DNS 75.75.76.76
done.
Starting OpenBSD Secure Shell server: sshd
  generating ssh RSA key...
  generating ssh ECDSA key...
  generating ssh DSA key...
  generating ssh ED25519 key...
done.
Starting syslogd/klogd: done
Starting /usr/sbin/xenstored....
Setting domain 0 name, domid and JSON config...
libxl: error: libxl.c:689:libxl_domain_info: getting domain info list: Permission denied
libxl: error: libxl_dom.c:1127:libxl__userdata_path: unable to find domain info for domain 0: Permission denied
cannot store stub json config for domain 0
Starting xenconsoled...
Starting QEMU as disk backend for dom0
/etc/rc5.d/S80xencommons: line 102: /usr/bin/qemu-system-i386: No such file or directory
Starting domain watchdog daemon: xenwatchdogd startup


PetaLinux 2016.4 plnx_aarch64 /dev/hvc0

plnx_aarch64 login: root
root@plnx_aarch64:~# [   33.180039] random: nonblocking pool is initialized
(XEN) 
(XEN) TBM Jan 25 2017 12:45:38 running on xen-guest_irq_latency
(XEN) MIDR=410fd034
(XEN) D$ L1 32KB linesize=64b ways=4 sets=128
(XEN) I$ L1 32KB linesize=64b ways=2 sets=256
(XEN) D$ L2 1024KB linesize=64b ways=16 sets=1024
(XEN) I$ L2 1024KB linesize=64b ways=16 sets=1024
(XEN) plat_init_a64
(XEN) EL1 I and D caches enabled
(XEN) RDO: enable MMU
(XEN) RDO: Still here? EL1 S1 MMU is on
(XEN) CNTFRQ=99998999 Hz k=10
(XEN) latency (ns): max=8470 warm_max=0 min=8470 avg=8470
(XEN) latency (ns): max=8470 warm_max=0 min=6700 avg=8249
(XEN) latency (ns): max=8470 warm_max=6580 min=6580 avg=8041
(XEN) latency (ns): max=8470 warm_max=6600 min=6580 avg=7861
(XEN) latency (ns): max=8470 warm_max=6600 min=6530 avg=7695
(XEN) latency (ns): max=8470 warm_max=6600 min=6330 avg=7525
(XEN) latency (ns): max=8470 warm_max=6600 min=6330 avg=7392
(XEN) latency (ns): max=8470 warm_max=6600 min=6330 avg=7281
(XEN) latency (ns): max=8470 warm_max=6600 min=6330 avg=7178
(XEN) latency (ns): max=8470 warm_max=6600 min=6330 avg=7102
(XEN) latency (ns): max=8470 warm_max=6600 min=6330 avg=7020
(XEN) latency (ns): max=8470 warm_max=6600 min=6330 avg=6948
(XEN) latency (ns): max=8470 warm_max=6600 min=6330 avg=6886
(XEN) latency (ns): max=8470 warm_max=6600 min=6330 avg=6837
(XEN) latency (ns): max=8470 warm_max=6600 min=6330 avg=6794
(XEN) latency (ns): max=8470 warm_max=6600 min=6330 avg=6769
(XEN) latency (ns): max=8470 warm_max=6600 min=6330 avg=6725
(XEN) latency (ns): max=8470 warm_max=6600 min=6330 avg=6696
(XEN) latency (ns): max=8470 warm_max=6600 min=6330 avg=6663
(XEN) latency (ns): max=8470 warm_max=6600 min=6330 avg=6644
(XEN) latency (ns): max=8470 warm_max=6600 min=6330 avg=6628
(XEN) latency (ns): max=8470 warm_max=6600 min=6330 avg=6616
(XEN) latency (ns): max=8470 warm_max=6600 min=6330 avg=6591
(XEN) latency (ns): max=8470 warm_max=6600 min=6330 avg=6565
(XEN) latency (ns): max=8470 warm_max=6660 min=6330 avg=6576
(XEN) latency (ns): max=8470 warm_max=6660 min=6330 avg=6558
(XEN) latency (ns): max=8470 warm_max=6660 min=6180 avg=6511
(XEN) latency (ns): max=8470 warm_max=6870 min=6180 avg=6555
(XEN) latency (ns): max=8470 warm_max=6870 min=6180 avg=6535
(XEN) latency (ns): max=8470 warm_max=6870 min=6180 avg=6522
(XEN) latency (ns): max=8470 warm_max=6870 min=6180 avg=6507
(XEN) latency (ns): max=8470 warm_max=6870 min=6180 avg=6497
(XEN) latency (ns): max=8470 warm_max=6870 min=6180 avg=6494
(XEN) latency (ns): max=8470 warm_max=6870 min=6180 avg=6484
(XEN) latency (ns): max=8470 warm_max=6870 min=6180 avg=6478
(XEN) latency (ns): max=8470 warm_max=6870 min=6180 avg=6464
(XEN) latency (ns): max=8470 warm_max=6870 min=6180 avg=6461
(XEN) latency (ns): max=8470 warm_max=6870 min=6180 avg=6462
(XEN) latency (ns): max=8470 warm_max=6870 min=6180 avg=6457
(XEN) sched_smt_power_savings: disabled
(XEN) NOW=56019474589
(XEN) Online Cpus: 0-3
(XEN) Cpupool 0:
(XEN) Cpus: 0-3
(XEN) Scheduler: SMP Credit Scheduler (credit)
(XEN) info:
(XEN)   ncpus              = 4
(XEN)   master             = 0
(XEN)   credit             = 1200
(XEN)   credit balance     = 0
(XEN)   weight             = 256
(XEN)   runq_sort          = 260
(XEN)   default-weight     = 256
(XEN)   tslice             = 30ms
(XEN)   ratelimit          = 1000us
(XEN)   credits per msec   = 10
(XEN)   ticks per tslice   = 3
(XEN)   migration delay    = 0us
(XEN) idlers: 00000000,00000000,00000000,0000000e
(XEN) active vcpus:
(XEN)     1: [0.0] pri=0 flags=0 cpu=0 credit=96 [w=256,cap=0]
(XEN) CPUs info:
(XEN) CPU[00]  sort=260, sibling=00000000,00000000,00000000,00000001, core=00000000,00000000,00000000,00000001
(XEN)   run: [0.0] pri=0 flags=0 cpu=0 credit=96 [w=256,cap=0]
(XEN)     1: [32767.0] pri=-64 flags=0 cpu=0
(XEN) CPU[01]  sort=260, sibling=00000000,00000000,00000000,00000002, core=00000000,00000000,00000000,00000002
(XEN)   run: [32767.1] pri=-64 flags=0 cpu=1
(XEN) CPU[02]  sort=260, sibling=00000000,00000000,00000000,00000004, core=00000000,00000000,00000000,00000004
(XEN)   run: [32767.2] pri=-64 flags=0 cpu=2
(XEN) CPU[03]  sort=260, sibling=00000000,00000000,00000000,00000008, core=00000000,00000000,00000000,00000008
(XEN)   run: [32767.3] pri=-64 flags=0 cpu=3
(XEN) latency (ns): max=8470 warm_max=7840 min=6180 avg=6629
(XEN) latency (ns): max=8470 warm_max=7840 min=6180 avg=6596
(XEN) latency (ns): max=8470 warm_max=7840 min=6180 avg=6579
(XEN) latency (ns): max=8470 warm_max=7840 min=6180 avg=6570
(XEN) latency (ns): max=8470 warm_max=7840 min=6180 avg=6562
(XEN) latency (ns): max=8470 warm_max=7840 min=6180 avg=6555
(XEN) latency (ns): max=8470 warm_max=7840 min=6180 avg=6534
(XEN) latency (ns): max=8470 warm_max=7840 min=6180 avg=6505
(XEN) latency (ns): max=8470 warm_max=7840 min=6180 avg=6480
(XEN) latency (ns): max=8470 warm_max=7840 min=6180 avg=6473
(XEN) latency (ns): max=8470 warm_max=7840 min=6180 avg=6459
(XEN) latency (ns): max=8470 warm_max=7840 min=6180 avg=6448
(XEN) latency (ns): max=8470 warm_max=7840 min=6180 avg=6447
(XEN) latency (ns): max=8470 warm_max=7840 min=6180 avg=6438
(XEN) latency (ns): max=8470 warm_max=7840 min=6180 avg=6427
(XEN) latency (ns): max=8470 warm_max=7840 min=6180 avg=6432
(XEN) latency (ns): max=8470 warm_max=7840 min=6180 avg=6423
(XEN) latency (ns): max=8470 warm_max=7840 min=6180 avg=6425
(XEN) latency (ns): max=8470 warm_max=7840 min=6180 avg=6414


[-- Attachment #3: Type: TEXT/PLAIN, Size: 36288 bytes --]

(XEN) Xen version 4.9-unstable (sstabellini@) (aarch64-linux-gnu-gcc (Linaro GCC 2014.05) 4.9.1 20140422 (prerelease)) debug=y  Thu Feb 16 14:55:12 PST 2017
(XEN) Latest ChangeSet: Thu Feb 16 14:39:16 2017 -0800 git:4e0ef4d-dirty
(XEN) Processor: 410fd034: "ARM Limited", variant: 0x0, part 0xd03, rev 0x4
(XEN) 64-bit Execution:
(XEN)   Processor Features: 0000000000002222 0000000000000000
(XEN)     Exception Levels: EL3:64+32 EL2:64+32 EL1:64+32 EL0:64+32
(XEN)     Extensions: FloatingPoint AdvancedSIMD
(XEN)   Debug Features: 0000000010305106 0000000000000000
(XEN)   Auxiliary Features: 0000000000000000 0000000000000000
(XEN)   Memory Model Features: 0000000000001122 0000000000000000
(XEN)   ISA Features:  0000000000011120 0000000000000000
(XEN) 32-bit Execution:
(XEN)   Processor Features: 00000131:00011011
(XEN)     Instruction Sets: AArch32 A32 Thumb Thumb-2 Jazelle
(XEN)     Extensions: GenericTimer Security
(XEN)   Debug Features: 03010066
(XEN)   Auxiliary Features: 00000000
(XEN)   Memory Model Features: 10201105 40000000 01260000 02102211
(XEN)  ISA Features: 02101110 13112111 21232042 01112131 00011142 00011121
(XEN) Using PSCI-1.0 for SMP bringup
(XEN) Generic Timer IRQ: phys=30 hyp=26 virt=27 Freq: 99998 KHz
(XEN) GICv2 initialization:
(XEN)         gic_dist_addr=00000000f9010000
(XEN)         gic_cpu_addr=00000000f9020000
(XEN)         gic_hyp_addr=00000000f9040000
(XEN)         gic_vcpu_addr=00000000f9060000
(XEN)         gic_maintenance_irq=25
(XEN) GICv2: Adjusting CPU interface base to 0xf902f000
(XEN) GICv2: 192 lines, 4 cpus, secure (IID 0200143b).
(XEN) GICv2: WARNING: CPU0: Failed to configure IRQ26 as Edge-triggered. H/w forces to Level-triggered.
(XEN) GICv2: WARNING: CPU0: Failed to configure IRQ27 as Edge-triggered. H/w forces to Level-triggered.
(XEN) GICv2: WARNING: CPU0: Failed to configure IRQ30 as Edge-triggered. H/w forces to Level-triggered.
(XEN) Using scheduler: SMP Credit Scheduler rev2 (credit2)
(XEN) Initializing Credit2 scheduler
(XEN)  load_precision_shift: 18
(XEN)  load_window_shift: 30
(XEN)  underload_balance_tolerance: 0
(XEN)  overload_balance_tolerance: -3
(XEN)  runqueues arrangement: socket
(XEN) load tracking window length 1073741824 ns
(XEN) Adding cpu 0 to runqueue 0
(XEN)  First cpu on runqueue, activating
(XEN) Allocated console ring of 32 KiB.
(XEN) Bringing up CPU1
(XEN) GICv2: WARNING: CPU1: Failed to configure IRQ26 as Edge-triggered. H/w forces to Level-triggered.
(XEN) GICv2: WARNING: CPU1: Failed to configure IRQ27 as Edge-triggered. H/w forces to Level-triggered.
(XEN) GICv2: WARNING: CPU1: Failed to configure IRQ30 as Edge-triggered. H/w forces to Level-triggered.
(XEN) Adding cpu 1 to runqueue 0
(XEN) CPU 1 booted.
(XEN) Bringing up CPU2
(XEN) GICv2: WARNING: CPU2: Failed to configure IRQ26 as Edge-triggered. H/w forces to Level-triggered.
(XEN) GICv2: WARNING: CPU2: Failed to configure IRQ27 as Edge-triggered. H/w forces to Level-triggered.
(XEN) GICv2: WARNING: CPU2: Failed to configure IRQ30 as Edge-triggered. H/w forces to Level-triggered.
(XEN) Adding cpu 2 to runqueue 0
(XEN) CPU 2 booted.
(XEN) Bringing up CPU3
(XEN) GICv2: WARNING: CPU3: Failed to configure IRQ26 as Edge-triggered. H/w forces to Level-triggered.
(XEN) GICv2: WARNING: CPU3: Failed to configure IRQ27 as Edge-triggered. H/w forces to Level-triggered.
(XEN) GICv2: WARNING: CPU3: Failed to configure IRQ30 as Edge-triggered. H/w forces to Level-triggered.
(XEN) Adding cpu 3 to runqueue 0
(XEN) CPU 3 booted.
(XEN) Brought up 4 CPUs
(XEN) P2M: 40-bit IPA with 40-bit PA and 8-bit VMID
(XEN) P2M: 3 levels with order-1 root, VTCR 0x80023558
(XEN) smmu: /amba/smmu@fd800000: probing hardware configuration...
(XEN) smmu: /amba/smmu@fd800000: SMMUv2 with:
(XEN) smmu: /amba/smmu@fd800000:        stage 2 translation
(XEN) smmu: /amba/smmu@fd800000:        stream matching with 48 register groups, mask 0x7fff
(XEN) smmu: /amba/smmu@fd800000:        16 context banks (0 stage-2 only)
(XEN) smmu: /amba/smmu@fd800000:        Stage-2: 40-bit IPA -> 48-bit PA
(XEN) smmu: /amba/smmu@fd800000: registered 26 master devices
(XEN) I/O virtualisation enabled
(XEN)  - Dom0 mode: Relaxed
(XEN) Interrupt remapping enabled
(XEN) build-id: 6f4fcc598664b1af0ae7b2e2f64b6ed3f140ef2f
(XEN) alternatives: Patching with alt table 00000000400f3c48 -> 00000000400f3ffc
(XEN) *** LOADING DOMAIN 0 ***
(XEN) Loading kernel from boot module @ 0000000000080000
(XEN) Loading ramdisk from boot module @ 00000000060af000
(XEN) Allocating 1:1 mappings totalling 1024MB for dom0:
(XEN) BANK[0] 0x00000040000000-0x00000080000000 (1024MB)
(XEN) Grant table range: 0x0000087fe00000-0x0000087fe61000
(XEN) smmu: /amba/smmu@fd800000: d0: p2maddr 0x000000087ff62000
(XEN) Loading zImage from 0000000000080000 to 0000000040080000-0000000043180000
(XEN) Loading dom0 initrd from 00000000060af000 to 0x0000000048200000-0x000000004a150400
(XEN) Allocating PPI 16 for event channel interrupt
(XEN) Loading dom0 DTB to 0x0000000048000000-0x0000000048006c4a
(XEN) Std. Loglevel: All
(XEN) Guest Loglevel: All
(XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch input to Xen)
(XEN) Freed 296kB init memory.
[    0.000000] Booting Linux on physical CPU 0x0
[    0.000000] Linux version 4.6.0-xilinx (xbrbbot@xcosswbld01) (gcc version 5.2.1 20151005 (Linaro GCC 5.2-2015.11-2) ) #1 SMP Thu Dec 15 18:01:07 MST 2016
[    0.000000] Boot CPU: AArch64 Processor [410fd034]
[    0.000000] earlycon: xen0 at I/O port 0x0 (options '')
[    0.000000] bootconsole [xen0] enabled
[    0.000000] efi: Getting EFI parameters from FDT:
[    0.000000] efi: UEFI not found.
[    0.000000] cma: Reserved 128 MiB at 0x0000000078000000
[    0.000000] psci: probing for conduit method from DT.
[    0.000000] psci: PSCIv0.2 detected in firmware.
[    0.000000] psci: Using standard PSCI v0.2 function IDs
[    0.000000] psci: Trusted OS migration not required
[    0.000000] Xen 4.9 support found
[    0.000000] percpu: Embedded 20 pages/cpu @ffffffc037f73000 s42240 r8192 d31488 u81920
[    0.000000] Detected VIPT I-cache on CPU0
[    0.000000] CPU features: enabling workaround for ARM erratum 845719
[    0.000000] Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 258560
[    0.000000] Kernel command line: console=hvc0 earlycon=xen earlyprintk=xen maxcpus=1
[    0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes)
[    0.000000] Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes)
[    0.000000] Inode-cache hash table entries: 65536 (order: 7, 524288 bytes)
[    0.000000] software IO TLB [mem 0x72e00000-0x76e00000] (64MB) mapped at [ffffffc032e00000-ffffffc036dfffff]
[    0.000000] Memory: 790568K/1048576K available (8136K kernel code, 561K rwdata, 2860K rodata, 460K init, 374K bss, 126936K reserved, 131072K cma-reserved)
[    0.000000] Virtual kernel memory layout:
[    0.000000]     modules : 0xffffff8000000000 - 0xffffff8008000000   (   128 MB)
[    0.000000]     vmalloc : 0xffffff8008000000 - 0xffffffbdffff0000   (   247 GB)
[    0.000000]       .text : 0xffffff8008080000 - 0xffffff8008870000   (  8128 KB)
[    0.000000]     .rodata : 0xffffff8008870000 - 0xffffff8008b3f000   (  2876 KB)
[    0.000000]       .init : 0xffffff8008b3f000 - 0xffffff8008bb2000   (   460 KB)
[    0.000000]       .data : 0xffffff8008bb2000 - 0xffffff8008c3e400   (   561 KB)
[    0.000000]     vmemmap : 0xffffffbe00000000 - 0xffffffbfc0000000   (     7 GB maximum)
[    0.000000]               0xffffffbe00000000 - 0xffffffbe00e00000   (    14 MB actual)
[    0.000000]     fixed   : 0xffffffbffe7fd000 - 0xffffffbffec00000   (  4108 KB)
[    0.000000]     PCI I/O : 0xffffffbffee00000 - 0xffffffbfffe00000   (    16 MB)
[    0.000000]     memory  : 0xffffffc000000000 - 0xffffffc040000000   (  1024 MB)
[    0.000000] Hierarchical RCU implementation.
[    0.000000]  Build-time adjustment of leaf fanout to 64.
[    0.000000]  RCU restricting CPUs from NR_CPUS=8 to nr_cpu_ids=4.
[    0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=64, nr_cpu_ids=4
[    0.000000] NR_IRQS:64 nr_irqs:64 0
(XEN) d0v0: vGICD: unhandled word write 0xffffffff to ICACTIVER4
(XEN) d0v0: vGICD: unhandled word write 0xffffffff to ICACTIVER8
(XEN) d0v0: vGICD: unhandled word write 0xffffffff to ICACTIVER12
(XEN) d0v0: vGICD: unhandled word write 0xffffffff to ICACTIVER16
(XEN) d0v0: vGICD: unhandled word write 0xffffffff to ICACTIVER20
(XEN) d0v0: vGICD: unhandled word write 0xffffffff to ICACTIVER0
[    0.000000] Architected cp15 timer(s) running at 99.99MHz (virt).
[    0.000000] clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0x171015c6fb, max_idle_ns: 440795203665 ns
[    0.000004] sched_clock: 56 bits at 99MHz, resolution 10ns, wraps every 4398046511101ns
[    0.008283] Console: colour dummy device 80x25
[    0.012378] console [hvc0] enabled
[    0.012378] console [hvc0] enabled
[    0.019210] bootconsole [xen0] disabled
[    0.019210] bootconsole [xen0] disabled
[    0.026913] Calibrating delay loop (skipped), value calculated using timer frequency.. 199.99 BogoMIPS (lpj=399995)
[    0.037368] pid_max: default: 32768 minimum: 301
[    0.042142] Mount-cache hash table entries: 2048 (order: 2, 16384 bytes)
[    0.048789] Mountpoint-cache hash table entries: 2048 (order: 2, 16384 bytes)
[    0.056540] ASID allocator initialised with 65536 entries
[    0.061531] xen:grant_table: Grant tables using version 1 layout
[    0.067507] Grant table initialized
[    0.071038] xen:events: Using FIFO-based ABI
[    0.075349] Xen: initializing cpu0
[    0.079398] zynqmp_plat_init: pm node not found
[    0.083458] EFI services will not be available.
[    0.088063] Brought up 1 CPUs
[    0.090993] SMP: Total of 1 processors activated.
[    0.095764] CPU: All CPU(s) started at EL1
[    0.099919] alternatives: patching kernel code
[    0.104879] devtmpfs: initialized
[    0.111828] DMI not present or invalid.
[    0.111948] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
[    0.124339] xor: measuring software checksum speed
[    0.166394]    8regs     :  2177.000 MB/sec
[    0.206457]    8regs_prefetch:  1934.000 MB/sec
[    0.246520]    32regs    :  2412.000 MB/sec
[    0.286585]    32regs_prefetch:  2178.000 MB/sec
[    0.286626] xor: using function: 32regs (2412.000 MB/sec)
[    0.292168] NET: Registered protocol family 16
[    0.296515] vdso: 2 pages (1 code @ ffffff8008876000, 1 data @ ffffff8008bb8000)
[    0.303066] hw-breakpoint: found 6 breakpoint and 4 watchpoint registers.
[    0.310705] DMA: preallocated 256 KiB pool for atomic allocations
[    0.316082] xen:swiotlb_xen: Warning: only able to allocate 4 MB for software IO TLB
[    0.324493] software IO TLB [mem 0x72000000-0x72400000] (4MB) mapped at [ffffffc032000000-ffffffc0323fffff]
[    0.334331] ARM CCI_400_r1 PMU driver probed
[    0.337730] /amba/dma@fd500000: could not find phandle
[    0.343186] /amba/dma@fd510000: could not find phandle
[    0.348369] /amba/dma@fd520000: could not find phandle
[    0.353556] /amba/dma@fd530000: could not find phandle
[    0.358747] /amba/dma@fd540000: could not find phandle
[    0.363941] /amba/dma@fd550000: could not find phandle
[    0.369132] /amba/dma@fd560000: could not find phandle
[    0.374317] /amba/dma@fd570000: could not find phandle
[    0.379625] /amba/dma@ffa80000: could not find phandle
[    0.384697] /amba/dma@ffa90000: could not find phandle
[    0.389889] /amba/dma@ffaa0000: could not find phandle
[    0.395077] /amba/dma@ffab0000: could not find phandle
[    0.400272] /amba/dma@ffac0000: could not find phandle
[    0.405459] /amba/dma@ffad0000: could not find phandle
[    0.410651] /amba/dma@ffae0000: could not find phandle
[    0.415840] /amba/dma@ffaf0000: could not find phandle
[    0.421124] /amba/ethernet@ff0e0000: could not find phandle
[    0.428927] /amba/spi@ff0f0000: could not find phandle
[    0.432103] /amba/sdhci@ff160000: could not find phandle
[    0.437227] /amba/sdhci@ff170000: could not find phandle
[    0.442640] /amba/usb@fe200000: could not find phandle
[    0.467039] HugeTLB registered 2 MB page size, pre-allocated 0 pages
[    0.536076] raid6: int64x1  gen()   409 MB/s
[    0.604165] raid6: int64x1  xor()   432 MB/s
[    0.672300] raid6: int64x2  gen()   617 MB/s
[    0.740425] raid6: int64x2  xor()   555 MB/s
[    0.808549] raid6: int64x4  gen()   907 MB/s
[    0.876699] raid6: int64x4  xor()   685 MB/s
[    0.944824] raid6: int64x8  gen()  1056 MB/s
[    1.012954] raid6: int64x8  xor()   683 MB/s
[    1.081060] raid6: neonx1   gen()   672 MB/s
[    1.149185] raid6: neonx1   xor()   702 MB/s
[    1.217308] raid6: neonx2   gen()  1044 MB/s
[    1.285429] raid6: neonx2   xor()   925 MB/s
[    1.353553] raid6: neonx4   gen()  1385 MB/s
[    1.421680] raid6: neonx4   xor()  1079 MB/s
[    1.489788] raid6: neonx8   gen()  1556 MB/s
[    1.557921] raid6: neonx8   xor()  1131 MB/s
[    1.557959] raid6: using algorithm neonx8 gen() 1556 MB/s
[    1.562095] raid6: .... xor() 1131 MB/s, rmw enabled
[    1.567115] raid6: using intx1 recovery algorithm
[    1.572102] ACPI: Interpreter disabled.
[    1.575793] xen:balloon: Initialising balloon driver
[    1.581095] xen_balloon: Initialising balloon driver
[    1.586511] SCSI subsystem initialized
[    1.589786] usbcore: registered new interface driver usbfs
[    1.595180] usbcore: registered new interface driver hub
[    1.600536] usbcore: registered new device driver usb
[    1.605689] media: Linux media interface: v0.10
[    1.610221] Linux video capture interface: v2.00
[    1.614892] pps_core: LinuxPPS API ver. 1 registered
[    1.619881] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@linux.it>
[    1.629066] PTP clock support registered
[    1.633072] EDAC MC: Ver: 3.0.0
[    1.637484] dmi: Firmware registration failed.
[    1.640780] FPGA manager framework
[    1.644238] Advanced Linux Sound Architecture Driver Initialized.
[    1.651754] clocksource: Switched to clocksource arch_sys_counter
[    1.656580] VFS: Disk quotas dquot_6.6.0
[    1.660496] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
[    1.667589] pnp: PnP ACPI: disabled
[    1.677675] NET: Registered protocol family 2
[    1.678071] TCP established hash table entries: 8192 (order: 4, 65536 bytes)
[    1.683679] TCP bind hash table entries: 8192 (order: 5, 131072 bytes)
[    1.690302] TCP: Hash tables configured (established 8192 bind 8192)
[    1.696611] UDP hash table entries: 512 (order: 2, 16384 bytes)
[    1.702559] UDP-Lite hash table entries: 512 (order: 2, 16384 bytes)
[    1.709133] NET: Registered protocol family 1
[    1.714098] RPC: Registered named UNIX socket transport module.
[    1.719317] RPC: Registered udp transport module.
[    1.724068] RPC: Registered tcp transport module.
[    1.728824] RPC: Registered tcp NFSv4.1 backchannel transport module.
[    1.735457] Trying to unpack rootfs image as initramfs...
[    1.829214] Freeing initrd memory: 32064K (ffffffc008200000 - ffffffc00a150000)
[    1.831842] futex hash table entries: 1024 (order: 5, 131072 bytes)
[    1.837389] audit: initializing netlink subsys (disabled)
[    1.842787] audit: type=2000 audit(1.716:1): initialized
[    1.849222] workingset: timestamp_bits=60 max_order=18 bucket_order=0
[    1.855367] NFS: Registering the id_resolver key type
[    1.859755] Key type id_resolver registered
[    1.863948] Key type id_legacy registered
[    1.868030] nfs4filelayout_init: NFSv4 File Layout Driver Registering...
[    1.874796] jffs2: version 2.2. (NAND) (SUMMARY)  © 2001-2006 Red Hat, Inc.
[    1.890491] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 247)
[    1.892369] io scheduler noop registered
[    1.896340] io scheduler deadline registered
[    1.900675] io scheduler cfq registered (default)
[    1.905914] nwl-pcie fd0e0000.pcie: Link is DOWN
[    1.910115] PCI host bridge /amba/pcie@fd0e0000 ranges:
[    1.915351]   No bus range found for /amba/pcie@fd0e0000, using [bus 00-ff]
[    1.922366]   MEM 0xe0000000..0xefffffff -> 0xe0000000
[    1.927553]   MEM 0x600000000..0x7ffffffff -> 0x600000000
[    1.933080] nwl-pcie fd0e0000.pcie: PCI host bridge to bus 0000:00
[    1.939227] pci_bus 0000:00: root bus resource [bus 00-ff]
[    1.944760] pci_bus 0000:00: root bus resource [mem 0xe0000000-0xefffffff]
[    1.951683] pci_bus 0000:00: root bus resource [mem 0x600000000-0x7ffffffff pref]
(XEN) physdev.c:17:d0v0 PHYSDEVOP cmd=25: not implemented
(XEN) physdev.c:17:d0v0 PHYSDEVOP cmd=15: not implemented
[    1.969420] pci 0000:00:00.0: Failed to add - passthrough or MSI/MSI-X might fail!
[    1.977127] pci 0000:00:00.0: PCI bridge to [bus 01-0c]
[    1.983420] xilinx-dpdma fd4c0000.dma: Xilinx DPDMA engine is probed
[    1.988985] xilinx-zynqmp-dma fd500000.dma: ZynqMP DMA driver Probe success
[    1.995839] xilinx-zynqmp-dma fd510000.dma: ZynqMP DMA driver Probe success
[    2.002835] xilinx-zynqmp-dma fd520000.dma: ZynqMP DMA driver Probe success
[    2.009841] xilinx-zynqmp-dma fd530000.dma: ZynqMP DMA driver Probe success
[    2.016840] xilinx-zynqmp-dma fd540000.dma: ZynqMP DMA driver Probe success
[    2.023846] xilinx-zynqmp-dma fd550000.dma: ZynqMP DMA driver Probe success
[    2.030856] xilinx-zynqmp-dma fd560000.dma: ZynqMP DMA driver Probe success
[    2.037865] xilinx-zynqmp-dma fd570000.dma: ZynqMP DMA driver Probe success
[    2.044871] xilinx-zynqmp-dma ffa80000.dma: ZynqMP DMA driver Probe success
[    2.051878] xilinx-zynqmp-dma ffa90000.dma: ZynqMP DMA driver Probe success
[    2.058878] xilinx-zynqmp-dma ffaa0000.dma: ZynqMP DMA driver Probe success
[    2.065893] xilinx-zynqmp-dma ffab0000.dma: ZynqMP DMA driver Probe success
[    2.072894] xilinx-zynqmp-dma ffac0000.dma: ZynqMP DMA driver Probe success
[    2.079901] xilinx-zynqmp-dma ffad0000.dma: ZynqMP DMA driver Probe success
[    2.086912] xilinx-zynqmp-dma ffae0000.dma: ZynqMP DMA driver Probe success
[    2.093913] xilinx-zynqmp-dma ffaf0000.dma: ZynqMP DMA driver Probe success
[    2.102033] xen:xen_evtchn: Event-channel device installed
[    2.153189] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
[    2.155328] ff010000.serial: ttyPS0 at MMIO 0xff010000 (irq = 215, base_baud = 6250000) is a xuartps
[    2.164053] [drm] Initialized drm 1.1.0 20060810
[    2.168098] Unable to detect cache hierarchy from DT for CPU 0
[    2.178894] brd: module loaded
[    2.182529] loop: module loaded
[    2.182577] Invalid max_queues (4), will use default max: 1.
[    2.186597] ahci-ceva fd0c0000.ahci: AHCI 0001.0301 32 slots 2 ports 6 Gbps 0x3 impl platform mode
[    2.194860] ahci-ceva fd0c0000.ahci: flags: 64bit ncq sntf pm clo only pmp fbs pio slum part ccc sds apst 
[    2.205933] scsi host0: ahci-ceva
[    2.208250] scsi host1: ahci-ceva
[    2.211373] ata1: SATA max UDMA/133 mmio [mem 0xfd0c0000-0xfd0c1fff] port 0x100 irq 212
[    2.219333] ata2: SATA max UDMA/133 mmio [mem 0xfd0c0000-0xfd0c1fff] port 0x180 irq 212
[    2.227508] mtdoops: mtd device (mtddev=name/number) must be supplied
[    2.234264] zynqmp-qspi ff0f0000.spi: rx bus width not found
[    2.240356] m25p80 spi0.0: found n25q512a, expected m25p80
[    2.245312] m25p80 spi0.0: n25q512a (131072 Kbytes)
[    2.250055] 4 ofpart partitions found on MTD device spi0.0
[    2.255569] Creating 4 MTD partitions on "spi0.0":
[    2.260416] 0x000000000000-0x000001e00000 : "boot"
[    2.265778] 0x000001e00000-0x000001e40000 : "bootenv"
[    2.270946] 0x000001e40000-0x000004240000 : "kernel"
[    2.275954] 0x000004240000-0x000008000000 : "spare"
[    2.282358] libphy: Fixed MDIO Bus: probed
[    2.284917] tun: Universal TUN/TAP device driver, 1.6
[    2.289566] tun: (C) 1999-2004 Max Krasnyansky <maxk@qualcomm.com>
[    2.299396] CAN device driver interface
[    2.300542] gpiod_set_value: invalid GPIO
[    2.304151] libphy: MACB_mii_bus: probed
[    2.311247] macb ff0e0000.ethernet eth0: Cadence GEM rev 0x50070106 at 0xff0e0000 irq 25 (00:0a:35:00:22:01)
[    2.317621] TI DP83867 ff0e0000.etherne:0c: attached PHY driver [TI DP83867] (mii_bus:phy_addr=ff0e0000.etherne:0c, irq=-1)
[    2.329274] xen_netfront: Initialising Xen virtual ethernet driver
[    2.335065] usbcore: registered new interface driver asix
[    2.340460] usbcore: registered new interface driver ax88179_178a
[    2.346595] usbcore: registered new interface driver cdc_ether
[    2.352474] usbcore: registered new interface driver net1080
[    2.358185] usbcore: registered new interface driver cdc_subset
[    2.364152] usbcore: registered new interface driver zaurus
[    2.369795] usbcore: registered new interface driver cdc_ncm
[    2.377082] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
[    2.382051] ehci-pci: EHCI PCI platform driver
[    2.386802] xhci-hcd xhci-hcd.0.auto: xHCI Host Controller
[    2.392085] xhci-hcd xhci-hcd.0.auto: new USB bus registered, assigned bus number 1
[    2.400713] xhci-hcd xhci-hcd.0.auto: hcc params 0x0238f625 hci version 0x100 quirks 0x00010010
[    2.408562] xhci-hcd xhci-hcd.0.auto: irq 220, io mem 0xfe200000
[    2.414782] usb usb1: New USB device found, idVendor=1d6b, idProduct=0002
[    2.421402] usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[    2.428660] usb usb1: Product: xHCI Host Controller
[    2.433588] usb usb1: Manufacturer: Linux 4.6.0-xilinx xhci-hcd
[    2.439552] usb usb1: SerialNumber: xhci-hcd.0.auto
[    2.444783] hub 1-0:1.0: USB hub found
[    2.448405] hub 1-0:1.0: 1 port detected
[    2.452393] xhci-hcd xhci-hcd.0.auto: xHCI Host Controller
[    2.457823] xhci-hcd xhci-hcd.0.auto: new USB bus registered, assigned bus number 2
[    2.465551] usb usb2: We don't know the algorithms for LPM for this host, disabling LPM.
[    2.473724] usb usb2: New USB device found, idVendor=1d6b, idProduct=0003
[    2.480476] usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[    2.487734] usb usb2: Product: xHCI Host Controller
[    2.492663] usb usb2: Manufacturer: Linux 4.6.0-xilinx xhci-hcd
[    2.498633] usb usb2: SerialNumber: xhci-hcd.0.auto
[    2.503830] hub 2-0:1.0: USB hub found
[    2.507440] hub 2-0:1.0: 1 port detected
[    2.511533] usbcore: registered new interface driver uas
[    2.516782] usbcore: registered new interface driver usb-storage
[    2.523153] mousedev: PS/2 mouse device common for all mice
[    2.528935] rtc_zynqmp ffa60000.rtc: rtc core: registered ffa60000.rtc as rtc0
[    2.535707] i2c /dev entries driver
[    2.541444] pca953x 0-0020: interrupt support not compiled in
[    2.546795] pca953x 0-0021: interrupt support not compiled in
[    2.550859] cdns-i2c ff020000.i2c: 400 kHz mmio ff020000 irq 201
[    2.557035] ata2: SATA link down (SStatus 0 SControl 330)
[    2.562335] ata1: SATA link down (SStatus 0 SControl 330)
[    2.568009] cdns-i2c ff030000.i2c: 400 kHz mmio ff030000 irq 202
[    2.574769] i2c i2c-0: Added multiplexed i2c bus 2
[    2.579511] i2c i2c-0: Added multiplexed i2c bus 3
[    2.584506] i2c i2c-0: Added multiplexed i2c bus 4
[    2.588941] i2c i2c-0: Added multiplexed i2c bus 5
[    2.593191] pca954x 0-0075: registered 4 multiplexed busses for I2C mux pca9544
[    2.600992] at24 6-0054: 1024 byte 24c08 EEPROM, writable, 1 bytes/write
[    2.607297] i2c i2c-6: Failed to register i2c client 24c08 at 0x55 (-16)
[    2.614037] i2c i2c-6: of_i2c: Failure registering /amba/i2c@ff030000/i2cswitch@74/i2c@0/eeprom@55
[    2.623032] i2c i2c-6: Failed to register i2c client 24c08 at 0x56 (-16)
[    2.629773] i2c i2c-6: of_i2c: Failure registering /amba/i2c@ff030000/i2cswitch@74/i2c@0/eeprom@56
[    2.638775] i2c i2c-6: Failed to register i2c client 24c08 at 0x57 (-16)
[    2.645517] i2c i2c-6: of_i2c: Failure registering /amba/i2c@ff030000/i2cswitch@74/i2c@0/eeprom@57
[    2.654520] i2c i2c-1: Added multiplexed i2c bus 6
[    2.659789] i2c i2c-1: Added multiplexed i2c bus 7
[    2.664473] i2c i2c-1: Added multiplexed i2c bus 8
[    2.669330] i2c i2c-1: Added multiplexed i2c bus 9
[    2.674164] i2c i2c-1: Added multiplexed i2c bus 10
[    2.679045] i2c i2c-1: Added multiplexed i2c bus 11
[    2.683945] i2c i2c-1: Added multiplexed i2c bus 12
[    2.688875] i2c i2c-1: Added multiplexed i2c bus 13
[    2.693617] pca954x 1-0074: registered 8 multiplexed busses for I2C switch pca9548
[    2.701443] i2c i2c-1: Added multiplexed i2c bus 14
[    2.706331] i2c i2c-1: Added multiplexed i2c bus 15
[    2.711290] i2c i2c-1: Added multiplexed i2c bus 16
[    2.716428] i2c i2c-1: Added multiplexed i2c bus 17
[    2.721284] i2c i2c-1: Added multiplexed i2c bus 18
[    2.726075] i2c i2c-1: Added multiplexed i2c bus 19
[    2.731025] i2c i2c-1: Added multiplexed i2c bus 20
[    2.735935] i2c i2c-1: Added multiplexed i2c bus 21
[    2.740674] pca954x 1-0075: registered 8 multiplexed busses for I2C switch pca9548
[    2.749935] ina2xx 2-0040: power monitor ina226 (Rshunt = 5000 uOhm)
[    2.755025] ina2xx 2-0041: power monitor ina226 (Rshunt = 5000 uOhm)
[    2.761436] ina2xx 2-0042: power monitor ina226 (Rshunt = 5000 uOhm)
[    2.767820] ina2xx 2-0043: power monitor ina226 (Rshunt = 5000 uOhm)
[    2.774230] ina2xx 2-0044: power monitor ina226 (Rshunt = 5000 uOhm)
[    2.780623] ina2xx 2-0045: power monitor ina226 (Rshunt = 5000 uOhm)
[    2.787028] ina2xx 2-0046: power monitor ina226 (Rshunt = 5000 uOhm)
[    2.793432] ina2xx 2-0047: power monitor ina226 (Rshunt = 5000 uOhm)
[    2.799835] ina2xx 2-004a: power monitor ina226 (Rshunt = 5000 uOhm)
[    2.806230] ina2xx 2-004b: power monitor ina226 (Rshunt = 5000 uOhm)
[    2.812724] ina2xx 3-0040: power monitor ina226 (Rshunt = 2000 uOhm)
[    2.819033] ina2xx 3-0041: power monitor ina226 (Rshunt = 5000 uOhm)
[    2.825438] ina2xx 3-0042: power monitor ina226 (Rshunt = 5000 uOhm)
[    2.831835] ina2xx 3-0043: power monitor ina226 (Rshunt = 5000 uOhm)
[    2.838245] ina2xx 3-0044: power monitor ina226 (Rshunt = 5000 uOhm)
[    2.844638] ina2xx 3-0045: power monitor ina226 (Rshunt = 5000 uOhm)
[    2.851038] ina2xx 3-0046: power monitor ina226 (Rshunt = 5000 uOhm)
[    2.857443] ina2xx 3-0047: power monitor ina226 (Rshunt = 5000 uOhm)
[    2.893958] EDAC MC: ECC not enabled
[    2.894244] EDAC DEVICE0: Giving out device to module zynqmp-ocm-edac controller zynqmp_ocm: DEV ff960000.memory-controller (INTERRUPT)
[    2.904297] sdhci: Secure Digital Host Controller Interface driver
[    2.910419] sdhci: Copyright(c) Pierre Ossman
[    2.914835] sdhci-pltfm: SDHCI platform and OF driver helper
[    2.920704] sdhci-arasan ff160000.sdhci: "xlnx,mio_bank " property is missing.
[    2.927819] sdhci-arasan: probe of ff160000.sdhci failed with error -22
[    2.971743] mmc0: SDHCI controller on ff170000.sdhci [ff170000.sdhci] using ADMA 64-bit
[    2.974342] ledtrig-cpu: registered to indicate activity on CPUs
[    2.980386] usbcore: registered new interface driver usbhid
[    2.985864] usbhid: USB HID core driver
[    3.006812] si570 8-005d: registered, current frequency 300000000 Hz
[    3.031405] si570 9-005d: registered, current frequency 148500000 Hz
[    3.032429] fpga_manager fpga0: Xilinx ZynqMp FPGA Manager registered
[    3.039397] pktgen: Packet Generator for packet performance testing. Version: 2.75
[    3.046731] Initializing XFRM netlink socket
[    3.050686] NET: Registered protocol family 10
[    3.057639] sit: IPv6 over IPv4 tunneling driver
[    3.060176] NET: Registered protocol family 17
[    3.064309] NET: Registered protocol family 15
[    3.068796] can: controller area network core (rev 20120528 abi 9)
[    3.075050] NET: Registered protocol family 29
[    3.079530] can: raw protocol (rev 20120528)
[    3.083847] can: broadcast manager protocol (rev 20120528 t)
[    3.089559] can: netlink gateway (rev 20130117) max_hops=1
[    3.095259] 9pnet: Installing 9P2000 support
[    3.099434] Key type dns_resolver registered
[    3.106404] registered taskstats version 1
[    3.108383] Btrfs loaded
[    3.114762] rtc_zynqmp ffa60000.rtc: setting system clock to 1986-12-22 00:56:59 UTC (535597019)
[    3.119456] ALSA device list:
[    3.122346]   No soundcards found.
[    3.129359] Freeing unused kernel memory: 460K (ffffff8008b3f000 - ffffff8008bb2000)
INIT: version 2.88 booting
[    4.093690] random: dd urandom read with 7 bits of entropy available
Fri Dec 16 01:23:11 UTC 2016
Starting internet superserver: inetd.
INIT: Entering runlevel: 5
Configuring network interfaces... [    4.193262] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
udhcpc (v1.24.1) started
Sending discover...
[    6.308058] macb ff0e0000.ethernet eth0: link up (1000/Full)
[    6.308191] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Sending discover...
Sending select for 10.0.0.191...
Lease of 10.0.0.191 obtained, lease time 604800
/etc/udhcpc.d/50default: Adding DNS 75.75.75.75
/etc/udhcpc.d/50default: Adding DNS 75.75.76.76
done.
Starting OpenBSD Secure Shell server: sshd
  generating ssh RSA key...
  generating ssh ECDSA key...
  generating ssh DSA key...
  generating ssh ED25519 key...
done.
Starting syslogd/klogd: done
Starting /usr/sbin/xenstored....
Setting domain 0 name, domid and JSON config...
libxl: error: libxl.c:689:libxl_domain_info: getting domain info list: Permission denied
libxl: error: libxl_dom.c:1127:libxl__userdata_path: unable to find domain info for domain 0: Permission denied
cannot store stub json config for domain 0
Starting xenconsoled...
Starting QEMU as disk backend for dom0
/etc/rc5.d/S80xencommons: line 102: /usr/bin/qemu-system-i386: No such file or directory
Starting domain watchdog daemon: xenwatchdogd startup


PetaLinux 2016.4 plnx_aarch64 /dev/hvc0

plnx_aarch64 login: [   50.558969] random: nonblocking pool is initialized
(XEN) 
(XEN) TBM Jan 25 2017 12:45:38 running on xen-guest_irq_latency
(XEN) MIDR=410fd034
(XEN) D$ L1 32KB linesize=64b ways=4 sets=128
(XEN) I$ L1 32KB linesize=64b ways=2 sets=256
(XEN) D$ L2 1024KB linesize=64b ways=16 sets=1024
(XEN) I$ L2 1024KB linesize=64b ways=16 sets=1024
(XEN) plat_init_a64
(XEN) EL1 I and D caches enabled
(XEN) RDO: enable MMU
(XEN) RDO: Still here? EL1 S1 MMU is on
(XEN) CNTFRQ=99998999 Hz k=10
(XEN) latency (ns): max=10180 warm_max=0 min=10180 avg=10180
(XEN) latency (ns): max=10180 warm_max=0 min=8470 avg=9967
(XEN) latency (ns): max=10180 warm_max=8150 min=8150 avg=9740
(XEN) latency (ns): max=10180 warm_max=8150 min=8050 avg=9529
(XEN) latency (ns): max=10180 warm_max=8150 min=8050 avg=9357
(XEN) latency (ns): max=10180 warm_max=8150 min=8030 avg=9192
(XEN) latency (ns): max=10180 warm_max=8150 min=8030 avg=9058
(XEN) latency (ns): max=10180 warm_max=8150 min=8020 avg=8929
(XEN) latency (ns): max=10180 warm_max=8150 min=8020 avg=8832
(XEN) latency (ns): max=10180 warm_max=8150 min=7840 avg=8708
(XEN) latency (ns): max=10180 warm_max=8150 min=7840 avg=8617
(XEN) latency (ns): max=10180 warm_max=8150 min=7840 avg=8552
(XEN) latency (ns): max=10180 warm_max=8150 min=7840 avg=8491
(XEN) latency (ns): max=10180 warm_max=8150 min=7840 avg=8440
(XEN) latency (ns): max=10180 warm_max=8150 min=7840 avg=8395
(XEN) latency (ns): max=10180 warm_max=8150 min=7840 avg=8346
(XEN) latency (ns): max=10180 warm_max=8150 min=7840 avg=8304
(XEN) latency (ns): max=10180 warm_max=8150 min=7840 avg=8270
(XEN) latency (ns): max=10180 warm_max=8150 min=7840 avg=8252
(XEN) latency (ns): max=10180 warm_max=8150 min=7840 avg=8215
(XEN) latency (ns): max=10180 warm_max=8150 min=7840 avg=8196
(XEN) latency (ns): max=10180 warm_max=8150 min=7840 avg=8176
(XEN) latency (ns): max=10180 warm_max=8210 min=7840 avg=8180
(XEN) latency (ns): max=10180 warm_max=8210 min=7840 avg=8173
(XEN) latency (ns): max=10180 warm_max=8210 min=7840 avg=8159
(XEN) latency (ns): max=10180 warm_max=8210 min=7840 avg=8157
(XEN) latency (ns): max=10180 warm_max=8210 min=7840 avg=8144
(XEN) latency (ns): max=10180 warm_max=8210 min=7840 avg=8119
(XEN) latency (ns): max=10180 warm_max=8210 min=7840 avg=8127
(XEN) latency (ns): max=10180 warm_max=8210 min=7840 avg=8113
(XEN) latency (ns): max=10180 warm_max=8210 min=7840 avg=8098
(XEN) latency (ns): max=10180 warm_max=8210 min=7840 avg=8077
(XEN) latency (ns): max=10180 warm_max=8210 min=7840 avg=8068
(XEN) latency (ns): max=10180 warm_max=8210 min=7840 avg=8056
(XEN) latency (ns): max=10180 warm_max=8210 min=7840 avg=8051
(XEN) latency (ns): max=10180 warm_max=8210 min=7840 avg=8054
(XEN) latency (ns): max=10180 warm_max=8210 min=7840 avg=8068
(XEN) latency (ns): max=10180 warm_max=8210 min=7840 avg=8061
(XEN) sched_smt_power_savings: disabled
(XEN) NOW=83505753225
(XEN) Online Cpus: 0-3
(XEN) Cpupool 0:
(XEN) Cpus: 0-3
(XEN) Scheduler: SMP Credit Scheduler rev2 (credit2)
(XEN) Active queues: 1
(XEN)   default-weight     = 256
(XEN) Runqueue 0:
(XEN)   ncpus              = 4
(XEN)   cpus               = 0-3
(XEN)   max_weight         = 256
(XEN)   instload           = 1
(XEN)   aveload            = 3208 (~1%)
(XEN) l(XEN)    idlers: 00000000,00000000,00000000,0000000a
a(XEN)  tickled: 00000000,00000000,00000000,00000000
t(XEN)  fully idle cores: 00000000,00000000,00000000,0000000a
e(XEN) Domain info:
n(XEN)  Domain: 0 w 256 v 4
c(XEN)    1: [0.0] flags=2 cpu=0 credit=10500000 [w=256] load=3170 (~1%)
(XEN)     2: y[0.1] flags=0 cpu=1  credit=10500000 [w=256]( load=131072 (~50%)
(XEN)     3: n[0.2] flags=0 cpu=2s credit=10500000 [w=256]) load=131072 (~50%):
(XEN)     4:  [0.3] flags=0 cpu=3m credit=10500000 [w=256]a load=131072 (~50%)x
=(XEN)  Domain: 1 w 256 v 1
1(XEN)    5: 1[1.0] flags=2 cpu=2 credit=9713074 [w=256] load=56 (~0%)
(XEN) Runqueue info:
6(XEN) runqueue 0:
9(XEN) CPUs info:
0(XEN) CPU[00]   runq=0, sibling=00000000,00000000,00000000,00000001, wcore=00000000,00000000,00000000,00000001
a(XEN)  run: [0.0] flags=2 cpu=0 credit=10500000 [w=256] load=3170 (~1%)
(XEN) CPU[01] r runq=0, sibling=00000000,00000000,00000000,00000002, mcore=00000000,00000000,00000000,00000002
(XEN) CPU[02] _ runq=0, sibling=00000000,00000000,00000000,00000004, mcore=00000000,00000000,00000000,00000004
(XEN)   run: [1.0] flags=2 cpu=2 credit=2915518 [w=256] load=56 (~0%)
(XEN) CPU[03] a runq=0, sibling=00000000,00000000,00000000,00000008, xcore=00000000,00000000,00000000,00000008
=11690 min=7840 avg=8514
(XEN) latency (ns): max=11690 warm_max=11690 min=7840 avg=8494
(XEN) latency (ns): max=11690 warm_max=11690 min=7840 avg=8444
(XEN) latency (ns): max=11690 warm_max=11690 min=7840 avg=8401
(XEN) latency (ns): max=11690 warm_max=11690 min=7840 avg=8346
(XEN) latency (ns): max=11690 warm_max=11690 min=7840 avg=8664
(XEN) latency (ns): max=11690 warm_max=11690 min=7840 avg=8565
(XEN) latency (ns): max=11690 warm_max=11690 min=7840 avg=8497
(XEN) latency (ns): max=11690 warm_max=11690 min=7840 avg=8432
(XEN) latency (ns): max=11690 warm_max=11690 min=7840 avg=8388
(XEN) latency (ns): max=11690 warm_max=11690 min=7840 avg=8349
(XEN) latency (ns): max=11690 warm_max=11690 min=7840 avg=8305
(XEN) latency (ns): max=11690 warm_max=11690 min=7840 avg=8280
(XEN) latency (ns): max=11690 warm_max=11690 min=7840 avg=8237
(XEN) latency (ns): max=11690 warm_max=11690 min=7840 avg=8203
(XEN) latency (ns): max=11690 warm_max=11690 min=7840 avg=8197
(XEN) latency (ns): max=11690 warm_max=11690 min=7840 avg=8187
(XEN) latency (ns): max=11690 warm_max=11690 min=7840 avg=8154
(XEN) latency (ns): max=11690 warm_max=11690 min=7840 avg=8150
(XEN) latency (ns): max=11690 warm_max=11690 min=7840 avg=8144
(XEN) latency (ns): max=11690 warm_max=11690 min=7840 avg=8136
(XEN) latency (ns): max=11690 warm_max=11690 min=7840 avg=8111
(XEN) latency (ns): max=11690 warm_max=11690 min=7840 avg=8109
(XEN) latency (ns): max=11690 warm_max=11690 min=7840 avg=8116


[-- Attachment #4: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Xen on ARM IRQ latency and scheduler overhead
  2017-02-16 23:07         ` Stefano Stabellini
@ 2017-02-17 11:02           ` Dario Faggioli
  2017-02-17 19:34             ` Julien Grall
  0 siblings, 1 reply; 19+ messages in thread
From: Dario Faggioli @ 2017-02-17 11:02 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: george.dunlap, edgar.iglesias, julien.grall, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 2347 bytes --]

Just very quickly...

On Thu, 2017-02-16 at 15:07 -0800, Stefano Stabellini wrote:
> (XEN) Active queues: 1
> (XEN)   default-weight     = 256
> (XEN) Runqueue 0:
> (XEN)   ncpus              = 4
> (XEN)   cpus               = 0-3
> (XEN)   max_weight         = 256
> (XEN)   instload           = 1
> (XEN)   aveload            = 3208 (~1%)
> (XEN) l(XEN)    idlers: 00000000,00000000,00000000,0000000a
> a(XEN)  tickled: 00000000,00000000,00000000,00000000
> t(XEN)  fully idle cores: 00000000,00000000,00000000,0000000a
> e(XEN) Domain info:
> n(XEN)  Domain: 0 w 256 v 4
> c(XEN)    1: [0.0] flags=2 cpu=0 credit=10500000 [w=256] load=3170 (~1%)
> (XEN)     2: y[0.1] flags=0 cpu=1  credit=10500000 [w=256]( load=131072 (~50%)
> (XEN)     3: n[0.2] flags=0 cpu=2s credit=10500000 [w=256]) load=131072 (~50%):
> (XEN)     4:  [0.3] flags=0 cpu=3m credit=10500000 [w=256]a load=131072 (~50%)x
>
Status of vcpus 2, 3 and 4 is a bit weird. I'll think about it.

> =(XEN)  Domain: 1 w 256 v 1
> 1(XEN)    5: 1[1.0] flags=2 cpu=2 credit=9713074 [w=256] load=56 (~0%)
> (XEN) Runqueue info:
> 6(XEN) runqueue 0:
> 9(XEN) CPUs info:
> 0(XEN) CPU[00]   runq=0, sibling=00000000,00000000,00000000,00000001, wcore=00000000,00000000,00000000,00000001
>
This tells me that nr_cpu_ids is very big (I think it tells it is 128,
i.e., ARM default), which means cpumask_t-s are huge.

What does `xl info' says. On my (x86) test box, it's like this:

 ...
 nr_cpus                : 16
 max_cpu_id             : 63
 ...

(and I have NR_CPUS=256, i.e., x86 the default).

Cpumasks being bigger also means cpumask operation being slower, and
this matters quite a bit in Credit2, because we use cpumasks a lot (but
also in Credit1, because we use cpumasks a little less than in Credit2,
but still quite a bit).

Isn't there a way, on ARM, to figure out online that you're not going
to have 128 cpus in the platform?

Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Xen on ARM IRQ latency and scheduler overhead
  2017-02-10  0:54 Xen on ARM IRQ latency and scheduler overhead Stefano Stabellini
  2017-02-10  8:40 ` Dario Faggioli
@ 2017-02-17 18:40 ` Dario Faggioli
  2017-02-17 19:44   ` Julien Grall
  2017-02-18  0:41   ` Stefano Stabellini
  1 sibling, 2 replies; 19+ messages in thread
From: Dario Faggioli @ 2017-02-17 18:40 UTC (permalink / raw)
  To: Stefano Stabellini, xen-devel; +Cc: george.dunlap, edgar.iglesias, julien.grall


[-- Attachment #1.1: Type: text/plain, Size: 4169 bytes --]

On Thu, 2017-02-09 at 16:54 -0800, Stefano Stabellini wrote:
> These are the results, in nanosec:
> 
>                         AVG     MIN     MAX     WARM MAX
> 
> NODEBUG no WFI          1890    1800    3170    2070
> NODEBUG WFI             4850    4810    7030    4980
> NODEBUG no WFI credit2  2217    2090    3420    2650
> NODEBUG WFI credit2     8080    7890    10320   8300
> 
> DEBUG no WFI            2252    2080    3320    2650
> DEBUG WFI               6500    6140    8520    8130
> DEBUG WFI, credit2      8050    7870    10680   8450
> 
> As you can see, depending on whether the guest issues a WFI or not
> while
> waiting for interrupts, the results change significantly.
> Interestingly,
> credit2 does worse than credit1 in this area.
> 
I did some measuring myself, on x86, with different tools. So,
cyclictest is basically something very very similar to the app
Stefano's app.

I've run it both within Dom0, and inside a guest. I also run a Xen
build (in this case, only inside of the guest).

> We are down to 2000-3000ns. Then, I started investigating the
> scheduler.
> I measured how long it takes to run "vcpu_unblock": 1050ns, which is
> significant. I don't know what is causing the remaining 1000-2000ns,
> but
> I bet on another scheduler function. Do you have any suggestions on
> which one?
> 
So, vcpu_unblock() calls vcpu_wake(), which then invokes the
scheduler's wakeup related functions.

If you time vcpu_unblock(), from beginning to end of the function, you
actually capture quite a few things. E.g., the scheduler lock is taken
inside vcpu_wake(), so you're basically including time spent waited on
the lock in the estimation.

That is probably ok (as in, lock contention definitely is something
relevant to latency), but it is expected for things to be rather
different between Credit1 and Credit2.

I've, OTOH, tried to time, SCHED_OP(wake) and SCHED_OP(do_schedule),
and here's the result. Numbers are in cycles (I've used RDTSC) and, for
making sure to obtain consistent and comparable numbers, I've set the
frequency scaling governor to performance.

Dom0, [performance]							
	        cyclictest 1us	cyclictest 1ms	cyclictest 100ms			
(cycles)	Credit1	Credit2	Credit1	Credit2	Credit1	Credit2		
wakeup-avg	2429	2035	1980	1633	2535	1979		
wakeup-max	14577	113682	15153	203136	12285	115164		
sched-avg	1716	1860	2527	1651	2286	1670		
sched-max	16059	15000	12297	101760	15831	13122		
								
VM, [performance]							
	        cyclictest 1us	cyclictest 1ms	cyclictest 100ms make -j xen	
(cycles)	Credit1	Credit2	Credit1	Credit2	Credit1	Credit2	 Credit1 Credit2
wakeup-avg	2213	2128	1944	2342	2374	2213	 2429	 1618
wakeup-max	9990	10104	11262	9927	10290	10218	 14430	 15108
sched-avg	2437	2472	1620	1594	2498	1759	 2449	 1809
sched-max	14100	14634	10071	9984	10878	8748	 16476	 14220

Actually, TSC on this box should be stable and invariant, so I guess I
can try with the default governor. Will do that on Monday. Does ARM
have frequency scaling (I did remember something on xen-devel, but I am
not sure whether it landed upstream)?

But anyway. You're seeing big differences between Credit1 and Credit2,
while I, at least as far as the actual schedulers' code is concerned,
don't.

Credit2 shows higher wakeup-max values, but only in cases where the
workload runs in dom0. But it also shows better (lower) averages, in
both the two kind of workload considered and both in the dom0 and VM
case.

I therefore wonder what is actually responsible for the huge
differences between the two scheduler that you are seeing... could be
lock contention, but with only 4 pCPUs and 2 active vCPUs, I honestly
doubt it...

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Xen on ARM IRQ latency and scheduler overhead
  2017-02-17 11:02           ` Dario Faggioli
@ 2017-02-17 19:34             ` Julien Grall
  2017-02-17 23:14               ` Stefano Stabellini
  0 siblings, 1 reply; 19+ messages in thread
From: Julien Grall @ 2017-02-17 19:34 UTC (permalink / raw)
  To: Dario Faggioli, Stefano Stabellini
  Cc: george.dunlap, edgar.iglesias, nd, xen-devel

Hi,

On 02/17/2017 11:02 AM, Dario Faggioli wrote:
> Just very quickly...
>
> On Thu, 2017-02-16 at 15:07 -0800, Stefano Stabellini wrote:
>> (XEN) Active queues: 1
>> (XEN)   default-weight     = 256
>> (XEN) Runqueue 0:
>> (XEN)   ncpus              = 4
>> (XEN)   cpus               = 0-3
>> (XEN)   max_weight         = 256
>> (XEN)   instload           = 1
>> (XEN)   aveload            = 3208 (~1%)
>> (XEN) l(XEN)    idlers: 00000000,00000000,00000000,0000000a
>> a(XEN)  tickled: 00000000,00000000,00000000,00000000
>> t(XEN)  fully idle cores: 00000000,00000000,00000000,0000000a
>> e(XEN) Domain info:
>> n(XEN)  Domain: 0 w 256 v 4
>> c(XEN)    1: [0.0] flags=2 cpu=0 credit=10500000 [w=256] load=3170 (~1%)
>> (XEN)     2: y[0.1] flags=0 cpu=1  credit=10500000 [w=256]( load=131072 (~50%)
>> (XEN)     3: n[0.2] flags=0 cpu=2s credit=10500000 [w=256]) load=131072 (~50%):
>> (XEN)     4:  [0.3] flags=0 cpu=3m credit=10500000 [w=256]a load=131072 (~50%)x
>>
> Status of vcpus 2, 3 and 4 is a bit weird. I'll think about it.
>
>> =(XEN)  Domain: 1 w 256 v 1
>> 1(XEN)    5: 1[1.0] flags=2 cpu=2 credit=9713074 [w=256] load=56 (~0%)
>> (XEN) Runqueue info:
>> 6(XEN) runqueue 0:
>> 9(XEN) CPUs info:
>> 0(XEN) CPU[00]   runq=0, sibling=00000000,00000000,00000000,00000001, wcore=00000000,00000000,00000000,00000001
>>
> This tells me that nr_cpu_ids is very big (I think it tells it is 128,
> i.e., ARM default), which means cpumask_t-s are huge.
>
> What does `xl info' says. On my (x86) test box, it's like this:
>
>  ...
>  nr_cpus                : 16
>  max_cpu_id             : 63
>  ...
>
> (and I have NR_CPUS=256, i.e., x86 the default).
>
> Cpumasks being bigger also means cpumask operation being slower, and
> this matters quite a bit in Credit2, because we use cpumasks a lot (but
> also in Credit1, because we use cpumasks a little less than in Credit2,
> but still quite a bit).
>
> Isn't there a way, on ARM, to figure out online that you're not going
> to have 128 cpus in the platform?

It is just we never set nr_cpu_ids on ARM :/. There was a patch on the 
ML a while ago [1] but never got applied.

Stefano, I think the patch is still valid. Could you apply it?

It would probably be worth to do the benchmark again with this patch 
applied.

Cheers,

[1] https://patchwork.kernel.org/patch/8177261/

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Xen on ARM IRQ latency and scheduler overhead
  2017-02-17 18:40 ` Dario Faggioli
@ 2017-02-17 19:44   ` Julien Grall
  2017-02-17 22:55     ` Stefano Stabellini
  2017-02-18  0:59     ` Dario Faggioli
  2017-02-18  0:41   ` Stefano Stabellini
  1 sibling, 2 replies; 19+ messages in thread
From: Julien Grall @ 2017-02-17 19:44 UTC (permalink / raw)
  To: Dario Faggioli, Stefano Stabellini, xen-devel
  Cc: george.dunlap, edgar.iglesias, nd

Hi Dario,

On 02/17/2017 06:40 PM, Dario Faggioli wrote:
> On Thu, 2017-02-09 at 16:54 -0800, Stefano Stabellini wrote:
> Actually, TSC on this box should be stable and invariant, so I guess I
> can try with the default governor. Will do that on Monday. Does ARM
> have frequency scaling (I did remember something on xen-devel, but I am
> not sure whether it landed upstream)?

I guess you mean the series sent by globallogic ([1])? If so, it was 
never upstreamed.

However the frequency scaling may depend on the processor used and 
sometimes implemented using big.LITTLE (e.g the task has to switch).

I am not sure why it would matter in this case.

[1] 
https://lists.xenproject.org/archives/html/xen-devel/2014-10/msg00801.html

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Xen on ARM IRQ latency and scheduler overhead
  2017-02-17 19:44   ` Julien Grall
@ 2017-02-17 22:55     ` Stefano Stabellini
  2017-02-18  0:59     ` Dario Faggioli
  1 sibling, 0 replies; 19+ messages in thread
From: Stefano Stabellini @ 2017-02-17 22:55 UTC (permalink / raw)
  To: Julien Grall
  Cc: edgar.iglesias, Stefano Stabellini, george.dunlap,
	Dario Faggioli, xen-devel, nd

On Fri, 17 Feb 2017, Julien Grall wrote:
> Hi Dario,
> 
> On 02/17/2017 06:40 PM, Dario Faggioli wrote:
> > On Thu, 2017-02-09 at 16:54 -0800, Stefano Stabellini wrote:
> > Actually, TSC on this box should be stable and invariant, so I guess I
> > can try with the default governor. Will do that on Monday. Does ARM
> > have frequency scaling (I did remember something on xen-devel, but I am
> > not sure whether it landed upstream)?
> 
> I guess you mean the series sent by globallogic ([1])? If so, it was never
> upstreamed.
> 
> However the frequency scaling may depend on the processor used and sometimes
> implemented using big.LITTLE (e.g the task has to switch).
> 
> I am not sure why it would matter in this case.
> 
> [1] https://lists.xenproject.org/archives/html/xen-devel/2014-10/msg00801.html

Not in this case, I have four A53s on this platform.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Xen on ARM IRQ latency and scheduler overhead
  2017-02-17 19:34             ` Julien Grall
@ 2017-02-17 23:14               ` Stefano Stabellini
  2017-02-18  0:02                 ` Stefano Stabellini
  0 siblings, 1 reply; 19+ messages in thread
From: Stefano Stabellini @ 2017-02-17 23:14 UTC (permalink / raw)
  To: Julien Grall
  Cc: edgar.iglesias, Stefano Stabellini, george.dunlap,
	Dario Faggioli, xen-devel, nd

On Fri, 17 Feb 2017, Julien Grall wrote:
> Hi,
> 
> On 02/17/2017 11:02 AM, Dario Faggioli wrote:
> > Just very quickly...
> > 
> > On Thu, 2017-02-16 at 15:07 -0800, Stefano Stabellini wrote:
> > > (XEN) Active queues: 1
> > > (XEN)   default-weight     = 256
> > > (XEN) Runqueue 0:
> > > (XEN)   ncpus              = 4
> > > (XEN)   cpus               = 0-3
> > > (XEN)   max_weight         = 256
> > > (XEN)   instload           = 1
> > > (XEN)   aveload            = 3208 (~1%)
> > > (XEN) l(XEN)    idlers: 00000000,00000000,00000000,0000000a
> > > a(XEN)  tickled: 00000000,00000000,00000000,00000000
> > > t(XEN)  fully idle cores: 00000000,00000000,00000000,0000000a
> > > e(XEN) Domain info:
> > > n(XEN)  Domain: 0 w 256 v 4
> > > c(XEN)    1: [0.0] flags=2 cpu=0 credit=10500000 [w=256] load=3170 (~1%)
> > > (XEN)     2: y[0.1] flags=0 cpu=1  credit=10500000 [w=256]( load=131072
> > > (~50%)
> > > (XEN)     3: n[0.2] flags=0 cpu=2s credit=10500000 [w=256]) load=131072
> > > (~50%):
> > > (XEN)     4:  [0.3] flags=0 cpu=3m credit=10500000 [w=256]a load=131072
> > > (~50%)x
> > > 
> > Status of vcpus 2, 3 and 4 is a bit weird. I'll think about it.
> > 
> > > =(XEN)  Domain: 1 w 256 v 1
> > > 1(XEN)    5: 1[1.0] flags=2 cpu=2 credit=9713074 [w=256] load=56 (~0%)
> > > (XEN) Runqueue info:
> > > 6(XEN) runqueue 0:
> > > 9(XEN) CPUs info:
> > > 0(XEN) CPU[00]   runq=0, sibling=00000000,00000000,00000000,00000001,
> > > wcore=00000000,00000000,00000000,00000001
> > > 
> > This tells me that nr_cpu_ids is very big (I think it tells it is 128,
> > i.e., ARM default), which means cpumask_t-s are huge.
> > 
> > What does `xl info' says. On my (x86) test box, it's like this:
> > 
> >  ...
> >  nr_cpus                : 16
> >  max_cpu_id             : 63
> >  ...
> > 
> > (and I have NR_CPUS=256, i.e., x86 the default).

Indeed I have 127


> > Cpumasks being bigger also means cpumask operation being slower, and
> > this matters quite a bit in Credit2, because we use cpumasks a lot (but
> > also in Credit1, because we use cpumasks a little less than in Credit2,
> > but still quite a bit).
> > 
> > Isn't there a way, on ARM, to figure out online that you're not going
> > to have 128 cpus in the platform?
> 
> It is just we never set nr_cpu_ids on ARM :/. There was a patch on the ML a
> while ago [1] but never got applied.
> 
> Stefano, I think the patch is still valid. Could you apply it?
> [1] https://patchwork.kernel.org/patch/8177261/

I pushed the patch


> It would probably be worth to do the benchmark again with this patch applied.

Unfortunately the numbers haven't changed much:

                                        AVG     MIN     MAX     WARM MAX
NODEBUG vwfi=sleep credit2 fix-cpumasks	8020	7670	10320	8390
NODEBUG vwfi=sleep credit1 fix-cpumasks	6400	6330	9650	6720

In addition to the mysterious difference between credit1 and credit2, we
also have the difference between vwfi=idle and vwfi=sleep to deal with:

NODEBUG vwfi=idle credit2 fix cpumasks	4000	2370	4500	3350
NODEBUG vwfi=idle credit1 fix cpumasks	3220	2180	4500	4320

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Xen on ARM IRQ latency and scheduler overhead
  2017-02-17 23:14               ` Stefano Stabellini
@ 2017-02-18  0:02                 ` Stefano Stabellini
  2017-02-18  0:47                   ` Dario Faggioli
  0 siblings, 1 reply; 19+ messages in thread
From: Stefano Stabellini @ 2017-02-18  0:02 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: edgar.iglesias, george.dunlap, Dario Faggioli, xen-devel,
	Julien Grall, nd

On Fri, 17 Feb 2017, Stefano Stabellini wrote:
> On Fri, 17 Feb 2017, Julien Grall wrote:
> > Hi,
> > 
> > On 02/17/2017 11:02 AM, Dario Faggioli wrote:
> > > Just very quickly...
> > > 
> > > On Thu, 2017-02-16 at 15:07 -0800, Stefano Stabellini wrote:
> > > > (XEN) Active queues: 1
> > > > (XEN)   default-weight     = 256
> > > > (XEN) Runqueue 0:
> > > > (XEN)   ncpus              = 4
> > > > (XEN)   cpus               = 0-3
> > > > (XEN)   max_weight         = 256
> > > > (XEN)   instload           = 1
> > > > (XEN)   aveload            = 3208 (~1%)
> > > > (XEN) l(XEN)    idlers: 00000000,00000000,00000000,0000000a
> > > > a(XEN)  tickled: 00000000,00000000,00000000,00000000
> > > > t(XEN)  fully idle cores: 00000000,00000000,00000000,0000000a
> > > > e(XEN) Domain info:
> > > > n(XEN)  Domain: 0 w 256 v 4
> > > > c(XEN)    1: [0.0] flags=2 cpu=0 credit=10500000 [w=256] load=3170 (~1%)
> > > > (XEN)     2: y[0.1] flags=0 cpu=1  credit=10500000 [w=256]( load=131072
> > > > (~50%)
> > > > (XEN)     3: n[0.2] flags=0 cpu=2s credit=10500000 [w=256]) load=131072
> > > > (~50%):
> > > > (XEN)     4:  [0.3] flags=0 cpu=3m credit=10500000 [w=256]a load=131072
> > > > (~50%)x
> > > > 
> > > Status of vcpus 2, 3 and 4 is a bit weird. I'll think about it.
> > > 
> > > > =(XEN)  Domain: 1 w 256 v 1
> > > > 1(XEN)    5: 1[1.0] flags=2 cpu=2 credit=9713074 [w=256] load=56 (~0%)
> > > > (XEN) Runqueue info:
> > > > 6(XEN) runqueue 0:
> > > > 9(XEN) CPUs info:
> > > > 0(XEN) CPU[00]   runq=0, sibling=00000000,00000000,00000000,00000001,
> > > > wcore=00000000,00000000,00000000,00000001
> > > > 
> > > This tells me that nr_cpu_ids is very big (I think it tells it is 128,
> > > i.e., ARM default), which means cpumask_t-s are huge.
> > > 
> > > What does `xl info' says. On my (x86) test box, it's like this:
> > > 
> > >  ...
> > >  nr_cpus                : 16
> > >  max_cpu_id             : 63
> > >  ...
> > > 
> > > (and I have NR_CPUS=256, i.e., x86 the default).
> 
> Indeed I have 127
> 
> 
> > > Cpumasks being bigger also means cpumask operation being slower, and
> > > this matters quite a bit in Credit2, because we use cpumasks a lot (but
> > > also in Credit1, because we use cpumasks a little less than in Credit2,
> > > but still quite a bit).
> > > 
> > > Isn't there a way, on ARM, to figure out online that you're not going
> > > to have 128 cpus in the platform?
> > 
> > It is just we never set nr_cpu_ids on ARM :/. There was a patch on the ML a
> > while ago [1] but never got applied.
> > 
> > Stefano, I think the patch is still valid. Could you apply it?
> > [1] https://patchwork.kernel.org/patch/8177261/
> 
> I pushed the patch
> 
> 
> > It would probably be worth to do the benchmark again with this patch applied.
> 
> Unfortunately the numbers haven't changed much:
> 
>                                         AVG     MIN     MAX     WARM MAX
> NODEBUG vwfi=sleep credit2 fix-cpumasks	8020	7670	10320	8390
> NODEBUG vwfi=sleep credit1 fix-cpumasks	6400	6330	9650	6720
> 
> In addition to the mysterious difference between credit1 and credit2, we
> also have the difference between vwfi=idle and vwfi=sleep to deal with:
> 
> NODEBUG vwfi=idle credit2 fix cpumasks	4000	2370	4500	3350
> NODEBUG vwfi=idle credit1 fix cpumasks	3220	2180	4500	4320

Actually those are still the old numbers, sorry! I didn't update the xen
binary properly. These are the new numbers:

        	                    AVG	    MIN	    MAX	    WARM MAX
vwfi=sleep credit2 fix cpumasks	5910	5800	8520	6180
vwfi=sleep credit1 fix cpumasks	4900	4810	6910	4980
vwfi=idle  credit2 fix cpumasks	2800	1560	4550	4200
vwfi=idle  credit1 fix cpumasks	2800	1610	3420	1770

The difference between credit2 and credit1 is smaller now. In fact it's
zero, when vwfi=idle. However, with vwfi=sleep, the larger MAX value is
a bit worrying.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Xen on ARM IRQ latency and scheduler overhead
  2017-02-17 18:40 ` Dario Faggioli
  2017-02-17 19:44   ` Julien Grall
@ 2017-02-18  0:41   ` Stefano Stabellini
  2017-02-20 11:04     ` George Dunlap
  1 sibling, 1 reply; 19+ messages in thread
From: Stefano Stabellini @ 2017-02-18  0:41 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: george.dunlap, edgar.iglesias, julien.grall, Stefano Stabellini,
	xen-devel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3917 bytes --]

On Fri, 17 Feb 2017, Dario Faggioli wrote:
> On Thu, 2017-02-09 at 16:54 -0800, Stefano Stabellini wrote:
> > These are the results, in nanosec:
> > 
> >                         AVG     MIN     MAX     WARM MAX
> > 
> > NODEBUG no WFI          1890    1800    3170    2070
> > NODEBUG WFI             4850    4810    7030    4980
> > NODEBUG no WFI credit2  2217    2090    3420    2650
> > NODEBUG WFI credit2     8080    7890    10320   8300
> > 
> > DEBUG no WFI            2252    2080    3320    2650
> > DEBUG WFI               6500    6140    8520    8130
> > DEBUG WFI, credit2      8050    7870    10680   8450
> > 
> > As you can see, depending on whether the guest issues a WFI or not
> > while
> > waiting for interrupts, the results change significantly.
> > Interestingly,
> > credit2 does worse than credit1 in this area.
> > 
> I did some measuring myself, on x86, with different tools. So,
> cyclictest is basically something very very similar to the app
> Stefano's app.
> 
> I've run it both within Dom0, and inside a guest. I also run a Xen
> build (in this case, only inside of the guest).
> 
> > We are down to 2000-3000ns. Then, I started investigating the
> > scheduler.
> > I measured how long it takes to run "vcpu_unblock": 1050ns, which is
> > significant. I don't know what is causing the remaining 1000-2000ns,
> > but
> > I bet on another scheduler function. Do you have any suggestions on
> > which one?
> > 
> So, vcpu_unblock() calls vcpu_wake(), which then invokes the
> scheduler's wakeup related functions.
> 
> If you time vcpu_unblock(), from beginning to end of the function, you
> actually capture quite a few things. E.g., the scheduler lock is taken
> inside vcpu_wake(), so you're basically including time spent waited on
> the lock in the estimation.
> 
> That is probably ok (as in, lock contention definitely is something
> relevant to latency), but it is expected for things to be rather
> different between Credit1 and Credit2.
> 
> I've, OTOH, tried to time, SCHED_OP(wake) and SCHED_OP(do_schedule),
> and here's the result. Numbers are in cycles (I've used RDTSC) and, for
> making sure to obtain consistent and comparable numbers, I've set the
> frequency scaling governor to performance.
> 
> Dom0, [performance]							
> 	        cyclictest 1us	cyclictest 1ms	cyclictest 100ms			
> (cycles)	Credit1	Credit2	Credit1	Credit2	Credit1	Credit2		
> wakeup-avg	2429	2035	1980	1633	2535	1979		
> wakeup-max	14577	113682	15153	203136	12285	115164		

I am not that familiar with the x86 side of things, but the 113682 and
203136 look worrisome, especially considering that credit1 doesn't have
them.


> sched-avg	1716	1860	2527	1651	2286	1670		
> sched-max	16059	15000	12297	101760	15831	13122		
> 								
> VM, [performance]							
> 	        cyclictest 1us	cyclictest 1ms	cyclictest 100ms make -j xen	
> (cycles)	Credit1	Credit2	Credit1	Credit2	Credit1	Credit2	 Credit1 Credit2
> wakeup-avg	2213	2128	1944	2342	2374	2213	 2429	 1618
> wakeup-max	9990	10104	11262	9927	10290	10218	 14430	 15108
> sched-avg	2437	2472	1620	1594	2498	1759	 2449	 1809
> sched-max	14100	14634	10071	9984	10878	8748	 16476	 14220
> 

These are the corresponding numbers I have in ns:

                            	AVG		MAX	    WARM MAX
credit2 sched_op do_schedule	638		2410	2290
credit2 sched_op wake	        603		2920	670
credit1 sched_op do_schedule	508		980	    980
credit1 sched_op wake	        792		2080	930

I would also like to see the nop scheduler as a comparison.

It looks like that credit2 has higher max values. I am attaching the raw
numbers because I think they are interesting (also in ns): credit2 has
an higher initial variability. FYI the scenario is still the same: domU
vcpu pinned to a pcpu, dom0 running elsewhere.

[-- Attachment #2: Type: TEXT/PLAIN, Size: 8218 bytes --]

(XEN) DEBUG vcpu_wake 450 ns=1640
(XEN) DEBUG schedule 1401 ns=480
(XEN) DEBUG schedule 1401 ns=590
(XEN) DEBUG schedule 1401 ns=980
(XEN) DEBUG vcpu_wake 450 ns=2080
(XEN) DEBUG schedule 1401 ns=330
(XEN) DEBUG schedule 1401 ns=800
(XEN) DEBUG vcpu_wake 450 ns=930
(XEN) DEBUG schedule 1401 ns=280
(XEN) DEBUG schedule 1401 ns=770
(XEN) DEBUG vcpu_wake 450 ns=800
(XEN) DEBUG schedule 1401 ns=290
(XEN) DEBUG schedule 1401 ns=800
(XEN) DEBUG vcpu_wake 450 ns=770
(XEN) DEBUG schedule 1401 ns=270
(XEN) DEBUG schedule 1401 ns=780
(XEN) DEBUG vcpu_wake 450 ns=800
(XEN) DEBUG schedule 1401 ns=290
(XEN) DEBUG schedule 1401 ns=790
(XEN) DEBUG vcpu_wake 450 ns=1510
(XEN) DEBUG schedule 1401 ns=260
(XEN) DEBUG schedule 1401 ns=780
(XEN) DEBUG vcpu_wake 450 ns=740
(XEN) DEBUG schedule 1401 ns=270
(XEN) DEBUG schedule 1401 ns=750
(XEN) DEBUG vcpu_wake 450 ns=780
(XEN) DEBUG schedule 1401 ns=250
(XEN) DEBUG schedule 1401 ns=780
(XEN) DEBUG vcpu_wake 450 ns=780
(XEN) DEBUG schedule 1401 ns=270
(XEN) DEBUG schedule 1401 ns=770
(XEN) DEBUG vcpu_wake 450 ns=760
(XEN) DEBUG schedule 1401 ns=270
(XEN) DEBUG schedule 1401 ns=780
(XEN) DEBUG vcpu_wake 450 ns=750
(XEN) DEBUG schedule 1401 ns=280
(XEN) DEBUG schedule 1401 ns=760
(XEN) DEBUG vcpu_wake 450 ns=1170
(XEN) DEBUG schedule 1401 ns=270
(XEN) DEBUG schedule 1401 ns=760
(XEN) DEBUG vcpu_wake 450 ns=720
(XEN) DEBUG schedule 1401 ns=270
(XEN) DEBUG schedule 1401 ns=750
(XEN) DEBUG vcpu_wake 450 ns=760
(XEN) DEBUG schedule 1401 ns=270
(XEN) DEBUG schedule 1401 ns=780
(XEN) DEBUG vcpu_wake 450 ns=760
(XEN) DEBUG schedule 1401 ns=290
(XEN) DEBUG schedule 1401 ns=800
(XEN) DEBUG vcpu_wake 450 ns=740
(XEN) DEBUG schedule 1401 ns=270
(XEN) DEBUG schedule 1401 ns=780
(XEN) DEBUG vcpu_wake 450 ns=760
(XEN) DEBUG schedule 1401 ns=280
(XEN) DEBUG schedule 1401 ns=760
(XEN) DEBUG vcpu_wake 450 ns=780
(XEN) DEBUG schedule 1401 ns=260
(XEN) DEBUG schedule 1401 ns=760
(XEN) DEBUG vcpu_wake 450 ns=740
(XEN) DEBUG schedule 1401 ns=290
(XEN) DEBUG schedule 1401 ns=770
(XEN) DEBUG vcpu_wake 450 ns=760
(XEN) DEBUG schedule 1401 ns=260
(XEN) DEBUG schedule 1401 ns=770
(XEN) DEBUG vcpu_wake 450 ns=740
(XEN) DEBUG schedule 1401 ns=280
(XEN) DEBUG schedule 1401 ns=730
(XEN) DEBUG vcpu_wake 450 ns=740
(XEN) DEBUG schedule 1401 ns=270
(XEN) DEBUG schedule 1401 ns=720
(XEN) DEBUG vcpu_wake 450 ns=760
(XEN) DEBUG schedule 1401 ns=270
(XEN) DEBUG schedule 1401 ns=720
(XEN) DEBUG vcpu_wake 450 ns=780
(XEN) DEBUG schedule 1401 ns=260
(XEN) DEBUG schedule 1401 ns=720
(XEN) DEBUG vcpu_wake 450 ns=760
(XEN) DEBUG schedule 1401 ns=270
(XEN) DEBUG schedule 1401 ns=700
(XEN) DEBUG vcpu_wake 450 ns=770
(XEN) DEBUG schedule 1401 ns=270
(XEN) DEBUG schedule 1401 ns=760
(XEN) DEBUG vcpu_wake 450 ns=720
(XEN) DEBUG schedule 1401 ns=280
(XEN) DEBUG schedule 1401 ns=710
(XEN) DEBUG vcpu_wake 450 ns=760
(XEN) DEBUG schedule 1401 ns=260
(XEN) DEBUG schedule 1401 ns=740
(XEN) DEBUG vcpu_wake 450 ns=750
(XEN) DEBUG schedule 1401 ns=260
(XEN) DEBUG schedule 1401 ns=700
(XEN) DEBUG vcpu_wake 450 ns=720
(XEN) DEBUG schedule 1401 ns=270
(XEN) DEBUG schedule 1401 ns=710
(XEN) DEBUG vcpu_wake 450 ns=740
(XEN) DEBUG schedule 1401 ns=270
(XEN) DEBUG schedule 1401 ns=720
(XEN) DEBUG vcpu_wake 450 ns=730
(XEN) DEBUG schedule 1401 ns=270
(XEN) DEBUG schedule 1401 ns=750
(XEN) DEBUG vcpu_wake 450 ns=760
(XEN) DEBUG schedule 1401 ns=290
(XEN) DEBUG schedule 1401 ns=720
(XEN) DEBUG vcpu_wake 450 ns=740
(XEN) DEBUG schedule 1401 ns=290
(XEN) DEBUG schedule 1401 ns=710
(XEN) DEBUG vcpu_wake 450 ns=740
(XEN) DEBUG schedule 1401 ns=260
(XEN) DEBUG schedule 1401 ns=710
(XEN) DEBUG vcpu_wake 450 ns=760
(XEN) DEBUG schedule 1401 ns=270
(XEN) DEBUG schedule 1401 ns=780
(XEN) DEBUG vcpu_wake 450 ns=720
(XEN) DEBUG schedule 1401 ns=270
(XEN) DEBUG schedule 1401 ns=730
(XEN) DEBUG vcpu_wake 450 ns=750
(XEN) DEBUG schedule 1401 ns=270
(XEN) DEBUG schedule 1401 ns=710
(XEN) DEBUG vcpu_wake 450 ns=750
(XEN) DEBUG schedule 1401 ns=290
(XEN) DEBUG schedule 1401 ns=740
(XEN) DEBUG vcpu_wake 450 ns=750
(XEN) DEBUG schedule 1401 ns=270
(XEN) DEBUG schedule 1401 ns=790
(XEN) DEBUG vcpu_wake 450 ns=730
(XEN) DEBUG schedule 1401 ns=280
(XEN) DEBUG schedule 1401 ns=730
(XEN) DEBUG vcpu_wake 450 ns=740
(XEN) DEBUG schedule 1401 ns=290
(XEN) DEBUG schedule 1401 ns=750
(XEN) DEBUG vcpu_wake 450 ns=740
(XEN) DEBUG schedule 1401 ns=280
(XEN) DEBUG schedule 1401 ns=770
(XEN) DEBUG vcpu_wake 450 ns=740
(XEN) DEBUG schedule 1401 ns=270
(XEN) DEBUG schedule 1401 ns=730
(XEN) DEBUG vcpu_wake 450 ns=730
(XEN) DEBUG schedule 1401 ns=270
(XEN) DEBUG schedule 1401 ns=740
(XEN) DEBUG vcpu_wake 450 ns=730
(XEN) DEBUG schedule 1401 ns=260
(XEN) DEBUG schedule 1401 ns=760
(XEN) DEBUG vcpu_wake 450 ns=730
(XEN) DEBUG schedule 1401 ns=280
(XEN) DEBUG schedule 1401 ns=710
(XEN) DEBUG vcpu_wake 450 ns=710
(XEN) DEBUG schedule 1401 ns=260
(XEN) DEBUG schedule 1401 ns=710
(XEN) DEBUG vcpu_wake 450 ns=760
(XEN) DEBUG schedule 1401 ns=260
(XEN) DEBUG schedule 1401 ns=750
(XEN) DEBUG vcpu_wake 450 ns=750
(XEN) DEBUG schedule 1401 ns=270
(XEN) DEBUG schedule 1401 ns=700
[B[B[B[B[B[B(XEN) DEBUG vcpu_wake 450 ns=760
(XEN) DEBUG schedule 1401 ns=270
(XEN) DEBUG schedule 1401 ns=730
[B[B[B[B(XEN) DEBUG vcpu_wake 450 ns=730
(XEN) DEBUG schedule 1401 ns=280
(XEN) DEBUG schedule 1401 ns=750
(XEN) DEBUG vcpu_wake 450 ns=730
(XEN) DEBUG schedule 1401 ns=270
(XEN) DEBUG schedule 1401 ns=690
(XEN) DEBUG vcpu_wake 450 ns=770
(XEN) DEBUG schedule 1401 ns=270
(XEN) DEBUG schedule 1401 ns=730
(XEN) DEBUG vcpu_wake 450 ns=730
(XEN) DEBUG schedule 1401 ns=270
(XEN) DEBUG schedule 1401 ns=750
(XEN) DEBUG vcpu_wake 450 ns=700
(XEN) DEBUG schedule 1401 ns=270
(XEN) DEBUG schedule 1401 ns=770
(XEN) DEBUG vcpu_wake 450 ns=720
(XEN) DEBUG schedule 1401 ns=270
(XEN) DEBUG schedule 1401 ns=710
(XEN) DEBUG vcpu_wake 450 ns=770
(XEN) DEBUG schedule 1401 ns=260
(XEN) DEBUG schedule 1401 ns=690
(XEN) DEBUG vcpu_wake 450 ns=780
(XEN) DEBUG schedule 1401 ns=260
(XEN) DEBUG schedule 1401 ns=730
(XEN) DEBUG vcpu_wake 450 ns=750
(XEN) DEBUG schedule 1401 ns=260
(XEN) DEBUG schedule 1401 ns=740
(XEN) DEBUG vcpu_wake 450 ns=740
(XEN) DEBUG schedule 1401 ns=270
(XEN) DEBUG schedule 1401 ns=720
(XEN) DEBUG vcpu_wake 450 ns=740
(XEN) DEBUG schedule 1401 ns=260
(XEN) DEBUG schedule 1401 ns=740
(XEN) DEBUG vcpu_wake 450 ns=730
(XEN) DEBUG schedule 1401 ns=270
(XEN) DEBUG schedule 1401 ns=750
(XEN) DEBUG vcpu_wake 450 ns=770
(XEN) DEBUG schedule 1401 ns=270
(XEN) DEBUG schedule 1401 ns=730
(XEN) DEBUG vcpu_wake 450 ns=770
(XEN) DEBUG schedule 1401 ns=270
(XEN) DEBUG schedule 1401 ns=720
(XEN) DEBUG vcpu_wake 450 ns=720
(XEN) DEBUG schedule 1401 ns=270
(XEN) DEBUG schedule 1401 ns=730
(XEN) DEBUG vcpu_wake 450 ns=710
(XEN) DEBUG schedule 1401 ns=300
(XEN) DEBUG schedule 1401 ns=720
(XEN) DEBUG vcpu_wake 450 ns=730
(XEN) DEBUG schedule 1401 ns=290
(XEN) DEBUG schedule 1401 ns=710
(XEN) DEBUG vcpu_wake 450 ns=770
(XEN) DEBUG schedule 1401 ns=270
(XEN) DEBUG schedule 1401 ns=710
(XEN) DEBUG vcpu_wake 450 ns=760
(XEN) DEBUG schedule 1401 ns=250
(XEN) DEBUG schedule 1401 ns=720
(XEN) DEBUG vcpu_wake 450 ns=710
(XEN) DEBUG schedule 1401 ns=270
(XEN) DEBUG schedule 1401 ns=700
(XEN) DEBUG vcpu_wake 450 ns=690
(XEN) DEBUG schedule 1401 ns=270
(XEN) DEBUG schedule 1401 ns=730
(XEN) DEBUG vcpu_wake 450 ns=770
(XEN) DEBUG schedule 1401 ns=270
(XEN) DEBUG schedule 1401 ns=720
(XEN) DEBUG vcpu_wake 450 ns=750
(XEN) DEBUG schedule 1401 ns=270
(XEN) DEBUG schedule 1401 ns=710
(XEN) DEBUG vcpu_wake 450 ns=780
(XEN) DEBUG schedule 1401 ns=270
(XEN) DEBUG schedule 1401 ns=670
(XEN) DEBUG vcpu_wake 450 ns=740
(XEN) DEBUG schedule 1401 ns=260
(XEN) DEBUG schedule 1401 ns=710
(XEN) DEBUG vcpu_wake 450 ns=760
(XEN) DEBUG schedule 1401 ns=270
(XEN) DEBUG schedule 1401 ns=700
(XEN) DEBUG vcpu_wake 450 ns=750
(XEN) DEBUG schedule 1401 ns=260
(XEN) DEBUG schedule 1401 ns=700
(XEN) DEBUG vcpu_wake 450 ns=740
(XEN) DEBUG schedule 1401 ns=270
(XEN) DEBUG schedule 1401 ns=700

[-- Attachment #3: Type: TEXT/PLAIN, Size: 16275 bytes --]

(XEN) DEBUG vcpu_wake 450 ns=2920
(XEN) DEBUG schedule 1401 ns=1160
(XEN) DEBUG schedule 1401 ns=2410
(XEN) IDEBUG schedule 1401 ns=2290
(XEN) DEBUG schedule 1401 ns=700
(XEN) DEBUG schedule 1401 ns=1150
(XEN) DEBUG vcpu_wake 450 ns=820
(XEN) DEBUG schedule 1401 ns=740
(XEN) DEBUG schedule 1401 ns=710
(XEN) DEBUG vcpu_wake 450 ns=670
(XEN) DEBUG schedule 1401 ns=630
(XEN) DEBUG schedule 1401 ns=650
(XEN) DEBUG vcpu_wake 450 ns=630
(XEN) DEBUG schedule 1401 ns=620
(XEN) DEBUG schedule 1401 ns=620
(XEN) DEBUG vcpu_wake 450 ns=600
(XEN) DEBUG schedule 1401 ns=620
(XEN) latDEBUG schedule 1401 ns=810
(XEN) DEBUG schedule 1401 ns=600
(XEN) DEBUG vcpu_wake 450 ns=600
(XEN) DEBUG schedule 1401 ns=580
(XEN) latDEBUG schedule 1401 ns=790
(XEN) DEBUG schedule 1401 ns=610
(XEN) DEBUG vcpu_wake 450 ns=630
(XEN) DEBUG schedule 1401 ns=590
(XEN) DEBUG schedule 1401 ns=630
(XEN) DEBUG vcpu_wake 450 ns=600
(XEN) DEBUG schedule 1401 ns=600
(XEN) DEBUG schedule 1401 ns=640
(XEN) DEBUG vcpu_wake 450 ns=580
(XEN) DEBUG schedule 1401 ns=600
(XEN) DEBUG schedule 1401 ns=630
(XEN) DEBUG vcpu_wake 450 ns=550
(XEN) DEBUG schedule 1401 ns=610
(XEN) DEBUG schedule 1401 ns=670
(XEN) DEBUG vcpu_wake 450 ns=600
(XEN) DEBUG schedule 1401 ns=610
(XEN) DEBUG schedule 1401 ns=620
(XEN) DEBUG vcpu_wake 450 ns=560
(XEN) DEBUG schedule 1401 ns=610
(XEN) DEBUG schedule 1401 ns=630
(XEN) DEBUG vcpu_wake 450 ns=570
(XEN) DEBUG schedule 1401 ns=610
(XEN) DEBUG schedule 1401 ns=630
(XEN) DEBUG vcpu_wake 450 ns=610
(XEN) DEBUG schedule 1401 ns=590
(XEN) DEBUG schedule 1401 ns=660
(XEN) DEBUG vcpu_wake 450 ns=610
(XEN) DEBUG schedule 1401 ns=600
(XEN) latDEBUG schedule 1401 ns=830
(XEN) DEBUG schedule 1401 ns=620
(XEN) DEBUG vcpu_wake 450 ns=580
(XEN) DEBUG schedule 1401 ns=590
(XEN) latenDEBUG schedule 1401 ns=820
(XEN) DEBUG schedule 1401 ns=610
(XEN) DEBUG vcpu_wake 450 ns=580
(XEN) DEBUG schedule 1401 ns=610
(XEN) DEBUG schedule 1401 ns=630
(XEN) DEBUG vcpu_wake 450 ns=590
(XEN) DEBUG schedule 1401 ns=580
(XEN) DEBUG schedule 1401 ns=630
(XEN) DEBUG vcpu_wake 450 ns=580
(XEN) DEBUG schedule 1401 ns=570
(XEN) DEBUG schedule 1401 ns=630
(XEN) DEBUG vcpu_wake 450 ns=570
(XEN) DEBUG schedule 1401 ns=580
(XEN) DEBUG schedule 1401 ns=620
(XEN) DEBUG vcpu_wake 450 ns=560
(XEN) DEBUG schedule 1401 ns=610
(XEN) DEBUG schedule 1401 ns=610
(XEN) DEBUG vcpu_wake 450 ns=560
(XEN) DEBUG schedule 1401 ns=570
(XEN) DEBUG schedule 1401 ns=660
(XEN) DEBUG vcpu_wake 450 ns=630
(XEN) DEBUG schedule 1401 ns=580
(XEN) DEBUG schedule 1401 ns=630
(XEN) DEBUG vcpu_wake 450 ns=600
(XEN) DEBUG schedule 1401 ns=590
(XEN) latDEBUG schedule 1401 ns=880
(XEN) DEBUG schedule 1401 ns=620
(XEN) DEBUG vcpu_wake 450 ns=590
(XEN) DEBUG schedule 1401 ns=590
(XEN) latDEBUG schedule 1401 ns=860
(XEN) DEBUG schedule 1401 ns=630
(XEN) DEBUG vcpu_wake 450 ns=590
(XEN) DEBUG schedule 1401 ns=600
(XEN) DEBUG schedule 1401 ns=630
(XEN) DEBUG vcpu_wake 450 ns=580
(XEN) DEBUG schedule 1401 ns=610
(XEN) DEBUG schedule 1401 ns=650
(XEN) DEBUG vcpu_wake 450 ns=590
(XEN) DEBUG schedule 1401 ns=610
(XEN) DEBUG schedule 1401 ns=640
(XEN) DEBUG vcpu_wake 450 ns=600
(XEN) DEBUG schedule 1401 ns=580
(XEN) DEBUG schedule 1401 ns=590
(XEN) DEBUG vcpu_wake 450 ns=580
(XEN) DEBUG schedule 1401 ns=590
(XEN) DEBUG schedule 1401 ns=620
(XEN) DEBUG vcpu_wake 450 ns=580
(XEN) DEBUG schedule 1401 ns=600
(XEN) DEBUG schedule 1401 ns=620
(XEN) DEBUG vcpu_wake 450 ns=630
(XEN) DEBUG schedule 1401 ns=580
(XEN) DEBUG schedule 1401 ns=830
(XEN) DEBUG schedule 1401 ns=610
(XEN) DEBUG vcpu_wake 450 ns=570
(XEN) DEBUG schedule 1401 ns=600
(XEN) DEBUG schedule 1401 ns=670
(XEN) DEBUG vcpu_wake 450 ns=580
(XEN) DEBUG schedule 1401 ns=580
(XEN) latDEBUG schedule 1401 ns=870
(XEN) DEBUG schedule 1401 ns=630
(XEN) DEBUG vcpu_wake 450 ns=600
(XEN) DEBUG schedule 1401 ns=650
(XEN) DEBUG schedule 1401 ns=640
(XEN) DEBUG vcpu_wake 450 ns=570
(XEN) DEBUG schedule 1401 ns=610
(XEN) DEBUG schedule 1401 ns=600
(XEN) DEBUG vcpu_wake 450 ns=570
(XEN) DEBUG schedule 1401 ns=590
(XEN) DEBUG schedule 1401 ns=610
(XEN) DEBUG vcpu_wake 450 ns=550
(XEN) DEBUG schedule 1401 ns=590
(XEN) DEBUG schedule 1401 ns=620
(XEN) DEBUG vcpu_wake 450 ns=570
(XEN) DEBUG schedule 1401 ns=620
(XEN) DEBUG schedule 1401 ns=610
(XEN) DEBUG vcpu_wake 450 ns=550
(XEN) DEBUG schedule 1401 ns=580
(XEN) DEBUG schedule 1401 ns=620
(XEN) DEBUG vcpu_wake 450 ns=580
(XEN) DEBUG schedule 1401 ns=600
(XEN) DEBUG schedule 1401 ns=610
(XEN) DEBUG vcpu_wake 450 ns=560
(XEN) DEBUG schedule 1401 ns=600
(XEN) DEBUG schedule 1401 ns=600
(XEN) DEBUG vcpu_wake 450 ns=580
(XEN) DEBUG schedule 1401 ns=600
(XEN) latDEBUG schedule 1401 ns=810
(XEN) DEBUG schedule 1401 ns=600
(XEN) DEBUG vcpu_wake 450 ns=600
(XEN) DEBUG schedule 1401 ns=600
(XEN) latenDEBUG schedule 1401 ns=810
(XEN) DEBUG schedule 1401 ns=590
(XEN) DEBUG vcpu_wake 450 ns=600
(XEN) DEBUG schedule 1401 ns=550
(XEN) DEBUG schedule 1401 ns=610
(XEN) DEBUG vcpu_wake 450 ns=570
(XEN) DEBUG schedule 1401 ns=580
(XEN) DEBUG schedule 1401 ns=650
(XEN) DEBUG vcpu_wake 450 ns=590
(XEN) DEBUG schedule 1401 ns=580
(XEN) DEBUG schedule 1401 ns=600
(XEN) DEBUG vcpu_wake 450 ns=590
(XEN) DEBUG schedule 1401 ns=590
(XEN) DEBUG schedule 1401 ns=600
(XEN) DEBUG vcpu_wake 450 ns=590
(XEN) DEBUG schedule 1401 ns=580
(XEN) DEBUG schedule 1401 ns=590
(XEN) DEBUG vcpu_wake 450 ns=610
(XEN) DEBUG schedule 1401 ns=600
(XEN) DEBUG schedule 1401 ns=590
(XEN) DEBUG vcpu_wake 450 ns=590
(XEN) DEBUG schedule 1401 ns=580
(XEN) DEBUG schedule 1401 ns=620
(XEN) DEBUG vcpu_wake 450 ns=590
(XEN) DEBUG schedule 1401 ns=580
(XEN) DEBUG schedule 1401 ns=570
(XEN) DEBUG vcpu_wake 450 ns=600
(XEN) DEBUG schedule 1401 ns=1020
(XEN) DEBUG schedule 1401 ns=650
(XEN) DEBUG vcpu_wake 450 ns=590
(XEN) DEBUG schedule 1401 ns=570
(XEN) DEBUG schedule 1401 ns=610
(XEN) DEBUG vcpu_wake 450 ns=600
(XEN) DEBUG schedule 1401 ns=550
(XEN) DEBUG schedule 1401 ns=610
(XEN) DEBUG vcpu_wake 450 ns=600
(XEN) DEBUG schedule 1401 ns=580
(XEN) DEBUG schedule 1401 ns=610
(XEN) DEBUG vcpu_wake 450 ns=580
(XEN) DEBUG schedule 1401 ns=570
(XEN) DEBUG schedule 1401 ns=600
(XEN) DEBUG vcpu_wake 450 ns=570
(XEN) DEBUG schedule 1401 ns=580
(XEN) DEBUG schedule 1401 ns=680
(XEN) DEBUG vcpu_wake 450 ns=580
(XEN) DEBUG schedule 1401 ns=550
(XEN) latDEBUG schedule 1401 ns=840
(XEN) DEBUG schedule 1401 ns=600
(XEN) DEBUG vcpu_wake 450 ns=570
(XEN) DEBUG schedule 1401 ns=590
(XEN) lateDEBUG schedule 1401 ns=830
(XEN) DEBUG schedule 1401 ns=630
(XEN) DEBUG vcpu_wake 450 ns=560
(XEN) DEBUG schedule 1401 ns=560
(XEN) DEBUG schedule 1401 ns=660
(XEN) DEBUG vcpu_wake 450 ns=620
(XEN) DEBUG schedule 1401 ns=560
(XEN) DEBUG schedule 1401 ns=640
(XEN) DEBUG vcpu_wake 450 ns=570
(XEN) DEBUG schedule 1401 ns=590
(XEN) DEBUG schedule 1401 ns=610
(XEN) DEBUG vcpu_wake 450 ns=580
(XEN) DEBUG schedule 1401 ns=570
(XEN) DEBUG schedule 1401 ns=630
(XEN) DEBUG vcpu_wake 450 ns=580
(XEN) DEBUG schedule 1401 ns=580
(XEN) DEBUG schedule 1401 ns=580
(XEN) DEBUG vcpu_wake 450 ns=600
(XEN) DEBUG schedule 1401 ns=580
(XEN) DEBUG schedule 1401 ns=640
(XEN) DEBUG vcpu_wake 450 ns=570
(XEN) DEBUG schedule 1401 ns=610
(XEN) DEBUG schedule 1401 ns=640
(XEN) DEBUG vcpu_wake 450 ns=580
(XEN) DEBUG schedule 1401 ns=590
(XEN) latDEBUG schedule 1401 ns=850
(XEN) DEBUG schedule 1401 ns=620
(XEN) DEBUG vcpu_wake 450 ns=590
(XEN) DEBUG schedule 1401 ns=580
(XEN) latDEBUG schedule 1401 ns=820
(XEN) DEBUG schedule 1401 ns=620
(XEN) DEBUG vcpu_wake 450 ns=570
(XEN) DEBUG schedule 1401 ns=570
(XEN) DEBUG schedule 1401 ns=630
(XEN) DEBUG vcpu_wake 450 ns=570
(XEN) DEBUG schedule 1401 ns=570
(XEN) DEBUG schedule 1401 ns=640
(XEN) DEBUG vcpu_wake 450 ns=560
(XEN) DEBUG schedule 1401 ns=590
(XEN) DEBUG schedule 1401 ns=610
(XEN) DEBUG vcpu_wake 450 ns=600
(XEN) DEBUG schedule 1401 ns=560
(XEN) DEBUG schedule 1401 ns=640
(XEN) DEBUG vcpu_wake 450 ns=590
(XEN) DEBUG schedule 1401 ns=580
(XEN) DEBUG schedule 1401 ns=640
(XEN) DEBUG vcpu_wake 450 ns=550
(XEN) DEBUG schedule 1401 ns=590
(XEN) DEBUG schedule 1401 ns=630
(XEN) DEBUG vcpu_wake 450 ns=600
(XEN) DEBUG schedule 1401 ns=530
(XEN) DEBUG schedule 1401 ns=620
(XEN) DEBUG vcpu_wake 450 ns=580
(XEN) DEBUG schedule 1401 ns=580
(XEN) DEBUG schedule 1401 ns=640
(XEN) DEBUG vcpu_wake 450 ns=600
(XEN) DEBUG schedule 1401 ns=580
(XEN) latDEBUG schedule 1401 ns=870
(XEN) DEBUG schedule 1401 ns=660
(XEN) DEBUG vcpu_wake 450 ns=540
(XEN) DEBUG schedule 1401 ns=550
(XEN) DEBUG schedule 1401 ns=620
(XEN) DEBUG vcpu_wake 450 ns=580
(XEN) DEBUG schedule 1401 ns=600
(XEN) DEBUG schedule 1401 ns=640
(XEN) DEBUG vcpu_wake 450 ns=600
(XEN) DEBUG schedule 1401 ns=560
(XEN) DEBUG schedule 1401 ns=630
(XEN) DEBUG vcpu_wake 450 ns=610
(XEN) DEBUG schedule 1401 ns=560
(XEN) DEBUG schedule 1401 ns=610
(XEN) DEBUG vcpu_wake 450 ns=550
(XEN) DEBUG schedule 1401 ns=570
(XEN) DEBUG schedule 1401 ns=630
(XEN) DEBUG vcpu_wake 450 ns=580
(XEN) DEBUG schedule 1401 ns=540
(XEN) DEBUG schedule 1401 ns=650
(XEN) DEBUG vcpu_wake 450 ns=570
(XEN) DEBUG schedule 1401 ns=600
(XEN) DEBUG schedule 1401 ns=610
(XEN) DEBUG vcpu_wake 450 ns=610
(XEN) DEBUG schedule 1401 ns=580
(XEN) DEBUG schedule 1401 ns=640
(XEN) DEBUG vcpu_wake 450 ns=550
(XEN) DEBUG schedule 1401 ns=560
(XEN) latDEBUG schedule 1401 ns=820
(XEN) DEBUG schedule 1401 ns=640
(XEN) DEBUG vcpu_wake 450 ns=600
(XEN) DEBUG schedule 1401 ns=540
(XEN) lateDEBUG schedule 1401 ns=850
(XEN) DEBUG schedule 1401 ns=640
(XEN) DEBUG vcpu_wake 450 ns=590
(XEN) DEBUG schedule 1401 ns=570
(XEN) DEBUG schedule 1401 ns=620
(XEN) DEBUG vcpu_wake 450 ns=560
(XEN) DEBUG schedule 1401 ns=570
(XEN) DEBUG schedule 1401 ns=650
(XEN) DEBUG vcpu_wake 450 ns=570
(XEN) DEBUG schedule 1401 ns=570
(XEN) DEBUG schedule 1401 ns=610
(XEN) DEBUG vcpu_wake 450 ns=580
(XEN) DEBUG schedule 1401 ns=580
(XEN) DEBUG schedule 1401 ns=640
(XEN) DEBUG vcpu_wake 450 ns=590
(XEN) DEBUG schedule 1401 ns=550
(XEN) DEBUG schedule 1401 ns=620
(XEN) DEBUG vcpu_wake 450 ns=580
(XEN) DEBUG schedule 1401 ns=560
(XEN) DEBUG schedule 1401 ns=620
(XEN) DEBUG vcpu_wake 450 ns=620
(XEN) DEBUG schedule 1401 ns=560
(XEN) DEBUG schedule 1401 ns=660
(XEN) DEBUG vcpu_wake 450 ns=570
(XEN) DEBUG schedule 1401 ns=600
(XEN) latDEBUG schedule 1401 ns=840
(XEN) DEBUG schedule 1401 ns=610
(XEN) DEBUG vcpu_wake 450 ns=650
(XEN) DEBUG schedule 1401 ns=640
(XEN) latDEBUG schedule 1401 ns=820
(XEN) DEBUG schedule 1401 ns=670
(XEN) DEBUG vcpu_wake 450 ns=620
(XEN) DEBUG schedule 1401 ns=570
(XEN) DEBUG schedule 1401 ns=670
(XEN) DEBUG vcpu_wake 450 ns=600
(XEN) DEBUG schedule 1401 ns=560
(XEN) DEBUG schedule 1401 ns=650
(XEN) DEBUG vcpu_wake 450 ns=590
(XEN) DEBUG schedule 1401 ns=550
(XEN) DEBUG schedule 1401 ns=660
(XEN) DEBUG vcpu_wake 450 ns=590
(XEN) DEBUG schedule 1401 ns=550
(XEN) DEBUG schedule 1401 ns=610
(XEN) DEBUG vcpu_wake 450 ns=600
(XEN) DEBUG schedule 1401 ns=560
(XEN) DEBUG schedule 1401 ns=630
(XEN) DEBUG vcpu_wake 450 ns=600
(XEN) DEBUG schedule 1401 ns=540
(XEN) DEBUG schedule 1401 ns=650
(XEN) DEBUG vcpu_wake 450 ns=590
(XEN) DEBUG schedule 1401 ns=560
(XEN) DEBUG schedule 1401 ns=620
(XEN) DEBUG vcpu_wake 450 ns=580
(XEN) DEBUG schedule 1401 ns=560
(XEN) DEBUG schedule 1401 ns=650
(XEN) DEBUG vcpu_wake 450 ns=580
(XEN) DEBUG schedule 1401 ns=580
(XEN) DEBUG schedule 1401 ns=640
(XEN) DEBUG vcpu_wake 450 ns=550
(XEN) DEBUG schedule 1401 ns=520
(XEN) latDEBUG schedule 1401 ns=860
(XEN) DEBUG schedule 1401 ns=670
(XEN) DEBUG vcpu_wake 450 ns=610
(XEN) DEBUG schedule 1401 ns=560
(XEN) lateDEBUG schedule 1401 ns=850
(XEN) DEBUG schedule 1401 ns=610
(XEN) DEBUG vcpu_wake 450 ns=570
(XEN) DEBUG schedule 1401 ns=580
(XEN) DEBUG schedule 1401 ns=640
(XEN) DEBUG vcpu_wake 450 ns=610
(XEN) DEBUG schedule 1401 ns=550
(XEN) DEBUG schedule 1401 ns=640
(XEN) DEBUG vcpu_wake 450 ns=620
(XEN) DEBUG schedule 1401 ns=550
(XEN) DEBUG schedule 1401 ns=620
(XEN) DEBUG vcpu_wake 450 ns=570
(XEN) DEBUG schedule 1401 ns=570
(XEN) DEBUG schedule 1401 ns=600
(XEN) DEBUG vcpu_wake 450 ns=580
(XEN) DEBUG schedule 1401 ns=580
(XEN) DEBUG schedule 1401 ns=630
(XEN) DEBUG vcpu_wake 450 ns=580
(XEN) DEBUG schedule 1401 ns=590
(XEN) DEBUG schedule 1401 ns=640
(XEN) DEBUG vcpu_wake 450 ns=610
(XEN) DEBUG schedule 1401 ns=600
(XEN) DEBUG schedule 1401 ns=630
(XEN) DEBUG vcpu_wake 450 ns=580
(XEN) DEBUG schedule 1401 ns=590
(XEN) DEBUG schedule 1401 ns=600
(XEN) DEBUG vcpu_wake 450 ns=600
(XEN) DEBUG schedule 1401 ns=580
(XEN) DEBUG schedule 1401 ns=640
(XEN) DEBUG vcpu_wake 450 ns=580
(XEN) DEBUG schedule 1401 ns=540
(XEN) latDEBUG schedule 1401 ns=830
(XEN) DEBUG schedule 1401 ns=650
(XEN) DEBUG vcpu_wake 450 ns=600
(XEN) DEBUG schedule 1401 ns=550
(XEN) DEBUG schedule 1401 ns=650
(XEN) DEBUG vcpu_wake 450 ns=570
(XEN) DEBUG schedule 1401 ns=550
(XEN) DEBUG schedule 1401 ns=650
(XEN) DEBUG vcpu_wake 450 ns=570
(XEN) DEBUG schedule 1401 ns=580
(XEN) DEBUG schedule 1401 ns=610
(XEN) DEBUG vcpu_wake 450 ns=600
(XEN) DEBUG schedule 1401 ns=530
(XEN) DEBUG schedule 1401 ns=600
(XEN) DEBUG vcpu_wake 450 ns=590
(XEN) DEBUG schedule 1401 ns=540
(XEN) DEBUG schedule 1401 ns=600
(XEN) DEBUG vcpu_wake 450 ns=570
(XEN) DEBUG schedule 1401 ns=570
(XEN) DEBUG schedule 1401 ns=610
(XEN) DEBUG vcpu_wake 450 ns=570
(XEN) DEBUG schedule 1401 ns=570
(XEN) DEBUG schedule 1401 ns=610
(XEN) DEBUG vcpu_wake 450 ns=590
(XEN) DEBUG schedule 1401 ns=570
(XEN) DEBUG schedule 1401 ns=610
(XEN) DEBUG vcpu_wake 450 ns=590
(XEN) DEBUG schedule 1401 ns=570
(XEN) latDEBUG schedule 1401 ns=860
(XEN) DEBUG schedule 1401 ns=630
(XEN) DEBUG vcpu_wake 450 ns=600
(XEN) DEBUG schedule 1401 ns=560
(XEN) lateDEBUG schedule 1401 ns=830
(XEN) DEBUG schedule 1401 ns=620
(XEN) DEBUG vcpu_wake 450 ns=600
(XEN) DEBUG schedule 1401 ns=550
(XEN) DEBUG schedule 1401 ns=610
(XEN) DEBUG vcpu_wake 450 ns=570
(XEN) DEBUG schedule 1401 ns=590
(XEN) DEBUG schedule 1401 ns=610
(XEN) DEBUG vcpu_wake 450 ns=540
(XEN) DEBUG schedule 1401 ns=580
(XEN) DEBUG schedule 1401 ns=570
(XEN) DEBUG vcpu_wake 450 ns=590
(XEN) DEBUG schedule 1401 ns=580
(XEN) DEBUG schedule 1401 ns=600
(XEN) DEBUG vcpu_wake 450 ns=590
(XEN) DEBUG schedule 1401 ns=580
(XEN) DEBUG schedule 1401 ns=590
(XEN) DEBUG vcpu_wake 450 ns=550
(XEN) DEBUG schedule 1401 ns=600
(XEN) DEBUG schedule 1401 ns=610
(XEN) DEBUG vcpu_wake 450 ns=630
(XEN) DEBUG schedule 1401 ns=570
(XEN) DEBUG schedule 1401 ns=620
(XEN) DEBUG vcpu_wake 450 ns=570
(XEN) DEBUG schedule 1401 ns=580
(XEN) DEBUG schedule 1401 ns=580
(XEN) DEBUG vcpu_wake 450 ns=560
(XEN) DEBUG schedule 1401 ns=1000
(XEN) DEBUG schedule 1401 ns=590
(XEN) DEBUG vcpu_wake 450 ns=560
(XEN) DEBUG schedule 1401 ns=570
(XEN) DEBUG schedule 1401 ns=640
(XEN) DEBUG vcpu_wake 450 ns=610
(XEN) DEBUG schedule 1401 ns=590
(XEN) DEBUG schedule 1401 ns=630
(XEN) DEBUG vcpu_wake 450 ns=620
(XEN) DEBUG schedule 1401 ns=580
(XEN) DEBUG schedule 1401 ns=650
(XEN) DEBUG vcpu_wake 450 ns=570
(XEN) DEBUG schedule 1401 ns=590
(XEN) DEBUG schedule 1401 ns=600
(XEN) DEBUG vcpu_wake 450 ns=620
(XEN) DEBUG schedule 1401 ns=590
(XEN) DEBUG schedule 1401 ns=640
(XEN) DEBUG vcpu_wake 450 ns=590
(XEN) DEBUG schedule 1401 ns=600
(XEN) latDEBUG schedule 1401 ns=860
(XEN) DEBUG schedule 1401 ns=630
(XEN) DEBUG vcpu_wake 450 ns=590
(XEN) DEBUG schedule 1401 ns=590
(XEN) latDEBUG schedule 1401 ns=830
(XEN) DEBUG schedule 1401 ns=600
(XEN) DEBUG vcpu_wake 450 ns=590
(XEN) DEBUG schedule 1401 ns=620
(XEN) DEBUG schedule 1401 ns=620
(XEN) DEBUG vcpu_wake 450 ns=610
(XEN) DEBUG schedule 1401 ns=600
(XEN) DEBUG schedule 1401 ns=600
(XEN) DEBUG vcpu_wake 450 ns=560
(XEN) DEBUG schedule 1401 ns=580
(XEN) DEBUG schedule 1401 ns=620
(XEN) DEBUG vcpu_wake 450 ns=600
(XEN) DEBUG schedule 1401 ns=550
(XEN) DEBUG schedule 1401 ns=610
(XEN) DEBUG vcpu_wake 450 ns=580
(XEN) DEBUG schedule 1401 ns=570
(XEN) DEBUG schedule 1401 ns=600

[-- Attachment #4: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Xen on ARM IRQ latency and scheduler overhead
  2017-02-18  0:02                 ` Stefano Stabellini
@ 2017-02-18  0:47                   ` Dario Faggioli
  0 siblings, 0 replies; 19+ messages in thread
From: Dario Faggioli @ 2017-02-18  0:47 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: edgar.iglesias, george.dunlap, Julien Grall, nd, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1623 bytes --]

On Fri, 2017-02-17 at 16:02 -0800, Stefano Stabellini wrote:
> On Fri, 17 Feb 2017, Stefano Stabellini wrote:
> > 
> > NODEBUG vwfi=idle credit2 fix cpumasks	4000	2370	
> > 4500	3350
> > NODEBUG vwfi=idle credit1 fix cpumasks	3220	2180	
> > 4500	4320
> 
> Actually those are still the old numbers, sorry! 
>
Yeah, I know, I've done the same sooo many times! :-P

> I didn't update the xen
> binary properly. These are the new numbers:
> 
>         	                    AVG	    MIN	    MAX
> 	    WARM MAX
> vwfi=sleep credit2 fix cpumasks	5910	5800	8520	
> 6180
> vwfi=sleep credit1 fix cpumasks	4900	4810	6910	
> 4980
> vwfi=idle  credit2 fix cpumasks	2800	1560	4550	
> 4200
> vwfi=idle  credit1 fix cpumasks	2800	1610	3420	
> 1770
> 
> The difference between credit2 and credit1 is smaller now. In fact
> it's
> zero, when vwfi=idle. 
>
Ok, cool.

> However, with vwfi=sleep, the larger MAX value is
> a bit worrying.
> 
I still think this may be due to the fact that you never stop Credit1's
tick on ARM, which means that, when on Credit1, the system is a lot
more "alive" (e.g., if we were on x86, I'd have said it'd spend less
time in deep C-states, and things like that).

And yes, I still owe you a patch for disabling it.
I'm working on that one. :-)

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Xen on ARM IRQ latency and scheduler overhead
  2017-02-17 19:44   ` Julien Grall
  2017-02-17 22:55     ` Stefano Stabellini
@ 2017-02-18  0:59     ` Dario Faggioli
  2017-02-20 12:18       ` Punit Agrawal
  1 sibling, 1 reply; 19+ messages in thread
From: Dario Faggioli @ 2017-02-18  0:59 UTC (permalink / raw)
  To: Julien Grall, Stefano Stabellini, xen-devel
  Cc: george.dunlap, edgar.iglesias, nd


[-- Attachment #1.1: Type: text/plain, Size: 2087 bytes --]

On Fri, 2017-02-17 at 19:44 +0000, Julien Grall wrote:
> On 02/17/2017 06:40 PM, Dario Faggioli wrote:
> > Does ARM
> > have frequency scaling (I did remember something on xen-devel, but
> > I am
> > not sure whether it landed upstream)?
> 
> I guess you mean the series sent by globallogic ([1])? If so, it was 
> never upstreamed.
> 
Yep, that one. I was quite sure it did not went in, but was in hurry,
and could not double check the code. Sorry.

> However the frequency scaling may depend on the processor used and 
> sometimes implemented using big.LITTLE (e.g the task has to switch).
> 
> I am not sure why it would matter in this case.
> 
It's hard to tell whether or not (and if yes, how much) it matter in
this specific case. In general, when I was working on squeezing latency
on bare-metal Linux, disabling scaling was one of the steps.

It depends on quite a few different factors, but, roughly, during the
time you're idle waiting for the interrupt, you're running at low
frequency.

When the interrupt comes, what happens depends on the governor, but you
usually don't immediately jump to the highest frequency, because that
happens gradually. And if all you do is something quick, in response to
the timer interrupt (like in this and similar cases), you most likely
end up doing that running at low frequency, which may impact latency
(at least the latency of the completion of the work that the timer
triggered).

But I was also referring to the fact that the app is using some
hardware counters, and a frequency value for updating them, and for
converting their value to time. So I was wondering how the fact that
frequency may be changing dynamically may interact with that (and
that's because I'm pretty ignorant about ARM internals, I know :-/).

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Xen on ARM IRQ latency and scheduler overhead
  2017-02-18  0:41   ` Stefano Stabellini
@ 2017-02-20 11:04     ` George Dunlap
  2017-02-20 11:40       ` Dario Faggioli
  0 siblings, 1 reply; 19+ messages in thread
From: George Dunlap @ 2017-02-20 11:04 UTC (permalink / raw)
  To: Stefano Stabellini, Dario Faggioli
  Cc: george.dunlap, edgar.iglesias, julien.grall, xen-devel

On 18/02/17 00:41, Stefano Stabellini wrote:
> On Fri, 17 Feb 2017, Dario Faggioli wrote:
>> On Thu, 2017-02-09 at 16:54 -0800, Stefano Stabellini wrote:
>>> These are the results, in nanosec:
>>>
>>>                         AVG     MIN     MAX     WARM MAX
>>>
>>> NODEBUG no WFI          1890    1800    3170    2070
>>> NODEBUG WFI             4850    4810    7030    4980
>>> NODEBUG no WFI credit2  2217    2090    3420    2650
>>> NODEBUG WFI credit2     8080    7890    10320   8300
>>>
>>> DEBUG no WFI            2252    2080    3320    2650
>>> DEBUG WFI               6500    6140    8520    8130
>>> DEBUG WFI, credit2      8050    7870    10680   8450
>>>
>>> As you can see, depending on whether the guest issues a WFI or not
>>> while
>>> waiting for interrupts, the results change significantly.
>>> Interestingly,
>>> credit2 does worse than credit1 in this area.
>>>
>> I did some measuring myself, on x86, with different tools. So,
>> cyclictest is basically something very very similar to the app
>> Stefano's app.
>>
>> I've run it both within Dom0, and inside a guest. I also run a Xen
>> build (in this case, only inside of the guest).
>>
>>> We are down to 2000-3000ns. Then, I started investigating the
>>> scheduler.
>>> I measured how long it takes to run "vcpu_unblock": 1050ns, which is
>>> significant. I don't know what is causing the remaining 1000-2000ns,
>>> but
>>> I bet on another scheduler function. Do you have any suggestions on
>>> which one?
>>>
>> So, vcpu_unblock() calls vcpu_wake(), which then invokes the
>> scheduler's wakeup related functions.
>>
>> If you time vcpu_unblock(), from beginning to end of the function, you
>> actually capture quite a few things. E.g., the scheduler lock is taken
>> inside vcpu_wake(), so you're basically including time spent waited on
>> the lock in the estimation.
>>
>> That is probably ok (as in, lock contention definitely is something
>> relevant to latency), but it is expected for things to be rather
>> different between Credit1 and Credit2.
>>
>> I've, OTOH, tried to time, SCHED_OP(wake) and SCHED_OP(do_schedule),
>> and here's the result. Numbers are in cycles (I've used RDTSC) and, for
>> making sure to obtain consistent and comparable numbers, I've set the
>> frequency scaling governor to performance.
>>
>> Dom0, [performance]							
>> 	        cyclictest 1us	cyclictest 1ms	cyclictest 100ms			
>> (cycles)	Credit1	Credit2	Credit1	Credit2	Credit1	Credit2		
>> wakeup-avg	2429	2035	1980	1633	2535	1979		
>> wakeup-max	14577	113682	15153	203136	12285	115164		
> 
> I am not that familiar with the x86 side of things, but the 113682 and
> 203136 look worrisome, especially considering that credit1 doesn't have
> them.

Dario,

Do you reckon those 'MAX' values could be the load balancer running
(both for credit1 and credit2)?

 -George


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Xen on ARM IRQ latency and scheduler overhead
  2017-02-20 11:04     ` George Dunlap
@ 2017-02-20 11:40       ` Dario Faggioli
  0 siblings, 0 replies; 19+ messages in thread
From: Dario Faggioli @ 2017-02-20 11:40 UTC (permalink / raw)
  To: George Dunlap, Stefano Stabellini
  Cc: george.dunlap, edgar.iglesias, julien.grall, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1400 bytes --]

On Mon, 2017-02-20 at 11:04 +0000, George Dunlap wrote:
> On 18/02/17 00:41, Stefano Stabellini wrote:
> > 
> > On Fri, 17 Feb 2017, Dario Faggioli wrote:	
> > > 	        cyclictest 1us	cyclictest 1ms	cyclictest
> > > 100ms			
> > > (cycles)	Credit1	Credit2	Credit1	Cred
> > > it2	Credit1	Credit2		
> > > wakeup-avg	2429	2035	1980	1633	
> > > 2535	1979		
> > > wakeup-max	14577	113682	15153	203136	
> > > 12285	115164		
> > 
> > I am not that familiar with the x86 side of things, but the 113682
> > and
> > 203136 look worrisome, especially considering that credit1 doesn't
> > have
> > them.
> 
> Dario,
> 
> Do you reckon those 'MAX' values could be the load balancer running
> (both for credit1 and credit2)?
> 
If they were max-es of do_schedule, that would have been my first
guess. But not in this case, as the load balancer is never called
during wakeup (neither in Credit nor Credit2).

But I'm curious about what happen, and am going to investigate that. It
should not be too hard has: 1) looks reproducible on my (x86) testbox
that I have here; 2) since it's x86 I have tracing! :-P

Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Xen on ARM IRQ latency and scheduler overhead
  2017-02-18  0:59     ` Dario Faggioli
@ 2017-02-20 12:18       ` Punit Agrawal
  0 siblings, 0 replies; 19+ messages in thread
From: Punit Agrawal @ 2017-02-20 12:18 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: edgar.iglesias, Stefano Stabellini, george.dunlap, xen-devel,
	Julien Grall, nd

Dario Faggioli <dario.faggioli@citrix.com> writes:

> On Fri, 2017-02-17 at 19:44 +0000, Julien Grall wrote:
>> On 02/17/2017 06:40 PM, Dario Faggioli wrote:
>> > Does ARM
>> > have frequency scaling (I did remember something on xen-devel, but
>> > I am
>> > not sure whether it landed upstream)?
>> 
>> I guess you mean the series sent by globallogic ([1])? If so, it was 
>> never upstreamed.
>> 
> Yep, that one. I was quite sure it did not went in, but was in hurry,
> and could not double check the code. Sorry.
>
>> However the frequency scaling may depend on the processor used and 
>> sometimes implemented using big.LITTLE (e.g the task has to switch).
>> 
>> I am not sure why it would matter in this case.
>> 
> It's hard to tell whether or not (and if yes, how much) it matter in
> this specific case. In general, when I was working on squeezing latency
> on bare-metal Linux, disabling scaling was one of the steps.
>
> It depends on quite a few different factors, but, roughly, during the
> time you're idle waiting for the interrupt, you're running at low
> frequency.
>
> When the interrupt comes, what happens depends on the governor, but you
> usually don't immediately jump to the highest frequency, because that
> happens gradually. And if all you do is something quick, in response to
> the timer interrupt (like in this and similar cases), you most likely
> end up doing that running at low frequency, which may impact latency
> (at least the latency of the completion of the work that the timer
> triggered).

Jumping in a little late but thought I'd add my 2c on cpu frequency
control on arm.

Typically, changing frequency requires active participation by a
software agent, e.g., linux. It's not done automagically behind the
scene in hardware. If there are no agents changing frequency, then the
cores run at the frequency they were booted at by the firmware.

When waiting for an event (WFI/WFE) it is unlikely that frequency is
lowered and ramped up on receiving such events. Such behaviour would
need to be implemented as part of the agent controlling the
frequency. Indeed, there have been discussions on implementing such
functionality in linux on certain arm platforms.

>
> But I was also referring to the fact that the app is using some
> hardware counters, and a frequency value for updating them, and for
> converting their value to time. So I was wondering how the fact that
> frequency may be changing dynamically may interact with that (and
> that's because I'm pretty ignorant about ARM internals, I know :-/).

In the absence of a controlling agent, hardware counters should not be
running at variable frequency.

Hope this helps.

Thanks,
Punit

>
> Regards,
> Dario
> -- 
> <<This happens because I choose it to happen!>> (Raistlin Majere)
> -----------------------------------------------------------------
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2017-02-20 12:18 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-10  0:54 Xen on ARM IRQ latency and scheduler overhead Stefano Stabellini
2017-02-10  8:40 ` Dario Faggioli
2017-02-10 18:32   ` Stefano Stabellini
2017-02-16 12:20     ` Dario Faggioli
2017-02-16 19:52       ` Stefano Stabellini
2017-02-16 23:07         ` Stefano Stabellini
2017-02-17 11:02           ` Dario Faggioli
2017-02-17 19:34             ` Julien Grall
2017-02-17 23:14               ` Stefano Stabellini
2017-02-18  0:02                 ` Stefano Stabellini
2017-02-18  0:47                   ` Dario Faggioli
2017-02-17 18:40 ` Dario Faggioli
2017-02-17 19:44   ` Julien Grall
2017-02-17 22:55     ` Stefano Stabellini
2017-02-18  0:59     ` Dario Faggioli
2017-02-20 12:18       ` Punit Agrawal
2017-02-18  0:41   ` Stefano Stabellini
2017-02-20 11:04     ` George Dunlap
2017-02-20 11:40       ` Dario Faggioli

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.