All of lore.kernel.org
 help / color / mirror / Atom feed
* Xen 4.5 random freeze question
@ 2014-11-14 14:25 Andrii Tseglytskyi
  2014-11-14 14:35 ` Stefano Stabellini
  0 siblings, 1 reply; 66+ messages in thread
From: Andrii Tseglytskyi @ 2014-11-14 14:25 UTC (permalink / raw)
  To: xen-devel, Stefano Stabellini, Ian Campbell, Julien Grall

Hi,

I observe system freeze on latest xen/master branch.

My setup is:

- Jacinto 6 evm board (OMAP5)
- Latest Xen 4.5.0-rc2 as hypervisor
- Linux 3.8 as dom0, running on 2 vcpus
- Android 4.3 as domU (running on Linux kernel 3.8, 2 vcpus)
- XSM feature is disabled
- gcc version 4.7.3 20130328 (prerelease) (crosstool-NG
linaro-1.13.1-4.7-2013.04-20130415 - Linaro GCC 2013.04) as cross
compiler

Freeze occurs in random moment of time during creation of domU domain.
Even Xen console may be not available after freeze.
Can someone suggest - what it can be? Maybe some weak places in new
code? Maybe new gic, which was reworked a lot or something else?

Thank you in advance for any suggestions.

Regards

-- 

Andrii Tseglytskyi | Embedded Dev
GlobalLogic
www.globallogic.com

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Xen 4.5 random freeze question
  2014-11-14 14:25 Xen 4.5 random freeze question Andrii Tseglytskyi
@ 2014-11-14 14:35 ` Stefano Stabellini
  2014-11-14 14:43   ` Andrii Tseglytskyi
  0 siblings, 1 reply; 66+ messages in thread
From: Stefano Stabellini @ 2014-11-14 14:35 UTC (permalink / raw)
  To: Andrii Tseglytskyi
  Cc: Stefano Stabellini, Ian Campbell, Julien Grall, xen-devel

On Fri, 14 Nov 2014, Andrii Tseglytskyi wrote:
> Hi,
> 
> I observe system freeze on latest xen/master branch.
> 
> My setup is:
> 
> - Jacinto 6 evm board (OMAP5)
> - Latest Xen 4.5.0-rc2 as hypervisor
> - Linux 3.8 as dom0, running on 2 vcpus
> - Android 4.3 as domU (running on Linux kernel 3.8, 2 vcpus)
> - XSM feature is disabled
> - gcc version 4.7.3 20130328 (prerelease) (crosstool-NG
> linaro-1.13.1-4.7-2013.04-20130415 - Linaro GCC 2013.04) as cross
> compiler
> 
> Freeze occurs in random moment of time during creation of domU domain.
> Even Xen console may be not available after freeze.
> Can someone suggest - what it can be? Maybe some weak places in new
> code? Maybe new gic, which was reworked a lot or something else?
> 
> Thank you in advance for any suggestions.

Is this really 3.8 or 3.18? 3.8 is pretty old and doesn't have any of
the fixes to be able to safely do dma involving guest pages to
non-coherent devices. Where are you storing the guest disk images?

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Xen 4.5 random freeze question
  2014-11-14 14:35 ` Stefano Stabellini
@ 2014-11-14 14:43   ` Andrii Tseglytskyi
  2014-11-14 15:22     ` Stefano Stabellini
  0 siblings, 1 reply; 66+ messages in thread
From: Andrii Tseglytskyi @ 2014-11-14 14:43 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: Julien Grall, Ian Campbell, xen-devel

On Fri, Nov 14, 2014 at 4:35 PM, Stefano Stabellini
<stefano.stabellini@eu.citrix.com> wrote:
> On Fri, 14 Nov 2014, Andrii Tseglytskyi wrote:
>> Hi,
>>
>> I observe system freeze on latest xen/master branch.
>>
>> My setup is:
>>
>> - Jacinto 6 evm board (OMAP5)
>> - Latest Xen 4.5.0-rc2 as hypervisor
>> - Linux 3.8 as dom0, running on 2 vcpus
>> - Android 4.3 as domU (running on Linux kernel 3.8, 2 vcpus)
>> - XSM feature is disabled
>> - gcc version 4.7.3 20130328 (prerelease) (crosstool-NG
>> linaro-1.13.1-4.7-2013.04-20130415 - Linaro GCC 2013.04) as cross
>> compiler
>>
>> Freeze occurs in random moment of time during creation of domU domain.
>> Even Xen console may be not available after freeze.
>> Can someone suggest - what it can be? Maybe some weak places in new
>> code? Maybe new gic, which was reworked a lot or something else?
>>
>> Thank you in advance for any suggestions.
>
> Is this really 3.8 or 3.18?

We have 3.8 in both dom0 and domU

> 3.8 is pretty old and doesn't have any of
> the fixes to be able to safely do dma involving guest pages to
> non-coherent devices.

This is a good point. Now we are migrating to 3.12 kernel in dom0. But
Android will remain on 3.8. Will it help ?
Maybe you can point me to any tree with proper DMA fixes? Note: if you
are talking about SWIOTLB - we have your latest one, retrieved from
git://git.kernel.org/pub/scm/linux/kernel/git/sstabellini/xen.git
branch:swiotlb-xen-9.1

> Where are you storing the guest disk images?

SATA drive, dedicated to dom0, its controller has its own DMA

Regards,
Andri



-- 

Andrii Tseglytskyi | Embedded Dev
GlobalLogic
www.globallogic.com

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Xen 4.5 random freeze question
  2014-11-14 14:43   ` Andrii Tseglytskyi
@ 2014-11-14 15:22     ` Stefano Stabellini
  2014-11-14 15:39       ` Andrii Tseglytskyi
  0 siblings, 1 reply; 66+ messages in thread
From: Stefano Stabellini @ 2014-11-14 15:22 UTC (permalink / raw)
  To: Andrii Tseglytskyi
  Cc: xen-devel, Ian Campbell, Julien Grall, Stefano Stabellini

On Fri, 14 Nov 2014, Andrii Tseglytskyi wrote:
> On Fri, Nov 14, 2014 at 4:35 PM, Stefano Stabellini
> <stefano.stabellini@eu.citrix.com> wrote:
> > On Fri, 14 Nov 2014, Andrii Tseglytskyi wrote:
> >> Hi,
> >>
> >> I observe system freeze on latest xen/master branch.
> >>
> >> My setup is:
> >>
> >> - Jacinto 6 evm board (OMAP5)
> >> - Latest Xen 4.5.0-rc2 as hypervisor
> >> - Linux 3.8 as dom0, running on 2 vcpus
> >> - Android 4.3 as domU (running on Linux kernel 3.8, 2 vcpus)
> >> - XSM feature is disabled
> >> - gcc version 4.7.3 20130328 (prerelease) (crosstool-NG
> >> linaro-1.13.1-4.7-2013.04-20130415 - Linaro GCC 2013.04) as cross
> >> compiler
> >>
> >> Freeze occurs in random moment of time during creation of domU domain.
> >> Even Xen console may be not available after freeze.
> >> Can someone suggest - what it can be? Maybe some weak places in new
> >> code? Maybe new gic, which was reworked a lot or something else?
> >>
> >> Thank you in advance for any suggestions.
> >
> > Is this really 3.8 or 3.18?
> 
> We have 3.8 in both dom0 and domU
> 
> > 3.8 is pretty old and doesn't have any of
> > the fixes to be able to safely do dma involving guest pages to
> > non-coherent devices.
> 
> This is a good point. Now we are migrating to 3.12 kernel in dom0. But
> Android will remain on 3.8. Will it help ?
> Maybe you can point me to any tree with proper DMA fixes? Note: if you
> are talking about SWIOTLB - we have your latest one, retrieved from
> git://git.kernel.org/pub/scm/linux/kernel/git/sstabellini/xen.git
> branch:swiotlb-xen-9.1

The last and most stable series is:

http://marc.info/?l=linux-kernel&m=141579241729749&w=2

But thinking more about this, I doubt that it is a dma problem, because
you would most probably see various kind of error messages, not a
freeze.


> > Where are you storing the guest disk images?
> 
> SATA drive, dedicated to dom0, its controller has its own DMA

Are they on file or on lvm volumes?

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Xen 4.5 random freeze question
  2014-11-14 15:22     ` Stefano Stabellini
@ 2014-11-14 15:39       ` Andrii Tseglytskyi
  2014-11-14 15:49         ` Julien Grall
  2014-11-14 16:15         ` Stefano Stabellini
  0 siblings, 2 replies; 66+ messages in thread
From: Andrii Tseglytskyi @ 2014-11-14 15:39 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: Julien Grall, Ian Campbell, xen-devel

On Fri, Nov 14, 2014 at 5:22 PM, Stefano Stabellini
<stefano.stabellini@eu.citrix.com> wrote:
> On Fri, 14 Nov 2014, Andrii Tseglytskyi wrote:
>> On Fri, Nov 14, 2014 at 4:35 PM, Stefano Stabellini
>> <stefano.stabellini@eu.citrix.com> wrote:
>> > On Fri, 14 Nov 2014, Andrii Tseglytskyi wrote:
>> >> Hi,
>> >>
>> >> I observe system freeze on latest xen/master branch.
>> >>
>> >> My setup is:
>> >>
>> >> - Jacinto 6 evm board (OMAP5)
>> >> - Latest Xen 4.5.0-rc2 as hypervisor
>> >> - Linux 3.8 as dom0, running on 2 vcpus
>> >> - Android 4.3 as domU (running on Linux kernel 3.8, 2 vcpus)
>> >> - XSM feature is disabled
>> >> - gcc version 4.7.3 20130328 (prerelease) (crosstool-NG
>> >> linaro-1.13.1-4.7-2013.04-20130415 - Linaro GCC 2013.04) as cross
>> >> compiler
>> >>
>> >> Freeze occurs in random moment of time during creation of domU domain.
>> >> Even Xen console may be not available after freeze.
>> >> Can someone suggest - what it can be? Maybe some weak places in new
>> >> code? Maybe new gic, which was reworked a lot or something else?
>> >>
>> >> Thank you in advance for any suggestions.
>> >
>> > Is this really 3.8 or 3.18?
>>
>> We have 3.8 in both dom0 and domU
>>
>> > 3.8 is pretty old and doesn't have any of
>> > the fixes to be able to safely do dma involving guest pages to
>> > non-coherent devices.
>>
>> This is a good point. Now we are migrating to 3.12 kernel in dom0. But
>> Android will remain on 3.8. Will it help ?
>> Maybe you can point me to any tree with proper DMA fixes? Note: if you
>> are talking about SWIOTLB - we have your latest one, retrieved from
>> git://git.kernel.org/pub/scm/linux/kernel/git/sstabellini/xen.git
>> branch:swiotlb-xen-9.1
>
> The last and most stable series is:
>
> http://marc.info/?l=linux-kernel&m=141579241729749&w=2
>

Thanks  - I'll try this series anyway.

> But thinking more about this, I doubt that it is a dma problem, because
> you would most probably see various kind of error messages, not a
> freeze.
>

Agree.

>
>> > Where are you storing the guest disk images?
>>
>> SATA drive, dedicated to dom0, its controller has its own DMA
>
> Are they on file or on lvm volumes?

Images are on file.

Also note - freeze depends on system load. It reproduces more
frequently if I start Android + QNX + all frontends/backends drivers.
Starting Android only without any addition drivers works more less
stable. It looks like issue is reproduced when domU starts in parallel
with backends drivers in dom0.
But the same works fine with old Xen 4.4.

Regards,
Andrii


-- 

Andrii Tseglytskyi | Embedded Dev
GlobalLogic
www.globallogic.com

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Xen 4.5 random freeze question
  2014-11-14 15:39       ` Andrii Tseglytskyi
@ 2014-11-14 15:49         ` Julien Grall
  2014-11-14 15:58           ` Andrii Tseglytskyi
  2014-11-14 16:15         ` Stefano Stabellini
  1 sibling, 1 reply; 66+ messages in thread
From: Julien Grall @ 2014-11-14 15:49 UTC (permalink / raw)
  To: Andrii Tseglytskyi, Stefano Stabellini; +Cc: Ian Campbell, xen-devel

Hi Andrii,

On 11/14/2014 03:39 PM, Andrii Tseglytskyi wrote:
> Also note - freeze depends on system load. It reproduces more
> frequently if I start Android + QNX + all frontends/backends drivers.
> Starting Android only without any addition drivers works more less
> stable. It looks like issue is reproduced when domU starts in parallel
> with backends drivers in dom0.
> But the same works fine with old Xen 4.4.

To be sure, when you say "xen/master" is it a vanilla Xen? Or do you
have patches on top of it?

Also, what are the frontends/backends?

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Xen 4.5 random freeze question
  2014-11-14 15:49         ` Julien Grall
@ 2014-11-14 15:58           ` Andrii Tseglytskyi
  0 siblings, 0 replies; 66+ messages in thread
From: Andrii Tseglytskyi @ 2014-11-14 15:58 UTC (permalink / raw)
  To: Julien Grall; +Cc: xen-devel, Ian Campbell, Stefano Stabellini

Hi Julien,

On Fri, Nov 14, 2014 at 5:49 PM, Julien Grall <julien.grall@linaro.org> wrote:
> Hi Andrii,
>
> On 11/14/2014 03:39 PM, Andrii Tseglytskyi wrote:
>> Also note - freeze depends on system load. It reproduces more
>> frequently if I start Android + QNX + all frontends/backends drivers.
>> Starting Android only without any addition drivers works more less
>> stable. It looks like issue is reproduced when domU starts in parallel
>> with backends drivers in dom0.
>> But the same works fine with old Xen 4.4.
>
> To be sure, when you say "xen/master" is it a vanilla Xen? Or do you
> have patches on top of it?
>

This is my work tree, I have some local patches, specific to our system:

579f19c (HEAD, dev_xen_4.5_rc2_04) xsm: arm: allow dom0 to use send
call on during event_channel creation <-- XSM is currently disabled,
this one has no effect
0f1bd43 kbdif: add raw events passing
b7289a0 pvfb: add release event
bd979de xen/tools: Fix virtual disks helper scripts.
81d2f11 libxl: skip memory finalize if appended DTB found <-- we have
device tree attached to domU zImage, we skip initializetion of domU
DTB
6c7f2ae libxc: skip constructing DTB during zImage loading <-- we have
device tree attached to domU zImage, we skip initializetion of domU
DTB
2b4ba6c libxl: add ability to skip constructing DTB <-- we have device
tree attached to domU zImage, we skip initializetion of domU DTB
bd226aa Revert "tools: arm: remove code to check for a DTB appended to
the kernel" <-- we have device tree attached to domU zImage, we skip
initializetion of domU DTB
e445c33 arm: decrease size of RAM memory for arm guest  <-- We have
memory mapped registers starting from 0x40000000, so I moved rambase
to 0x80000000
3a00dd2 flask/policy: allow domU to use previously-mapped I/O-memory
<-- XSM is currently disabled, this one has no effect
0fd131ac2 fix commit xen/arm: Add support for GICv3 for domU
cacfcc5 (tag: 4.5.0-rc2) Xen 4.5.0-rc2: Update tag for QEMU upstream tree....
e6fa63d (xen_baseline/master) pvgrub: ignore NUL
fda1614 xen/arm: Add support for GICv3 for domU


> Also, what are the frontends/backends?

We have some userspace backend drivers - audio, framebuffer, event device, etc

I may send a tarball with all my local patches if needed.

Regards,
Andrii

>
> Regards,
>
> --
> Julien Grall



-- 

Andrii Tseglytskyi | Embedded Dev
GlobalLogic
www.globallogic.com

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Xen 4.5 random freeze question
  2014-11-14 15:39       ` Andrii Tseglytskyi
  2014-11-14 15:49         ` Julien Grall
@ 2014-11-14 16:15         ` Stefano Stabellini
  2014-11-14 16:22           ` Andrii Tseglytskyi
  1 sibling, 1 reply; 66+ messages in thread
From: Stefano Stabellini @ 2014-11-14 16:15 UTC (permalink / raw)
  To: Andrii Tseglytskyi
  Cc: xen-devel, Ian Campbell, Julien Grall, Stefano Stabellini

On Fri, 14 Nov 2014, Andrii Tseglytskyi wrote:
> On Fri, Nov 14, 2014 at 5:22 PM, Stefano Stabellini
> <stefano.stabellini@eu.citrix.com> wrote:
> > On Fri, 14 Nov 2014, Andrii Tseglytskyi wrote:
> >> On Fri, Nov 14, 2014 at 4:35 PM, Stefano Stabellini
> >> <stefano.stabellini@eu.citrix.com> wrote:
> >> > On Fri, 14 Nov 2014, Andrii Tseglytskyi wrote:
> >> >> Hi,
> >> >>
> >> >> I observe system freeze on latest xen/master branch.
> >> >>
> >> >> My setup is:
> >> >>
> >> >> - Jacinto 6 evm board (OMAP5)
> >> >> - Latest Xen 4.5.0-rc2 as hypervisor
> >> >> - Linux 3.8 as dom0, running on 2 vcpus
> >> >> - Android 4.3 as domU (running on Linux kernel 3.8, 2 vcpus)
> >> >> - XSM feature is disabled
> >> >> - gcc version 4.7.3 20130328 (prerelease) (crosstool-NG
> >> >> linaro-1.13.1-4.7-2013.04-20130415 - Linaro GCC 2013.04) as cross
> >> >> compiler
> >> >>
> >> >> Freeze occurs in random moment of time during creation of domU domain.
> >> >> Even Xen console may be not available after freeze.
> >> >> Can someone suggest - what it can be? Maybe some weak places in new
> >> >> code? Maybe new gic, which was reworked a lot or something else?
> >> >>
> >> >> Thank you in advance for any suggestions.
> >> >
> >> > Is this really 3.8 or 3.18?
> >>
> >> We have 3.8 in both dom0 and domU
> >>
> >> > 3.8 is pretty old and doesn't have any of
> >> > the fixes to be able to safely do dma involving guest pages to
> >> > non-coherent devices.
> >>
> >> This is a good point. Now we are migrating to 3.12 kernel in dom0. But
> >> Android will remain on 3.8. Will it help ?
> >> Maybe you can point me to any tree with proper DMA fixes? Note: if you
> >> are talking about SWIOTLB - we have your latest one, retrieved from
> >> git://git.kernel.org/pub/scm/linux/kernel/git/sstabellini/xen.git
> >> branch:swiotlb-xen-9.1
> >
> > The last and most stable series is:
> >
> > http://marc.info/?l=linux-kernel&m=141579241729749&w=2
> >
> 
> Thanks  - I'll try this series anyway.
> 
> > But thinking more about this, I doubt that it is a dma problem, because
> > you would most probably see various kind of error messages, not a
> > freeze.
> >
> 
> Agree.
> 
> >
> >> > Where are you storing the guest disk images?
> >>
> >> SATA drive, dedicated to dom0, its controller has its own DMA
> >
> > Are they on file or on lvm volumes?
> 
> Images are on file.
> 
> Also note - freeze depends on system load. It reproduces more
> frequently if I start Android + QNX + all frontends/backends drivers.
> Starting Android only without any addition drivers works more less
> stable. It looks like issue is reproduced when domU starts in parallel
> with backends drivers in dom0.
> But the same works fine with old Xen 4.4.

In my experience freezes like the one you are describing are due to
interrupt related bugs or deadlocks. Both of them are hard to track
down. If you can reproduce it reliably maybe you could bisect it.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Xen 4.5 random freeze question
  2014-11-14 16:15         ` Stefano Stabellini
@ 2014-11-14 16:22           ` Andrii Tseglytskyi
  2014-11-14 16:35             ` Julien Grall
  0 siblings, 1 reply; 66+ messages in thread
From: Andrii Tseglytskyi @ 2014-11-14 16:22 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: Julien Grall, Ian Campbell, xen-devel

On Fri, Nov 14, 2014 at 6:15 PM, Stefano Stabellini
<stefano.stabellini@eu.citrix.com> wrote:
> On Fri, 14 Nov 2014, Andrii Tseglytskyi wrote:
>> On Fri, Nov 14, 2014 at 5:22 PM, Stefano Stabellini
>> <stefano.stabellini@eu.citrix.com> wrote:
>> > On Fri, 14 Nov 2014, Andrii Tseglytskyi wrote:
>> >> On Fri, Nov 14, 2014 at 4:35 PM, Stefano Stabellini
>> >> <stefano.stabellini@eu.citrix.com> wrote:
>> >> > On Fri, 14 Nov 2014, Andrii Tseglytskyi wrote:
>> >> >> Hi,
>> >> >>
>> >> >> I observe system freeze on latest xen/master branch.
>> >> >>
>> >> >> My setup is:
>> >> >>
>> >> >> - Jacinto 6 evm board (OMAP5)
>> >> >> - Latest Xen 4.5.0-rc2 as hypervisor
>> >> >> - Linux 3.8 as dom0, running on 2 vcpus
>> >> >> - Android 4.3 as domU (running on Linux kernel 3.8, 2 vcpus)
>> >> >> - XSM feature is disabled
>> >> >> - gcc version 4.7.3 20130328 (prerelease) (crosstool-NG
>> >> >> linaro-1.13.1-4.7-2013.04-20130415 - Linaro GCC 2013.04) as cross
>> >> >> compiler
>> >> >>
>> >> >> Freeze occurs in random moment of time during creation of domU domain.
>> >> >> Even Xen console may be not available after freeze.
>> >> >> Can someone suggest - what it can be? Maybe some weak places in new
>> >> >> code? Maybe new gic, which was reworked a lot or something else?
>> >> >>
>> >> >> Thank you in advance for any suggestions.
>> >> >
>> >> > Is this really 3.8 or 3.18?
>> >>
>> >> We have 3.8 in both dom0 and domU
>> >>
>> >> > 3.8 is pretty old and doesn't have any of
>> >> > the fixes to be able to safely do dma involving guest pages to
>> >> > non-coherent devices.
>> >>
>> >> This is a good point. Now we are migrating to 3.12 kernel in dom0. But
>> >> Android will remain on 3.8. Will it help ?
>> >> Maybe you can point me to any tree with proper DMA fixes? Note: if you
>> >> are talking about SWIOTLB - we have your latest one, retrieved from
>> >> git://git.kernel.org/pub/scm/linux/kernel/git/sstabellini/xen.git
>> >> branch:swiotlb-xen-9.1
>> >
>> > The last and most stable series is:
>> >
>> > http://marc.info/?l=linux-kernel&m=141579241729749&w=2
>> >
>>
>> Thanks  - I'll try this series anyway.
>>
>> > But thinking more about this, I doubt that it is a dma problem, because
>> > you would most probably see various kind of error messages, not a
>> > freeze.
>> >
>>
>> Agree.
>>
>> >
>> >> > Where are you storing the guest disk images?
>> >>
>> >> SATA drive, dedicated to dom0, its controller has its own DMA
>> >
>> > Are they on file or on lvm volumes?
>>
>> Images are on file.
>>
>> Also note - freeze depends on system load. It reproduces more
>> frequently if I start Android + QNX + all frontends/backends drivers.
>> Starting Android only without any addition drivers works more less
>> stable. It looks like issue is reproduced when domU starts in parallel
>> with backends drivers in dom0.
>> But the same works fine with old Xen 4.4.
>
> In my experience freezes like the one you are describing are due to
> interrupt related bugs or deadlocks. Both of them are hard to track
> down. If you can reproduce it reliably maybe you could bisect it.

Agree. I suspect that new gic series impacts on this. In very few
moments when xen console is available after freeze I see that dom0
code stacks around kernel lock_release() or  handle_IPI() functions


-- 

Andrii Tseglytskyi | Embedded Dev
GlobalLogic
www.globallogic.com

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Xen 4.5 random freeze question
  2014-11-14 16:22           ` Andrii Tseglytskyi
@ 2014-11-14 16:35             ` Julien Grall
  2014-11-14 16:40               ` Andrii Tseglytskyi
  0 siblings, 1 reply; 66+ messages in thread
From: Julien Grall @ 2014-11-14 16:35 UTC (permalink / raw)
  To: Andrii Tseglytskyi, Stefano Stabellini; +Cc: Ian Campbell, xen-devel

On 11/14/2014 04:22 PM, Andrii Tseglytskyi wrote:
> On Fri, Nov 14, 2014 at 6:15 PM, Stefano Stabellini
> <stefano.stabellini@eu.citrix.com> wrote:
>> On Fri, 14 Nov 2014, Andrii Tseglytskyi wrote:
>>> On Fri, Nov 14, 2014 at 5:22 PM, Stefano Stabellini
>>> <stefano.stabellini@eu.citrix.com> wrote:
>>>> On Fri, 14 Nov 2014, Andrii Tseglytskyi wrote:
>>>>> On Fri, Nov 14, 2014 at 4:35 PM, Stefano Stabellini
>>>>> <stefano.stabellini@eu.citrix.com> wrote:
>>>>>> On Fri, 14 Nov 2014, Andrii Tseglytskyi wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> I observe system freeze on latest xen/master branch.
>>>>>>>
>>>>>>> My setup is:
>>>>>>>
>>>>>>> - Jacinto 6 evm board (OMAP5)
>>>>>>> - Latest Xen 4.5.0-rc2 as hypervisor
>>>>>>> - Linux 3.8 as dom0, running on 2 vcpus
>>>>>>> - Android 4.3 as domU (running on Linux kernel 3.8, 2 vcpus)
>>>>>>> - XSM feature is disabled
>>>>>>> - gcc version 4.7.3 20130328 (prerelease) (crosstool-NG
>>>>>>> linaro-1.13.1-4.7-2013.04-20130415 - Linaro GCC 2013.04) as cross
>>>>>>> compiler
>>>>>>>
>>>>>>> Freeze occurs in random moment of time during creation of domU domain.
>>>>>>> Even Xen console may be not available after freeze.
>>>>>>> Can someone suggest - what it can be? Maybe some weak places in new
>>>>>>> code? Maybe new gic, which was reworked a lot or something else?
>>>>>>>
>>>>>>> Thank you in advance for any suggestions.
>>>>>>
>>>>>> Is this really 3.8 or 3.18?
>>>>>
>>>>> We have 3.8 in both dom0 and domU
>>>>>
>>>>>> 3.8 is pretty old and doesn't have any of
>>>>>> the fixes to be able to safely do dma involving guest pages to
>>>>>> non-coherent devices.
>>>>>
>>>>> This is a good point. Now we are migrating to 3.12 kernel in dom0. But
>>>>> Android will remain on 3.8. Will it help ?
>>>>> Maybe you can point me to any tree with proper DMA fixes? Note: if you
>>>>> are talking about SWIOTLB - we have your latest one, retrieved from
>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/sstabellini/xen.git
>>>>> branch:swiotlb-xen-9.1
>>>>
>>>> The last and most stable series is:
>>>>
>>>> http://marc.info/?l=linux-kernel&m=141579241729749&w=2
>>>>
>>>
>>> Thanks  - I'll try this series anyway.
>>>
>>>> But thinking more about this, I doubt that it is a dma problem, because
>>>> you would most probably see various kind of error messages, not a
>>>> freeze.
>>>>
>>>
>>> Agree.
>>>
>>>>
>>>>>> Where are you storing the guest disk images?
>>>>>
>>>>> SATA drive, dedicated to dom0, its controller has its own DMA
>>>>
>>>> Are they on file or on lvm volumes?
>>>
>>> Images are on file.
>>>
>>> Also note - freeze depends on system load. It reproduces more
>>> frequently if I start Android + QNX + all frontends/backends drivers.
>>> Starting Android only without any addition drivers works more less
>>> stable. It looks like issue is reproduced when domU starts in parallel
>>> with backends drivers in dom0.
>>> But the same works fine with old Xen 4.4.
>>
>> In my experience freezes like the one you are describing are due to
>> interrupt related bugs or deadlocks. Both of them are hard to track
>> down. If you can reproduce it reliably maybe you could bisect it.
> 
> Agree. I suspect that new gic series impacts on this. In very few
> moments when xen console is available after freeze I see that dom0
> code stacks around kernel lock_release() or  handle_IPI() functions

I would be surprised that the next GIC series impact this code as the
next driver is only compiled for arm64 (GICv3 doesn't exist on arm32).
Though, there was some refactoring.

The interrupt management has also been reworked for Xen 4.5 to avoid
maintenance interrupt. I would give a look on this part.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Xen 4.5 random freeze question
  2014-11-14 16:35             ` Julien Grall
@ 2014-11-14 16:40               ` Andrii Tseglytskyi
  2014-11-17 15:47                 ` Andrii Tseglytskyi
  0 siblings, 1 reply; 66+ messages in thread
From: Andrii Tseglytskyi @ 2014-11-14 16:40 UTC (permalink / raw)
  To: Julien Grall; +Cc: xen-devel, Ian Campbell, Stefano Stabellini

Hi Julien,

> I would be surprised that the next GIC series impact this code as the
> next driver is only compiled for arm64 (GICv3 doesn't exist on arm32).
> Though, there was some refactoring.

I meant that code was divided for generic GIC and GICv2 together with
refactoring. Also in mails I saw that it was initially tested without
SMP.
GICv3 has no impacts for sure.

>
> The interrupt management has also been reworked for Xen 4.5 to avoid
> maintenance interrupt. I would give a look on this part.

Thanks, this may help.

Regards,
Andrii


>
> Regards,
>
> --
> Julien Grall



-- 

Andrii Tseglytskyi | Embedded Dev
GlobalLogic
www.globallogic.com

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Xen 4.5 random freeze question
  2014-11-14 16:40               ` Andrii Tseglytskyi
@ 2014-11-17 15:47                 ` Andrii Tseglytskyi
  2014-11-17 16:39                   ` Stefano Stabellini
  0 siblings, 1 reply; 66+ messages in thread
From: Andrii Tseglytskyi @ 2014-11-17 15:47 UTC (permalink / raw)
  To: Julien Grall; +Cc: xen-devel, Ian Campbell, Stefano Stabellini

Hi,

Issue occurs after the following commit:

commit 5495a512b63bad868c147198f7f049c2617d468c
Author: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Date:   Tue Jun 10 15:07:12 2014 +0100

    xen/arm: support HW interrupts, do not request maintenance_interrupts

    If the irq to be injected is an hardware irq (p->desc != NULL), set
    GICH_LR_HW. Do not set GICH_LR_MAINTENANCE_IRQ.


I'm going to debug it deeply.
Stefano - may be you have a feeling what it can be ?

Regards,
Andrii


On Fri, Nov 14, 2014 at 6:40 PM, Andrii Tseglytskyi
<andrii.tseglytskyi@globallogic.com> wrote:
> Hi Julien,
>
>> I would be surprised that the next GIC series impact this code as the
>> next driver is only compiled for arm64 (GICv3 doesn't exist on arm32).
>> Though, there was some refactoring.
>
> I meant that code was divided for generic GIC and GICv2 together with
> refactoring. Also in mails I saw that it was initially tested without
> SMP.
> GICv3 has no impacts for sure.
>
>>
>> The interrupt management has also been reworked for Xen 4.5 to avoid
>> maintenance interrupt. I would give a look on this part.
>
> Thanks, this may help.
>
> Regards,
> Andrii
>
>
>>
>> Regards,
>>
>> --
>> Julien Grall
>
>
>
> --
>
> Andrii Tseglytskyi | Embedded Dev
> GlobalLogic
> www.globallogic.com



-- 

Andrii Tseglytskyi | Embedded Dev
GlobalLogic
www.globallogic.com

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Xen 4.5 random freeze question
  2014-11-17 15:47                 ` Andrii Tseglytskyi
@ 2014-11-17 16:39                   ` Stefano Stabellini
  2014-11-17 17:05                     ` Andrii Tseglytskyi
  0 siblings, 1 reply; 66+ messages in thread
From: Stefano Stabellini @ 2014-11-17 16:39 UTC (permalink / raw)
  To: Andrii Tseglytskyi
  Cc: Julien Grall, xen-devel, Ian Campbell, Stefano Stabellini

Although it is possible that that patch is the cause of your problem,
unfortunately it is part of a significant rework of the GIC driver in
Xen and I am afraid that testing with only a portion of that patch
series might introduce other subtle bugs.  For your reference the series
starts at commit 6f91502be64a05d0635454d629118b96ae38b50f and ends at
commit 72eaf29e8d70784aaf066ead79df1295a25ecfd0.

If 5495a512b63bad868c147198f7f049c2617d468c is really the cause of your
problem, one idea that comes to mind is that GICH_LR_MAINTENANCE_IRQ
might not work correctly on your platform. It wouldn't be the first time
that we see hardware behaving that way, especially if you are using the
GIC secure registers instead of the non-secure register as GICH_LRn.HW
can only deactivate non-secure interrupts. This is usually due to a
configuration error in u-boot.

Could you please try to set PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI for your
platform?



On Mon, 17 Nov 2014, Andrii Tseglytskyi wrote:
> Hi,
> 
> Issue occurs after the following commit:
> 
> commit 5495a512b63bad868c147198f7f049c2617d468c
> Author: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> Date:   Tue Jun 10 15:07:12 2014 +0100
> 
>     xen/arm: support HW interrupts, do not request maintenance_interrupts
> 
>     If the irq to be injected is an hardware irq (p->desc != NULL), set
>     GICH_LR_HW. Do not set GICH_LR_MAINTENANCE_IRQ.
> 
> 
> I'm going to debug it deeply.
> Stefano - may be you have a feeling what it can be ?
> 
> Regards,
> Andrii
> 
> 
> On Fri, Nov 14, 2014 at 6:40 PM, Andrii Tseglytskyi
> <andrii.tseglytskyi@globallogic.com> wrote:
> > Hi Julien,
> >
> >> I would be surprised that the next GIC series impact this code as the
> >> next driver is only compiled for arm64 (GICv3 doesn't exist on arm32).
> >> Though, there was some refactoring.
> >
> > I meant that code was divided for generic GIC and GICv2 together with
> > refactoring. Also in mails I saw that it was initially tested without
> > SMP.
> > GICv3 has no impacts for sure.
> >
> >>
> >> The interrupt management has also been reworked for Xen 4.5 to avoid
> >> maintenance interrupt. I would give a look on this part.
> >
> > Thanks, this may help.
> >
> > Regards,
> > Andrii
> >
> >
> >>
> >> Regards,
> >>
> >> --
> >> Julien Grall
> >
> >
> >
> > --
> >
> > Andrii Tseglytskyi | Embedded Dev
> > GlobalLogic
> > www.globallogic.com
> 
> 
> 
> -- 
> 
> Andrii Tseglytskyi | Embedded Dev
> GlobalLogic
> www.globallogic.com
> 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Xen 4.5 random freeze question
  2014-11-17 16:39                   ` Stefano Stabellini
@ 2014-11-17 17:05                     ` Andrii Tseglytskyi
  2014-11-17 18:02                       ` Stefano Stabellini
  0 siblings, 1 reply; 66+ messages in thread
From: Andrii Tseglytskyi @ 2014-11-17 17:05 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: Julien Grall, Ian Campbell, xen-devel

Hi Stefano,

Thank you for your answer.

On Mon, Nov 17, 2014 at 6:39 PM, Stefano Stabellini
<stefano.stabellini@eu.citrix.com> wrote:
> Although it is possible that that patch is the cause of your problem,
> unfortunately it is part of a significant rework of the GIC driver in
> Xen and I am afraid that testing with only a portion of that patch
> series might introduce other subtle bugs.  For your reference the series
> starts at commit 6f91502be64a05d0635454d629118b96ae38b50f and ends at
> commit 72eaf29e8d70784aaf066ead79df1295a25ecfd0.
>

Yes, I tested with and without the whole series.

> If 5495a512b63bad868c147198f7f049c2617d468c is really the cause of your
> problem, one idea that comes to mind is that GICH_LR_MAINTENANCE_IRQ
> might not work correctly on your platform. It wouldn't be the first time
> that we see hardware behaving that way, especially if you are using the
> GIC secure registers instead of the non-secure register as GICH_LRn.HW
> can only deactivate non-secure interrupts. This is usually due to a
> configuration error in u-boot.
>
> Could you please try to set PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI for your
> platform?
>

I tried this. Unfortunately it doesn't help.

Regards,
Andrii

>
>
> On Mon, 17 Nov 2014, Andrii Tseglytskyi wrote:
>> Hi,
>>
>> Issue occurs after the following commit:
>>
>> commit 5495a512b63bad868c147198f7f049c2617d468c
>> Author: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
>> Date:   Tue Jun 10 15:07:12 2014 +0100
>>
>>     xen/arm: support HW interrupts, do not request maintenance_interrupts
>>
>>     If the irq to be injected is an hardware irq (p->desc != NULL), set
>>     GICH_LR_HW. Do not set GICH_LR_MAINTENANCE_IRQ.
>>
>>
>> I'm going to debug it deeply.
>> Stefano - may be you have a feeling what it can be ?
>>
>> Regards,
>> Andrii
>>
>>
>> On Fri, Nov 14, 2014 at 6:40 PM, Andrii Tseglytskyi
>> <andrii.tseglytskyi@globallogic.com> wrote:
>> > Hi Julien,
>> >
>> >> I would be surprised that the next GIC series impact this code as the
>> >> next driver is only compiled for arm64 (GICv3 doesn't exist on arm32).
>> >> Though, there was some refactoring.
>> >
>> > I meant that code was divided for generic GIC and GICv2 together with
>> > refactoring. Also in mails I saw that it was initially tested without
>> > SMP.
>> > GICv3 has no impacts for sure.
>> >
>> >>
>> >> The interrupt management has also been reworked for Xen 4.5 to avoid
>> >> maintenance interrupt. I would give a look on this part.
>> >
>> > Thanks, this may help.
>> >
>> > Regards,
>> > Andrii
>> >
>> >
>> >>
>> >> Regards,
>> >>
>> >> --
>> >> Julien Grall
>> >
>> >
>> >
>> > --
>> >
>> > Andrii Tseglytskyi | Embedded Dev
>> > GlobalLogic
>> > www.globallogic.com
>>
>>
>>
>> --
>>
>> Andrii Tseglytskyi | Embedded Dev
>> GlobalLogic
>> www.globallogic.com
>>



-- 

Andrii Tseglytskyi | Embedded Dev
GlobalLogic
www.globallogic.com

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Xen 4.5 random freeze question
  2014-11-17 17:05                     ` Andrii Tseglytskyi
@ 2014-11-17 18:02                       ` Stefano Stabellini
  2014-11-18 10:41                         ` Andrii Tseglytskyi
  0 siblings, 1 reply; 66+ messages in thread
From: Stefano Stabellini @ 2014-11-17 18:02 UTC (permalink / raw)
  To: Andrii Tseglytskyi
  Cc: Julien Grall, xen-devel, Ian Campbell, Stefano Stabellini

On Mon, 17 Nov 2014, Andrii Tseglytskyi wrote:
> Hi Stefano,
> 
> Thank you for your answer.
> 
> On Mon, Nov 17, 2014 at 6:39 PM, Stefano Stabellini
> <stefano.stabellini@eu.citrix.com> wrote:
> > Although it is possible that that patch is the cause of your problem,
> > unfortunately it is part of a significant rework of the GIC driver in
> > Xen and I am afraid that testing with only a portion of that patch
> > series might introduce other subtle bugs.  For your reference the series
> > starts at commit 6f91502be64a05d0635454d629118b96ae38b50f and ends at
> > commit 72eaf29e8d70784aaf066ead79df1295a25ecfd0.
> >
> 
> Yes, I tested with and without the whole series.

And the result is that the series causes the problem?


> > If 5495a512b63bad868c147198f7f049c2617d468c is really the cause of your
> > problem, one idea that comes to mind is that GICH_LR_MAINTENANCE_IRQ
> > might not work correctly on your platform. It wouldn't be the first time
> > that we see hardware behaving that way, especially if you are using the
> > GIC secure registers instead of the non-secure register as GICH_LRn.HW
> > can only deactivate non-secure interrupts. This is usually due to a
> > configuration error in u-boot.
> >
> > Could you please try to set PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI for your
> > platform?
> >
> 
> I tried this. Unfortunately it doesn't help.

Could you try the following patch on top of
5495a512b63bad868c147198f7f049c2617d468c ?

diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index 302c031..a286376 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -557,10 +557,8 @@ static inline void gic_set_lr(int lr, struct pending_irq *p,
     BUG_ON(lr < 0);
     BUG_ON(state & ~(GICH_LR_STATE_MASK<<GICH_LR_STATE_SHIFT));
 
-    lr_val = state | ((p->priority >> 3) << GICH_LR_PRIORITY_SHIFT) |
+    lr_val = state | GICH_LR_MAINTENANCE_IRQ | ((p->priority >> 3) << GICH_LR_PRIORITY_SHIFT) |
         ((p->irq & GICH_LR_VIRTUAL_MASK) << GICH_LR_VIRTUAL_SHIFT);
-    if ( p->desc != NULL )
-        lr_val |= GICH_LR_HW | (p->desc->irq << GICH_LR_PHYSICAL_SHIFT);
 
     GICH[GICH_LR + lr] = lr_val;
 
@@ -622,6 +620,12 @@ out:
     return;
 }
 
+static void gic_irq_eoi(void *info)
+{
+    int virq = (uintptr_t) info;
+    GICC[GICC_DIR] = virq;
+}
+
 static void gic_update_one_lr(struct vcpu *v, int i)
 {
     struct pending_irq *p;
@@ -639,7 +643,10 @@ static void gic_update_one_lr(struct vcpu *v, int i)
         irq = (lr >> GICH_LR_VIRTUAL_SHIFT) & GICH_LR_VIRTUAL_MASK;
         p = irq_to_pending(v, irq);
         if ( p->desc != NULL )
+        {
+            gic_irq_eoi((void*)(uintptr_t)irq);
             p->desc->status &= ~IRQ_INPROGRESS;
+        }
         clear_bit(GIC_IRQ_GUEST_VISIBLE, &p->status);
         if ( test_bit(GIC_IRQ_GUEST_PENDING, &p->status) &&
                 test_bit(GIC_IRQ_GUEST_ENABLED, &p->status))

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: Xen 4.5 random freeze question
  2014-11-17 18:02                       ` Stefano Stabellini
@ 2014-11-18 10:41                         ` Andrii Tseglytskyi
  2014-11-18 11:31                           ` Andrii Tseglytskyi
  0 siblings, 1 reply; 66+ messages in thread
From: Andrii Tseglytskyi @ 2014-11-18 10:41 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: Julien Grall, Ian Campbell, xen-devel

Hi Stefano,

On Mon, Nov 17, 2014 at 8:02 PM, Stefano Stabellini
<stefano.stabellini@eu.citrix.com> wrote:
> On Mon, 17 Nov 2014, Andrii Tseglytskyi wrote:
>> Hi Stefano,
>>
>> Thank you for your answer.
>>
>> On Mon, Nov 17, 2014 at 6:39 PM, Stefano Stabellini
>> <stefano.stabellini@eu.citrix.com> wrote:
>> > Although it is possible that that patch is the cause of your problem,
>> > unfortunately it is part of a significant rework of the GIC driver in
>> > Xen and I am afraid that testing with only a portion of that patch
>> > series might introduce other subtle bugs.  For your reference the series
>> > starts at commit 6f91502be64a05d0635454d629118b96ae38b50f and ends at
>> > commit 72eaf29e8d70784aaf066ead79df1295a25ecfd0.
>> >
>>
>> Yes, I tested with and without the whole series.
>
> And the result is that the series causes the problem?
>

Yes.

>
>> > If 5495a512b63bad868c147198f7f049c2617d468c is really the cause of your
>> > problem, one idea that comes to mind is that GICH_LR_MAINTENANCE_IRQ
>> > might not work correctly on your platform. It wouldn't be the first time
>> > that we see hardware behaving that way, especially if you are using the
>> > GIC secure registers instead of the non-secure register as GICH_LRn.HW
>> > can only deactivate non-secure interrupts. This is usually due to a
>> > configuration error in u-boot.
>> >
>> > Could you please try to set PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI for your
>> > platform?
>> >
>>
>> I tried this. Unfortunately it doesn't help.
>
> Could you try the following patch on top of
> 5495a512b63bad868c147198f7f049c2617d468c ?
>
> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> index 302c031..a286376 100644
> --- a/xen/arch/arm/gic.c
> +++ b/xen/arch/arm/gic.c
> @@ -557,10 +557,8 @@ static inline void gic_set_lr(int lr, struct pending_irq *p,
>      BUG_ON(lr < 0);
>      BUG_ON(state & ~(GICH_LR_STATE_MASK<<GICH_LR_STATE_SHIFT));
>
> -    lr_val = state | ((p->priority >> 3) << GICH_LR_PRIORITY_SHIFT) |
> +    lr_val = state | GICH_LR_MAINTENANCE_IRQ | ((p->priority >> 3) << GICH_LR_PRIORITY_SHIFT) |
>          ((p->irq & GICH_LR_VIRTUAL_MASK) << GICH_LR_VIRTUAL_SHIFT);
> -    if ( p->desc != NULL )
> -        lr_val |= GICH_LR_HW | (p->desc->irq << GICH_LR_PHYSICAL_SHIFT);
>
>      GICH[GICH_LR + lr] = lr_val;
>
> @@ -622,6 +620,12 @@ out:
>      return;
>  }
>
> +static void gic_irq_eoi(void *info)
> +{
> +    int virq = (uintptr_t) info;
> +    GICC[GICC_DIR] = virq;
> +}
> +
>  static void gic_update_one_lr(struct vcpu *v, int i)
>  {
>      struct pending_irq *p;
> @@ -639,7 +643,10 @@ static void gic_update_one_lr(struct vcpu *v, int i)
>          irq = (lr >> GICH_LR_VIRTUAL_SHIFT) & GICH_LR_VIRTUAL_MASK;
>          p = irq_to_pending(v, irq);
>          if ( p->desc != NULL )
> +        {
> +            gic_irq_eoi((void*)(uintptr_t)irq);
>              p->desc->status &= ~IRQ_INPROGRESS;
> +        }
>          clear_bit(GIC_IRQ_GUEST_VISIBLE, &p->status);
>          if ( test_bit(GIC_IRQ_GUEST_PENDING, &p->status) &&
>                  test_bit(GIC_IRQ_GUEST_ENABLED, &p->status))


It helps! Thank you a lot!
I did about ~30 reboots and got no hangs. The only what is needed - is
to rebase these changes on top of xen/master branch.
Changes in patch can be applied only on top of
5495a512b63bad868c147198f7f049c2617d468c
Will you do this change? Is it acceptable for baseline?

Regards,
Andrii

-- 

Andrii Tseglytskyi | Embedded Dev
GlobalLogic
www.globallogic.com

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Xen 4.5 random freeze question
  2014-11-18 10:41                         ` Andrii Tseglytskyi
@ 2014-11-18 11:31                           ` Andrii Tseglytskyi
  2014-11-18 12:35                             ` Andrii Tseglytskyi
  0 siblings, 1 reply; 66+ messages in thread
From: Andrii Tseglytskyi @ 2014-11-18 11:31 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: Julien Grall, Ian Campbell, xen-devel

Strange - looks like baseline code already does the same, that you
sent me yesterday. The only what is needed - is to set
PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI.
But baseline contains an issue. And in the same time changes on top of
5495a512b63bad868c147198f7f049c2617d468c work fine.

Regards,
Andrii

On Tue, Nov 18, 2014 at 12:41 PM, Andrii Tseglytskyi
<andrii.tseglytskyi@globallogic.com> wrote:
> Hi Stefano,
>
> On Mon, Nov 17, 2014 at 8:02 PM, Stefano Stabellini
> <stefano.stabellini@eu.citrix.com> wrote:
>> On Mon, 17 Nov 2014, Andrii Tseglytskyi wrote:
>>> Hi Stefano,
>>>
>>> Thank you for your answer.
>>>
>>> On Mon, Nov 17, 2014 at 6:39 PM, Stefano Stabellini
>>> <stefano.stabellini@eu.citrix.com> wrote:
>>> > Although it is possible that that patch is the cause of your problem,
>>> > unfortunately it is part of a significant rework of the GIC driver in
>>> > Xen and I am afraid that testing with only a portion of that patch
>>> > series might introduce other subtle bugs.  For your reference the series
>>> > starts at commit 6f91502be64a05d0635454d629118b96ae38b50f and ends at
>>> > commit 72eaf29e8d70784aaf066ead79df1295a25ecfd0.
>>> >
>>>
>>> Yes, I tested with and without the whole series.
>>
>> And the result is that the series causes the problem?
>>
>
> Yes.
>
>>
>>> > If 5495a512b63bad868c147198f7f049c2617d468c is really the cause of your
>>> > problem, one idea that comes to mind is that GICH_LR_MAINTENANCE_IRQ
>>> > might not work correctly on your platform. It wouldn't be the first time
>>> > that we see hardware behaving that way, especially if you are using the
>>> > GIC secure registers instead of the non-secure register as GICH_LRn.HW
>>> > can only deactivate non-secure interrupts. This is usually due to a
>>> > configuration error in u-boot.
>>> >
>>> > Could you please try to set PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI for your
>>> > platform?
>>> >
>>>
>>> I tried this. Unfortunately it doesn't help.
>>
>> Could you try the following patch on top of
>> 5495a512b63bad868c147198f7f049c2617d468c ?
>>
>> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
>> index 302c031..a286376 100644
>> --- a/xen/arch/arm/gic.c
>> +++ b/xen/arch/arm/gic.c
>> @@ -557,10 +557,8 @@ static inline void gic_set_lr(int lr, struct pending_irq *p,
>>      BUG_ON(lr < 0);
>>      BUG_ON(state & ~(GICH_LR_STATE_MASK<<GICH_LR_STATE_SHIFT));
>>
>> -    lr_val = state | ((p->priority >> 3) << GICH_LR_PRIORITY_SHIFT) |
>> +    lr_val = state | GICH_LR_MAINTENANCE_IRQ | ((p->priority >> 3) << GICH_LR_PRIORITY_SHIFT) |
>>          ((p->irq & GICH_LR_VIRTUAL_MASK) << GICH_LR_VIRTUAL_SHIFT);
>> -    if ( p->desc != NULL )
>> -        lr_val |= GICH_LR_HW | (p->desc->irq << GICH_LR_PHYSICAL_SHIFT);
>>
>>      GICH[GICH_LR + lr] = lr_val;
>>
>> @@ -622,6 +620,12 @@ out:
>>      return;
>>  }
>>
>> +static void gic_irq_eoi(void *info)
>> +{
>> +    int virq = (uintptr_t) info;
>> +    GICC[GICC_DIR] = virq;
>> +}
>> +
>>  static void gic_update_one_lr(struct vcpu *v, int i)
>>  {
>>      struct pending_irq *p;
>> @@ -639,7 +643,10 @@ static void gic_update_one_lr(struct vcpu *v, int i)
>>          irq = (lr >> GICH_LR_VIRTUAL_SHIFT) & GICH_LR_VIRTUAL_MASK;
>>          p = irq_to_pending(v, irq);
>>          if ( p->desc != NULL )
>> +        {
>> +            gic_irq_eoi((void*)(uintptr_t)irq);
>>              p->desc->status &= ~IRQ_INPROGRESS;
>> +        }
>>          clear_bit(GIC_IRQ_GUEST_VISIBLE, &p->status);
>>          if ( test_bit(GIC_IRQ_GUEST_PENDING, &p->status) &&
>>                  test_bit(GIC_IRQ_GUEST_ENABLED, &p->status))
>
>
> It helps! Thank you a lot!
> I did about ~30 reboots and got no hangs. The only what is needed - is
> to rebase these changes on top of xen/master branch.
> Changes in patch can be applied only on top of
> 5495a512b63bad868c147198f7f049c2617d468c
> Will you do this change? Is it acceptable for baseline?
>
> Regards,
> Andrii
>
> --
>
> Andrii Tseglytskyi | Embedded Dev
> GlobalLogic
> www.globallogic.com



-- 

Andrii Tseglytskyi | Embedded Dev
GlobalLogic
www.globallogic.com

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Xen 4.5 random freeze question
  2014-11-18 11:31                           ` Andrii Tseglytskyi
@ 2014-11-18 12:35                             ` Andrii Tseglytskyi
  2014-11-18 15:39                               ` Stefano Stabellini
  0 siblings, 1 reply; 66+ messages in thread
From: Andrii Tseglytskyi @ 2014-11-18 12:35 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: Julien Grall, Ian Campbell, xen-devel

OK, I see that GICH_V2_LR_MAINTENANCE_IRQ must always be set and
everything works fine
The following 2 patches fixes xen/master for my platform.

Stefano, could you please take a look to these changes?

commit 3628a0aa35706a8f532af865ed784536ce514eca
Author: Andrii Tseglytskyi <andrii.tseglytskyi@globallogic.com>
Date:   Tue Nov 18 14:20:42 2014 +0200

    xen/arm: dra7: always set GICH_V2_LR_MAINTENANCE_IRQ flag

    Change-Id: Ia380b3507a182b11592588f65fd23693d4f87434
    Signed-off-by: Andrii Tseglytskyi <andrii.tseglytskyi@globallogic.com>

diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
index 31fb81a..093ecdb 100644
--- a/xen/arch/arm/gic-v2.c
+++ b/xen/arch/arm/gic-v2.c
@@ -396,13 +396,14 @@ static void gicv2_update_lr(int lr, const struct
pending_irq *p,
                                              << GICH_V2_LR_PRIORITY_SHIFT) |
               ((p->irq & GICH_V2_LR_VIRTUAL_MASK) <<
GICH_V2_LR_VIRTUAL_SHIFT));

-    if ( p->desc != NULL )
+    if ( platform_has_quirk(PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI) )
     {
-        if ( platform_has_quirk(PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI) )
-            lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ;
-        else
-            lr_reg |= GICH_V2_LR_HW | ((p->desc->irq &
GICH_V2_LR_PHYSICAL_MASK )
-                            << GICH_V2_LR_PHYSICAL_SHIFT);
+        lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ;
+    }
+    else if ( p->desc != NULL )
+    {
+        lr_reg |= GICH_V2_LR_HW | ((p->desc->irq & GICH_V2_LR_PHYSICAL_MASK )
+                       << GICH_V2_LR_PHYSICAL_SHIFT);
     }

     writel_gich(lr_reg, GICH_LR + lr * 4);

commit 110ad1914f04a5e52ec9d49a9aeb7df488f524b1
Author: Andrii Tseglytskyi <andrii.tseglytskyi@globallogic.com>
Date:   Tue Nov 18 12:14:42 2014 +0200

    xen/arm: dra7: add PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI

    Change-Id: Ic6285d5aea803fb0bfef50ffcc35e20b5bfb7a77
    Signed-off-by: Andrii Tseglytskyi <andrii.tseglytskyi@globallogic.com>

diff --git a/xen/arch/arm/platforms/omap5.c b/xen/arch/arm/platforms/omap5.c
index 9d6e504..fb6686f 100644
--- a/xen/arch/arm/platforms/omap5.c
+++ b/xen/arch/arm/platforms/omap5.c
@@ -166,6 +166,11 @@ static const struct dt_device_match
dra7_blacklist_dev[] __initconst =
     { /* sentinel */ },
 };

+static uint32_t dra7_quirks(void)
+{
+    return PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI;
+}
+
 PLATFORM_START(omap5, "TI OMAP5")
     .compatible = omap5_dt_compat,
     .init_time = omap5_init_time,
@@ -186,6 +191,7 @@ PLATFORM_START(dra7, "TI DRA7")
     .dom0_gnttab_start = 0x4b000000,
     .dom0_gnttab_size = 0x20000,
     .blacklist_dev = dra7_blacklist_dev,
+    .quirks = dra7_quirks,
 PLATFORM_END

 /*

On Tue, Nov 18, 2014 at 1:31 PM, Andrii Tseglytskyi
<andrii.tseglytskyi@globallogic.com> wrote:
> Strange - looks like baseline code already does the same, that you
> sent me yesterday. The only what is needed - is to set
> PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI.
> But baseline contains an issue. And in the same time changes on top of
> 5495a512b63bad868c147198f7f049c2617d468c work fine.
>
> Regards,
> Andrii
>
> On Tue, Nov 18, 2014 at 12:41 PM, Andrii Tseglytskyi
> <andrii.tseglytskyi@globallogic.com> wrote:
>> Hi Stefano,
>>
>> On Mon, Nov 17, 2014 at 8:02 PM, Stefano Stabellini
>> <stefano.stabellini@eu.citrix.com> wrote:
>>> On Mon, 17 Nov 2014, Andrii Tseglytskyi wrote:
>>>> Hi Stefano,
>>>>
>>>> Thank you for your answer.
>>>>
>>>> On Mon, Nov 17, 2014 at 6:39 PM, Stefano Stabellini
>>>> <stefano.stabellini@eu.citrix.com> wrote:
>>>> > Although it is possible that that patch is the cause of your problem,
>>>> > unfortunately it is part of a significant rework of the GIC driver in
>>>> > Xen and I am afraid that testing with only a portion of that patch
>>>> > series might introduce other subtle bugs.  For your reference the series
>>>> > starts at commit 6f91502be64a05d0635454d629118b96ae38b50f and ends at
>>>> > commit 72eaf29e8d70784aaf066ead79df1295a25ecfd0.
>>>> >
>>>>
>>>> Yes, I tested with and without the whole series.
>>>
>>> And the result is that the series causes the problem?
>>>
>>
>> Yes.
>>
>>>
>>>> > If 5495a512b63bad868c147198f7f049c2617d468c is really the cause of your
>>>> > problem, one idea that comes to mind is that GICH_LR_MAINTENANCE_IRQ
>>>> > might not work correctly on your platform. It wouldn't be the first time
>>>> > that we see hardware behaving that way, especially if you are using the
>>>> > GIC secure registers instead of the non-secure register as GICH_LRn.HW
>>>> > can only deactivate non-secure interrupts. This is usually due to a
>>>> > configuration error in u-boot.
>>>> >
>>>> > Could you please try to set PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI for your
>>>> > platform?
>>>> >
>>>>
>>>> I tried this. Unfortunately it doesn't help.
>>>
>>> Could you try the following patch on top of
>>> 5495a512b63bad868c147198f7f049c2617d468c ?
>>>
>>> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
>>> index 302c031..a286376 100644
>>> --- a/xen/arch/arm/gic.c
>>> +++ b/xen/arch/arm/gic.c
>>> @@ -557,10 +557,8 @@ static inline void gic_set_lr(int lr, struct pending_irq *p,
>>>      BUG_ON(lr < 0);
>>>      BUG_ON(state & ~(GICH_LR_STATE_MASK<<GICH_LR_STATE_SHIFT));
>>>
>>> -    lr_val = state | ((p->priority >> 3) << GICH_LR_PRIORITY_SHIFT) |
>>> +    lr_val = state | GICH_LR_MAINTENANCE_IRQ | ((p->priority >> 3) << GICH_LR_PRIORITY_SHIFT) |
>>>          ((p->irq & GICH_LR_VIRTUAL_MASK) << GICH_LR_VIRTUAL_SHIFT);
>>> -    if ( p->desc != NULL )
>>> -        lr_val |= GICH_LR_HW | (p->desc->irq << GICH_LR_PHYSICAL_SHIFT);
>>>
>>>      GICH[GICH_LR + lr] = lr_val;
>>>
>>> @@ -622,6 +620,12 @@ out:
>>>      return;
>>>  }
>>>
>>> +static void gic_irq_eoi(void *info)
>>> +{
>>> +    int virq = (uintptr_t) info;
>>> +    GICC[GICC_DIR] = virq;
>>> +}
>>> +
>>>  static void gic_update_one_lr(struct vcpu *v, int i)
>>>  {
>>>      struct pending_irq *p;
>>> @@ -639,7 +643,10 @@ static void gic_update_one_lr(struct vcpu *v, int i)
>>>          irq = (lr >> GICH_LR_VIRTUAL_SHIFT) & GICH_LR_VIRTUAL_MASK;
>>>          p = irq_to_pending(v, irq);
>>>          if ( p->desc != NULL )
>>> +        {
>>> +            gic_irq_eoi((void*)(uintptr_t)irq);
>>>              p->desc->status &= ~IRQ_INPROGRESS;
>>> +        }
>>>          clear_bit(GIC_IRQ_GUEST_VISIBLE, &p->status);
>>>          if ( test_bit(GIC_IRQ_GUEST_PENDING, &p->status) &&
>>>                  test_bit(GIC_IRQ_GUEST_ENABLED, &p->status))
>>
>>
>> It helps! Thank you a lot!
>> I did about ~30 reboots and got no hangs. The only what is needed - is
>> to rebase these changes on top of xen/master branch.
>> Changes in patch can be applied only on top of
>> 5495a512b63bad868c147198f7f049c2617d468c
>> Will you do this change? Is it acceptable for baseline?
>>
>> Regards,
>> Andrii
>>
>> --
>>
>> Andrii Tseglytskyi | Embedded Dev
>> GlobalLogic
>> www.globallogic.com
>
>
>
> --
>
> Andrii Tseglytskyi | Embedded Dev
> GlobalLogic
> www.globallogic.com



-- 

Andrii Tseglytskyi | Embedded Dev
GlobalLogic
www.globallogic.com

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: Xen 4.5 random freeze question
  2014-11-18 12:35                             ` Andrii Tseglytskyi
@ 2014-11-18 15:39                               ` Stefano Stabellini
  2014-11-18 16:11                                 ` Andrii Tseglytskyi
  0 siblings, 1 reply; 66+ messages in thread
From: Stefano Stabellini @ 2014-11-18 15:39 UTC (permalink / raw)
  To: Andrii Tseglytskyi
  Cc: Julien Grall, xen-devel, Ian Campbell, Stefano Stabellini

On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote:
> OK, I see that GICH_V2_LR_MAINTENANCE_IRQ must always be set and
> everything works fine
> The following 2 patches fixes xen/master for my platform.
> 
> Stefano, could you please take a look to these changes?
> 
> commit 3628a0aa35706a8f532af865ed784536ce514eca
> Author: Andrii Tseglytskyi <andrii.tseglytskyi@globallogic.com>
> Date:   Tue Nov 18 14:20:42 2014 +0200
> 
>     xen/arm: dra7: always set GICH_V2_LR_MAINTENANCE_IRQ flag
> 
>     Change-Id: Ia380b3507a182b11592588f65fd23693d4f87434
>     Signed-off-by: Andrii Tseglytskyi <andrii.tseglytskyi@globallogic.com>
> 
> diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
> index 31fb81a..093ecdb 100644
> --- a/xen/arch/arm/gic-v2.c
> +++ b/xen/arch/arm/gic-v2.c
> @@ -396,13 +396,14 @@ static void gicv2_update_lr(int lr, const struct
> pending_irq *p,
>                                               << GICH_V2_LR_PRIORITY_SHIFT) |
>                ((p->irq & GICH_V2_LR_VIRTUAL_MASK) <<
> GICH_V2_LR_VIRTUAL_SHIFT));
> 
> -    if ( p->desc != NULL )
> +    if ( platform_has_quirk(PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI) )
>      {
> -        if ( platform_has_quirk(PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI) )
> -            lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ;
> -        else
> -            lr_reg |= GICH_V2_LR_HW | ((p->desc->irq &
> GICH_V2_LR_PHYSICAL_MASK )
> -                            << GICH_V2_LR_PHYSICAL_SHIFT);
> +        lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ;
> +    }
> +    else if ( p->desc != NULL )
> +    {
> +        lr_reg |= GICH_V2_LR_HW | ((p->desc->irq & GICH_V2_LR_PHYSICAL_MASK )
> +                       << GICH_V2_LR_PHYSICAL_SHIFT);
>      }
> 
>      writel_gich(lr_reg, GICH_LR + lr * 4);

Actually in case p->desc == NULL (the irq is not an hardware irq, it
could be the virtual timer irq or the evtchn irq), you shouldn't need
the maintenance interrupt, if the bug was really due to GICH_LR_HW not
working correctly on OMAP5. This changes might only be better at
"hiding" the real issue.

Maybe the problem is exactly the opposite: the new scheme for avoiding
maintenance interrupts doesn't work for software interrupts.
The commit that should make them work correctly after the
no-maintenance-irq commit is 394b7e587b05d0f4a5fd6f067b38339ab5a77121
If you look at the changes to gic_update_one_lr in that commit, you'll
see that is going to set a software irq as PENDING if it is already ACTIVE.
Maybe that doesn't work correctly on OMAP5.

Could you try this patch on top of
394b7e587b05d0f4a5fd6f067b38339ab5a77121?  It should help us understand
if the problem is specifically with software irqs.


diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index b7516c0..d8a17c9 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -66,7 +66,7 @@ static DEFINE_PER_CPU(u8, gic_cpu_id);
 /* Maximum cpu interface per GIC */
 #define NR_GIC_CPU_IF 8
 
-#undef GIC_DEBUG
+#define GIC_DEBUG 1
 
 static void gic_update_one_lr(struct vcpu *v, int i);
 
@@ -563,6 +563,8 @@ static inline void gic_set_lr(int lr, struct pending_irq *p,
         ((p->irq & GICH_LR_VIRTUAL_MASK) << GICH_LR_VIRTUAL_SHIFT);
     if ( p->desc != NULL )
         lr_val |= GICH_LR_HW | (p->desc->irq << GICH_LR_PHYSICAL_SHIFT);
+    else
+        lr_val |= GICH_LR_MAINTENANCE_IRQ;
 
     GICH[GICH_LR + lr] = lr_val;

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: Xen 4.5 random freeze question
  2014-11-18 15:39                               ` Stefano Stabellini
@ 2014-11-18 16:11                                 ` Andrii Tseglytskyi
  2014-11-18 16:14                                   ` Stefano Stabellini
  0 siblings, 1 reply; 66+ messages in thread
From: Andrii Tseglytskyi @ 2014-11-18 16:11 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: Julien Grall, Ian Campbell, xen-devel

What if I try on top of current master branch the following code:

diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
index 31fb81a..6764ab7 100644
--- a/xen/arch/arm/gic-v2.c
+++ b/xen/arch/arm/gic-v2.c
@@ -36,6 +36,8 @@
 #include <asm/io.h>
 #include <asm/gic.h>

+#define GIC_DEBUG 1
+
 /*
  * LR register definitions are GIC v2 specific.
  * Moved these definitions from header file to here
diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index bcaded9..c03d6a6 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -41,7 +41,7 @@ static DEFINE_PER_CPU(uint64_t, lr_mask);

 #define lr_all_full() (this_cpu(lr_mask) == ((1 <<
gic_hw_ops->info->nr_lrs) - 1))

-#undef GIC_DEBUG
+#define GIC_DEBUG 1

 static void gic_update_one_lr(struct vcpu *v, int i);

It is equivalent to what you proposing - my code contains
PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI, as result the following lone will
be executed:
 lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ; inside gicv2_update_lr() function

regards,
Andrii

On Tue, Nov 18, 2014 at 5:39 PM, Stefano Stabellini
<stefano.stabellini@eu.citrix.com> wrote:
> On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote:
>> OK, I see that GICH_V2_LR_MAINTENANCE_IRQ must always be set and
>> everything works fine
>> The following 2 patches fixes xen/master for my platform.
>>
>> Stefano, could you please take a look to these changes?
>>
>> commit 3628a0aa35706a8f532af865ed784536ce514eca
>> Author: Andrii Tseglytskyi <andrii.tseglytskyi@globallogic.com>
>> Date:   Tue Nov 18 14:20:42 2014 +0200
>>
>>     xen/arm: dra7: always set GICH_V2_LR_MAINTENANCE_IRQ flag
>>
>>     Change-Id: Ia380b3507a182b11592588f65fd23693d4f87434
>>     Signed-off-by: Andrii Tseglytskyi <andrii.tseglytskyi@globallogic.com>
>>
>> diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
>> index 31fb81a..093ecdb 100644
>> --- a/xen/arch/arm/gic-v2.c
>> +++ b/xen/arch/arm/gic-v2.c
>> @@ -396,13 +396,14 @@ static void gicv2_update_lr(int lr, const struct
>> pending_irq *p,
>>                                               << GICH_V2_LR_PRIORITY_SHIFT) |
>>                ((p->irq & GICH_V2_LR_VIRTUAL_MASK) <<
>> GICH_V2_LR_VIRTUAL_SHIFT));
>>
>> -    if ( p->desc != NULL )
>> +    if ( platform_has_quirk(PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI) )
>>      {
>> -        if ( platform_has_quirk(PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI) )
>> -            lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ;
>> -        else
>> -            lr_reg |= GICH_V2_LR_HW | ((p->desc->irq &
>> GICH_V2_LR_PHYSICAL_MASK )
>> -                            << GICH_V2_LR_PHYSICAL_SHIFT);
>> +        lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ;
>> +    }
>> +    else if ( p->desc != NULL )
>> +    {
>> +        lr_reg |= GICH_V2_LR_HW | ((p->desc->irq & GICH_V2_LR_PHYSICAL_MASK )
>> +                       << GICH_V2_LR_PHYSICAL_SHIFT);
>>      }
>>
>>      writel_gich(lr_reg, GICH_LR + lr * 4);
>
> Actually in case p->desc == NULL (the irq is not an hardware irq, it
> could be the virtual timer irq or the evtchn irq), you shouldn't need
> the maintenance interrupt, if the bug was really due to GICH_LR_HW not
> working correctly on OMAP5. This changes might only be better at
> "hiding" the real issue.
>
> Maybe the problem is exactly the opposite: the new scheme for avoiding
> maintenance interrupts doesn't work for software interrupts.
> The commit that should make them work correctly after the
> no-maintenance-irq commit is 394b7e587b05d0f4a5fd6f067b38339ab5a77121
> If you look at the changes to gic_update_one_lr in that commit, you'll
> see that is going to set a software irq as PENDING if it is already ACTIVE.
> Maybe that doesn't work correctly on OMAP5.
>
> Could you try this patch on top of
> 394b7e587b05d0f4a5fd6f067b38339ab5a77121?  It should help us understand
> if the problem is specifically with software irqs.
>
>
> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> index b7516c0..d8a17c9 100644
> --- a/xen/arch/arm/gic.c
> +++ b/xen/arch/arm/gic.c
> @@ -66,7 +66,7 @@ static DEFINE_PER_CPU(u8, gic_cpu_id);
>  /* Maximum cpu interface per GIC */
>  #define NR_GIC_CPU_IF 8
>
> -#undef GIC_DEBUG
> +#define GIC_DEBUG 1
>
>  static void gic_update_one_lr(struct vcpu *v, int i);
>
> @@ -563,6 +563,8 @@ static inline void gic_set_lr(int lr, struct pending_irq *p,
>          ((p->irq & GICH_LR_VIRTUAL_MASK) << GICH_LR_VIRTUAL_SHIFT);
>      if ( p->desc != NULL )
>          lr_val |= GICH_LR_HW | (p->desc->irq << GICH_LR_PHYSICAL_SHIFT);
> +    else
> +        lr_val |= GICH_LR_MAINTENANCE_IRQ;
>
>      GICH[GICH_LR + lr] = lr_val;
>



-- 

Andrii Tseglytskyi | Embedded Dev
GlobalLogic
www.globallogic.com

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: Xen 4.5 random freeze question
  2014-11-18 16:11                                 ` Andrii Tseglytskyi
@ 2014-11-18 16:14                                   ` Stefano Stabellini
  2014-11-18 16:18                                     ` Andrii Tseglytskyi
  0 siblings, 1 reply; 66+ messages in thread
From: Stefano Stabellini @ 2014-11-18 16:14 UTC (permalink / raw)
  To: Andrii Tseglytskyi
  Cc: Julien Grall, xen-devel, Ian Campbell, Stefano Stabellini

It is not the same: I would like to set GICH_V2_LR_MAINTENANCE_IRQ only
for non-hardware irqs (desc == NULL) and keep avoiding
GICH_V2_LR_MAINTENANCE_IRQ and setting GICH_LR_HW for hardware irqs.

Also testing on 394b7e587b05d0f4a5fd6f067b38339ab5a77121 would avoid
other potential bugs introduced later.

On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote:
> What if I try on top of current master branch the following code:
> 
> diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
> index 31fb81a..6764ab7 100644
> --- a/xen/arch/arm/gic-v2.c
> +++ b/xen/arch/arm/gic-v2.c
> @@ -36,6 +36,8 @@
>  #include <asm/io.h>
>  #include <asm/gic.h>
> 
> +#define GIC_DEBUG 1
> +
>  /*
>   * LR register definitions are GIC v2 specific.
>   * Moved these definitions from header file to here
> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> index bcaded9..c03d6a6 100644
> --- a/xen/arch/arm/gic.c
> +++ b/xen/arch/arm/gic.c
> @@ -41,7 +41,7 @@ static DEFINE_PER_CPU(uint64_t, lr_mask);
> 
>  #define lr_all_full() (this_cpu(lr_mask) == ((1 <<
> gic_hw_ops->info->nr_lrs) - 1))
> 
> -#undef GIC_DEBUG
> +#define GIC_DEBUG 1
> 
>  static void gic_update_one_lr(struct vcpu *v, int i);
> 
> It is equivalent to what you proposing - my code contains
> PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI, as result the following lone will
> be executed:
>  lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ; inside gicv2_update_lr() function
> 
> regards,
> Andrii
> 
> On Tue, Nov 18, 2014 at 5:39 PM, Stefano Stabellini
> <stefano.stabellini@eu.citrix.com> wrote:
> > On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote:
> >> OK, I see that GICH_V2_LR_MAINTENANCE_IRQ must always be set and
> >> everything works fine
> >> The following 2 patches fixes xen/master for my platform.
> >>
> >> Stefano, could you please take a look to these changes?
> >>
> >> commit 3628a0aa35706a8f532af865ed784536ce514eca
> >> Author: Andrii Tseglytskyi <andrii.tseglytskyi@globallogic.com>
> >> Date:   Tue Nov 18 14:20:42 2014 +0200
> >>
> >>     xen/arm: dra7: always set GICH_V2_LR_MAINTENANCE_IRQ flag
> >>
> >>     Change-Id: Ia380b3507a182b11592588f65fd23693d4f87434
> >>     Signed-off-by: Andrii Tseglytskyi <andrii.tseglytskyi@globallogic.com>
> >>
> >> diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
> >> index 31fb81a..093ecdb 100644
> >> --- a/xen/arch/arm/gic-v2.c
> >> +++ b/xen/arch/arm/gic-v2.c
> >> @@ -396,13 +396,14 @@ static void gicv2_update_lr(int lr, const struct
> >> pending_irq *p,
> >>                                               << GICH_V2_LR_PRIORITY_SHIFT) |
> >>                ((p->irq & GICH_V2_LR_VIRTUAL_MASK) <<
> >> GICH_V2_LR_VIRTUAL_SHIFT));
> >>
> >> -    if ( p->desc != NULL )
> >> +    if ( platform_has_quirk(PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI) )
> >>      {
> >> -        if ( platform_has_quirk(PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI) )
> >> -            lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ;
> >> -        else
> >> -            lr_reg |= GICH_V2_LR_HW | ((p->desc->irq &
> >> GICH_V2_LR_PHYSICAL_MASK )
> >> -                            << GICH_V2_LR_PHYSICAL_SHIFT);
> >> +        lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ;
> >> +    }
> >> +    else if ( p->desc != NULL )
> >> +    {
> >> +        lr_reg |= GICH_V2_LR_HW | ((p->desc->irq & GICH_V2_LR_PHYSICAL_MASK )
> >> +                       << GICH_V2_LR_PHYSICAL_SHIFT);
> >>      }
> >>
> >>      writel_gich(lr_reg, GICH_LR + lr * 4);
> >
> > Actually in case p->desc == NULL (the irq is not an hardware irq, it
> > could be the virtual timer irq or the evtchn irq), you shouldn't need
> > the maintenance interrupt, if the bug was really due to GICH_LR_HW not
> > working correctly on OMAP5. This changes might only be better at
> > "hiding" the real issue.
> >
> > Maybe the problem is exactly the opposite: the new scheme for avoiding
> > maintenance interrupts doesn't work for software interrupts.
> > The commit that should make them work correctly after the
> > no-maintenance-irq commit is 394b7e587b05d0f4a5fd6f067b38339ab5a77121
> > If you look at the changes to gic_update_one_lr in that commit, you'll
> > see that is going to set a software irq as PENDING if it is already ACTIVE.
> > Maybe that doesn't work correctly on OMAP5.
> >
> > Could you try this patch on top of
> > 394b7e587b05d0f4a5fd6f067b38339ab5a77121?  It should help us understand
> > if the problem is specifically with software irqs.
> >
> >
> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> > index b7516c0..d8a17c9 100644
> > --- a/xen/arch/arm/gic.c
> > +++ b/xen/arch/arm/gic.c
> > @@ -66,7 +66,7 @@ static DEFINE_PER_CPU(u8, gic_cpu_id);
> >  /* Maximum cpu interface per GIC */
> >  #define NR_GIC_CPU_IF 8
> >
> > -#undef GIC_DEBUG
> > +#define GIC_DEBUG 1
> >
> >  static void gic_update_one_lr(struct vcpu *v, int i);
> >
> > @@ -563,6 +563,8 @@ static inline void gic_set_lr(int lr, struct pending_irq *p,
> >          ((p->irq & GICH_LR_VIRTUAL_MASK) << GICH_LR_VIRTUAL_SHIFT);
> >      if ( p->desc != NULL )
> >          lr_val |= GICH_LR_HW | (p->desc->irq << GICH_LR_PHYSICAL_SHIFT);
> > +    else
> > +        lr_val |= GICH_LR_MAINTENANCE_IRQ;
> >
> >      GICH[GICH_LR + lr] = lr_val;
> >
> 
> 
> 
> -- 
> 
> Andrii Tseglytskyi | Embedded Dev
> GlobalLogic
> www.globallogic.com
> 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Xen 4.5 random freeze question
  2014-11-18 16:14                                   ` Stefano Stabellini
@ 2014-11-18 16:18                                     ` Andrii Tseglytskyi
  2014-11-18 16:46                                       ` Andrii Tseglytskyi
  0 siblings, 1 reply; 66+ messages in thread
From: Andrii Tseglytskyi @ 2014-11-18 16:18 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: Julien Grall, Ian Campbell, xen-devel

OK got it. Give me a few mins

On Tue, Nov 18, 2014 at 6:14 PM, Stefano Stabellini
<stefano.stabellini@eu.citrix.com> wrote:
> It is not the same: I would like to set GICH_V2_LR_MAINTENANCE_IRQ only
> for non-hardware irqs (desc == NULL) and keep avoiding
> GICH_V2_LR_MAINTENANCE_IRQ and setting GICH_LR_HW for hardware irqs.
>
> Also testing on 394b7e587b05d0f4a5fd6f067b38339ab5a77121 would avoid
> other potential bugs introduced later.
>
> On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote:
>> What if I try on top of current master branch the following code:
>>
>> diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
>> index 31fb81a..6764ab7 100644
>> --- a/xen/arch/arm/gic-v2.c
>> +++ b/xen/arch/arm/gic-v2.c
>> @@ -36,6 +36,8 @@
>>  #include <asm/io.h>
>>  #include <asm/gic.h>
>>
>> +#define GIC_DEBUG 1
>> +
>>  /*
>>   * LR register definitions are GIC v2 specific.
>>   * Moved these definitions from header file to here
>> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
>> index bcaded9..c03d6a6 100644
>> --- a/xen/arch/arm/gic.c
>> +++ b/xen/arch/arm/gic.c
>> @@ -41,7 +41,7 @@ static DEFINE_PER_CPU(uint64_t, lr_mask);
>>
>>  #define lr_all_full() (this_cpu(lr_mask) == ((1 <<
>> gic_hw_ops->info->nr_lrs) - 1))
>>
>> -#undef GIC_DEBUG
>> +#define GIC_DEBUG 1
>>
>>  static void gic_update_one_lr(struct vcpu *v, int i);
>>
>> It is equivalent to what you proposing - my code contains
>> PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI, as result the following lone will
>> be executed:
>>  lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ; inside gicv2_update_lr() function
>>
>> regards,
>> Andrii
>>
>> On Tue, Nov 18, 2014 at 5:39 PM, Stefano Stabellini
>> <stefano.stabellini@eu.citrix.com> wrote:
>> > On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote:
>> >> OK, I see that GICH_V2_LR_MAINTENANCE_IRQ must always be set and
>> >> everything works fine
>> >> The following 2 patches fixes xen/master for my platform.
>> >>
>> >> Stefano, could you please take a look to these changes?
>> >>
>> >> commit 3628a0aa35706a8f532af865ed784536ce514eca
>> >> Author: Andrii Tseglytskyi <andrii.tseglytskyi@globallogic.com>
>> >> Date:   Tue Nov 18 14:20:42 2014 +0200
>> >>
>> >>     xen/arm: dra7: always set GICH_V2_LR_MAINTENANCE_IRQ flag
>> >>
>> >>     Change-Id: Ia380b3507a182b11592588f65fd23693d4f87434
>> >>     Signed-off-by: Andrii Tseglytskyi <andrii.tseglytskyi@globallogic.com>
>> >>
>> >> diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
>> >> index 31fb81a..093ecdb 100644
>> >> --- a/xen/arch/arm/gic-v2.c
>> >> +++ b/xen/arch/arm/gic-v2.c
>> >> @@ -396,13 +396,14 @@ static void gicv2_update_lr(int lr, const struct
>> >> pending_irq *p,
>> >>                                               << GICH_V2_LR_PRIORITY_SHIFT) |
>> >>                ((p->irq & GICH_V2_LR_VIRTUAL_MASK) <<
>> >> GICH_V2_LR_VIRTUAL_SHIFT));
>> >>
>> >> -    if ( p->desc != NULL )
>> >> +    if ( platform_has_quirk(PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI) )
>> >>      {
>> >> -        if ( platform_has_quirk(PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI) )
>> >> -            lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ;
>> >> -        else
>> >> -            lr_reg |= GICH_V2_LR_HW | ((p->desc->irq &
>> >> GICH_V2_LR_PHYSICAL_MASK )
>> >> -                            << GICH_V2_LR_PHYSICAL_SHIFT);
>> >> +        lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ;
>> >> +    }
>> >> +    else if ( p->desc != NULL )
>> >> +    {
>> >> +        lr_reg |= GICH_V2_LR_HW | ((p->desc->irq & GICH_V2_LR_PHYSICAL_MASK )
>> >> +                       << GICH_V2_LR_PHYSICAL_SHIFT);
>> >>      }
>> >>
>> >>      writel_gich(lr_reg, GICH_LR + lr * 4);
>> >
>> > Actually in case p->desc == NULL (the irq is not an hardware irq, it
>> > could be the virtual timer irq or the evtchn irq), you shouldn't need
>> > the maintenance interrupt, if the bug was really due to GICH_LR_HW not
>> > working correctly on OMAP5. This changes might only be better at
>> > "hiding" the real issue.
>> >
>> > Maybe the problem is exactly the opposite: the new scheme for avoiding
>> > maintenance interrupts doesn't work for software interrupts.
>> > The commit that should make them work correctly after the
>> > no-maintenance-irq commit is 394b7e587b05d0f4a5fd6f067b38339ab5a77121
>> > If you look at the changes to gic_update_one_lr in that commit, you'll
>> > see that is going to set a software irq as PENDING if it is already ACTIVE.
>> > Maybe that doesn't work correctly on OMAP5.
>> >
>> > Could you try this patch on top of
>> > 394b7e587b05d0f4a5fd6f067b38339ab5a77121?  It should help us understand
>> > if the problem is specifically with software irqs.
>> >
>> >
>> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
>> > index b7516c0..d8a17c9 100644
>> > --- a/xen/arch/arm/gic.c
>> > +++ b/xen/arch/arm/gic.c
>> > @@ -66,7 +66,7 @@ static DEFINE_PER_CPU(u8, gic_cpu_id);
>> >  /* Maximum cpu interface per GIC */
>> >  #define NR_GIC_CPU_IF 8
>> >
>> > -#undef GIC_DEBUG
>> > +#define GIC_DEBUG 1
>> >
>> >  static void gic_update_one_lr(struct vcpu *v, int i);
>> >
>> > @@ -563,6 +563,8 @@ static inline void gic_set_lr(int lr, struct pending_irq *p,
>> >          ((p->irq & GICH_LR_VIRTUAL_MASK) << GICH_LR_VIRTUAL_SHIFT);
>> >      if ( p->desc != NULL )
>> >          lr_val |= GICH_LR_HW | (p->desc->irq << GICH_LR_PHYSICAL_SHIFT);
>> > +    else
>> > +        lr_val |= GICH_LR_MAINTENANCE_IRQ;
>> >
>> >      GICH[GICH_LR + lr] = lr_val;
>> >
>>
>>
>>
>> --
>>
>> Andrii Tseglytskyi | Embedded Dev
>> GlobalLogic
>> www.globallogic.com
>>



-- 

Andrii Tseglytskyi | Embedded Dev
GlobalLogic
www.globallogic.com

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Xen 4.5 random freeze question
  2014-11-18 16:18                                     ` Andrii Tseglytskyi
@ 2014-11-18 16:46                                       ` Andrii Tseglytskyi
  2014-11-18 17:51                                         ` Stefano Stabellini
  0 siblings, 1 reply; 66+ messages in thread
From: Andrii Tseglytskyi @ 2014-11-18 16:46 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: Julien Grall, Ian Campbell, xen-devel

Hi Stefano,

No hangs with this change.
Complete log is the following:

U-Boot SPL 2013.10-00499-g062782f (Oct 14 2014 - 11:36:26)
DRA752 ES1.0
<ethaddr> not set. Validating first E-fuse MAC
cpsw
- UART enabled -
- CPU 00000000 booting -
- Xen starting in Hyp mode -
- Zero BSS -
- Setting up control registers -
- Turning on paging -
- Ready -
(XEN) Checking for initrd in /chosen
(XEN) RAM: 0000000080000000 - 000000009fffffff
(XEN) RAM: 00000000a0000000 - 00000000bfffffff
(XEN) RAM: 00000000c0000000 - 00000000dfffffff
(XEN)
(XEN) MODULE[1]: 00000000c2000000 - 00000000c20069aa
(XEN) MODULE[2]: 00000000c0000000 - 00000000c2000000
(XEN) MODULE[3]: 0000000000000000 - 0000000000000000
(XEN) MODULE[4]: 00000000c3000000 - 00000000c3010000
(XEN)  RESVD[0]: 00000000ba300000 - 00000000bfd00000
(XEN)  RESVD[1]: 0000000095800000 - 0000000095900000
(XEN)  RESVD[2]: 0000000098a00000 - 0000000098b00000
(XEN)  RESVD[3]: 0000000095f00000 - 0000000098a00000
(XEN)  RESVD[4]: 0000000095900000 - 0000000095f00000
(XEN)
(XEN) Command line: dom0_mem=128M console=dtuart dtuart=serial0
dom0_max_vcpus=2 bootscrub=0 flask_enforcing=1
(XEN) Placing Xen at 0x00000000dfe00000-0x00000000e0000000
(XEN) Xen heap: 00000000d2000000-00000000de000000 (49152 pages)
(XEN) Dom heap: 344064 pages
(XEN) Domain heap initialised
(XEN) Looking for UART console serial0
 Xen 4.5-unstable
(XEN) Xen version 4.5-unstable (atseglytskyi@)
(arm-linux-gnueabihf-gcc (crosstool-NG
linaro-1.13.1-4.7-2013.04-20130415 - Linaro GCC 2013.04) 4.7.3
20130328 (prerelease)) debu4
(XEN) Latest ChangeSet: Thu Jul 3 12:55:26 2014 +0300 git:3ee354f-dirty
(XEN) Processor: 412fc0f2: "ARM Limited", variant: 0x2, part 0xc0f, rev 0x2
(XEN) 32-bit Execution:
(XEN)   Processor Features: 00001131:00011011
(XEN)     Instruction Sets: AArch32 Thumb Thumb-2 ThumbEE Jazelle
(XEN)     Extensions: GenericTimer Security
(XEN)   Debug Features: 02010555
(XEN)   Auxiliary Features: 00000000
(XEN)   Memory Model Features: 10201105 20000000 01240000 02102211
(XEN)  ISA Features: 02101110 13112111 21232041 11112131 10011142 00000000
(XEN) Platform: TI DRA7
(XEN) /psci method must be smc, but is: "hvc"
(XEN) Set AuxCoreBoot1 to 00000000dfe0004c (0020004c)
(XEN) Set AuxCoreBoot0 to 0x20
(XEN) Generic Timer IRQ: phys=30 hyp=26 virt=27
(XEN) Using generic timer at 6144 KHz
(XEN) GIC initialization:
(XEN)         gic_dist_addr=0000000048211000
(XEN)         gic_cpu_addr=0000000048212000
(XEN)         gic_hyp_addr=0000000048214000
(XEN)         gic_vcpu_addr=0000000048216000
(XEN)         gic_maintenance_irq=25
(XEN) GIC: 192 lines, 2 cpus, secure (IID 0000043b).
(XEN) Using scheduler: SMP Credit Scheduler (credit)
(XEN) I/O virtualisation disabled
(XEN) Allocated console ring of 16 KiB.
(XEN) VFP implementer 0x41 architecture 4 part 0x30 variant 0xf rev 0x0
(XEN) Bringing up CPU1
- CPU 00000001 booting -
- Xen starting in Hyp mode -
- Setting up control registers -
- Turning on paging -
- Ready -
(XEN) CPU 1 booted.
(XEN) Brought up 2 CPUs
(XEN) *** LOADING DOMAIN 0 ***
(XEN) Loading kernel from boot module 2
(XEN) Populate P2M 0xc8000000->0xd0000000 (1:1 mapping for dom0)
(XEN) Loading zImage from 00000000c0000040 to 00000000cfc00000-00000000cff50c48
(XEN) Loading dom0 DTB to 0x00000000cfa00000-0x00000000cfa05ba8
(XEN) Std. Loglevel: All
(XEN) Guest Loglevel: All
(XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch
input to Xen)
(XEN) Freed 272kB init memory.
(XEN) gic.c:673:d0v0 trying to inject irq=2 into d0v0, when it is
already pending in LR0
(XEN) gic.c:673:d0v0 trying to inject irq=2 into d0v0, when it is
already pending in LR0
[    0.000000] /cpus/cpu@0 missing clock-frequency property
[    0.000000] /cpus/cpu@1 missing clock-frequency property
[    0.140625] omap-gpmc omap-gpmc: failed to reserve memory
[    0.187500] omap_l3_noc ocp.3: couldn't find resource 2
[    0.273437] i2c i2c-1: of_i2c: invalid reg on
/ocp/i2c@48072000/camera_ov10635
[    0.437500] ldo3: operation not allowed
[    0.437500] omapdss HDMI error: can't set the voltage regulator
[    0.468750] tfc_s9700 display0: tfc_s9700_probe probe
[    0.468750] ov1063x 1-0030: No deserializer node found
[    0.468750] ov1063x 1-0030: No serializer node found
[    0.468750] ov1063x 1-0030: Failed writing register 0x0103!
[    0.468750] dra7xx-vip vip1-0: Waiting for I2C subdevice 30
[    0.578125] ahci ahci.0.auto: can't get clock
[    0.898437] ldc_module_init
[    1.304687] Missing dual_emac_res_vlan in DT.
[    1.304687] Using 1 as Reserved VLAN for 0 slave
[    1.312500] Missing dual_emac_res_vlan in DT.
[    1.320312] Using 2 as Reserved VLAN for 1 slave
[    1.382812] Freeing init memory: 236K
sh: write error: No such device
Cannot identify '/dev/camera0': 2, No such file or directory
Parsing config from /xen/images/DomUAndroid.cfg
XSM Disabled: seclabel not supported
(XEN) do_physdev_op 16 cmd=13: not implemented yet
libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give
dom1 access to irq 53: Function not implemented
(XEN) do_physdev_op 16 cmd=13: not implemented yet
libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give
dom1 access to irq 71: Function not implemented
(XEN) do_physdev_op 16 cmd=13: not implemented yet
libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give
dom1 access to irq 173: Function not implemented
(XEN) do_physdev_op 16 cmd=13: not implemented yet
libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give
dom1 access to irq 174: Function not implemented
Turning on vfb in domain 1
(XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is
still lr_pending
(XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is
still lr_pending
Parsing config from /xen/images/DomUQNX.cfg
XSM Disabled: seclabel not supported(XEN) gic.c:617:d0v1 trying to
inject irq=2 into d0v0, when it is still lr_pending

(XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is
still lr_pending
[    4.304687] dra7-evm-sound sound.17: cpu dai node is invalid
[    4.312500] dra7-evm-sound sound.17: failed to add bluetooth dai link -22
xc: error: panic: xc_dom_core.c:644: xc_dom_find_loader: no loader
found: Invalid kernel
libxl: error: libxl_dom.c:436:libxl__build_pv: xc_dom_parse_image
failed: No such file or directory
libxl: error: libxl_create.c:1030:domcreate_rebuild_done: cannot
(re-)build domain: -3
(XEN) gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is
still lr_pending
(XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is
still lr_pending
Turning on 'vsnd' in domain '1' (dev_id: '0')
Turning on vkbd in domain 1
(XEN) gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is
still lr_pending
(XEN) gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is
still lr_pending
(XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is
still lr_pending

Please press Enter to activate this console. (XEN) gic.c:617:d0v1
trying to inject irq=2 into d0v0, when it is still lr_pending

On Tue, Nov 18, 2014 at 6:18 PM, Andrii Tseglytskyi
<andrii.tseglytskyi@globallogic.com> wrote:
> OK got it. Give me a few mins
>
> On Tue, Nov 18, 2014 at 6:14 PM, Stefano Stabellini
> <stefano.stabellini@eu.citrix.com> wrote:
>> It is not the same: I would like to set GICH_V2_LR_MAINTENANCE_IRQ only
>> for non-hardware irqs (desc == NULL) and keep avoiding
>> GICH_V2_LR_MAINTENANCE_IRQ and setting GICH_LR_HW for hardware irqs.
>>
>> Also testing on 394b7e587b05d0f4a5fd6f067b38339ab5a77121 would avoid
>> other potential bugs introduced later.
>>
>> On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote:
>>> What if I try on top of current master branch the following code:
>>>
>>> diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
>>> index 31fb81a..6764ab7 100644
>>> --- a/xen/arch/arm/gic-v2.c
>>> +++ b/xen/arch/arm/gic-v2.c
>>> @@ -36,6 +36,8 @@
>>>  #include <asm/io.h>
>>>  #include <asm/gic.h>
>>>
>>> +#define GIC_DEBUG 1
>>> +
>>>  /*
>>>   * LR register definitions are GIC v2 specific.
>>>   * Moved these definitions from header file to here
>>> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
>>> index bcaded9..c03d6a6 100644
>>> --- a/xen/arch/arm/gic.c
>>> +++ b/xen/arch/arm/gic.c
>>> @@ -41,7 +41,7 @@ static DEFINE_PER_CPU(uint64_t, lr_mask);
>>>
>>>  #define lr_all_full() (this_cpu(lr_mask) == ((1 <<
>>> gic_hw_ops->info->nr_lrs) - 1))
>>>
>>> -#undef GIC_DEBUG
>>> +#define GIC_DEBUG 1
>>>
>>>  static void gic_update_one_lr(struct vcpu *v, int i);
>>>
>>> It is equivalent to what you proposing - my code contains
>>> PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI, as result the following lone will
>>> be executed:
>>>  lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ; inside gicv2_update_lr() function
>>>
>>> regards,
>>> Andrii
>>>
>>> On Tue, Nov 18, 2014 at 5:39 PM, Stefano Stabellini
>>> <stefano.stabellini@eu.citrix.com> wrote:
>>> > On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote:
>>> >> OK, I see that GICH_V2_LR_MAINTENANCE_IRQ must always be set and
>>> >> everything works fine
>>> >> The following 2 patches fixes xen/master for my platform.
>>> >>
>>> >> Stefano, could you please take a look to these changes?
>>> >>
>>> >> commit 3628a0aa35706a8f532af865ed784536ce514eca
>>> >> Author: Andrii Tseglytskyi <andrii.tseglytskyi@globallogic.com>
>>> >> Date:   Tue Nov 18 14:20:42 2014 +0200
>>> >>
>>> >>     xen/arm: dra7: always set GICH_V2_LR_MAINTENANCE_IRQ flag
>>> >>
>>> >>     Change-Id: Ia380b3507a182b11592588f65fd23693d4f87434
>>> >>     Signed-off-by: Andrii Tseglytskyi <andrii.tseglytskyi@globallogic.com>
>>> >>
>>> >> diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
>>> >> index 31fb81a..093ecdb 100644
>>> >> --- a/xen/arch/arm/gic-v2.c
>>> >> +++ b/xen/arch/arm/gic-v2.c
>>> >> @@ -396,13 +396,14 @@ static void gicv2_update_lr(int lr, const struct
>>> >> pending_irq *p,
>>> >>                                               << GICH_V2_LR_PRIORITY_SHIFT) |
>>> >>                ((p->irq & GICH_V2_LR_VIRTUAL_MASK) <<
>>> >> GICH_V2_LR_VIRTUAL_SHIFT));
>>> >>
>>> >> -    if ( p->desc != NULL )
>>> >> +    if ( platform_has_quirk(PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI) )
>>> >>      {
>>> >> -        if ( platform_has_quirk(PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI) )
>>> >> -            lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ;
>>> >> -        else
>>> >> -            lr_reg |= GICH_V2_LR_HW | ((p->desc->irq &
>>> >> GICH_V2_LR_PHYSICAL_MASK )
>>> >> -                            << GICH_V2_LR_PHYSICAL_SHIFT);
>>> >> +        lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ;
>>> >> +    }
>>> >> +    else if ( p->desc != NULL )
>>> >> +    {
>>> >> +        lr_reg |= GICH_V2_LR_HW | ((p->desc->irq & GICH_V2_LR_PHYSICAL_MASK )
>>> >> +                       << GICH_V2_LR_PHYSICAL_SHIFT);
>>> >>      }
>>> >>
>>> >>      writel_gich(lr_reg, GICH_LR + lr * 4);
>>> >
>>> > Actually in case p->desc == NULL (the irq is not an hardware irq, it
>>> > could be the virtual timer irq or the evtchn irq), you shouldn't need
>>> > the maintenance interrupt, if the bug was really due to GICH_LR_HW not
>>> > working correctly on OMAP5. This changes might only be better at
>>> > "hiding" the real issue.
>>> >
>>> > Maybe the problem is exactly the opposite: the new scheme for avoiding
>>> > maintenance interrupts doesn't work for software interrupts.
>>> > The commit that should make them work correctly after the
>>> > no-maintenance-irq commit is 394b7e587b05d0f4a5fd6f067b38339ab5a77121
>>> > If you look at the changes to gic_update_one_lr in that commit, you'll
>>> > see that is going to set a software irq as PENDING if it is already ACTIVE.
>>> > Maybe that doesn't work correctly on OMAP5.
>>> >
>>> > Could you try this patch on top of
>>> > 394b7e587b05d0f4a5fd6f067b38339ab5a77121?  It should help us understand
>>> > if the problem is specifically with software irqs.
>>> >
>>> >
>>> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
>>> > index b7516c0..d8a17c9 100644
>>> > --- a/xen/arch/arm/gic.c
>>> > +++ b/xen/arch/arm/gic.c
>>> > @@ -66,7 +66,7 @@ static DEFINE_PER_CPU(u8, gic_cpu_id);
>>> >  /* Maximum cpu interface per GIC */
>>> >  #define NR_GIC_CPU_IF 8
>>> >
>>> > -#undef GIC_DEBUG
>>> > +#define GIC_DEBUG 1
>>> >
>>> >  static void gic_update_one_lr(struct vcpu *v, int i);
>>> >
>>> > @@ -563,6 +563,8 @@ static inline void gic_set_lr(int lr, struct pending_irq *p,
>>> >          ((p->irq & GICH_LR_VIRTUAL_MASK) << GICH_LR_VIRTUAL_SHIFT);
>>> >      if ( p->desc != NULL )
>>> >          lr_val |= GICH_LR_HW | (p->desc->irq << GICH_LR_PHYSICAL_SHIFT);
>>> > +    else
>>> > +        lr_val |= GICH_LR_MAINTENANCE_IRQ;
>>> >
>>> >      GICH[GICH_LR + lr] = lr_val;
>>> >
>>>
>>>
>>>
>>> --
>>>
>>> Andrii Tseglytskyi | Embedded Dev
>>> GlobalLogic
>>> www.globallogic.com
>>>
>
>
>
> --
>
> Andrii Tseglytskyi | Embedded Dev
> GlobalLogic
> www.globallogic.com



-- 

Andrii Tseglytskyi | Embedded Dev
GlobalLogic
www.globallogic.com

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Xen 4.5 random freeze question
  2014-11-18 16:46                                       ` Andrii Tseglytskyi
@ 2014-11-18 17:51                                         ` Stefano Stabellini
  2014-11-19  9:38                                           ` Andrii Tseglytskyi
  0 siblings, 1 reply; 66+ messages in thread
From: Stefano Stabellini @ 2014-11-18 17:51 UTC (permalink / raw)
  To: Andrii Tseglytskyi
  Cc: Julien Grall, xen-devel, Ian Campbell, Stefano Stabellini

Hello Andrii,
we are getting closer :-)

It would help if you post the output with GIC_DEBUG defined but without
the other change that "fixes" the issue.

I think the problem is probably due to software irqs.
You are getting too many

gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is still lr_pending

messages. That means you are loosing virtual SGIs (guest VCPU to guest
VCPU). It would be best to investigate why, especially if you get many
more of the same messages without the MAINTENANCE_IRQ change I
suggested.

This patch might also help understading the problem more:


diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index b7516c0..5eaeca2 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -717,7 +717,12 @@ static void gic_restore_pending_irqs(struct vcpu *v)
     list_for_each_entry_safe ( p, t, &v->arch.vgic.lr_pending, lr_queue )
     {
         i = find_first_zero_bit(&this_cpu(lr_mask), nr_lrs);
-        if ( i >= nr_lrs ) return;
+        if ( i >= nr_lrs )
+        {
+            gdprintk(XENLOG_DEBUG, "LRs full, not injecting irq=%u into d%dv%d\n",
+                    p->irq, v->domain->domain_id, v->vcpu_id);
+            continue;
+        }
 
         spin_lock_irqsave(&gic.lock, flags);
         gic_set_lr(i, p, GICH_LR_PENDING);




On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote:
> Hi Stefano,
> 
> No hangs with this change.
> Complete log is the following:
> 
> U-Boot SPL 2013.10-00499-g062782f (Oct 14 2014 - 11:36:26)
> DRA752 ES1.0
> <ethaddr> not set. Validating first E-fuse MAC
> cpsw
> - UART enabled -
> - CPU 00000000 booting -
> - Xen starting in Hyp mode -
> - Zero BSS -
> - Setting up control registers -
> - Turning on paging -
> - Ready -
> (XEN) Checking for initrd in /chosen
> (XEN) RAM: 0000000080000000 - 000000009fffffff
> (XEN) RAM: 00000000a0000000 - 00000000bfffffff
> (XEN) RAM: 00000000c0000000 - 00000000dfffffff
> (XEN)
> (XEN) MODULE[1]: 00000000c2000000 - 00000000c20069aa
> (XEN) MODULE[2]: 00000000c0000000 - 00000000c2000000
> (XEN) MODULE[3]: 0000000000000000 - 0000000000000000
> (XEN) MODULE[4]: 00000000c3000000 - 00000000c3010000
> (XEN)  RESVD[0]: 00000000ba300000 - 00000000bfd00000
> (XEN)  RESVD[1]: 0000000095800000 - 0000000095900000
> (XEN)  RESVD[2]: 0000000098a00000 - 0000000098b00000
> (XEN)  RESVD[3]: 0000000095f00000 - 0000000098a00000
> (XEN)  RESVD[4]: 0000000095900000 - 0000000095f00000
> (XEN)
> (XEN) Command line: dom0_mem=128M console=dtuart dtuart=serial0
> dom0_max_vcpus=2 bootscrub=0 flask_enforcing=1
> (XEN) Placing Xen at 0x00000000dfe00000-0x00000000e0000000
> (XEN) Xen heap: 00000000d2000000-00000000de000000 (49152 pages)
> (XEN) Dom heap: 344064 pages
> (XEN) Domain heap initialised
> (XEN) Looking for UART console serial0
>  Xen 4.5-unstable
> (XEN) Xen version 4.5-unstable (atseglytskyi@)
> (arm-linux-gnueabihf-gcc (crosstool-NG
> linaro-1.13.1-4.7-2013.04-20130415 - Linaro GCC 2013.04) 4.7.3
> 20130328 (prerelease)) debu4
> (XEN) Latest ChangeSet: Thu Jul 3 12:55:26 2014 +0300 git:3ee354f-dirty
> (XEN) Processor: 412fc0f2: "ARM Limited", variant: 0x2, part 0xc0f, rev 0x2
> (XEN) 32-bit Execution:
> (XEN)   Processor Features: 00001131:00011011
> (XEN)     Instruction Sets: AArch32 Thumb Thumb-2 ThumbEE Jazelle
> (XEN)     Extensions: GenericTimer Security
> (XEN)   Debug Features: 02010555
> (XEN)   Auxiliary Features: 00000000
> (XEN)   Memory Model Features: 10201105 20000000 01240000 02102211
> (XEN)  ISA Features: 02101110 13112111 21232041 11112131 10011142 00000000
> (XEN) Platform: TI DRA7
> (XEN) /psci method must be smc, but is: "hvc"
> (XEN) Set AuxCoreBoot1 to 00000000dfe0004c (0020004c)
> (XEN) Set AuxCoreBoot0 to 0x20
> (XEN) Generic Timer IRQ: phys=30 hyp=26 virt=27
> (XEN) Using generic timer at 6144 KHz
> (XEN) GIC initialization:
> (XEN)         gic_dist_addr=0000000048211000
> (XEN)         gic_cpu_addr=0000000048212000
> (XEN)         gic_hyp_addr=0000000048214000
> (XEN)         gic_vcpu_addr=0000000048216000
> (XEN)         gic_maintenance_irq=25
> (XEN) GIC: 192 lines, 2 cpus, secure (IID 0000043b).
> (XEN) Using scheduler: SMP Credit Scheduler (credit)
> (XEN) I/O virtualisation disabled
> (XEN) Allocated console ring of 16 KiB.
> (XEN) VFP implementer 0x41 architecture 4 part 0x30 variant 0xf rev 0x0
> (XEN) Bringing up CPU1
> - CPU 00000001 booting -
> - Xen starting in Hyp mode -
> - Setting up control registers -
> - Turning on paging -
> - Ready -
> (XEN) CPU 1 booted.
> (XEN) Brought up 2 CPUs
> (XEN) *** LOADING DOMAIN 0 ***
> (XEN) Loading kernel from boot module 2
> (XEN) Populate P2M 0xc8000000->0xd0000000 (1:1 mapping for dom0)
> (XEN) Loading zImage from 00000000c0000040 to 00000000cfc00000-00000000cff50c48
> (XEN) Loading dom0 DTB to 0x00000000cfa00000-0x00000000cfa05ba8
> (XEN) Std. Loglevel: All
> (XEN) Guest Loglevel: All
> (XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch
> input to Xen)
> (XEN) Freed 272kB init memory.
> (XEN) gic.c:673:d0v0 trying to inject irq=2 into d0v0, when it is
> already pending in LR0
> (XEN) gic.c:673:d0v0 trying to inject irq=2 into d0v0, when it is
> already pending in LR0
> [    0.000000] /cpus/cpu@0 missing clock-frequency property
> [    0.000000] /cpus/cpu@1 missing clock-frequency property
> [    0.140625] omap-gpmc omap-gpmc: failed to reserve memory
> [    0.187500] omap_l3_noc ocp.3: couldn't find resource 2
> [    0.273437] i2c i2c-1: of_i2c: invalid reg on
> /ocp/i2c@48072000/camera_ov10635
> [    0.437500] ldo3: operation not allowed
> [    0.437500] omapdss HDMI error: can't set the voltage regulator
> [    0.468750] tfc_s9700 display0: tfc_s9700_probe probe
> [    0.468750] ov1063x 1-0030: No deserializer node found
> [    0.468750] ov1063x 1-0030: No serializer node found
> [    0.468750] ov1063x 1-0030: Failed writing register 0x0103!
> [    0.468750] dra7xx-vip vip1-0: Waiting for I2C subdevice 30
> [    0.578125] ahci ahci.0.auto: can't get clock
> [    0.898437] ldc_module_init
> [    1.304687] Missing dual_emac_res_vlan in DT.
> [    1.304687] Using 1 as Reserved VLAN for 0 slave
> [    1.312500] Missing dual_emac_res_vlan in DT.
> [    1.320312] Using 2 as Reserved VLAN for 1 slave
> [    1.382812] Freeing init memory: 236K
> sh: write error: No such device
> Cannot identify '/dev/camera0': 2, No such file or directory
> Parsing config from /xen/images/DomUAndroid.cfg
> XSM Disabled: seclabel not supported
> (XEN) do_physdev_op 16 cmd=13: not implemented yet
> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give
> dom1 access to irq 53: Function not implemented
> (XEN) do_physdev_op 16 cmd=13: not implemented yet
> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give
> dom1 access to irq 71: Function not implemented
> (XEN) do_physdev_op 16 cmd=13: not implemented yet
> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give
> dom1 access to irq 173: Function not implemented
> (XEN) do_physdev_op 16 cmd=13: not implemented yet
> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give
> dom1 access to irq 174: Function not implemented
> Turning on vfb in domain 1
> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is
> still lr_pending
> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is
> still lr_pending
> Parsing config from /xen/images/DomUQNX.cfg
> XSM Disabled: seclabel not supported(XEN) gic.c:617:d0v1 trying to
> inject irq=2 into d0v0, when it is still lr_pending
> 
> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is
> still lr_pending
> [    4.304687] dra7-evm-sound sound.17: cpu dai node is invalid
> [    4.312500] dra7-evm-sound sound.17: failed to add bluetooth dai link -22
> xc: error: panic: xc_dom_core.c:644: xc_dom_find_loader: no loader
> found: Invalid kernel
> libxl: error: libxl_dom.c:436:libxl__build_pv: xc_dom_parse_image
> failed: No such file or directory
> libxl: error: libxl_create.c:1030:domcreate_rebuild_done: cannot
> (re-)build domain: -3
> (XEN) gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is
> still lr_pending
> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is
> still lr_pending
> Turning on 'vsnd' in domain '1' (dev_id: '0')
> Turning on vkbd in domain 1
> (XEN) gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is
> still lr_pending
> (XEN) gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is
> still lr_pending
> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is
> still lr_pending
> 
> Please press Enter to activate this console. (XEN) gic.c:617:d0v1
> trying to inject irq=2 into d0v0, when it is still lr_pending
> 
> On Tue, Nov 18, 2014 at 6:18 PM, Andrii Tseglytskyi
> <andrii.tseglytskyi@globallogic.com> wrote:
> > OK got it. Give me a few mins
> >
> > On Tue, Nov 18, 2014 at 6:14 PM, Stefano Stabellini
> > <stefano.stabellini@eu.citrix.com> wrote:
> >> It is not the same: I would like to set GICH_V2_LR_MAINTENANCE_IRQ only
> >> for non-hardware irqs (desc == NULL) and keep avoiding
> >> GICH_V2_LR_MAINTENANCE_IRQ and setting GICH_LR_HW for hardware irqs.
> >>
> >> Also testing on 394b7e587b05d0f4a5fd6f067b38339ab5a77121 would avoid
> >> other potential bugs introduced later.
> >>
> >> On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote:
> >>> What if I try on top of current master branch the following code:
> >>>
> >>> diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
> >>> index 31fb81a..6764ab7 100644
> >>> --- a/xen/arch/arm/gic-v2.c
> >>> +++ b/xen/arch/arm/gic-v2.c
> >>> @@ -36,6 +36,8 @@
> >>>  #include <asm/io.h>
> >>>  #include <asm/gic.h>
> >>>
> >>> +#define GIC_DEBUG 1
> >>> +
> >>>  /*
> >>>   * LR register definitions are GIC v2 specific.
> >>>   * Moved these definitions from header file to here
> >>> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> >>> index bcaded9..c03d6a6 100644
> >>> --- a/xen/arch/arm/gic.c
> >>> +++ b/xen/arch/arm/gic.c
> >>> @@ -41,7 +41,7 @@ static DEFINE_PER_CPU(uint64_t, lr_mask);
> >>>
> >>>  #define lr_all_full() (this_cpu(lr_mask) == ((1 <<
> >>> gic_hw_ops->info->nr_lrs) - 1))
> >>>
> >>> -#undef GIC_DEBUG
> >>> +#define GIC_DEBUG 1
> >>>
> >>>  static void gic_update_one_lr(struct vcpu *v, int i);
> >>>
> >>> It is equivalent to what you proposing - my code contains
> >>> PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI, as result the following lone will
> >>> be executed:
> >>>  lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ; inside gicv2_update_lr() function
> >>>
> >>> regards,
> >>> Andrii
> >>>
> >>> On Tue, Nov 18, 2014 at 5:39 PM, Stefano Stabellini
> >>> <stefano.stabellini@eu.citrix.com> wrote:
> >>> > On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote:
> >>> >> OK, I see that GICH_V2_LR_MAINTENANCE_IRQ must always be set and
> >>> >> everything works fine
> >>> >> The following 2 patches fixes xen/master for my platform.
> >>> >>
> >>> >> Stefano, could you please take a look to these changes?
> >>> >>
> >>> >> commit 3628a0aa35706a8f532af865ed784536ce514eca
> >>> >> Author: Andrii Tseglytskyi <andrii.tseglytskyi@globallogic.com>
> >>> >> Date:   Tue Nov 18 14:20:42 2014 +0200
> >>> >>
> >>> >>     xen/arm: dra7: always set GICH_V2_LR_MAINTENANCE_IRQ flag
> >>> >>
> >>> >>     Change-Id: Ia380b3507a182b11592588f65fd23693d4f87434
> >>> >>     Signed-off-by: Andrii Tseglytskyi <andrii.tseglytskyi@globallogic.com>
> >>> >>
> >>> >> diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
> >>> >> index 31fb81a..093ecdb 100644
> >>> >> --- a/xen/arch/arm/gic-v2.c
> >>> >> +++ b/xen/arch/arm/gic-v2.c
> >>> >> @@ -396,13 +396,14 @@ static void gicv2_update_lr(int lr, const struct
> >>> >> pending_irq *p,
> >>> >>                                               << GICH_V2_LR_PRIORITY_SHIFT) |
> >>> >>                ((p->irq & GICH_V2_LR_VIRTUAL_MASK) <<
> >>> >> GICH_V2_LR_VIRTUAL_SHIFT));
> >>> >>
> >>> >> -    if ( p->desc != NULL )
> >>> >> +    if ( platform_has_quirk(PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI) )
> >>> >>      {
> >>> >> -        if ( platform_has_quirk(PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI) )
> >>> >> -            lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ;
> >>> >> -        else
> >>> >> -            lr_reg |= GICH_V2_LR_HW | ((p->desc->irq &
> >>> >> GICH_V2_LR_PHYSICAL_MASK )
> >>> >> -                            << GICH_V2_LR_PHYSICAL_SHIFT);
> >>> >> +        lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ;
> >>> >> +    }
> >>> >> +    else if ( p->desc != NULL )
> >>> >> +    {
> >>> >> +        lr_reg |= GICH_V2_LR_HW | ((p->desc->irq & GICH_V2_LR_PHYSICAL_MASK )
> >>> >> +                       << GICH_V2_LR_PHYSICAL_SHIFT);
> >>> >>      }
> >>> >>
> >>> >>      writel_gich(lr_reg, GICH_LR + lr * 4);
> >>> >
> >>> > Actually in case p->desc == NULL (the irq is not an hardware irq, it
> >>> > could be the virtual timer irq or the evtchn irq), you shouldn't need
> >>> > the maintenance interrupt, if the bug was really due to GICH_LR_HW not
> >>> > working correctly on OMAP5. This changes might only be better at
> >>> > "hiding" the real issue.
> >>> >
> >>> > Maybe the problem is exactly the opposite: the new scheme for avoiding
> >>> > maintenance interrupts doesn't work for software interrupts.
> >>> > The commit that should make them work correctly after the
> >>> > no-maintenance-irq commit is 394b7e587b05d0f4a5fd6f067b38339ab5a77121
> >>> > If you look at the changes to gic_update_one_lr in that commit, you'll
> >>> > see that is going to set a software irq as PENDING if it is already ACTIVE.
> >>> > Maybe that doesn't work correctly on OMAP5.
> >>> >
> >>> > Could you try this patch on top of
> >>> > 394b7e587b05d0f4a5fd6f067b38339ab5a77121?  It should help us understand
> >>> > if the problem is specifically with software irqs.
> >>> >
> >>> >
> >>> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> >>> > index b7516c0..d8a17c9 100644
> >>> > --- a/xen/arch/arm/gic.c
> >>> > +++ b/xen/arch/arm/gic.c
> >>> > @@ -66,7 +66,7 @@ static DEFINE_PER_CPU(u8, gic_cpu_id);
> >>> >  /* Maximum cpu interface per GIC */
> >>> >  #define NR_GIC_CPU_IF 8
> >>> >
> >>> > -#undef GIC_DEBUG
> >>> > +#define GIC_DEBUG 1
> >>> >
> >>> >  static void gic_update_one_lr(struct vcpu *v, int i);
> >>> >
> >>> > @@ -563,6 +563,8 @@ static inline void gic_set_lr(int lr, struct pending_irq *p,
> >>> >          ((p->irq & GICH_LR_VIRTUAL_MASK) << GICH_LR_VIRTUAL_SHIFT);
> >>> >      if ( p->desc != NULL )
> >>> >          lr_val |= GICH_LR_HW | (p->desc->irq << GICH_LR_PHYSICAL_SHIFT);
> >>> > +    else
> >>> > +        lr_val |= GICH_LR_MAINTENANCE_IRQ;
> >>> >
> >>> >      GICH[GICH_LR + lr] = lr_val;
> >>> >
> >>>
> >>>
> >>>
> >>> --
> >>>
> >>> Andrii Tseglytskyi | Embedded Dev
> >>> GlobalLogic
> >>> www.globallogic.com
> >>>
> >
> >
> >
> > --
> >
> > Andrii Tseglytskyi | Embedded Dev
> > GlobalLogic
> > www.globallogic.com
> 
> 
> 
> -- 
> 
> Andrii Tseglytskyi | Embedded Dev
> GlobalLogic
> www.globallogic.com
> 

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: Xen 4.5 random freeze question
  2014-11-18 17:51                                         ` Stefano Stabellini
@ 2014-11-19  9:38                                           ` Andrii Tseglytskyi
  2014-11-19 11:12                                             ` Stefano Stabellini
  0 siblings, 1 reply; 66+ messages in thread
From: Andrii Tseglytskyi @ 2014-11-19  9:38 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: Julien Grall, Ian Campbell, xen-devel

Hi Stefano,

Thank you for your support.

You are right - with latest change you've proposed I got a continuous
prints during platform hang:

(XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
(XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
(XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
(XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
(XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
(XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
(XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0

Looks line issue needs further deeper debugging.

Regards,
Andrii

On Tue, Nov 18, 2014 at 7:51 PM, Stefano Stabellini
<stefano.stabellini@eu.citrix.com> wrote:
> Hello Andrii,
> we are getting closer :-)
>
> It would help if you post the output with GIC_DEBUG defined but without
> the other change that "fixes" the issue.
>
> I think the problem is probably due to software irqs.
> You are getting too many
>
> gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is still lr_pending
>
> messages. That means you are loosing virtual SGIs (guest VCPU to guest
> VCPU). It would be best to investigate why, especially if you get many
> more of the same messages without the MAINTENANCE_IRQ change I
> suggested.
>
> This patch might also help understading the problem more:
>
>
> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> index b7516c0..5eaeca2 100644
> --- a/xen/arch/arm/gic.c
> +++ b/xen/arch/arm/gic.c
> @@ -717,7 +717,12 @@ static void gic_restore_pending_irqs(struct vcpu *v)
>      list_for_each_entry_safe ( p, t, &v->arch.vgic.lr_pending, lr_queue )
>      {
>          i = find_first_zero_bit(&this_cpu(lr_mask), nr_lrs);
> -        if ( i >= nr_lrs ) return;
> +        if ( i >= nr_lrs )
> +        {
> +            gdprintk(XENLOG_DEBUG, "LRs full, not injecting irq=%u into d%dv%d\n",
> +                    p->irq, v->domain->domain_id, v->vcpu_id);
> +            continue;
> +        }
>
>          spin_lock_irqsave(&gic.lock, flags);
>          gic_set_lr(i, p, GICH_LR_PENDING);
>
>
>
>
> On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote:
>> Hi Stefano,
>>
>> No hangs with this change.
>> Complete log is the following:
>>
>> U-Boot SPL 2013.10-00499-g062782f (Oct 14 2014 - 11:36:26)
>> DRA752 ES1.0
>> <ethaddr> not set. Validating first E-fuse MAC
>> cpsw
>> - UART enabled -
>> - CPU 00000000 booting -
>> - Xen starting in Hyp mode -
>> - Zero BSS -
>> - Setting up control registers -
>> - Turning on paging -
>> - Ready -
>> (XEN) Checking for initrd in /chosen
>> (XEN) RAM: 0000000080000000 - 000000009fffffff
>> (XEN) RAM: 00000000a0000000 - 00000000bfffffff
>> (XEN) RAM: 00000000c0000000 - 00000000dfffffff
>> (XEN)
>> (XEN) MODULE[1]: 00000000c2000000 - 00000000c20069aa
>> (XEN) MODULE[2]: 00000000c0000000 - 00000000c2000000
>> (XEN) MODULE[3]: 0000000000000000 - 0000000000000000
>> (XEN) MODULE[4]: 00000000c3000000 - 00000000c3010000
>> (XEN)  RESVD[0]: 00000000ba300000 - 00000000bfd00000
>> (XEN)  RESVD[1]: 0000000095800000 - 0000000095900000
>> (XEN)  RESVD[2]: 0000000098a00000 - 0000000098b00000
>> (XEN)  RESVD[3]: 0000000095f00000 - 0000000098a00000
>> (XEN)  RESVD[4]: 0000000095900000 - 0000000095f00000
>> (XEN)
>> (XEN) Command line: dom0_mem=128M console=dtuart dtuart=serial0
>> dom0_max_vcpus=2 bootscrub=0 flask_enforcing=1
>> (XEN) Placing Xen at 0x00000000dfe00000-0x00000000e0000000
>> (XEN) Xen heap: 00000000d2000000-00000000de000000 (49152 pages)
>> (XEN) Dom heap: 344064 pages
>> (XEN) Domain heap initialised
>> (XEN) Looking for UART console serial0
>>  Xen 4.5-unstable
>> (XEN) Xen version 4.5-unstable (atseglytskyi@)
>> (arm-linux-gnueabihf-gcc (crosstool-NG
>> linaro-1.13.1-4.7-2013.04-20130415 - Linaro GCC 2013.04) 4.7.3
>> 20130328 (prerelease)) debu4
>> (XEN) Latest ChangeSet: Thu Jul 3 12:55:26 2014 +0300 git:3ee354f-dirty
>> (XEN) Processor: 412fc0f2: "ARM Limited", variant: 0x2, part 0xc0f, rev 0x2
>> (XEN) 32-bit Execution:
>> (XEN)   Processor Features: 00001131:00011011
>> (XEN)     Instruction Sets: AArch32 Thumb Thumb-2 ThumbEE Jazelle
>> (XEN)     Extensions: GenericTimer Security
>> (XEN)   Debug Features: 02010555
>> (XEN)   Auxiliary Features: 00000000
>> (XEN)   Memory Model Features: 10201105 20000000 01240000 02102211
>> (XEN)  ISA Features: 02101110 13112111 21232041 11112131 10011142 00000000
>> (XEN) Platform: TI DRA7
>> (XEN) /psci method must be smc, but is: "hvc"
>> (XEN) Set AuxCoreBoot1 to 00000000dfe0004c (0020004c)
>> (XEN) Set AuxCoreBoot0 to 0x20
>> (XEN) Generic Timer IRQ: phys=30 hyp=26 virt=27
>> (XEN) Using generic timer at 6144 KHz
>> (XEN) GIC initialization:
>> (XEN)         gic_dist_addr=0000000048211000
>> (XEN)         gic_cpu_addr=0000000048212000
>> (XEN)         gic_hyp_addr=0000000048214000
>> (XEN)         gic_vcpu_addr=0000000048216000
>> (XEN)         gic_maintenance_irq=25
>> (XEN) GIC: 192 lines, 2 cpus, secure (IID 0000043b).
>> (XEN) Using scheduler: SMP Credit Scheduler (credit)
>> (XEN) I/O virtualisation disabled
>> (XEN) Allocated console ring of 16 KiB.
>> (XEN) VFP implementer 0x41 architecture 4 part 0x30 variant 0xf rev 0x0
>> (XEN) Bringing up CPU1
>> - CPU 00000001 booting -
>> - Xen starting in Hyp mode -
>> - Setting up control registers -
>> - Turning on paging -
>> - Ready -
>> (XEN) CPU 1 booted.
>> (XEN) Brought up 2 CPUs
>> (XEN) *** LOADING DOMAIN 0 ***
>> (XEN) Loading kernel from boot module 2
>> (XEN) Populate P2M 0xc8000000->0xd0000000 (1:1 mapping for dom0)
>> (XEN) Loading zImage from 00000000c0000040 to 00000000cfc00000-00000000cff50c48
>> (XEN) Loading dom0 DTB to 0x00000000cfa00000-0x00000000cfa05ba8
>> (XEN) Std. Loglevel: All
>> (XEN) Guest Loglevel: All
>> (XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch
>> input to Xen)
>> (XEN) Freed 272kB init memory.
>> (XEN) gic.c:673:d0v0 trying to inject irq=2 into d0v0, when it is
>> already pending in LR0
>> (XEN) gic.c:673:d0v0 trying to inject irq=2 into d0v0, when it is
>> already pending in LR0
>> [    0.000000] /cpus/cpu@0 missing clock-frequency property
>> [    0.000000] /cpus/cpu@1 missing clock-frequency property
>> [    0.140625] omap-gpmc omap-gpmc: failed to reserve memory
>> [    0.187500] omap_l3_noc ocp.3: couldn't find resource 2
>> [    0.273437] i2c i2c-1: of_i2c: invalid reg on
>> /ocp/i2c@48072000/camera_ov10635
>> [    0.437500] ldo3: operation not allowed
>> [    0.437500] omapdss HDMI error: can't set the voltage regulator
>> [    0.468750] tfc_s9700 display0: tfc_s9700_probe probe
>> [    0.468750] ov1063x 1-0030: No deserializer node found
>> [    0.468750] ov1063x 1-0030: No serializer node found
>> [    0.468750] ov1063x 1-0030: Failed writing register 0x0103!
>> [    0.468750] dra7xx-vip vip1-0: Waiting for I2C subdevice 30
>> [    0.578125] ahci ahci.0.auto: can't get clock
>> [    0.898437] ldc_module_init
>> [    1.304687] Missing dual_emac_res_vlan in DT.
>> [    1.304687] Using 1 as Reserved VLAN for 0 slave
>> [    1.312500] Missing dual_emac_res_vlan in DT.
>> [    1.320312] Using 2 as Reserved VLAN for 1 slave
>> [    1.382812] Freeing init memory: 236K
>> sh: write error: No such device
>> Cannot identify '/dev/camera0': 2, No such file or directory
>> Parsing config from /xen/images/DomUAndroid.cfg
>> XSM Disabled: seclabel not supported
>> (XEN) do_physdev_op 16 cmd=13: not implemented yet
>> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give
>> dom1 access to irq 53: Function not implemented
>> (XEN) do_physdev_op 16 cmd=13: not implemented yet
>> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give
>> dom1 access to irq 71: Function not implemented
>> (XEN) do_physdev_op 16 cmd=13: not implemented yet
>> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give
>> dom1 access to irq 173: Function not implemented
>> (XEN) do_physdev_op 16 cmd=13: not implemented yet
>> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give
>> dom1 access to irq 174: Function not implemented
>> Turning on vfb in domain 1
>> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is
>> still lr_pending
>> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is
>> still lr_pending
>> Parsing config from /xen/images/DomUQNX.cfg
>> XSM Disabled: seclabel not supported(XEN) gic.c:617:d0v1 trying to
>> inject irq=2 into d0v0, when it is still lr_pending
>>
>> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is
>> still lr_pending
>> [    4.304687] dra7-evm-sound sound.17: cpu dai node is invalid
>> [    4.312500] dra7-evm-sound sound.17: failed to add bluetooth dai link -22
>> xc: error: panic: xc_dom_core.c:644: xc_dom_find_loader: no loader
>> found: Invalid kernel
>> libxl: error: libxl_dom.c:436:libxl__build_pv: xc_dom_parse_image
>> failed: No such file or directory
>> libxl: error: libxl_create.c:1030:domcreate_rebuild_done: cannot
>> (re-)build domain: -3
>> (XEN) gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is
>> still lr_pending
>> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is
>> still lr_pending
>> Turning on 'vsnd' in domain '1' (dev_id: '0')
>> Turning on vkbd in domain 1
>> (XEN) gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is
>> still lr_pending
>> (XEN) gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is
>> still lr_pending
>> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is
>> still lr_pending
>>
>> Please press Enter to activate this console. (XEN) gic.c:617:d0v1
>> trying to inject irq=2 into d0v0, when it is still lr_pending
>>
>> On Tue, Nov 18, 2014 at 6:18 PM, Andrii Tseglytskyi
>> <andrii.tseglytskyi@globallogic.com> wrote:
>> > OK got it. Give me a few mins
>> >
>> > On Tue, Nov 18, 2014 at 6:14 PM, Stefano Stabellini
>> > <stefano.stabellini@eu.citrix.com> wrote:
>> >> It is not the same: I would like to set GICH_V2_LR_MAINTENANCE_IRQ only
>> >> for non-hardware irqs (desc == NULL) and keep avoiding
>> >> GICH_V2_LR_MAINTENANCE_IRQ and setting GICH_LR_HW for hardware irqs.
>> >>
>> >> Also testing on 394b7e587b05d0f4a5fd6f067b38339ab5a77121 would avoid
>> >> other potential bugs introduced later.
>> >>
>> >> On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote:
>> >>> What if I try on top of current master branch the following code:
>> >>>
>> >>> diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
>> >>> index 31fb81a..6764ab7 100644
>> >>> --- a/xen/arch/arm/gic-v2.c
>> >>> +++ b/xen/arch/arm/gic-v2.c
>> >>> @@ -36,6 +36,8 @@
>> >>>  #include <asm/io.h>
>> >>>  #include <asm/gic.h>
>> >>>
>> >>> +#define GIC_DEBUG 1
>> >>> +
>> >>>  /*
>> >>>   * LR register definitions are GIC v2 specific.
>> >>>   * Moved these definitions from header file to here
>> >>> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
>> >>> index bcaded9..c03d6a6 100644
>> >>> --- a/xen/arch/arm/gic.c
>> >>> +++ b/xen/arch/arm/gic.c
>> >>> @@ -41,7 +41,7 @@ static DEFINE_PER_CPU(uint64_t, lr_mask);
>> >>>
>> >>>  #define lr_all_full() (this_cpu(lr_mask) == ((1 <<
>> >>> gic_hw_ops->info->nr_lrs) - 1))
>> >>>
>> >>> -#undef GIC_DEBUG
>> >>> +#define GIC_DEBUG 1
>> >>>
>> >>>  static void gic_update_one_lr(struct vcpu *v, int i);
>> >>>
>> >>> It is equivalent to what you proposing - my code contains
>> >>> PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI, as result the following lone will
>> >>> be executed:
>> >>>  lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ; inside gicv2_update_lr() function
>> >>>
>> >>> regards,
>> >>> Andrii
>> >>>
>> >>> On Tue, Nov 18, 2014 at 5:39 PM, Stefano Stabellini
>> >>> <stefano.stabellini@eu.citrix.com> wrote:
>> >>> > On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote:
>> >>> >> OK, I see that GICH_V2_LR_MAINTENANCE_IRQ must always be set and
>> >>> >> everything works fine
>> >>> >> The following 2 patches fixes xen/master for my platform.
>> >>> >>
>> >>> >> Stefano, could you please take a look to these changes?
>> >>> >>
>> >>> >> commit 3628a0aa35706a8f532af865ed784536ce514eca
>> >>> >> Author: Andrii Tseglytskyi <andrii.tseglytskyi@globallogic.com>
>> >>> >> Date:   Tue Nov 18 14:20:42 2014 +0200
>> >>> >>
>> >>> >>     xen/arm: dra7: always set GICH_V2_LR_MAINTENANCE_IRQ flag
>> >>> >>
>> >>> >>     Change-Id: Ia380b3507a182b11592588f65fd23693d4f87434
>> >>> >>     Signed-off-by: Andrii Tseglytskyi <andrii.tseglytskyi@globallogic.com>
>> >>> >>
>> >>> >> diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
>> >>> >> index 31fb81a..093ecdb 100644
>> >>> >> --- a/xen/arch/arm/gic-v2.c
>> >>> >> +++ b/xen/arch/arm/gic-v2.c
>> >>> >> @@ -396,13 +396,14 @@ static void gicv2_update_lr(int lr, const struct
>> >>> >> pending_irq *p,
>> >>> >>                                               << GICH_V2_LR_PRIORITY_SHIFT) |
>> >>> >>                ((p->irq & GICH_V2_LR_VIRTUAL_MASK) <<
>> >>> >> GICH_V2_LR_VIRTUAL_SHIFT));
>> >>> >>
>> >>> >> -    if ( p->desc != NULL )
>> >>> >> +    if ( platform_has_quirk(PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI) )
>> >>> >>      {
>> >>> >> -        if ( platform_has_quirk(PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI) )
>> >>> >> -            lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ;
>> >>> >> -        else
>> >>> >> -            lr_reg |= GICH_V2_LR_HW | ((p->desc->irq &
>> >>> >> GICH_V2_LR_PHYSICAL_MASK )
>> >>> >> -                            << GICH_V2_LR_PHYSICAL_SHIFT);
>> >>> >> +        lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ;
>> >>> >> +    }
>> >>> >> +    else if ( p->desc != NULL )
>> >>> >> +    {
>> >>> >> +        lr_reg |= GICH_V2_LR_HW | ((p->desc->irq & GICH_V2_LR_PHYSICAL_MASK )
>> >>> >> +                       << GICH_V2_LR_PHYSICAL_SHIFT);
>> >>> >>      }
>> >>> >>
>> >>> >>      writel_gich(lr_reg, GICH_LR + lr * 4);
>> >>> >
>> >>> > Actually in case p->desc == NULL (the irq is not an hardware irq, it
>> >>> > could be the virtual timer irq or the evtchn irq), you shouldn't need
>> >>> > the maintenance interrupt, if the bug was really due to GICH_LR_HW not
>> >>> > working correctly on OMAP5. This changes might only be better at
>> >>> > "hiding" the real issue.
>> >>> >
>> >>> > Maybe the problem is exactly the opposite: the new scheme for avoiding
>> >>> > maintenance interrupts doesn't work for software interrupts.
>> >>> > The commit that should make them work correctly after the
>> >>> > no-maintenance-irq commit is 394b7e587b05d0f4a5fd6f067b38339ab5a77121
>> >>> > If you look at the changes to gic_update_one_lr in that commit, you'll
>> >>> > see that is going to set a software irq as PENDING if it is already ACTIVE.
>> >>> > Maybe that doesn't work correctly on OMAP5.
>> >>> >
>> >>> > Could you try this patch on top of
>> >>> > 394b7e587b05d0f4a5fd6f067b38339ab5a77121?  It should help us understand
>> >>> > if the problem is specifically with software irqs.
>> >>> >
>> >>> >
>> >>> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
>> >>> > index b7516c0..d8a17c9 100644
>> >>> > --- a/xen/arch/arm/gic.c
>> >>> > +++ b/xen/arch/arm/gic.c
>> >>> > @@ -66,7 +66,7 @@ static DEFINE_PER_CPU(u8, gic_cpu_id);
>> >>> >  /* Maximum cpu interface per GIC */
>> >>> >  #define NR_GIC_CPU_IF 8
>> >>> >
>> >>> > -#undef GIC_DEBUG
>> >>> > +#define GIC_DEBUG 1
>> >>> >
>> >>> >  static void gic_update_one_lr(struct vcpu *v, int i);
>> >>> >
>> >>> > @@ -563,6 +563,8 @@ static inline void gic_set_lr(int lr, struct pending_irq *p,
>> >>> >          ((p->irq & GICH_LR_VIRTUAL_MASK) << GICH_LR_VIRTUAL_SHIFT);
>> >>> >      if ( p->desc != NULL )
>> >>> >          lr_val |= GICH_LR_HW | (p->desc->irq << GICH_LR_PHYSICAL_SHIFT);
>> >>> > +    else
>> >>> > +        lr_val |= GICH_LR_MAINTENANCE_IRQ;
>> >>> >
>> >>> >      GICH[GICH_LR + lr] = lr_val;
>> >>> >
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>>
>> >>> Andrii Tseglytskyi | Embedded Dev
>> >>> GlobalLogic
>> >>> www.globallogic.com
>> >>>
>> >
>> >
>> >
>> > --
>> >
>> > Andrii Tseglytskyi | Embedded Dev
>> > GlobalLogic
>> > www.globallogic.com
>>
>>
>>
>> --
>>
>> Andrii Tseglytskyi | Embedded Dev
>> GlobalLogic
>> www.globallogic.com
>>



-- 

Andrii Tseglytskyi | Embedded Dev
GlobalLogic
www.globallogic.com

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Xen 4.5 random freeze question
  2014-11-19  9:38                                           ` Andrii Tseglytskyi
@ 2014-11-19 11:12                                             ` Stefano Stabellini
  2014-11-19 11:16                                               ` Andrii Tseglytskyi
  0 siblings, 1 reply; 66+ messages in thread
From: Stefano Stabellini @ 2014-11-19 11:12 UTC (permalink / raw)
  To: Andrii Tseglytskyi
  Cc: Julien Grall, xen-devel, Ian Campbell, Stefano Stabellini

On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> Hi Stefano,
> 
> Thank you for your support.
> 
> You are right - with latest change you've proposed I got a continuous
> prints during platform hang:
> 
> (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
> (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
> (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
> (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
> (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
> (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
> (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
> 
> Looks line issue needs further deeper debugging.

Cool! You could simply print what irqs are in all LRs when they are
full, for example you could call gic_dump_info. That would tell us what
is taking all the LRs space we have.

How many LRs are available on omap5 anyway?

I doubt you have so much interrupt traffic to actually fill all the LRs,
so I am thinking that a few LRs might not be cleared properly (that
should happen on hypervisor entry, gic_update_one_lr should take care of
it).


> Regards,
> Andrii
> 
> On Tue, Nov 18, 2014 at 7:51 PM, Stefano Stabellini
> <stefano.stabellini@eu.citrix.com> wrote:
> > Hello Andrii,
> > we are getting closer :-)
> >
> > It would help if you post the output with GIC_DEBUG defined but without
> > the other change that "fixes" the issue.
> >
> > I think the problem is probably due to software irqs.
> > You are getting too many
> >
> > gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is still lr_pending
> >
> > messages. That means you are loosing virtual SGIs (guest VCPU to guest
> > VCPU). It would be best to investigate why, especially if you get many
> > more of the same messages without the MAINTENANCE_IRQ change I
> > suggested.
> >
> > This patch might also help understading the problem more:
> >
> >
> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> > index b7516c0..5eaeca2 100644
> > --- a/xen/arch/arm/gic.c
> > +++ b/xen/arch/arm/gic.c
> > @@ -717,7 +717,12 @@ static void gic_restore_pending_irqs(struct vcpu *v)
> >      list_for_each_entry_safe ( p, t, &v->arch.vgic.lr_pending, lr_queue )
> >      {
> >          i = find_first_zero_bit(&this_cpu(lr_mask), nr_lrs);
> > -        if ( i >= nr_lrs ) return;
> > +        if ( i >= nr_lrs )
> > +        {
> > +            gdprintk(XENLOG_DEBUG, "LRs full, not injecting irq=%u into d%dv%d\n",
> > +                    p->irq, v->domain->domain_id, v->vcpu_id);
> > +            continue;
> > +        }
> >
> >          spin_lock_irqsave(&gic.lock, flags);
> >          gic_set_lr(i, p, GICH_LR_PENDING);
> >
> >
> >
> >
> > On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote:
> >> Hi Stefano,
> >>
> >> No hangs with this change.
> >> Complete log is the following:
> >>
> >> U-Boot SPL 2013.10-00499-g062782f (Oct 14 2014 - 11:36:26)
> >> DRA752 ES1.0
> >> <ethaddr> not set. Validating first E-fuse MAC
> >> cpsw
> >> - UART enabled -
> >> - CPU 00000000 booting -
> >> - Xen starting in Hyp mode -
> >> - Zero BSS -
> >> - Setting up control registers -
> >> - Turning on paging -
> >> - Ready -
> >> (XEN) Checking for initrd in /chosen
> >> (XEN) RAM: 0000000080000000 - 000000009fffffff
> >> (XEN) RAM: 00000000a0000000 - 00000000bfffffff
> >> (XEN) RAM: 00000000c0000000 - 00000000dfffffff
> >> (XEN)
> >> (XEN) MODULE[1]: 00000000c2000000 - 00000000c20069aa
> >> (XEN) MODULE[2]: 00000000c0000000 - 00000000c2000000
> >> (XEN) MODULE[3]: 0000000000000000 - 0000000000000000
> >> (XEN) MODULE[4]: 00000000c3000000 - 00000000c3010000
> >> (XEN)  RESVD[0]: 00000000ba300000 - 00000000bfd00000
> >> (XEN)  RESVD[1]: 0000000095800000 - 0000000095900000
> >> (XEN)  RESVD[2]: 0000000098a00000 - 0000000098b00000
> >> (XEN)  RESVD[3]: 0000000095f00000 - 0000000098a00000
> >> (XEN)  RESVD[4]: 0000000095900000 - 0000000095f00000
> >> (XEN)
> >> (XEN) Command line: dom0_mem=128M console=dtuart dtuart=serial0
> >> dom0_max_vcpus=2 bootscrub=0 flask_enforcing=1
> >> (XEN) Placing Xen at 0x00000000dfe00000-0x00000000e0000000
> >> (XEN) Xen heap: 00000000d2000000-00000000de000000 (49152 pages)
> >> (XEN) Dom heap: 344064 pages
> >> (XEN) Domain heap initialised
> >> (XEN) Looking for UART console serial0
> >>  Xen 4.5-unstable
> >> (XEN) Xen version 4.5-unstable (atseglytskyi@)
> >> (arm-linux-gnueabihf-gcc (crosstool-NG
> >> linaro-1.13.1-4.7-2013.04-20130415 - Linaro GCC 2013.04) 4.7.3
> >> 20130328 (prerelease)) debu4
> >> (XEN) Latest ChangeSet: Thu Jul 3 12:55:26 2014 +0300 git:3ee354f-dirty
> >> (XEN) Processor: 412fc0f2: "ARM Limited", variant: 0x2, part 0xc0f, rev 0x2
> >> (XEN) 32-bit Execution:
> >> (XEN)   Processor Features: 00001131:00011011
> >> (XEN)     Instruction Sets: AArch32 Thumb Thumb-2 ThumbEE Jazelle
> >> (XEN)     Extensions: GenericTimer Security
> >> (XEN)   Debug Features: 02010555
> >> (XEN)   Auxiliary Features: 00000000
> >> (XEN)   Memory Model Features: 10201105 20000000 01240000 02102211
> >> (XEN)  ISA Features: 02101110 13112111 21232041 11112131 10011142 00000000
> >> (XEN) Platform: TI DRA7
> >> (XEN) /psci method must be smc, but is: "hvc"
> >> (XEN) Set AuxCoreBoot1 to 00000000dfe0004c (0020004c)
> >> (XEN) Set AuxCoreBoot0 to 0x20
> >> (XEN) Generic Timer IRQ: phys=30 hyp=26 virt=27
> >> (XEN) Using generic timer at 6144 KHz
> >> (XEN) GIC initialization:
> >> (XEN)         gic_dist_addr=0000000048211000
> >> (XEN)         gic_cpu_addr=0000000048212000
> >> (XEN)         gic_hyp_addr=0000000048214000
> >> (XEN)         gic_vcpu_addr=0000000048216000
> >> (XEN)         gic_maintenance_irq=25
> >> (XEN) GIC: 192 lines, 2 cpus, secure (IID 0000043b).
> >> (XEN) Using scheduler: SMP Credit Scheduler (credit)
> >> (XEN) I/O virtualisation disabled
> >> (XEN) Allocated console ring of 16 KiB.
> >> (XEN) VFP implementer 0x41 architecture 4 part 0x30 variant 0xf rev 0x0
> >> (XEN) Bringing up CPU1
> >> - CPU 00000001 booting -
> >> - Xen starting in Hyp mode -
> >> - Setting up control registers -
> >> - Turning on paging -
> >> - Ready -
> >> (XEN) CPU 1 booted.
> >> (XEN) Brought up 2 CPUs
> >> (XEN) *** LOADING DOMAIN 0 ***
> >> (XEN) Loading kernel from boot module 2
> >> (XEN) Populate P2M 0xc8000000->0xd0000000 (1:1 mapping for dom0)
> >> (XEN) Loading zImage from 00000000c0000040 to 00000000cfc00000-00000000cff50c48
> >> (XEN) Loading dom0 DTB to 0x00000000cfa00000-0x00000000cfa05ba8
> >> (XEN) Std. Loglevel: All
> >> (XEN) Guest Loglevel: All
> >> (XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch
> >> input to Xen)
> >> (XEN) Freed 272kB init memory.
> >> (XEN) gic.c:673:d0v0 trying to inject irq=2 into d0v0, when it is
> >> already pending in LR0
> >> (XEN) gic.c:673:d0v0 trying to inject irq=2 into d0v0, when it is
> >> already pending in LR0
> >> [    0.000000] /cpus/cpu@0 missing clock-frequency property
> >> [    0.000000] /cpus/cpu@1 missing clock-frequency property
> >> [    0.140625] omap-gpmc omap-gpmc: failed to reserve memory
> >> [    0.187500] omap_l3_noc ocp.3: couldn't find resource 2
> >> [    0.273437] i2c i2c-1: of_i2c: invalid reg on
> >> /ocp/i2c@48072000/camera_ov10635
> >> [    0.437500] ldo3: operation not allowed
> >> [    0.437500] omapdss HDMI error: can't set the voltage regulator
> >> [    0.468750] tfc_s9700 display0: tfc_s9700_probe probe
> >> [    0.468750] ov1063x 1-0030: No deserializer node found
> >> [    0.468750] ov1063x 1-0030: No serializer node found
> >> [    0.468750] ov1063x 1-0030: Failed writing register 0x0103!
> >> [    0.468750] dra7xx-vip vip1-0: Waiting for I2C subdevice 30
> >> [    0.578125] ahci ahci.0.auto: can't get clock
> >> [    0.898437] ldc_module_init
> >> [    1.304687] Missing dual_emac_res_vlan in DT.
> >> [    1.304687] Using 1 as Reserved VLAN for 0 slave
> >> [    1.312500] Missing dual_emac_res_vlan in DT.
> >> [    1.320312] Using 2 as Reserved VLAN for 1 slave
> >> [    1.382812] Freeing init memory: 236K
> >> sh: write error: No such device
> >> Cannot identify '/dev/camera0': 2, No such file or directory
> >> Parsing config from /xen/images/DomUAndroid.cfg
> >> XSM Disabled: seclabel not supported
> >> (XEN) do_physdev_op 16 cmd=13: not implemented yet
> >> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give
> >> dom1 access to irq 53: Function not implemented
> >> (XEN) do_physdev_op 16 cmd=13: not implemented yet
> >> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give
> >> dom1 access to irq 71: Function not implemented
> >> (XEN) do_physdev_op 16 cmd=13: not implemented yet
> >> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give
> >> dom1 access to irq 173: Function not implemented
> >> (XEN) do_physdev_op 16 cmd=13: not implemented yet
> >> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give
> >> dom1 access to irq 174: Function not implemented
> >> Turning on vfb in domain 1
> >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is
> >> still lr_pending
> >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is
> >> still lr_pending
> >> Parsing config from /xen/images/DomUQNX.cfg
> >> XSM Disabled: seclabel not supported(XEN) gic.c:617:d0v1 trying to
> >> inject irq=2 into d0v0, when it is still lr_pending
> >>
> >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is
> >> still lr_pending
> >> [    4.304687] dra7-evm-sound sound.17: cpu dai node is invalid
> >> [    4.312500] dra7-evm-sound sound.17: failed to add bluetooth dai link -22
> >> xc: error: panic: xc_dom_core.c:644: xc_dom_find_loader: no loader
> >> found: Invalid kernel
> >> libxl: error: libxl_dom.c:436:libxl__build_pv: xc_dom_parse_image
> >> failed: No such file or directory
> >> libxl: error: libxl_create.c:1030:domcreate_rebuild_done: cannot
> >> (re-)build domain: -3
> >> (XEN) gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is
> >> still lr_pending
> >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is
> >> still lr_pending
> >> Turning on 'vsnd' in domain '1' (dev_id: '0')
> >> Turning on vkbd in domain 1
> >> (XEN) gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is
> >> still lr_pending
> >> (XEN) gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is
> >> still lr_pending
> >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is
> >> still lr_pending
> >>
> >> Please press Enter to activate this console. (XEN) gic.c:617:d0v1
> >> trying to inject irq=2 into d0v0, when it is still lr_pending
> >>
> >> On Tue, Nov 18, 2014 at 6:18 PM, Andrii Tseglytskyi
> >> <andrii.tseglytskyi@globallogic.com> wrote:
> >> > OK got it. Give me a few mins
> >> >
> >> > On Tue, Nov 18, 2014 at 6:14 PM, Stefano Stabellini
> >> > <stefano.stabellini@eu.citrix.com> wrote:
> >> >> It is not the same: I would like to set GICH_V2_LR_MAINTENANCE_IRQ only
> >> >> for non-hardware irqs (desc == NULL) and keep avoiding
> >> >> GICH_V2_LR_MAINTENANCE_IRQ and setting GICH_LR_HW for hardware irqs.
> >> >>
> >> >> Also testing on 394b7e587b05d0f4a5fd6f067b38339ab5a77121 would avoid
> >> >> other potential bugs introduced later.
> >> >>
> >> >> On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote:
> >> >>> What if I try on top of current master branch the following code:
> >> >>>
> >> >>> diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
> >> >>> index 31fb81a..6764ab7 100644
> >> >>> --- a/xen/arch/arm/gic-v2.c
> >> >>> +++ b/xen/arch/arm/gic-v2.c
> >> >>> @@ -36,6 +36,8 @@
> >> >>>  #include <asm/io.h>
> >> >>>  #include <asm/gic.h>
> >> >>>
> >> >>> +#define GIC_DEBUG 1
> >> >>> +
> >> >>>  /*
> >> >>>   * LR register definitions are GIC v2 specific.
> >> >>>   * Moved these definitions from header file to here
> >> >>> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> >> >>> index bcaded9..c03d6a6 100644
> >> >>> --- a/xen/arch/arm/gic.c
> >> >>> +++ b/xen/arch/arm/gic.c
> >> >>> @@ -41,7 +41,7 @@ static DEFINE_PER_CPU(uint64_t, lr_mask);
> >> >>>
> >> >>>  #define lr_all_full() (this_cpu(lr_mask) == ((1 <<
> >> >>> gic_hw_ops->info->nr_lrs) - 1))
> >> >>>
> >> >>> -#undef GIC_DEBUG
> >> >>> +#define GIC_DEBUG 1
> >> >>>
> >> >>>  static void gic_update_one_lr(struct vcpu *v, int i);
> >> >>>
> >> >>> It is equivalent to what you proposing - my code contains
> >> >>> PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI, as result the following lone will
> >> >>> be executed:
> >> >>>  lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ; inside gicv2_update_lr() function
> >> >>>
> >> >>> regards,
> >> >>> Andrii
> >> >>>
> >> >>> On Tue, Nov 18, 2014 at 5:39 PM, Stefano Stabellini
> >> >>> <stefano.stabellini@eu.citrix.com> wrote:
> >> >>> > On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote:
> >> >>> >> OK, I see that GICH_V2_LR_MAINTENANCE_IRQ must always be set and
> >> >>> >> everything works fine
> >> >>> >> The following 2 patches fixes xen/master for my platform.
> >> >>> >>
> >> >>> >> Stefano, could you please take a look to these changes?
> >> >>> >>
> >> >>> >> commit 3628a0aa35706a8f532af865ed784536ce514eca
> >> >>> >> Author: Andrii Tseglytskyi <andrii.tseglytskyi@globallogic.com>
> >> >>> >> Date:   Tue Nov 18 14:20:42 2014 +0200
> >> >>> >>
> >> >>> >>     xen/arm: dra7: always set GICH_V2_LR_MAINTENANCE_IRQ flag
> >> >>> >>
> >> >>> >>     Change-Id: Ia380b3507a182b11592588f65fd23693d4f87434
> >> >>> >>     Signed-off-by: Andrii Tseglytskyi <andrii.tseglytskyi@globallogic.com>
> >> >>> >>
> >> >>> >> diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
> >> >>> >> index 31fb81a..093ecdb 100644
> >> >>> >> --- a/xen/arch/arm/gic-v2.c
> >> >>> >> +++ b/xen/arch/arm/gic-v2.c
> >> >>> >> @@ -396,13 +396,14 @@ static void gicv2_update_lr(int lr, const struct
> >> >>> >> pending_irq *p,
> >> >>> >>                                               << GICH_V2_LR_PRIORITY_SHIFT) |
> >> >>> >>                ((p->irq & GICH_V2_LR_VIRTUAL_MASK) <<
> >> >>> >> GICH_V2_LR_VIRTUAL_SHIFT));
> >> >>> >>
> >> >>> >> -    if ( p->desc != NULL )
> >> >>> >> +    if ( platform_has_quirk(PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI) )
> >> >>> >>      {
> >> >>> >> -        if ( platform_has_quirk(PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI) )
> >> >>> >> -            lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ;
> >> >>> >> -        else
> >> >>> >> -            lr_reg |= GICH_V2_LR_HW | ((p->desc->irq &
> >> >>> >> GICH_V2_LR_PHYSICAL_MASK )
> >> >>> >> -                            << GICH_V2_LR_PHYSICAL_SHIFT);
> >> >>> >> +        lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ;
> >> >>> >> +    }
> >> >>> >> +    else if ( p->desc != NULL )
> >> >>> >> +    {
> >> >>> >> +        lr_reg |= GICH_V2_LR_HW | ((p->desc->irq & GICH_V2_LR_PHYSICAL_MASK )
> >> >>> >> +                       << GICH_V2_LR_PHYSICAL_SHIFT);
> >> >>> >>      }
> >> >>> >>
> >> >>> >>      writel_gich(lr_reg, GICH_LR + lr * 4);
> >> >>> >
> >> >>> > Actually in case p->desc == NULL (the irq is not an hardware irq, it
> >> >>> > could be the virtual timer irq or the evtchn irq), you shouldn't need
> >> >>> > the maintenance interrupt, if the bug was really due to GICH_LR_HW not
> >> >>> > working correctly on OMAP5. This changes might only be better at
> >> >>> > "hiding" the real issue.
> >> >>> >
> >> >>> > Maybe the problem is exactly the opposite: the new scheme for avoiding
> >> >>> > maintenance interrupts doesn't work for software interrupts.
> >> >>> > The commit that should make them work correctly after the
> >> >>> > no-maintenance-irq commit is 394b7e587b05d0f4a5fd6f067b38339ab5a77121
> >> >>> > If you look at the changes to gic_update_one_lr in that commit, you'll
> >> >>> > see that is going to set a software irq as PENDING if it is already ACTIVE.
> >> >>> > Maybe that doesn't work correctly on OMAP5.
> >> >>> >
> >> >>> > Could you try this patch on top of
> >> >>> > 394b7e587b05d0f4a5fd6f067b38339ab5a77121?  It should help us understand
> >> >>> > if the problem is specifically with software irqs.
> >> >>> >
> >> >>> >
> >> >>> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> >> >>> > index b7516c0..d8a17c9 100644
> >> >>> > --- a/xen/arch/arm/gic.c
> >> >>> > +++ b/xen/arch/arm/gic.c
> >> >>> > @@ -66,7 +66,7 @@ static DEFINE_PER_CPU(u8, gic_cpu_id);
> >> >>> >  /* Maximum cpu interface per GIC */
> >> >>> >  #define NR_GIC_CPU_IF 8
> >> >>> >
> >> >>> > -#undef GIC_DEBUG
> >> >>> > +#define GIC_DEBUG 1
> >> >>> >
> >> >>> >  static void gic_update_one_lr(struct vcpu *v, int i);
> >> >>> >
> >> >>> > @@ -563,6 +563,8 @@ static inline void gic_set_lr(int lr, struct pending_irq *p,
> >> >>> >          ((p->irq & GICH_LR_VIRTUAL_MASK) << GICH_LR_VIRTUAL_SHIFT);
> >> >>> >      if ( p->desc != NULL )
> >> >>> >          lr_val |= GICH_LR_HW | (p->desc->irq << GICH_LR_PHYSICAL_SHIFT);
> >> >>> > +    else
> >> >>> > +        lr_val |= GICH_LR_MAINTENANCE_IRQ;
> >> >>> >
> >> >>> >      GICH[GICH_LR + lr] = lr_val;
> >> >>> >
> >> >>>
> >> >>>
> >> >>>
> >> >>> --
> >> >>>
> >> >>> Andrii Tseglytskyi | Embedded Dev
> >> >>> GlobalLogic
> >> >>> www.globallogic.com
> >> >>>
> >> >
> >> >
> >> >
> >> > --
> >> >
> >> > Andrii Tseglytskyi | Embedded Dev
> >> > GlobalLogic
> >> > www.globallogic.com
> >>
> >>
> >>
> >> --
> >>
> >> Andrii Tseglytskyi | Embedded Dev
> >> GlobalLogic
> >> www.globallogic.com
> >>
> 
> 
> 
> -- 
> 
> Andrii Tseglytskyi | Embedded Dev
> GlobalLogic
> www.globallogic.com
> 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Xen 4.5 random freeze question
  2014-11-19 11:12                                             ` Stefano Stabellini
@ 2014-11-19 11:16                                               ` Andrii Tseglytskyi
  2014-11-19 11:42                                                 ` Stefano Stabellini
  0 siblings, 1 reply; 66+ messages in thread
From: Andrii Tseglytskyi @ 2014-11-19 11:16 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: Julien Grall, Ian Campbell, xen-devel

On Wed, Nov 19, 2014 at 1:12 PM, Stefano Stabellini
<stefano.stabellini@eu.citrix.com> wrote:
> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>> Hi Stefano,
>>
>> Thank you for your support.
>>
>> You are right - with latest change you've proposed I got a continuous
>> prints during platform hang:
>>
>> (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
>> (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
>> (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
>> (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
>> (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
>> (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
>> (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
>>
>> Looks line issue needs further deeper debugging.
>
> Cool! You could simply print what irqs are in all LRs when they are
> full, for example you could call gic_dump_info. That would tell us what
> is taking all the LRs space we have.
>
> How many LRs are available on omap5 anyway?

:) Already done this:


(XEN) gic.c:725:d0v0 LRs full, not injecting irq=27 nr_lrs 4 i 4 into d0v0
(XEN) GICH_LRs (vcpu 0) mask=f
(XEN)    HW_LR[0]=1a00001f
(XEN)    HW_LR[1]=9a00e439
(XEN)    HW_LR[2]=1a000002
(XEN)    HW_LR[3]=9a015856
(XEN) Inflight irq=31 lr=0
(XEN) Inflight irq=57 lr=1
(XEN) Inflight irq=2 lr=2
(XEN) Inflight irq=86 lr=3
(XEN) Inflight irq=27 lr=255
(XEN) Pending irq=27


>
> I doubt you have so much interrupt traffic to actually fill all the LRs,
> so I am thinking that a few LRs might not be cleared properly (that
> should happen on hypervisor entry, gic_update_one_lr should take care of
> it).

This actually explains why this happens during domU start - SGI
traffic might be very heavy this time

>
>
>> Regards,
>> Andrii
>>
>> On Tue, Nov 18, 2014 at 7:51 PM, Stefano Stabellini
>> <stefano.stabellini@eu.citrix.com> wrote:
>> > Hello Andrii,
>> > we are getting closer :-)
>> >
>> > It would help if you post the output with GIC_DEBUG defined but without
>> > the other change that "fixes" the issue.
>> >
>> > I think the problem is probably due to software irqs.
>> > You are getting too many
>> >
>> > gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is still lr_pending
>> >
>> > messages. That means you are loosing virtual SGIs (guest VCPU to guest
>> > VCPU). It would be best to investigate why, especially if you get many
>> > more of the same messages without the MAINTENANCE_IRQ change I
>> > suggested.
>> >
>> > This patch might also help understading the problem more:
>> >
>> >
>> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
>> > index b7516c0..5eaeca2 100644
>> > --- a/xen/arch/arm/gic.c
>> > +++ b/xen/arch/arm/gic.c
>> > @@ -717,7 +717,12 @@ static void gic_restore_pending_irqs(struct vcpu *v)
>> >      list_for_each_entry_safe ( p, t, &v->arch.vgic.lr_pending, lr_queue )
>> >      {
>> >          i = find_first_zero_bit(&this_cpu(lr_mask), nr_lrs);
>> > -        if ( i >= nr_lrs ) return;
>> > +        if ( i >= nr_lrs )
>> > +        {
>> > +            gdprintk(XENLOG_DEBUG, "LRs full, not injecting irq=%u into d%dv%d\n",
>> > +                    p->irq, v->domain->domain_id, v->vcpu_id);
>> > +            continue;
>> > +        }
>> >
>> >          spin_lock_irqsave(&gic.lock, flags);
>> >          gic_set_lr(i, p, GICH_LR_PENDING);
>> >
>> >
>> >
>> >
>> > On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote:
>> >> Hi Stefano,
>> >>
>> >> No hangs with this change.
>> >> Complete log is the following:
>> >>
>> >> U-Boot SPL 2013.10-00499-g062782f (Oct 14 2014 - 11:36:26)
>> >> DRA752 ES1.0
>> >> <ethaddr> not set. Validating first E-fuse MAC
>> >> cpsw
>> >> - UART enabled -
>> >> - CPU 00000000 booting -
>> >> - Xen starting in Hyp mode -
>> >> - Zero BSS -
>> >> - Setting up control registers -
>> >> - Turning on paging -
>> >> - Ready -
>> >> (XEN) Checking for initrd in /chosen
>> >> (XEN) RAM: 0000000080000000 - 000000009fffffff
>> >> (XEN) RAM: 00000000a0000000 - 00000000bfffffff
>> >> (XEN) RAM: 00000000c0000000 - 00000000dfffffff
>> >> (XEN)
>> >> (XEN) MODULE[1]: 00000000c2000000 - 00000000c20069aa
>> >> (XEN) MODULE[2]: 00000000c0000000 - 00000000c2000000
>> >> (XEN) MODULE[3]: 0000000000000000 - 0000000000000000
>> >> (XEN) MODULE[4]: 00000000c3000000 - 00000000c3010000
>> >> (XEN)  RESVD[0]: 00000000ba300000 - 00000000bfd00000
>> >> (XEN)  RESVD[1]: 0000000095800000 - 0000000095900000
>> >> (XEN)  RESVD[2]: 0000000098a00000 - 0000000098b00000
>> >> (XEN)  RESVD[3]: 0000000095f00000 - 0000000098a00000
>> >> (XEN)  RESVD[4]: 0000000095900000 - 0000000095f00000
>> >> (XEN)
>> >> (XEN) Command line: dom0_mem=128M console=dtuart dtuart=serial0
>> >> dom0_max_vcpus=2 bootscrub=0 flask_enforcing=1
>> >> (XEN) Placing Xen at 0x00000000dfe00000-0x00000000e0000000
>> >> (XEN) Xen heap: 00000000d2000000-00000000de000000 (49152 pages)
>> >> (XEN) Dom heap: 344064 pages
>> >> (XEN) Domain heap initialised
>> >> (XEN) Looking for UART console serial0
>> >>  Xen 4.5-unstable
>> >> (XEN) Xen version 4.5-unstable (atseglytskyi@)
>> >> (arm-linux-gnueabihf-gcc (crosstool-NG
>> >> linaro-1.13.1-4.7-2013.04-20130415 - Linaro GCC 2013.04) 4.7.3
>> >> 20130328 (prerelease)) debu4
>> >> (XEN) Latest ChangeSet: Thu Jul 3 12:55:26 2014 +0300 git:3ee354f-dirty
>> >> (XEN) Processor: 412fc0f2: "ARM Limited", variant: 0x2, part 0xc0f, rev 0x2
>> >> (XEN) 32-bit Execution:
>> >> (XEN)   Processor Features: 00001131:00011011
>> >> (XEN)     Instruction Sets: AArch32 Thumb Thumb-2 ThumbEE Jazelle
>> >> (XEN)     Extensions: GenericTimer Security
>> >> (XEN)   Debug Features: 02010555
>> >> (XEN)   Auxiliary Features: 00000000
>> >> (XEN)   Memory Model Features: 10201105 20000000 01240000 02102211
>> >> (XEN)  ISA Features: 02101110 13112111 21232041 11112131 10011142 00000000
>> >> (XEN) Platform: TI DRA7
>> >> (XEN) /psci method must be smc, but is: "hvc"
>> >> (XEN) Set AuxCoreBoot1 to 00000000dfe0004c (0020004c)
>> >> (XEN) Set AuxCoreBoot0 to 0x20
>> >> (XEN) Generic Timer IRQ: phys=30 hyp=26 virt=27
>> >> (XEN) Using generic timer at 6144 KHz
>> >> (XEN) GIC initialization:
>> >> (XEN)         gic_dist_addr=0000000048211000
>> >> (XEN)         gic_cpu_addr=0000000048212000
>> >> (XEN)         gic_hyp_addr=0000000048214000
>> >> (XEN)         gic_vcpu_addr=0000000048216000
>> >> (XEN)         gic_maintenance_irq=25
>> >> (XEN) GIC: 192 lines, 2 cpus, secure (IID 0000043b).
>> >> (XEN) Using scheduler: SMP Credit Scheduler (credit)
>> >> (XEN) I/O virtualisation disabled
>> >> (XEN) Allocated console ring of 16 KiB.
>> >> (XEN) VFP implementer 0x41 architecture 4 part 0x30 variant 0xf rev 0x0
>> >> (XEN) Bringing up CPU1
>> >> - CPU 00000001 booting -
>> >> - Xen starting in Hyp mode -
>> >> - Setting up control registers -
>> >> - Turning on paging -
>> >> - Ready -
>> >> (XEN) CPU 1 booted.
>> >> (XEN) Brought up 2 CPUs
>> >> (XEN) *** LOADING DOMAIN 0 ***
>> >> (XEN) Loading kernel from boot module 2
>> >> (XEN) Populate P2M 0xc8000000->0xd0000000 (1:1 mapping for dom0)
>> >> (XEN) Loading zImage from 00000000c0000040 to 00000000cfc00000-00000000cff50c48
>> >> (XEN) Loading dom0 DTB to 0x00000000cfa00000-0x00000000cfa05ba8
>> >> (XEN) Std. Loglevel: All
>> >> (XEN) Guest Loglevel: All
>> >> (XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch
>> >> input to Xen)
>> >> (XEN) Freed 272kB init memory.
>> >> (XEN) gic.c:673:d0v0 trying to inject irq=2 into d0v0, when it is
>> >> already pending in LR0
>> >> (XEN) gic.c:673:d0v0 trying to inject irq=2 into d0v0, when it is
>> >> already pending in LR0
>> >> [    0.000000] /cpus/cpu@0 missing clock-frequency property
>> >> [    0.000000] /cpus/cpu@1 missing clock-frequency property
>> >> [    0.140625] omap-gpmc omap-gpmc: failed to reserve memory
>> >> [    0.187500] omap_l3_noc ocp.3: couldn't find resource 2
>> >> [    0.273437] i2c i2c-1: of_i2c: invalid reg on
>> >> /ocp/i2c@48072000/camera_ov10635
>> >> [    0.437500] ldo3: operation not allowed
>> >> [    0.437500] omapdss HDMI error: can't set the voltage regulator
>> >> [    0.468750] tfc_s9700 display0: tfc_s9700_probe probe
>> >> [    0.468750] ov1063x 1-0030: No deserializer node found
>> >> [    0.468750] ov1063x 1-0030: No serializer node found
>> >> [    0.468750] ov1063x 1-0030: Failed writing register 0x0103!
>> >> [    0.468750] dra7xx-vip vip1-0: Waiting for I2C subdevice 30
>> >> [    0.578125] ahci ahci.0.auto: can't get clock
>> >> [    0.898437] ldc_module_init
>> >> [    1.304687] Missing dual_emac_res_vlan in DT.
>> >> [    1.304687] Using 1 as Reserved VLAN for 0 slave
>> >> [    1.312500] Missing dual_emac_res_vlan in DT.
>> >> [    1.320312] Using 2 as Reserved VLAN for 1 slave
>> >> [    1.382812] Freeing init memory: 236K
>> >> sh: write error: No such device
>> >> Cannot identify '/dev/camera0': 2, No such file or directory
>> >> Parsing config from /xen/images/DomUAndroid.cfg
>> >> XSM Disabled: seclabel not supported
>> >> (XEN) do_physdev_op 16 cmd=13: not implemented yet
>> >> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give
>> >> dom1 access to irq 53: Function not implemented
>> >> (XEN) do_physdev_op 16 cmd=13: not implemented yet
>> >> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give
>> >> dom1 access to irq 71: Function not implemented
>> >> (XEN) do_physdev_op 16 cmd=13: not implemented yet
>> >> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give
>> >> dom1 access to irq 173: Function not implemented
>> >> (XEN) do_physdev_op 16 cmd=13: not implemented yet
>> >> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give
>> >> dom1 access to irq 174: Function not implemented
>> >> Turning on vfb in domain 1
>> >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is
>> >> still lr_pending
>> >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is
>> >> still lr_pending
>> >> Parsing config from /xen/images/DomUQNX.cfg
>> >> XSM Disabled: seclabel not supported(XEN) gic.c:617:d0v1 trying to
>> >> inject irq=2 into d0v0, when it is still lr_pending
>> >>
>> >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is
>> >> still lr_pending
>> >> [    4.304687] dra7-evm-sound sound.17: cpu dai node is invalid
>> >> [    4.312500] dra7-evm-sound sound.17: failed to add bluetooth dai link -22
>> >> xc: error: panic: xc_dom_core.c:644: xc_dom_find_loader: no loader
>> >> found: Invalid kernel
>> >> libxl: error: libxl_dom.c:436:libxl__build_pv: xc_dom_parse_image
>> >> failed: No such file or directory
>> >> libxl: error: libxl_create.c:1030:domcreate_rebuild_done: cannot
>> >> (re-)build domain: -3
>> >> (XEN) gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is
>> >> still lr_pending
>> >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is
>> >> still lr_pending
>> >> Turning on 'vsnd' in domain '1' (dev_id: '0')
>> >> Turning on vkbd in domain 1
>> >> (XEN) gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is
>> >> still lr_pending
>> >> (XEN) gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is
>> >> still lr_pending
>> >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is
>> >> still lr_pending
>> >>
>> >> Please press Enter to activate this console. (XEN) gic.c:617:d0v1
>> >> trying to inject irq=2 into d0v0, when it is still lr_pending
>> >>
>> >> On Tue, Nov 18, 2014 at 6:18 PM, Andrii Tseglytskyi
>> >> <andrii.tseglytskyi@globallogic.com> wrote:
>> >> > OK got it. Give me a few mins
>> >> >
>> >> > On Tue, Nov 18, 2014 at 6:14 PM, Stefano Stabellini
>> >> > <stefano.stabellini@eu.citrix.com> wrote:
>> >> >> It is not the same: I would like to set GICH_V2_LR_MAINTENANCE_IRQ only
>> >> >> for non-hardware irqs (desc == NULL) and keep avoiding
>> >> >> GICH_V2_LR_MAINTENANCE_IRQ and setting GICH_LR_HW for hardware irqs.
>> >> >>
>> >> >> Also testing on 394b7e587b05d0f4a5fd6f067b38339ab5a77121 would avoid
>> >> >> other potential bugs introduced later.
>> >> >>
>> >> >> On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote:
>> >> >>> What if I try on top of current master branch the following code:
>> >> >>>
>> >> >>> diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
>> >> >>> index 31fb81a..6764ab7 100644
>> >> >>> --- a/xen/arch/arm/gic-v2.c
>> >> >>> +++ b/xen/arch/arm/gic-v2.c
>> >> >>> @@ -36,6 +36,8 @@
>> >> >>>  #include <asm/io.h>
>> >> >>>  #include <asm/gic.h>
>> >> >>>
>> >> >>> +#define GIC_DEBUG 1
>> >> >>> +
>> >> >>>  /*
>> >> >>>   * LR register definitions are GIC v2 specific.
>> >> >>>   * Moved these definitions from header file to here
>> >> >>> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
>> >> >>> index bcaded9..c03d6a6 100644
>> >> >>> --- a/xen/arch/arm/gic.c
>> >> >>> +++ b/xen/arch/arm/gic.c
>> >> >>> @@ -41,7 +41,7 @@ static DEFINE_PER_CPU(uint64_t, lr_mask);
>> >> >>>
>> >> >>>  #define lr_all_full() (this_cpu(lr_mask) == ((1 <<
>> >> >>> gic_hw_ops->info->nr_lrs) - 1))
>> >> >>>
>> >> >>> -#undef GIC_DEBUG
>> >> >>> +#define GIC_DEBUG 1
>> >> >>>
>> >> >>>  static void gic_update_one_lr(struct vcpu *v, int i);
>> >> >>>
>> >> >>> It is equivalent to what you proposing - my code contains
>> >> >>> PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI, as result the following lone will
>> >> >>> be executed:
>> >> >>>  lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ; inside gicv2_update_lr() function
>> >> >>>
>> >> >>> regards,
>> >> >>> Andrii
>> >> >>>
>> >> >>> On Tue, Nov 18, 2014 at 5:39 PM, Stefano Stabellini
>> >> >>> <stefano.stabellini@eu.citrix.com> wrote:
>> >> >>> > On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote:
>> >> >>> >> OK, I see that GICH_V2_LR_MAINTENANCE_IRQ must always be set and
>> >> >>> >> everything works fine
>> >> >>> >> The following 2 patches fixes xen/master for my platform.
>> >> >>> >>
>> >> >>> >> Stefano, could you please take a look to these changes?
>> >> >>> >>
>> >> >>> >> commit 3628a0aa35706a8f532af865ed784536ce514eca
>> >> >>> >> Author: Andrii Tseglytskyi <andrii.tseglytskyi@globallogic.com>
>> >> >>> >> Date:   Tue Nov 18 14:20:42 2014 +0200
>> >> >>> >>
>> >> >>> >>     xen/arm: dra7: always set GICH_V2_LR_MAINTENANCE_IRQ flag
>> >> >>> >>
>> >> >>> >>     Change-Id: Ia380b3507a182b11592588f65fd23693d4f87434
>> >> >>> >>     Signed-off-by: Andrii Tseglytskyi <andrii.tseglytskyi@globallogic.com>
>> >> >>> >>
>> >> >>> >> diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
>> >> >>> >> index 31fb81a..093ecdb 100644
>> >> >>> >> --- a/xen/arch/arm/gic-v2.c
>> >> >>> >> +++ b/xen/arch/arm/gic-v2.c
>> >> >>> >> @@ -396,13 +396,14 @@ static void gicv2_update_lr(int lr, const struct
>> >> >>> >> pending_irq *p,
>> >> >>> >>                                               << GICH_V2_LR_PRIORITY_SHIFT) |
>> >> >>> >>                ((p->irq & GICH_V2_LR_VIRTUAL_MASK) <<
>> >> >>> >> GICH_V2_LR_VIRTUAL_SHIFT));
>> >> >>> >>
>> >> >>> >> -    if ( p->desc != NULL )
>> >> >>> >> +    if ( platform_has_quirk(PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI) )
>> >> >>> >>      {
>> >> >>> >> -        if ( platform_has_quirk(PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI) )
>> >> >>> >> -            lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ;
>> >> >>> >> -        else
>> >> >>> >> -            lr_reg |= GICH_V2_LR_HW | ((p->desc->irq &
>> >> >>> >> GICH_V2_LR_PHYSICAL_MASK )
>> >> >>> >> -                            << GICH_V2_LR_PHYSICAL_SHIFT);
>> >> >>> >> +        lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ;
>> >> >>> >> +    }
>> >> >>> >> +    else if ( p->desc != NULL )
>> >> >>> >> +    {
>> >> >>> >> +        lr_reg |= GICH_V2_LR_HW | ((p->desc->irq & GICH_V2_LR_PHYSICAL_MASK )
>> >> >>> >> +                       << GICH_V2_LR_PHYSICAL_SHIFT);
>> >> >>> >>      }
>> >> >>> >>
>> >> >>> >>      writel_gich(lr_reg, GICH_LR + lr * 4);
>> >> >>> >
>> >> >>> > Actually in case p->desc == NULL (the irq is not an hardware irq, it
>> >> >>> > could be the virtual timer irq or the evtchn irq), you shouldn't need
>> >> >>> > the maintenance interrupt, if the bug was really due to GICH_LR_HW not
>> >> >>> > working correctly on OMAP5. This changes might only be better at
>> >> >>> > "hiding" the real issue.
>> >> >>> >
>> >> >>> > Maybe the problem is exactly the opposite: the new scheme for avoiding
>> >> >>> > maintenance interrupts doesn't work for software interrupts.
>> >> >>> > The commit that should make them work correctly after the
>> >> >>> > no-maintenance-irq commit is 394b7e587b05d0f4a5fd6f067b38339ab5a77121
>> >> >>> > If you look at the changes to gic_update_one_lr in that commit, you'll
>> >> >>> > see that is going to set a software irq as PENDING if it is already ACTIVE.
>> >> >>> > Maybe that doesn't work correctly on OMAP5.
>> >> >>> >
>> >> >>> > Could you try this patch on top of
>> >> >>> > 394b7e587b05d0f4a5fd6f067b38339ab5a77121?  It should help us understand
>> >> >>> > if the problem is specifically with software irqs.
>> >> >>> >
>> >> >>> >
>> >> >>> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
>> >> >>> > index b7516c0..d8a17c9 100644
>> >> >>> > --- a/xen/arch/arm/gic.c
>> >> >>> > +++ b/xen/arch/arm/gic.c
>> >> >>> > @@ -66,7 +66,7 @@ static DEFINE_PER_CPU(u8, gic_cpu_id);
>> >> >>> >  /* Maximum cpu interface per GIC */
>> >> >>> >  #define NR_GIC_CPU_IF 8
>> >> >>> >
>> >> >>> > -#undef GIC_DEBUG
>> >> >>> > +#define GIC_DEBUG 1
>> >> >>> >
>> >> >>> >  static void gic_update_one_lr(struct vcpu *v, int i);
>> >> >>> >
>> >> >>> > @@ -563,6 +563,8 @@ static inline void gic_set_lr(int lr, struct pending_irq *p,
>> >> >>> >          ((p->irq & GICH_LR_VIRTUAL_MASK) << GICH_LR_VIRTUAL_SHIFT);
>> >> >>> >      if ( p->desc != NULL )
>> >> >>> >          lr_val |= GICH_LR_HW | (p->desc->irq << GICH_LR_PHYSICAL_SHIFT);
>> >> >>> > +    else
>> >> >>> > +        lr_val |= GICH_LR_MAINTENANCE_IRQ;
>> >> >>> >
>> >> >>> >      GICH[GICH_LR + lr] = lr_val;
>> >> >>> >
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> --
>> >> >>>
>> >> >>> Andrii Tseglytskyi | Embedded Dev
>> >> >>> GlobalLogic
>> >> >>> www.globallogic.com
>> >> >>>
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> >
>> >> > Andrii Tseglytskyi | Embedded Dev
>> >> > GlobalLogic
>> >> > www.globallogic.com
>> >>
>> >>
>> >>
>> >> --
>> >>
>> >> Andrii Tseglytskyi | Embedded Dev
>> >> GlobalLogic
>> >> www.globallogic.com
>> >>
>>
>>
>>
>> --
>>
>> Andrii Tseglytskyi | Embedded Dev
>> GlobalLogic
>> www.globallogic.com
>>



-- 

Andrii Tseglytskyi | Embedded Dev
GlobalLogic
www.globallogic.com

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Xen 4.5 random freeze question
  2014-11-19 11:16                                               ` Andrii Tseglytskyi
@ 2014-11-19 11:42                                                 ` Stefano Stabellini
  2014-11-19 11:57                                                   ` Andrii Tseglytskyi
  2014-11-19 12:13                                                   ` Ian Campbell
  0 siblings, 2 replies; 66+ messages in thread
From: Stefano Stabellini @ 2014-11-19 11:42 UTC (permalink / raw)
  To: Andrii Tseglytskyi
  Cc: Julien Grall, xen-devel, Ian Campbell, Stefano Stabellini

On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> On Wed, Nov 19, 2014 at 1:12 PM, Stefano Stabellini
> <stefano.stabellini@eu.citrix.com> wrote:
> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> >> Hi Stefano,
> >>
> >> Thank you for your support.
> >>
> >> You are right - with latest change you've proposed I got a continuous
> >> prints during platform hang:
> >>
> >> (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
> >> (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
> >> (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
> >> (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
> >> (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
> >> (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
> >> (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
> >>
> >> Looks line issue needs further deeper debugging.
> >
> > Cool! You could simply print what irqs are in all LRs when they are
> > full, for example you could call gic_dump_info. That would tell us what
> > is taking all the LRs space we have.
> >
> > How many LRs are available on omap5 anyway?
> 
> :) Already done this:
> 
> 
> (XEN) gic.c:725:d0v0 LRs full, not injecting irq=27 nr_lrs 4 i 4 into d0v0
> (XEN) GICH_LRs (vcpu 0) mask=f
> (XEN)    HW_LR[0]=1a00001f
> (XEN)    HW_LR[1]=9a00e439
> (XEN)    HW_LR[2]=1a000002
> (XEN)    HW_LR[3]=9a015856
> (XEN) Inflight irq=31 lr=0
> (XEN) Inflight irq=57 lr=1
> (XEN) Inflight irq=2 lr=2
> (XEN) Inflight irq=86 lr=3
> (XEN) Inflight irq=27 lr=255
> (XEN) Pending irq=27

27 should be the virtual timer if I remember correctly.

So it looks like there is not actually anything wrong, is just that you
have too much inflight irqs? It should cause problems because in that
case GICH_HCR_UIE should be set and you should get a maintenance
interrupt when LRs become available (actually when "none, or only one,
of the List register entries is marked as a valid interrupt").

Maybe GICH_HCR_UIE is the one that doesn't work properly. It might be
worth checking that you are receiving maintenance interrupts:


diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index b7516c0..b3eaa44 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -868,6 +868,8 @@ static void maintenance_interrupt(int irq, void *dev_id, struct cpu_user_regs *r
      * on return to guest that is going to clear the old LRs and inject
      * new interrupts.
      */
+    
+    gdprintk(XENLOG_DEBUG, "maintenance interrupt\n");
 }
 
 void gic_dump_info(struct vcpu *v)

 
You could also try to replace GICH_HCR_UIE with GICH_HCR_NPIE, you
should still be receiving maintenance interrupts when one or more LRs
become available.


> >
> > I doubt you have so much interrupt traffic to actually fill all the LRs,
> > so I am thinking that a few LRs might not be cleared properly (that
> > should happen on hypervisor entry, gic_update_one_lr should take care of
> > it).
> 
> This actually explains why this happens during domU start - SGI
> traffic might be very heavy this time
> 
> >
> >
> >> Regards,
> >> Andrii
> >>
> >> On Tue, Nov 18, 2014 at 7:51 PM, Stefano Stabellini
> >> <stefano.stabellini@eu.citrix.com> wrote:
> >> > Hello Andrii,
> >> > we are getting closer :-)
> >> >
> >> > It would help if you post the output with GIC_DEBUG defined but without
> >> > the other change that "fixes" the issue.
> >> >
> >> > I think the problem is probably due to software irqs.
> >> > You are getting too many
> >> >
> >> > gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is still lr_pending
> >> >
> >> > messages. That means you are loosing virtual SGIs (guest VCPU to guest
> >> > VCPU). It would be best to investigate why, especially if you get many
> >> > more of the same messages without the MAINTENANCE_IRQ change I
> >> > suggested.
> >> >
> >> > This patch might also help understading the problem more:
> >> >
> >> >
> >> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> >> > index b7516c0..5eaeca2 100644
> >> > --- a/xen/arch/arm/gic.c
> >> > +++ b/xen/arch/arm/gic.c
> >> > @@ -717,7 +717,12 @@ static void gic_restore_pending_irqs(struct vcpu *v)
> >> >      list_for_each_entry_safe ( p, t, &v->arch.vgic.lr_pending, lr_queue )
> >> >      {
> >> >          i = find_first_zero_bit(&this_cpu(lr_mask), nr_lrs);
> >> > -        if ( i >= nr_lrs ) return;
> >> > +        if ( i >= nr_lrs )
> >> > +        {
> >> > +            gdprintk(XENLOG_DEBUG, "LRs full, not injecting irq=%u into d%dv%d\n",
> >> > +                    p->irq, v->domain->domain_id, v->vcpu_id);
> >> > +            continue;
> >> > +        }
> >> >
> >> >          spin_lock_irqsave(&gic.lock, flags);
> >> >          gic_set_lr(i, p, GICH_LR_PENDING);
> >> >
> >> >
> >> >
> >> >
> >> > On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote:
> >> >> Hi Stefano,
> >> >>
> >> >> No hangs with this change.
> >> >> Complete log is the following:
> >> >>
> >> >> U-Boot SPL 2013.10-00499-g062782f (Oct 14 2014 - 11:36:26)
> >> >> DRA752 ES1.0
> >> >> <ethaddr> not set. Validating first E-fuse MAC
> >> >> cpsw
> >> >> - UART enabled -
> >> >> - CPU 00000000 booting -
> >> >> - Xen starting in Hyp mode -
> >> >> - Zero BSS -
> >> >> - Setting up control registers -
> >> >> - Turning on paging -
> >> >> - Ready -
> >> >> (XEN) Checking for initrd in /chosen
> >> >> (XEN) RAM: 0000000080000000 - 000000009fffffff
> >> >> (XEN) RAM: 00000000a0000000 - 00000000bfffffff
> >> >> (XEN) RAM: 00000000c0000000 - 00000000dfffffff
> >> >> (XEN)
> >> >> (XEN) MODULE[1]: 00000000c2000000 - 00000000c20069aa
> >> >> (XEN) MODULE[2]: 00000000c0000000 - 00000000c2000000
> >> >> (XEN) MODULE[3]: 0000000000000000 - 0000000000000000
> >> >> (XEN) MODULE[4]: 00000000c3000000 - 00000000c3010000
> >> >> (XEN)  RESVD[0]: 00000000ba300000 - 00000000bfd00000
> >> >> (XEN)  RESVD[1]: 0000000095800000 - 0000000095900000
> >> >> (XEN)  RESVD[2]: 0000000098a00000 - 0000000098b00000
> >> >> (XEN)  RESVD[3]: 0000000095f00000 - 0000000098a00000
> >> >> (XEN)  RESVD[4]: 0000000095900000 - 0000000095f00000
> >> >> (XEN)
> >> >> (XEN) Command line: dom0_mem=128M console=dtuart dtuart=serial0
> >> >> dom0_max_vcpus=2 bootscrub=0 flask_enforcing=1
> >> >> (XEN) Placing Xen at 0x00000000dfe00000-0x00000000e0000000
> >> >> (XEN) Xen heap: 00000000d2000000-00000000de000000 (49152 pages)
> >> >> (XEN) Dom heap: 344064 pages
> >> >> (XEN) Domain heap initialised
> >> >> (XEN) Looking for UART console serial0
> >> >>  Xen 4.5-unstable
> >> >> (XEN) Xen version 4.5-unstable (atseglytskyi@)
> >> >> (arm-linux-gnueabihf-gcc (crosstool-NG
> >> >> linaro-1.13.1-4.7-2013.04-20130415 - Linaro GCC 2013.04) 4.7.3
> >> >> 20130328 (prerelease)) debu4
> >> >> (XEN) Latest ChangeSet: Thu Jul 3 12:55:26 2014 +0300 git:3ee354f-dirty
> >> >> (XEN) Processor: 412fc0f2: "ARM Limited", variant: 0x2, part 0xc0f, rev 0x2
> >> >> (XEN) 32-bit Execution:
> >> >> (XEN)   Processor Features: 00001131:00011011
> >> >> (XEN)     Instruction Sets: AArch32 Thumb Thumb-2 ThumbEE Jazelle
> >> >> (XEN)     Extensions: GenericTimer Security
> >> >> (XEN)   Debug Features: 02010555
> >> >> (XEN)   Auxiliary Features: 00000000
> >> >> (XEN)   Memory Model Features: 10201105 20000000 01240000 02102211
> >> >> (XEN)  ISA Features: 02101110 13112111 21232041 11112131 10011142 00000000
> >> >> (XEN) Platform: TI DRA7
> >> >> (XEN) /psci method must be smc, but is: "hvc"
> >> >> (XEN) Set AuxCoreBoot1 to 00000000dfe0004c (0020004c)
> >> >> (XEN) Set AuxCoreBoot0 to 0x20
> >> >> (XEN) Generic Timer IRQ: phys=30 hyp=26 virt=27
> >> >> (XEN) Using generic timer at 6144 KHz
> >> >> (XEN) GIC initialization:
> >> >> (XEN)         gic_dist_addr=0000000048211000
> >> >> (XEN)         gic_cpu_addr=0000000048212000
> >> >> (XEN)         gic_hyp_addr=0000000048214000
> >> >> (XEN)         gic_vcpu_addr=0000000048216000
> >> >> (XEN)         gic_maintenance_irq=25
> >> >> (XEN) GIC: 192 lines, 2 cpus, secure (IID 0000043b).
> >> >> (XEN) Using scheduler: SMP Credit Scheduler (credit)
> >> >> (XEN) I/O virtualisation disabled
> >> >> (XEN) Allocated console ring of 16 KiB.
> >> >> (XEN) VFP implementer 0x41 architecture 4 part 0x30 variant 0xf rev 0x0
> >> >> (XEN) Bringing up CPU1
> >> >> - CPU 00000001 booting -
> >> >> - Xen starting in Hyp mode -
> >> >> - Setting up control registers -
> >> >> - Turning on paging -
> >> >> - Ready -
> >> >> (XEN) CPU 1 booted.
> >> >> (XEN) Brought up 2 CPUs
> >> >> (XEN) *** LOADING DOMAIN 0 ***
> >> >> (XEN) Loading kernel from boot module 2
> >> >> (XEN) Populate P2M 0xc8000000->0xd0000000 (1:1 mapping for dom0)
> >> >> (XEN) Loading zImage from 00000000c0000040 to 00000000cfc00000-00000000cff50c48
> >> >> (XEN) Loading dom0 DTB to 0x00000000cfa00000-0x00000000cfa05ba8
> >> >> (XEN) Std. Loglevel: All
> >> >> (XEN) Guest Loglevel: All
> >> >> (XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch
> >> >> input to Xen)
> >> >> (XEN) Freed 272kB init memory.
> >> >> (XEN) gic.c:673:d0v0 trying to inject irq=2 into d0v0, when it is
> >> >> already pending in LR0
> >> >> (XEN) gic.c:673:d0v0 trying to inject irq=2 into d0v0, when it is
> >> >> already pending in LR0
> >> >> [    0.000000] /cpus/cpu@0 missing clock-frequency property
> >> >> [    0.000000] /cpus/cpu@1 missing clock-frequency property
> >> >> [    0.140625] omap-gpmc omap-gpmc: failed to reserve memory
> >> >> [    0.187500] omap_l3_noc ocp.3: couldn't find resource 2
> >> >> [    0.273437] i2c i2c-1: of_i2c: invalid reg on
> >> >> /ocp/i2c@48072000/camera_ov10635
> >> >> [    0.437500] ldo3: operation not allowed
> >> >> [    0.437500] omapdss HDMI error: can't set the voltage regulator
> >> >> [    0.468750] tfc_s9700 display0: tfc_s9700_probe probe
> >> >> [    0.468750] ov1063x 1-0030: No deserializer node found
> >> >> [    0.468750] ov1063x 1-0030: No serializer node found
> >> >> [    0.468750] ov1063x 1-0030: Failed writing register 0x0103!
> >> >> [    0.468750] dra7xx-vip vip1-0: Waiting for I2C subdevice 30
> >> >> [    0.578125] ahci ahci.0.auto: can't get clock
> >> >> [    0.898437] ldc_module_init
> >> >> [    1.304687] Missing dual_emac_res_vlan in DT.
> >> >> [    1.304687] Using 1 as Reserved VLAN for 0 slave
> >> >> [    1.312500] Missing dual_emac_res_vlan in DT.
> >> >> [    1.320312] Using 2 as Reserved VLAN for 1 slave
> >> >> [    1.382812] Freeing init memory: 236K
> >> >> sh: write error: No such device
> >> >> Cannot identify '/dev/camera0': 2, No such file or directory
> >> >> Parsing config from /xen/images/DomUAndroid.cfg
> >> >> XSM Disabled: seclabel not supported
> >> >> (XEN) do_physdev_op 16 cmd=13: not implemented yet
> >> >> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give
> >> >> dom1 access to irq 53: Function not implemented
> >> >> (XEN) do_physdev_op 16 cmd=13: not implemented yet
> >> >> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give
> >> >> dom1 access to irq 71: Function not implemented
> >> >> (XEN) do_physdev_op 16 cmd=13: not implemented yet
> >> >> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give
> >> >> dom1 access to irq 173: Function not implemented
> >> >> (XEN) do_physdev_op 16 cmd=13: not implemented yet
> >> >> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give
> >> >> dom1 access to irq 174: Function not implemented
> >> >> Turning on vfb in domain 1
> >> >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is
> >> >> still lr_pending
> >> >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is
> >> >> still lr_pending
> >> >> Parsing config from /xen/images/DomUQNX.cfg
> >> >> XSM Disabled: seclabel not supported(XEN) gic.c:617:d0v1 trying to
> >> >> inject irq=2 into d0v0, when it is still lr_pending
> >> >>
> >> >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is
> >> >> still lr_pending
> >> >> [    4.304687] dra7-evm-sound sound.17: cpu dai node is invalid
> >> >> [    4.312500] dra7-evm-sound sound.17: failed to add bluetooth dai link -22
> >> >> xc: error: panic: xc_dom_core.c:644: xc_dom_find_loader: no loader
> >> >> found: Invalid kernel
> >> >> libxl: error: libxl_dom.c:436:libxl__build_pv: xc_dom_parse_image
> >> >> failed: No such file or directory
> >> >> libxl: error: libxl_create.c:1030:domcreate_rebuild_done: cannot
> >> >> (re-)build domain: -3
> >> >> (XEN) gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is
> >> >> still lr_pending
> >> >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is
> >> >> still lr_pending
> >> >> Turning on 'vsnd' in domain '1' (dev_id: '0')
> >> >> Turning on vkbd in domain 1
> >> >> (XEN) gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is
> >> >> still lr_pending
> >> >> (XEN) gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is
> >> >> still lr_pending
> >> >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is
> >> >> still lr_pending
> >> >>
> >> >> Please press Enter to activate this console. (XEN) gic.c:617:d0v1
> >> >> trying to inject irq=2 into d0v0, when it is still lr_pending
> >> >>
> >> >> On Tue, Nov 18, 2014 at 6:18 PM, Andrii Tseglytskyi
> >> >> <andrii.tseglytskyi@globallogic.com> wrote:
> >> >> > OK got it. Give me a few mins
> >> >> >
> >> >> > On Tue, Nov 18, 2014 at 6:14 PM, Stefano Stabellini
> >> >> > <stefano.stabellini@eu.citrix.com> wrote:
> >> >> >> It is not the same: I would like to set GICH_V2_LR_MAINTENANCE_IRQ only
> >> >> >> for non-hardware irqs (desc == NULL) and keep avoiding
> >> >> >> GICH_V2_LR_MAINTENANCE_IRQ and setting GICH_LR_HW for hardware irqs.
> >> >> >>
> >> >> >> Also testing on 394b7e587b05d0f4a5fd6f067b38339ab5a77121 would avoid
> >> >> >> other potential bugs introduced later.
> >> >> >>
> >> >> >> On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote:
> >> >> >>> What if I try on top of current master branch the following code:
> >> >> >>>
> >> >> >>> diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
> >> >> >>> index 31fb81a..6764ab7 100644
> >> >> >>> --- a/xen/arch/arm/gic-v2.c
> >> >> >>> +++ b/xen/arch/arm/gic-v2.c
> >> >> >>> @@ -36,6 +36,8 @@
> >> >> >>>  #include <asm/io.h>
> >> >> >>>  #include <asm/gic.h>
> >> >> >>>
> >> >> >>> +#define GIC_DEBUG 1
> >> >> >>> +
> >> >> >>>  /*
> >> >> >>>   * LR register definitions are GIC v2 specific.
> >> >> >>>   * Moved these definitions from header file to here
> >> >> >>> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> >> >> >>> index bcaded9..c03d6a6 100644
> >> >> >>> --- a/xen/arch/arm/gic.c
> >> >> >>> +++ b/xen/arch/arm/gic.c
> >> >> >>> @@ -41,7 +41,7 @@ static DEFINE_PER_CPU(uint64_t, lr_mask);
> >> >> >>>
> >> >> >>>  #define lr_all_full() (this_cpu(lr_mask) == ((1 <<
> >> >> >>> gic_hw_ops->info->nr_lrs) - 1))
> >> >> >>>
> >> >> >>> -#undef GIC_DEBUG
> >> >> >>> +#define GIC_DEBUG 1
> >> >> >>>
> >> >> >>>  static void gic_update_one_lr(struct vcpu *v, int i);
> >> >> >>>
> >> >> >>> It is equivalent to what you proposing - my code contains
> >> >> >>> PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI, as result the following lone will
> >> >> >>> be executed:
> >> >> >>>  lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ; inside gicv2_update_lr() function
> >> >> >>>
> >> >> >>> regards,
> >> >> >>> Andrii
> >> >> >>>
> >> >> >>> On Tue, Nov 18, 2014 at 5:39 PM, Stefano Stabellini
> >> >> >>> <stefano.stabellini@eu.citrix.com> wrote:
> >> >> >>> > On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote:
> >> >> >>> >> OK, I see that GICH_V2_LR_MAINTENANCE_IRQ must always be set and
> >> >> >>> >> everything works fine
> >> >> >>> >> The following 2 patches fixes xen/master for my platform.
> >> >> >>> >>
> >> >> >>> >> Stefano, could you please take a look to these changes?
> >> >> >>> >>
> >> >> >>> >> commit 3628a0aa35706a8f532af865ed784536ce514eca
> >> >> >>> >> Author: Andrii Tseglytskyi <andrii.tseglytskyi@globallogic.com>
> >> >> >>> >> Date:   Tue Nov 18 14:20:42 2014 +0200
> >> >> >>> >>
> >> >> >>> >>     xen/arm: dra7: always set GICH_V2_LR_MAINTENANCE_IRQ flag
> >> >> >>> >>
> >> >> >>> >>     Change-Id: Ia380b3507a182b11592588f65fd23693d4f87434
> >> >> >>> >>     Signed-off-by: Andrii Tseglytskyi <andrii.tseglytskyi@globallogic.com>
> >> >> >>> >>
> >> >> >>> >> diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
> >> >> >>> >> index 31fb81a..093ecdb 100644
> >> >> >>> >> --- a/xen/arch/arm/gic-v2.c
> >> >> >>> >> +++ b/xen/arch/arm/gic-v2.c
> >> >> >>> >> @@ -396,13 +396,14 @@ static void gicv2_update_lr(int lr, const struct
> >> >> >>> >> pending_irq *p,
> >> >> >>> >>                                               << GICH_V2_LR_PRIORITY_SHIFT) |
> >> >> >>> >>                ((p->irq & GICH_V2_LR_VIRTUAL_MASK) <<
> >> >> >>> >> GICH_V2_LR_VIRTUAL_SHIFT));
> >> >> >>> >>
> >> >> >>> >> -    if ( p->desc != NULL )
> >> >> >>> >> +    if ( platform_has_quirk(PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI) )
> >> >> >>> >>      {
> >> >> >>> >> -        if ( platform_has_quirk(PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI) )
> >> >> >>> >> -            lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ;
> >> >> >>> >> -        else
> >> >> >>> >> -            lr_reg |= GICH_V2_LR_HW | ((p->desc->irq &
> >> >> >>> >> GICH_V2_LR_PHYSICAL_MASK )
> >> >> >>> >> -                            << GICH_V2_LR_PHYSICAL_SHIFT);
> >> >> >>> >> +        lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ;
> >> >> >>> >> +    }
> >> >> >>> >> +    else if ( p->desc != NULL )
> >> >> >>> >> +    {
> >> >> >>> >> +        lr_reg |= GICH_V2_LR_HW | ((p->desc->irq & GICH_V2_LR_PHYSICAL_MASK )
> >> >> >>> >> +                       << GICH_V2_LR_PHYSICAL_SHIFT);
> >> >> >>> >>      }
> >> >> >>> >>
> >> >> >>> >>      writel_gich(lr_reg, GICH_LR + lr * 4);
> >> >> >>> >
> >> >> >>> > Actually in case p->desc == NULL (the irq is not an hardware irq, it
> >> >> >>> > could be the virtual timer irq or the evtchn irq), you shouldn't need
> >> >> >>> > the maintenance interrupt, if the bug was really due to GICH_LR_HW not
> >> >> >>> > working correctly on OMAP5. This changes might only be better at
> >> >> >>> > "hiding" the real issue.
> >> >> >>> >
> >> >> >>> > Maybe the problem is exactly the opposite: the new scheme for avoiding
> >> >> >>> > maintenance interrupts doesn't work for software interrupts.
> >> >> >>> > The commit that should make them work correctly after the
> >> >> >>> > no-maintenance-irq commit is 394b7e587b05d0f4a5fd6f067b38339ab5a77121
> >> >> >>> > If you look at the changes to gic_update_one_lr in that commit, you'll
> >> >> >>> > see that is going to set a software irq as PENDING if it is already ACTIVE.
> >> >> >>> > Maybe that doesn't work correctly on OMAP5.
> >> >> >>> >
> >> >> >>> > Could you try this patch on top of
> >> >> >>> > 394b7e587b05d0f4a5fd6f067b38339ab5a77121?  It should help us understand
> >> >> >>> > if the problem is specifically with software irqs.
> >> >> >>> >
> >> >> >>> >
> >> >> >>> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> >> >> >>> > index b7516c0..d8a17c9 100644
> >> >> >>> > --- a/xen/arch/arm/gic.c
> >> >> >>> > +++ b/xen/arch/arm/gic.c
> >> >> >>> > @@ -66,7 +66,7 @@ static DEFINE_PER_CPU(u8, gic_cpu_id);
> >> >> >>> >  /* Maximum cpu interface per GIC */
> >> >> >>> >  #define NR_GIC_CPU_IF 8
> >> >> >>> >
> >> >> >>> > -#undef GIC_DEBUG
> >> >> >>> > +#define GIC_DEBUG 1
> >> >> >>> >
> >> >> >>> >  static void gic_update_one_lr(struct vcpu *v, int i);
> >> >> >>> >
> >> >> >>> > @@ -563,6 +563,8 @@ static inline void gic_set_lr(int lr, struct pending_irq *p,
> >> >> >>> >          ((p->irq & GICH_LR_VIRTUAL_MASK) << GICH_LR_VIRTUAL_SHIFT);
> >> >> >>> >      if ( p->desc != NULL )
> >> >> >>> >          lr_val |= GICH_LR_HW | (p->desc->irq << GICH_LR_PHYSICAL_SHIFT);
> >> >> >>> > +    else
> >> >> >>> > +        lr_val |= GICH_LR_MAINTENANCE_IRQ;
> >> >> >>> >
> >> >> >>> >      GICH[GICH_LR + lr] = lr_val;
> >> >> >>> >
> >> >> >>>
> >> >> >>>
> >> >> >>>
> >> >> >>> --
> >> >> >>>
> >> >> >>> Andrii Tseglytskyi | Embedded Dev
> >> >> >>> GlobalLogic
> >> >> >>> www.globallogic.com
> >> >> >>>
> >> >> >
> >> >> >
> >> >> >
> >> >> > --
> >> >> >
> >> >> > Andrii Tseglytskyi | Embedded Dev
> >> >> > GlobalLogic
> >> >> > www.globallogic.com
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >>
> >> >> Andrii Tseglytskyi | Embedded Dev
> >> >> GlobalLogic
> >> >> www.globallogic.com
> >> >>
> >>
> >>
> >>
> >> --
> >>
> >> Andrii Tseglytskyi | Embedded Dev
> >> GlobalLogic
> >> www.globallogic.com
> >>
> 
> 
> 
> -- 
> 
> Andrii Tseglytskyi | Embedded Dev
> GlobalLogic
> www.globallogic.com
> 

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: Xen 4.5 random freeze question
  2014-11-19 11:42                                                 ` Stefano Stabellini
@ 2014-11-19 11:57                                                   ` Andrii Tseglytskyi
  2014-11-19 11:59                                                     ` Stefano Stabellini
  2014-11-19 12:13                                                   ` Ian Campbell
  1 sibling, 1 reply; 66+ messages in thread
From: Andrii Tseglytskyi @ 2014-11-19 11:57 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: Julien Grall, Ian Campbell, xen-devel

On Wed, Nov 19, 2014 at 1:42 PM, Stefano Stabellini
<stefano.stabellini@eu.citrix.com> wrote:
> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>> On Wed, Nov 19, 2014 at 1:12 PM, Stefano Stabellini
>> <stefano.stabellini@eu.citrix.com> wrote:
>> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>> >> Hi Stefano,
>> >>
>> >> Thank you for your support.
>> >>
>> >> You are right - with latest change you've proposed I got a continuous
>> >> prints during platform hang:
>> >>
>> >> (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
>> >> (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
>> >> (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
>> >> (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
>> >> (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
>> >> (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
>> >> (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
>> >>
>> >> Looks line issue needs further deeper debugging.
>> >
>> > Cool! You could simply print what irqs are in all LRs when they are
>> > full, for example you could call gic_dump_info. That would tell us what
>> > is taking all the LRs space we have.
>> >
>> > How many LRs are available on omap5 anyway?
>>
>> :) Already done this:
>>
>>
>> (XEN) gic.c:725:d0v0 LRs full, not injecting irq=27 nr_lrs 4 i 4 into d0v0
>> (XEN) GICH_LRs (vcpu 0) mask=f
>> (XEN)    HW_LR[0]=1a00001f
>> (XEN)    HW_LR[1]=9a00e439
>> (XEN)    HW_LR[2]=1a000002
>> (XEN)    HW_LR[3]=9a015856
>> (XEN) Inflight irq=31 lr=0
>> (XEN) Inflight irq=57 lr=1
>> (XEN) Inflight irq=2 lr=2
>> (XEN) Inflight irq=86 lr=3
>> (XEN) Inflight irq=27 lr=255
>> (XEN) Pending irq=27
>
> 27 should be the virtual timer if I remember correctly.
>
> So it looks like there is not actually anything wrong, is just that you
> have too much inflight irqs? It should cause problems because in that
> case GICH_HCR_UIE should be set and you should get a maintenance
> interrupt when LRs become available (actually when "none, or only one,
> of the List register entries is marked as a valid interrupt").
>
> Maybe GICH_HCR_UIE is the one that doesn't work properly. It might be
> worth checking that you are receiving maintenance interrupts:
>
>
> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> index b7516c0..b3eaa44 100644
> --- a/xen/arch/arm/gic.c
> +++ b/xen/arch/arm/gic.c
> @@ -868,6 +868,8 @@ static void maintenance_interrupt(int irq, void *dev_id, struct cpu_user_regs *r
>       * on return to guest that is going to clear the old LRs and inject
>       * new interrupts.
>       */
> +
> +    gdprintk(XENLOG_DEBUG, "maintenance interrupt\n");
>  }
>

I observe this print during hang, so maintenance interrupt occurs.
Can I perform some kind of LRs cleanup inside its handler?

>  void gic_dump_info(struct vcpu *v)
>
>
> You could also try to replace GICH_HCR_UIE with GICH_HCR_NPIE, you
> should still be receiving maintenance interrupts when one or more LRs
> become available.

Sorry didn't get, do you mean this change ?

@@ -759,9 +760,9 @@ void gic_inject(void)


     if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
-        GICH[GICH_HCR] |= GICH_HCR_UIE;
+        GICH[GICH_HCR] |= GICH_HCR_NPIE;
     else
-        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
+        GICH[GICH_HCR] &= ~GICH_HCR_NPIE;

 }



>
>
>> >
>> > I doubt you have so much interrupt traffic to actually fill all the LRs,
>> > so I am thinking that a few LRs might not be cleared properly (that
>> > should happen on hypervisor entry, gic_update_one_lr should take care of
>> > it).
>>
>> This actually explains why this happens during domU start - SGI
>> traffic might be very heavy this time
>>
>> >
>> >
>> >> Regards,
>> >> Andrii
>> >>
>> >> On Tue, Nov 18, 2014 at 7:51 PM, Stefano Stabellini
>> >> <stefano.stabellini@eu.citrix.com> wrote:
>> >> > Hello Andrii,
>> >> > we are getting closer :-)
>> >> >
>> >> > It would help if you post the output with GIC_DEBUG defined but without
>> >> > the other change that "fixes" the issue.
>> >> >
>> >> > I think the problem is probably due to software irqs.
>> >> > You are getting too many
>> >> >
>> >> > gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is still lr_pending
>> >> >
>> >> > messages. That means you are loosing virtual SGIs (guest VCPU to guest
>> >> > VCPU). It would be best to investigate why, especially if you get many
>> >> > more of the same messages without the MAINTENANCE_IRQ change I
>> >> > suggested.
>> >> >
>> >> > This patch might also help understading the problem more:
>> >> >
>> >> >
>> >> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
>> >> > index b7516c0..5eaeca2 100644
>> >> > --- a/xen/arch/arm/gic.c
>> >> > +++ b/xen/arch/arm/gic.c
>> >> > @@ -717,7 +717,12 @@ static void gic_restore_pending_irqs(struct vcpu *v)
>> >> >      list_for_each_entry_safe ( p, t, &v->arch.vgic.lr_pending, lr_queue )
>> >> >      {
>> >> >          i = find_first_zero_bit(&this_cpu(lr_mask), nr_lrs);
>> >> > -        if ( i >= nr_lrs ) return;
>> >> > +        if ( i >= nr_lrs )
>> >> > +        {
>> >> > +            gdprintk(XENLOG_DEBUG, "LRs full, not injecting irq=%u into d%dv%d\n",
>> >> > +                    p->irq, v->domain->domain_id, v->vcpu_id);
>> >> > +            continue;
>> >> > +        }
>> >> >
>> >> >          spin_lock_irqsave(&gic.lock, flags);
>> >> >          gic_set_lr(i, p, GICH_LR_PENDING);
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote:
>> >> >> Hi Stefano,
>> >> >>
>> >> >> No hangs with this change.
>> >> >> Complete log is the following:
>> >> >>
>> >> >> U-Boot SPL 2013.10-00499-g062782f (Oct 14 2014 - 11:36:26)
>> >> >> DRA752 ES1.0
>> >> >> <ethaddr> not set. Validating first E-fuse MAC
>> >> >> cpsw
>> >> >> - UART enabled -
>> >> >> - CPU 00000000 booting -
>> >> >> - Xen starting in Hyp mode -
>> >> >> - Zero BSS -
>> >> >> - Setting up control registers -
>> >> >> - Turning on paging -
>> >> >> - Ready -
>> >> >> (XEN) Checking for initrd in /chosen
>> >> >> (XEN) RAM: 0000000080000000 - 000000009fffffff
>> >> >> (XEN) RAM: 00000000a0000000 - 00000000bfffffff
>> >> >> (XEN) RAM: 00000000c0000000 - 00000000dfffffff
>> >> >> (XEN)
>> >> >> (XEN) MODULE[1]: 00000000c2000000 - 00000000c20069aa
>> >> >> (XEN) MODULE[2]: 00000000c0000000 - 00000000c2000000
>> >> >> (XEN) MODULE[3]: 0000000000000000 - 0000000000000000
>> >> >> (XEN) MODULE[4]: 00000000c3000000 - 00000000c3010000
>> >> >> (XEN)  RESVD[0]: 00000000ba300000 - 00000000bfd00000
>> >> >> (XEN)  RESVD[1]: 0000000095800000 - 0000000095900000
>> >> >> (XEN)  RESVD[2]: 0000000098a00000 - 0000000098b00000
>> >> >> (XEN)  RESVD[3]: 0000000095f00000 - 0000000098a00000
>> >> >> (XEN)  RESVD[4]: 0000000095900000 - 0000000095f00000
>> >> >> (XEN)
>> >> >> (XEN) Command line: dom0_mem=128M console=dtuart dtuart=serial0
>> >> >> dom0_max_vcpus=2 bootscrub=0 flask_enforcing=1
>> >> >> (XEN) Placing Xen at 0x00000000dfe00000-0x00000000e0000000
>> >> >> (XEN) Xen heap: 00000000d2000000-00000000de000000 (49152 pages)
>> >> >> (XEN) Dom heap: 344064 pages
>> >> >> (XEN) Domain heap initialised
>> >> >> (XEN) Looking for UART console serial0
>> >> >>  Xen 4.5-unstable
>> >> >> (XEN) Xen version 4.5-unstable (atseglytskyi@)
>> >> >> (arm-linux-gnueabihf-gcc (crosstool-NG
>> >> >> linaro-1.13.1-4.7-2013.04-20130415 - Linaro GCC 2013.04) 4.7.3
>> >> >> 20130328 (prerelease)) debu4
>> >> >> (XEN) Latest ChangeSet: Thu Jul 3 12:55:26 2014 +0300 git:3ee354f-dirty
>> >> >> (XEN) Processor: 412fc0f2: "ARM Limited", variant: 0x2, part 0xc0f, rev 0x2
>> >> >> (XEN) 32-bit Execution:
>> >> >> (XEN)   Processor Features: 00001131:00011011
>> >> >> (XEN)     Instruction Sets: AArch32 Thumb Thumb-2 ThumbEE Jazelle
>> >> >> (XEN)     Extensions: GenericTimer Security
>> >> >> (XEN)   Debug Features: 02010555
>> >> >> (XEN)   Auxiliary Features: 00000000
>> >> >> (XEN)   Memory Model Features: 10201105 20000000 01240000 02102211
>> >> >> (XEN)  ISA Features: 02101110 13112111 21232041 11112131 10011142 00000000
>> >> >> (XEN) Platform: TI DRA7
>> >> >> (XEN) /psci method must be smc, but is: "hvc"
>> >> >> (XEN) Set AuxCoreBoot1 to 00000000dfe0004c (0020004c)
>> >> >> (XEN) Set AuxCoreBoot0 to 0x20
>> >> >> (XEN) Generic Timer IRQ: phys=30 hyp=26 virt=27
>> >> >> (XEN) Using generic timer at 6144 KHz
>> >> >> (XEN) GIC initialization:
>> >> >> (XEN)         gic_dist_addr=0000000048211000
>> >> >> (XEN)         gic_cpu_addr=0000000048212000
>> >> >> (XEN)         gic_hyp_addr=0000000048214000
>> >> >> (XEN)         gic_vcpu_addr=0000000048216000
>> >> >> (XEN)         gic_maintenance_irq=25
>> >> >> (XEN) GIC: 192 lines, 2 cpus, secure (IID 0000043b).
>> >> >> (XEN) Using scheduler: SMP Credit Scheduler (credit)
>> >> >> (XEN) I/O virtualisation disabled
>> >> >> (XEN) Allocated console ring of 16 KiB.
>> >> >> (XEN) VFP implementer 0x41 architecture 4 part 0x30 variant 0xf rev 0x0
>> >> >> (XEN) Bringing up CPU1
>> >> >> - CPU 00000001 booting -
>> >> >> - Xen starting in Hyp mode -
>> >> >> - Setting up control registers -
>> >> >> - Turning on paging -
>> >> >> - Ready -
>> >> >> (XEN) CPU 1 booted.
>> >> >> (XEN) Brought up 2 CPUs
>> >> >> (XEN) *** LOADING DOMAIN 0 ***
>> >> >> (XEN) Loading kernel from boot module 2
>> >> >> (XEN) Populate P2M 0xc8000000->0xd0000000 (1:1 mapping for dom0)
>> >> >> (XEN) Loading zImage from 00000000c0000040 to 00000000cfc00000-00000000cff50c48
>> >> >> (XEN) Loading dom0 DTB to 0x00000000cfa00000-0x00000000cfa05ba8
>> >> >> (XEN) Std. Loglevel: All
>> >> >> (XEN) Guest Loglevel: All
>> >> >> (XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch
>> >> >> input to Xen)
>> >> >> (XEN) Freed 272kB init memory.
>> >> >> (XEN) gic.c:673:d0v0 trying to inject irq=2 into d0v0, when it is
>> >> >> already pending in LR0
>> >> >> (XEN) gic.c:673:d0v0 trying to inject irq=2 into d0v0, when it is
>> >> >> already pending in LR0
>> >> >> [    0.000000] /cpus/cpu@0 missing clock-frequency property
>> >> >> [    0.000000] /cpus/cpu@1 missing clock-frequency property
>> >> >> [    0.140625] omap-gpmc omap-gpmc: failed to reserve memory
>> >> >> [    0.187500] omap_l3_noc ocp.3: couldn't find resource 2
>> >> >> [    0.273437] i2c i2c-1: of_i2c: invalid reg on
>> >> >> /ocp/i2c@48072000/camera_ov10635
>> >> >> [    0.437500] ldo3: operation not allowed
>> >> >> [    0.437500] omapdss HDMI error: can't set the voltage regulator
>> >> >> [    0.468750] tfc_s9700 display0: tfc_s9700_probe probe
>> >> >> [    0.468750] ov1063x 1-0030: No deserializer node found
>> >> >> [    0.468750] ov1063x 1-0030: No serializer node found
>> >> >> [    0.468750] ov1063x 1-0030: Failed writing register 0x0103!
>> >> >> [    0.468750] dra7xx-vip vip1-0: Waiting for I2C subdevice 30
>> >> >> [    0.578125] ahci ahci.0.auto: can't get clock
>> >> >> [    0.898437] ldc_module_init
>> >> >> [    1.304687] Missing dual_emac_res_vlan in DT.
>> >> >> [    1.304687] Using 1 as Reserved VLAN for 0 slave
>> >> >> [    1.312500] Missing dual_emac_res_vlan in DT.
>> >> >> [    1.320312] Using 2 as Reserved VLAN for 1 slave
>> >> >> [    1.382812] Freeing init memory: 236K
>> >> >> sh: write error: No such device
>> >> >> Cannot identify '/dev/camera0': 2, No such file or directory
>> >> >> Parsing config from /xen/images/DomUAndroid.cfg
>> >> >> XSM Disabled: seclabel not supported
>> >> >> (XEN) do_physdev_op 16 cmd=13: not implemented yet
>> >> >> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give
>> >> >> dom1 access to irq 53: Function not implemented
>> >> >> (XEN) do_physdev_op 16 cmd=13: not implemented yet
>> >> >> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give
>> >> >> dom1 access to irq 71: Function not implemented
>> >> >> (XEN) do_physdev_op 16 cmd=13: not implemented yet
>> >> >> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give
>> >> >> dom1 access to irq 173: Function not implemented
>> >> >> (XEN) do_physdev_op 16 cmd=13: not implemented yet
>> >> >> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give
>> >> >> dom1 access to irq 174: Function not implemented
>> >> >> Turning on vfb in domain 1
>> >> >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is
>> >> >> still lr_pending
>> >> >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is
>> >> >> still lr_pending
>> >> >> Parsing config from /xen/images/DomUQNX.cfg
>> >> >> XSM Disabled: seclabel not supported(XEN) gic.c:617:d0v1 trying to
>> >> >> inject irq=2 into d0v0, when it is still lr_pending
>> >> >>
>> >> >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is
>> >> >> still lr_pending
>> >> >> [    4.304687] dra7-evm-sound sound.17: cpu dai node is invalid
>> >> >> [    4.312500] dra7-evm-sound sound.17: failed to add bluetooth dai link -22
>> >> >> xc: error: panic: xc_dom_core.c:644: xc_dom_find_loader: no loader
>> >> >> found: Invalid kernel
>> >> >> libxl: error: libxl_dom.c:436:libxl__build_pv: xc_dom_parse_image
>> >> >> failed: No such file or directory
>> >> >> libxl: error: libxl_create.c:1030:domcreate_rebuild_done: cannot
>> >> >> (re-)build domain: -3
>> >> >> (XEN) gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is
>> >> >> still lr_pending
>> >> >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is
>> >> >> still lr_pending
>> >> >> Turning on 'vsnd' in domain '1' (dev_id: '0')
>> >> >> Turning on vkbd in domain 1
>> >> >> (XEN) gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is
>> >> >> still lr_pending
>> >> >> (XEN) gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is
>> >> >> still lr_pending
>> >> >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is
>> >> >> still lr_pending
>> >> >>
>> >> >> Please press Enter to activate this console. (XEN) gic.c:617:d0v1
>> >> >> trying to inject irq=2 into d0v0, when it is still lr_pending
>> >> >>
>> >> >> On Tue, Nov 18, 2014 at 6:18 PM, Andrii Tseglytskyi
>> >> >> <andrii.tseglytskyi@globallogic.com> wrote:
>> >> >> > OK got it. Give me a few mins
>> >> >> >
>> >> >> > On Tue, Nov 18, 2014 at 6:14 PM, Stefano Stabellini
>> >> >> > <stefano.stabellini@eu.citrix.com> wrote:
>> >> >> >> It is not the same: I would like to set GICH_V2_LR_MAINTENANCE_IRQ only
>> >> >> >> for non-hardware irqs (desc == NULL) and keep avoiding
>> >> >> >> GICH_V2_LR_MAINTENANCE_IRQ and setting GICH_LR_HW for hardware irqs.
>> >> >> >>
>> >> >> >> Also testing on 394b7e587b05d0f4a5fd6f067b38339ab5a77121 would avoid
>> >> >> >> other potential bugs introduced later.
>> >> >> >>
>> >> >> >> On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote:
>> >> >> >>> What if I try on top of current master branch the following code:
>> >> >> >>>
>> >> >> >>> diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
>> >> >> >>> index 31fb81a..6764ab7 100644
>> >> >> >>> --- a/xen/arch/arm/gic-v2.c
>> >> >> >>> +++ b/xen/arch/arm/gic-v2.c
>> >> >> >>> @@ -36,6 +36,8 @@
>> >> >> >>>  #include <asm/io.h>
>> >> >> >>>  #include <asm/gic.h>
>> >> >> >>>
>> >> >> >>> +#define GIC_DEBUG 1
>> >> >> >>> +
>> >> >> >>>  /*
>> >> >> >>>   * LR register definitions are GIC v2 specific.
>> >> >> >>>   * Moved these definitions from header file to here
>> >> >> >>> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
>> >> >> >>> index bcaded9..c03d6a6 100644
>> >> >> >>> --- a/xen/arch/arm/gic.c
>> >> >> >>> +++ b/xen/arch/arm/gic.c
>> >> >> >>> @@ -41,7 +41,7 @@ static DEFINE_PER_CPU(uint64_t, lr_mask);
>> >> >> >>>
>> >> >> >>>  #define lr_all_full() (this_cpu(lr_mask) == ((1 <<
>> >> >> >>> gic_hw_ops->info->nr_lrs) - 1))
>> >> >> >>>
>> >> >> >>> -#undef GIC_DEBUG
>> >> >> >>> +#define GIC_DEBUG 1
>> >> >> >>>
>> >> >> >>>  static void gic_update_one_lr(struct vcpu *v, int i);
>> >> >> >>>
>> >> >> >>> It is equivalent to what you proposing - my code contains
>> >> >> >>> PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI, as result the following lone will
>> >> >> >>> be executed:
>> >> >> >>>  lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ; inside gicv2_update_lr() function
>> >> >> >>>
>> >> >> >>> regards,
>> >> >> >>> Andrii
>> >> >> >>>
>> >> >> >>> On Tue, Nov 18, 2014 at 5:39 PM, Stefano Stabellini
>> >> >> >>> <stefano.stabellini@eu.citrix.com> wrote:
>> >> >> >>> > On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote:
>> >> >> >>> >> OK, I see that GICH_V2_LR_MAINTENANCE_IRQ must always be set and
>> >> >> >>> >> everything works fine
>> >> >> >>> >> The following 2 patches fixes xen/master for my platform.
>> >> >> >>> >>
>> >> >> >>> >> Stefano, could you please take a look to these changes?
>> >> >> >>> >>
>> >> >> >>> >> commit 3628a0aa35706a8f532af865ed784536ce514eca
>> >> >> >>> >> Author: Andrii Tseglytskyi <andrii.tseglytskyi@globallogic.com>
>> >> >> >>> >> Date:   Tue Nov 18 14:20:42 2014 +0200
>> >> >> >>> >>
>> >> >> >>> >>     xen/arm: dra7: always set GICH_V2_LR_MAINTENANCE_IRQ flag
>> >> >> >>> >>
>> >> >> >>> >>     Change-Id: Ia380b3507a182b11592588f65fd23693d4f87434
>> >> >> >>> >>     Signed-off-by: Andrii Tseglytskyi <andrii.tseglytskyi@globallogic.com>
>> >> >> >>> >>
>> >> >> >>> >> diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
>> >> >> >>> >> index 31fb81a..093ecdb 100644
>> >> >> >>> >> --- a/xen/arch/arm/gic-v2.c
>> >> >> >>> >> +++ b/xen/arch/arm/gic-v2.c
>> >> >> >>> >> @@ -396,13 +396,14 @@ static void gicv2_update_lr(int lr, const struct
>> >> >> >>> >> pending_irq *p,
>> >> >> >>> >>                                               << GICH_V2_LR_PRIORITY_SHIFT) |
>> >> >> >>> >>                ((p->irq & GICH_V2_LR_VIRTUAL_MASK) <<
>> >> >> >>> >> GICH_V2_LR_VIRTUAL_SHIFT));
>> >> >> >>> >>
>> >> >> >>> >> -    if ( p->desc != NULL )
>> >> >> >>> >> +    if ( platform_has_quirk(PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI) )
>> >> >> >>> >>      {
>> >> >> >>> >> -        if ( platform_has_quirk(PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI) )
>> >> >> >>> >> -            lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ;
>> >> >> >>> >> -        else
>> >> >> >>> >> -            lr_reg |= GICH_V2_LR_HW | ((p->desc->irq &
>> >> >> >>> >> GICH_V2_LR_PHYSICAL_MASK )
>> >> >> >>> >> -                            << GICH_V2_LR_PHYSICAL_SHIFT);
>> >> >> >>> >> +        lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ;
>> >> >> >>> >> +    }
>> >> >> >>> >> +    else if ( p->desc != NULL )
>> >> >> >>> >> +    {
>> >> >> >>> >> +        lr_reg |= GICH_V2_LR_HW | ((p->desc->irq & GICH_V2_LR_PHYSICAL_MASK )
>> >> >> >>> >> +                       << GICH_V2_LR_PHYSICAL_SHIFT);
>> >> >> >>> >>      }
>> >> >> >>> >>
>> >> >> >>> >>      writel_gich(lr_reg, GICH_LR + lr * 4);
>> >> >> >>> >
>> >> >> >>> > Actually in case p->desc == NULL (the irq is not an hardware irq, it
>> >> >> >>> > could be the virtual timer irq or the evtchn irq), you shouldn't need
>> >> >> >>> > the maintenance interrupt, if the bug was really due to GICH_LR_HW not
>> >> >> >>> > working correctly on OMAP5. This changes might only be better at
>> >> >> >>> > "hiding" the real issue.
>> >> >> >>> >
>> >> >> >>> > Maybe the problem is exactly the opposite: the new scheme for avoiding
>> >> >> >>> > maintenance interrupts doesn't work for software interrupts.
>> >> >> >>> > The commit that should make them work correctly after the
>> >> >> >>> > no-maintenance-irq commit is 394b7e587b05d0f4a5fd6f067b38339ab5a77121
>> >> >> >>> > If you look at the changes to gic_update_one_lr in that commit, you'll
>> >> >> >>> > see that is going to set a software irq as PENDING if it is already ACTIVE.
>> >> >> >>> > Maybe that doesn't work correctly on OMAP5.
>> >> >> >>> >
>> >> >> >>> > Could you try this patch on top of
>> >> >> >>> > 394b7e587b05d0f4a5fd6f067b38339ab5a77121?  It should help us understand
>> >> >> >>> > if the problem is specifically with software irqs.
>> >> >> >>> >
>> >> >> >>> >
>> >> >> >>> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
>> >> >> >>> > index b7516c0..d8a17c9 100644
>> >> >> >>> > --- a/xen/arch/arm/gic.c
>> >> >> >>> > +++ b/xen/arch/arm/gic.c
>> >> >> >>> > @@ -66,7 +66,7 @@ static DEFINE_PER_CPU(u8, gic_cpu_id);
>> >> >> >>> >  /* Maximum cpu interface per GIC */
>> >> >> >>> >  #define NR_GIC_CPU_IF 8
>> >> >> >>> >
>> >> >> >>> > -#undef GIC_DEBUG
>> >> >> >>> > +#define GIC_DEBUG 1
>> >> >> >>> >
>> >> >> >>> >  static void gic_update_one_lr(struct vcpu *v, int i);
>> >> >> >>> >
>> >> >> >>> > @@ -563,6 +563,8 @@ static inline void gic_set_lr(int lr, struct pending_irq *p,
>> >> >> >>> >          ((p->irq & GICH_LR_VIRTUAL_MASK) << GICH_LR_VIRTUAL_SHIFT);
>> >> >> >>> >      if ( p->desc != NULL )
>> >> >> >>> >          lr_val |= GICH_LR_HW | (p->desc->irq << GICH_LR_PHYSICAL_SHIFT);
>> >> >> >>> > +    else
>> >> >> >>> > +        lr_val |= GICH_LR_MAINTENANCE_IRQ;
>> >> >> >>> >
>> >> >> >>> >      GICH[GICH_LR + lr] = lr_val;
>> >> >> >>> >
>> >> >> >>>
>> >> >> >>>
>> >> >> >>>
>> >> >> >>> --
>> >> >> >>>
>> >> >> >>> Andrii Tseglytskyi | Embedded Dev
>> >> >> >>> GlobalLogic
>> >> >> >>> www.globallogic.com
>> >> >> >>>
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > --
>> >> >> >
>> >> >> > Andrii Tseglytskyi | Embedded Dev
>> >> >> > GlobalLogic
>> >> >> > www.globallogic.com
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >>
>> >> >> Andrii Tseglytskyi | Embedded Dev
>> >> >> GlobalLogic
>> >> >> www.globallogic.com
>> >> >>
>> >>
>> >>
>> >>
>> >> --
>> >>
>> >> Andrii Tseglytskyi | Embedded Dev
>> >> GlobalLogic
>> >> www.globallogic.com
>> >>
>>
>>
>>
>> --
>>
>> Andrii Tseglytskyi | Embedded Dev
>> GlobalLogic
>> www.globallogic.com
>>



-- 

Andrii Tseglytskyi | Embedded Dev
GlobalLogic
www.globallogic.com

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Xen 4.5 random freeze question
  2014-11-19 11:57                                                   ` Andrii Tseglytskyi
@ 2014-11-19 11:59                                                     ` Stefano Stabellini
  2014-11-19 12:37                                                       ` Andrii Tseglytskyi
  0 siblings, 1 reply; 66+ messages in thread
From: Stefano Stabellini @ 2014-11-19 11:59 UTC (permalink / raw)
  To: Andrii Tseglytskyi
  Cc: Julien Grall, xen-devel, Ian Campbell, Stefano Stabellini

On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> On Wed, Nov 19, 2014 at 1:42 PM, Stefano Stabellini
> <stefano.stabellini@eu.citrix.com> wrote:
> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> >> On Wed, Nov 19, 2014 at 1:12 PM, Stefano Stabellini
> >> <stefano.stabellini@eu.citrix.com> wrote:
> >> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> >> >> Hi Stefano,
> >> >>
> >> >> Thank you for your support.
> >> >>
> >> >> You are right - with latest change you've proposed I got a continuous
> >> >> prints during platform hang:
> >> >>
> >> >> (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
> >> >> (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
> >> >> (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
> >> >> (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
> >> >> (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
> >> >> (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
> >> >> (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0
> >> >>
> >> >> Looks line issue needs further deeper debugging.
> >> >
> >> > Cool! You could simply print what irqs are in all LRs when they are
> >> > full, for example you could call gic_dump_info. That would tell us what
> >> > is taking all the LRs space we have.
> >> >
> >> > How many LRs are available on omap5 anyway?
> >>
> >> :) Already done this:
> >>
> >>
> >> (XEN) gic.c:725:d0v0 LRs full, not injecting irq=27 nr_lrs 4 i 4 into d0v0
> >> (XEN) GICH_LRs (vcpu 0) mask=f
> >> (XEN)    HW_LR[0]=1a00001f
> >> (XEN)    HW_LR[1]=9a00e439
> >> (XEN)    HW_LR[2]=1a000002
> >> (XEN)    HW_LR[3]=9a015856
> >> (XEN) Inflight irq=31 lr=0
> >> (XEN) Inflight irq=57 lr=1
> >> (XEN) Inflight irq=2 lr=2
> >> (XEN) Inflight irq=86 lr=3
> >> (XEN) Inflight irq=27 lr=255
> >> (XEN) Pending irq=27
> >
> > 27 should be the virtual timer if I remember correctly.
> >
> > So it looks like there is not actually anything wrong, is just that you
> > have too much inflight irqs? It should cause problems because in that
> > case GICH_HCR_UIE should be set and you should get a maintenance
> > interrupt when LRs become available (actually when "none, or only one,
> > of the List register entries is marked as a valid interrupt").
> >
> > Maybe GICH_HCR_UIE is the one that doesn't work properly. It might be
> > worth checking that you are receiving maintenance interrupts:
> >
> >
> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> > index b7516c0..b3eaa44 100644
> > --- a/xen/arch/arm/gic.c
> > +++ b/xen/arch/arm/gic.c
> > @@ -868,6 +868,8 @@ static void maintenance_interrupt(int irq, void *dev_id, struct cpu_user_regs *r
> >       * on return to guest that is going to clear the old LRs and inject
> >       * new interrupts.
> >       */
> > +
> > +    gdprintk(XENLOG_DEBUG, "maintenance interrupt\n");
> >  }
> >
> 
> I observe this print during hang, so maintenance interrupt occurs.
> Can I perform some kind of LRs cleanup inside its handler?

It should happen automatically because on hypervisor entry gic_clear_lrs
gets called. In fact by the time you see this message the LRs should
have already been cleared.


> >  void gic_dump_info(struct vcpu *v)
> >
> >
> > You could also try to replace GICH_HCR_UIE with GICH_HCR_NPIE, you
> > should still be receiving maintenance interrupts when one or more LRs
> > become available.
> 
> Sorry didn't get, do you mean this change ?
> 
> @@ -759,9 +760,9 @@ void gic_inject(void)
> 
> 
>      if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
> -        GICH[GICH_HCR] |= GICH_HCR_UIE;
> +        GICH[GICH_HCR] |= GICH_HCR_NPIE;
>      else
> -        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> +        GICH[GICH_HCR] &= ~GICH_HCR_NPIE;
> 
>  }

Yes, exactly


> 
> >
> >
> >> >
> >> > I doubt you have so much interrupt traffic to actually fill all the LRs,
> >> > so I am thinking that a few LRs might not be cleared properly (that
> >> > should happen on hypervisor entry, gic_update_one_lr should take care of
> >> > it).
> >>
> >> This actually explains why this happens during domU start - SGI
> >> traffic might be very heavy this time
> >>
> >> >
> >> >
> >> >> Regards,
> >> >> Andrii
> >> >>
> >> >> On Tue, Nov 18, 2014 at 7:51 PM, Stefano Stabellini
> >> >> <stefano.stabellini@eu.citrix.com> wrote:
> >> >> > Hello Andrii,
> >> >> > we are getting closer :-)
> >> >> >
> >> >> > It would help if you post the output with GIC_DEBUG defined but without
> >> >> > the other change that "fixes" the issue.
> >> >> >
> >> >> > I think the problem is probably due to software irqs.
> >> >> > You are getting too many
> >> >> >
> >> >> > gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is still lr_pending
> >> >> >
> >> >> > messages. That means you are loosing virtual SGIs (guest VCPU to guest
> >> >> > VCPU). It would be best to investigate why, especially if you get many
> >> >> > more of the same messages without the MAINTENANCE_IRQ change I
> >> >> > suggested.
> >> >> >
> >> >> > This patch might also help understading the problem more:
> >> >> >
> >> >> >
> >> >> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> >> >> > index b7516c0..5eaeca2 100644
> >> >> > --- a/xen/arch/arm/gic.c
> >> >> > +++ b/xen/arch/arm/gic.c
> >> >> > @@ -717,7 +717,12 @@ static void gic_restore_pending_irqs(struct vcpu *v)
> >> >> >      list_for_each_entry_safe ( p, t, &v->arch.vgic.lr_pending, lr_queue )
> >> >> >      {
> >> >> >          i = find_first_zero_bit(&this_cpu(lr_mask), nr_lrs);
> >> >> > -        if ( i >= nr_lrs ) return;
> >> >> > +        if ( i >= nr_lrs )
> >> >> > +        {
> >> >> > +            gdprintk(XENLOG_DEBUG, "LRs full, not injecting irq=%u into d%dv%d\n",
> >> >> > +                    p->irq, v->domain->domain_id, v->vcpu_id);
> >> >> > +            continue;
> >> >> > +        }
> >> >> >
> >> >> >          spin_lock_irqsave(&gic.lock, flags);
> >> >> >          gic_set_lr(i, p, GICH_LR_PENDING);
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> > On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote:
> >> >> >> Hi Stefano,
> >> >> >>
> >> >> >> No hangs with this change.
> >> >> >> Complete log is the following:
> >> >> >>
> >> >> >> U-Boot SPL 2013.10-00499-g062782f (Oct 14 2014 - 11:36:26)
> >> >> >> DRA752 ES1.0
> >> >> >> <ethaddr> not set. Validating first E-fuse MAC
> >> >> >> cpsw
> >> >> >> - UART enabled -
> >> >> >> - CPU 00000000 booting -
> >> >> >> - Xen starting in Hyp mode -
> >> >> >> - Zero BSS -
> >> >> >> - Setting up control registers -
> >> >> >> - Turning on paging -
> >> >> >> - Ready -
> >> >> >> (XEN) Checking for initrd in /chosen
> >> >> >> (XEN) RAM: 0000000080000000 - 000000009fffffff
> >> >> >> (XEN) RAM: 00000000a0000000 - 00000000bfffffff
> >> >> >> (XEN) RAM: 00000000c0000000 - 00000000dfffffff
> >> >> >> (XEN)
> >> >> >> (XEN) MODULE[1]: 00000000c2000000 - 00000000c20069aa
> >> >> >> (XEN) MODULE[2]: 00000000c0000000 - 00000000c2000000
> >> >> >> (XEN) MODULE[3]: 0000000000000000 - 0000000000000000
> >> >> >> (XEN) MODULE[4]: 00000000c3000000 - 00000000c3010000
> >> >> >> (XEN)  RESVD[0]: 00000000ba300000 - 00000000bfd00000
> >> >> >> (XEN)  RESVD[1]: 0000000095800000 - 0000000095900000
> >> >> >> (XEN)  RESVD[2]: 0000000098a00000 - 0000000098b00000
> >> >> >> (XEN)  RESVD[3]: 0000000095f00000 - 0000000098a00000
> >> >> >> (XEN)  RESVD[4]: 0000000095900000 - 0000000095f00000
> >> >> >> (XEN)
> >> >> >> (XEN) Command line: dom0_mem=128M console=dtuart dtuart=serial0
> >> >> >> dom0_max_vcpus=2 bootscrub=0 flask_enforcing=1
> >> >> >> (XEN) Placing Xen at 0x00000000dfe00000-0x00000000e0000000
> >> >> >> (XEN) Xen heap: 00000000d2000000-00000000de000000 (49152 pages)
> >> >> >> (XEN) Dom heap: 344064 pages
> >> >> >> (XEN) Domain heap initialised
> >> >> >> (XEN) Looking for UART console serial0
> >> >> >>  Xen 4.5-unstable
> >> >> >> (XEN) Xen version 4.5-unstable (atseglytskyi@)
> >> >> >> (arm-linux-gnueabihf-gcc (crosstool-NG
> >> >> >> linaro-1.13.1-4.7-2013.04-20130415 - Linaro GCC 2013.04) 4.7.3
> >> >> >> 20130328 (prerelease)) debu4
> >> >> >> (XEN) Latest ChangeSet: Thu Jul 3 12:55:26 2014 +0300 git:3ee354f-dirty
> >> >> >> (XEN) Processor: 412fc0f2: "ARM Limited", variant: 0x2, part 0xc0f, rev 0x2
> >> >> >> (XEN) 32-bit Execution:
> >> >> >> (XEN)   Processor Features: 00001131:00011011
> >> >> >> (XEN)     Instruction Sets: AArch32 Thumb Thumb-2 ThumbEE Jazelle
> >> >> >> (XEN)     Extensions: GenericTimer Security
> >> >> >> (XEN)   Debug Features: 02010555
> >> >> >> (XEN)   Auxiliary Features: 00000000
> >> >> >> (XEN)   Memory Model Features: 10201105 20000000 01240000 02102211
> >> >> >> (XEN)  ISA Features: 02101110 13112111 21232041 11112131 10011142 00000000
> >> >> >> (XEN) Platform: TI DRA7
> >> >> >> (XEN) /psci method must be smc, but is: "hvc"
> >> >> >> (XEN) Set AuxCoreBoot1 to 00000000dfe0004c (0020004c)
> >> >> >> (XEN) Set AuxCoreBoot0 to 0x20
> >> >> >> (XEN) Generic Timer IRQ: phys=30 hyp=26 virt=27
> >> >> >> (XEN) Using generic timer at 6144 KHz
> >> >> >> (XEN) GIC initialization:
> >> >> >> (XEN)         gic_dist_addr=0000000048211000
> >> >> >> (XEN)         gic_cpu_addr=0000000048212000
> >> >> >> (XEN)         gic_hyp_addr=0000000048214000
> >> >> >> (XEN)         gic_vcpu_addr=0000000048216000
> >> >> >> (XEN)         gic_maintenance_irq=25
> >> >> >> (XEN) GIC: 192 lines, 2 cpus, secure (IID 0000043b).
> >> >> >> (XEN) Using scheduler: SMP Credit Scheduler (credit)
> >> >> >> (XEN) I/O virtualisation disabled
> >> >> >> (XEN) Allocated console ring of 16 KiB.
> >> >> >> (XEN) VFP implementer 0x41 architecture 4 part 0x30 variant 0xf rev 0x0
> >> >> >> (XEN) Bringing up CPU1
> >> >> >> - CPU 00000001 booting -
> >> >> >> - Xen starting in Hyp mode -
> >> >> >> - Setting up control registers -
> >> >> >> - Turning on paging -
> >> >> >> - Ready -
> >> >> >> (XEN) CPU 1 booted.
> >> >> >> (XEN) Brought up 2 CPUs
> >> >> >> (XEN) *** LOADING DOMAIN 0 ***
> >> >> >> (XEN) Loading kernel from boot module 2
> >> >> >> (XEN) Populate P2M 0xc8000000->0xd0000000 (1:1 mapping for dom0)
> >> >> >> (XEN) Loading zImage from 00000000c0000040 to 00000000cfc00000-00000000cff50c48
> >> >> >> (XEN) Loading dom0 DTB to 0x00000000cfa00000-0x00000000cfa05ba8
> >> >> >> (XEN) Std. Loglevel: All
> >> >> >> (XEN) Guest Loglevel: All
> >> >> >> (XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch
> >> >> >> input to Xen)
> >> >> >> (XEN) Freed 272kB init memory.
> >> >> >> (XEN) gic.c:673:d0v0 trying to inject irq=2 into d0v0, when it is
> >> >> >> already pending in LR0
> >> >> >> (XEN) gic.c:673:d0v0 trying to inject irq=2 into d0v0, when it is
> >> >> >> already pending in LR0
> >> >> >> [    0.000000] /cpus/cpu@0 missing clock-frequency property
> >> >> >> [    0.000000] /cpus/cpu@1 missing clock-frequency property
> >> >> >> [    0.140625] omap-gpmc omap-gpmc: failed to reserve memory
> >> >> >> [    0.187500] omap_l3_noc ocp.3: couldn't find resource 2
> >> >> >> [    0.273437] i2c i2c-1: of_i2c: invalid reg on
> >> >> >> /ocp/i2c@48072000/camera_ov10635
> >> >> >> [    0.437500] ldo3: operation not allowed
> >> >> >> [    0.437500] omapdss HDMI error: can't set the voltage regulator
> >> >> >> [    0.468750] tfc_s9700 display0: tfc_s9700_probe probe
> >> >> >> [    0.468750] ov1063x 1-0030: No deserializer node found
> >> >> >> [    0.468750] ov1063x 1-0030: No serializer node found
> >> >> >> [    0.468750] ov1063x 1-0030: Failed writing register 0x0103!
> >> >> >> [    0.468750] dra7xx-vip vip1-0: Waiting for I2C subdevice 30
> >> >> >> [    0.578125] ahci ahci.0.auto: can't get clock
> >> >> >> [    0.898437] ldc_module_init
> >> >> >> [    1.304687] Missing dual_emac_res_vlan in DT.
> >> >> >> [    1.304687] Using 1 as Reserved VLAN for 0 slave
> >> >> >> [    1.312500] Missing dual_emac_res_vlan in DT.
> >> >> >> [    1.320312] Using 2 as Reserved VLAN for 1 slave
> >> >> >> [    1.382812] Freeing init memory: 236K
> >> >> >> sh: write error: No such device
> >> >> >> Cannot identify '/dev/camera0': 2, No such file or directory
> >> >> >> Parsing config from /xen/images/DomUAndroid.cfg
> >> >> >> XSM Disabled: seclabel not supported
> >> >> >> (XEN) do_physdev_op 16 cmd=13: not implemented yet
> >> >> >> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give
> >> >> >> dom1 access to irq 53: Function not implemented
> >> >> >> (XEN) do_physdev_op 16 cmd=13: not implemented yet
> >> >> >> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give
> >> >> >> dom1 access to irq 71: Function not implemented
> >> >> >> (XEN) do_physdev_op 16 cmd=13: not implemented yet
> >> >> >> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give
> >> >> >> dom1 access to irq 173: Function not implemented
> >> >> >> (XEN) do_physdev_op 16 cmd=13: not implemented yet
> >> >> >> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give
> >> >> >> dom1 access to irq 174: Function not implemented
> >> >> >> Turning on vfb in domain 1
> >> >> >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is
> >> >> >> still lr_pending
> >> >> >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is
> >> >> >> still lr_pending
> >> >> >> Parsing config from /xen/images/DomUQNX.cfg
> >> >> >> XSM Disabled: seclabel not supported(XEN) gic.c:617:d0v1 trying to
> >> >> >> inject irq=2 into d0v0, when it is still lr_pending
> >> >> >>
> >> >> >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is
> >> >> >> still lr_pending
> >> >> >> [    4.304687] dra7-evm-sound sound.17: cpu dai node is invalid
> >> >> >> [    4.312500] dra7-evm-sound sound.17: failed to add bluetooth dai link -22
> >> >> >> xc: error: panic: xc_dom_core.c:644: xc_dom_find_loader: no loader
> >> >> >> found: Invalid kernel
> >> >> >> libxl: error: libxl_dom.c:436:libxl__build_pv: xc_dom_parse_image
> >> >> >> failed: No such file or directory
> >> >> >> libxl: error: libxl_create.c:1030:domcreate_rebuild_done: cannot
> >> >> >> (re-)build domain: -3
> >> >> >> (XEN) gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is
> >> >> >> still lr_pending
> >> >> >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is
> >> >> >> still lr_pending
> >> >> >> Turning on 'vsnd' in domain '1' (dev_id: '0')
> >> >> >> Turning on vkbd in domain 1
> >> >> >> (XEN) gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is
> >> >> >> still lr_pending
> >> >> >> (XEN) gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is
> >> >> >> still lr_pending
> >> >> >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is
> >> >> >> still lr_pending
> >> >> >>
> >> >> >> Please press Enter to activate this console. (XEN) gic.c:617:d0v1
> >> >> >> trying to inject irq=2 into d0v0, when it is still lr_pending
> >> >> >>
> >> >> >> On Tue, Nov 18, 2014 at 6:18 PM, Andrii Tseglytskyi
> >> >> >> <andrii.tseglytskyi@globallogic.com> wrote:
> >> >> >> > OK got it. Give me a few mins
> >> >> >> >
> >> >> >> > On Tue, Nov 18, 2014 at 6:14 PM, Stefano Stabellini
> >> >> >> > <stefano.stabellini@eu.citrix.com> wrote:
> >> >> >> >> It is not the same: I would like to set GICH_V2_LR_MAINTENANCE_IRQ only
> >> >> >> >> for non-hardware irqs (desc == NULL) and keep avoiding
> >> >> >> >> GICH_V2_LR_MAINTENANCE_IRQ and setting GICH_LR_HW for hardware irqs.
> >> >> >> >>
> >> >> >> >> Also testing on 394b7e587b05d0f4a5fd6f067b38339ab5a77121 would avoid
> >> >> >> >> other potential bugs introduced later.
> >> >> >> >>
> >> >> >> >> On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote:
> >> >> >> >>> What if I try on top of current master branch the following code:
> >> >> >> >>>
> >> >> >> >>> diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
> >> >> >> >>> index 31fb81a..6764ab7 100644
> >> >> >> >>> --- a/xen/arch/arm/gic-v2.c
> >> >> >> >>> +++ b/xen/arch/arm/gic-v2.c
> >> >> >> >>> @@ -36,6 +36,8 @@
> >> >> >> >>>  #include <asm/io.h>
> >> >> >> >>>  #include <asm/gic.h>
> >> >> >> >>>
> >> >> >> >>> +#define GIC_DEBUG 1
> >> >> >> >>> +
> >> >> >> >>>  /*
> >> >> >> >>>   * LR register definitions are GIC v2 specific.
> >> >> >> >>>   * Moved these definitions from header file to here
> >> >> >> >>> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> >> >> >> >>> index bcaded9..c03d6a6 100644
> >> >> >> >>> --- a/xen/arch/arm/gic.c
> >> >> >> >>> +++ b/xen/arch/arm/gic.c
> >> >> >> >>> @@ -41,7 +41,7 @@ static DEFINE_PER_CPU(uint64_t, lr_mask);
> >> >> >> >>>
> >> >> >> >>>  #define lr_all_full() (this_cpu(lr_mask) == ((1 <<
> >> >> >> >>> gic_hw_ops->info->nr_lrs) - 1))
> >> >> >> >>>
> >> >> >> >>> -#undef GIC_DEBUG
> >> >> >> >>> +#define GIC_DEBUG 1
> >> >> >> >>>
> >> >> >> >>>  static void gic_update_one_lr(struct vcpu *v, int i);
> >> >> >> >>>
> >> >> >> >>> It is equivalent to what you proposing - my code contains
> >> >> >> >>> PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI, as result the following lone will
> >> >> >> >>> be executed:
> >> >> >> >>>  lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ; inside gicv2_update_lr() function
> >> >> >> >>>
> >> >> >> >>> regards,
> >> >> >> >>> Andrii
> >> >> >> >>>
> >> >> >> >>> On Tue, Nov 18, 2014 at 5:39 PM, Stefano Stabellini
> >> >> >> >>> <stefano.stabellini@eu.citrix.com> wrote:
> >> >> >> >>> > On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote:
> >> >> >> >>> >> OK, I see that GICH_V2_LR_MAINTENANCE_IRQ must always be set and
> >> >> >> >>> >> everything works fine
> >> >> >> >>> >> The following 2 patches fixes xen/master for my platform.
> >> >> >> >>> >>
> >> >> >> >>> >> Stefano, could you please take a look to these changes?
> >> >> >> >>> >>
> >> >> >> >>> >> commit 3628a0aa35706a8f532af865ed784536ce514eca
> >> >> >> >>> >> Author: Andrii Tseglytskyi <andrii.tseglytskyi@globallogic.com>
> >> >> >> >>> >> Date:   Tue Nov 18 14:20:42 2014 +0200
> >> >> >> >>> >>
> >> >> >> >>> >>     xen/arm: dra7: always set GICH_V2_LR_MAINTENANCE_IRQ flag
> >> >> >> >>> >>
> >> >> >> >>> >>     Change-Id: Ia380b3507a182b11592588f65fd23693d4f87434
> >> >> >> >>> >>     Signed-off-by: Andrii Tseglytskyi <andrii.tseglytskyi@globallogic.com>
> >> >> >> >>> >>
> >> >> >> >>> >> diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
> >> >> >> >>> >> index 31fb81a..093ecdb 100644
> >> >> >> >>> >> --- a/xen/arch/arm/gic-v2.c
> >> >> >> >>> >> +++ b/xen/arch/arm/gic-v2.c
> >> >> >> >>> >> @@ -396,13 +396,14 @@ static void gicv2_update_lr(int lr, const struct
> >> >> >> >>> >> pending_irq *p,
> >> >> >> >>> >>                                               << GICH_V2_LR_PRIORITY_SHIFT) |
> >> >> >> >>> >>                ((p->irq & GICH_V2_LR_VIRTUAL_MASK) <<
> >> >> >> >>> >> GICH_V2_LR_VIRTUAL_SHIFT));
> >> >> >> >>> >>
> >> >> >> >>> >> -    if ( p->desc != NULL )
> >> >> >> >>> >> +    if ( platform_has_quirk(PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI) )
> >> >> >> >>> >>      {
> >> >> >> >>> >> -        if ( platform_has_quirk(PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI) )
> >> >> >> >>> >> -            lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ;
> >> >> >> >>> >> -        else
> >> >> >> >>> >> -            lr_reg |= GICH_V2_LR_HW | ((p->desc->irq &
> >> >> >> >>> >> GICH_V2_LR_PHYSICAL_MASK )
> >> >> >> >>> >> -                            << GICH_V2_LR_PHYSICAL_SHIFT);
> >> >> >> >>> >> +        lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ;
> >> >> >> >>> >> +    }
> >> >> >> >>> >> +    else if ( p->desc != NULL )
> >> >> >> >>> >> +    {
> >> >> >> >>> >> +        lr_reg |= GICH_V2_LR_HW | ((p->desc->irq & GICH_V2_LR_PHYSICAL_MASK )
> >> >> >> >>> >> +                       << GICH_V2_LR_PHYSICAL_SHIFT);
> >> >> >> >>> >>      }
> >> >> >> >>> >>
> >> >> >> >>> >>      writel_gich(lr_reg, GICH_LR + lr * 4);
> >> >> >> >>> >
> >> >> >> >>> > Actually in case p->desc == NULL (the irq is not an hardware irq, it
> >> >> >> >>> > could be the virtual timer irq or the evtchn irq), you shouldn't need
> >> >> >> >>> > the maintenance interrupt, if the bug was really due to GICH_LR_HW not
> >> >> >> >>> > working correctly on OMAP5. This changes might only be better at
> >> >> >> >>> > "hiding" the real issue.
> >> >> >> >>> >
> >> >> >> >>> > Maybe the problem is exactly the opposite: the new scheme for avoiding
> >> >> >> >>> > maintenance interrupts doesn't work for software interrupts.
> >> >> >> >>> > The commit that should make them work correctly after the
> >> >> >> >>> > no-maintenance-irq commit is 394b7e587b05d0f4a5fd6f067b38339ab5a77121
> >> >> >> >>> > If you look at the changes to gic_update_one_lr in that commit, you'll
> >> >> >> >>> > see that is going to set a software irq as PENDING if it is already ACTIVE.
> >> >> >> >>> > Maybe that doesn't work correctly on OMAP5.
> >> >> >> >>> >
> >> >> >> >>> > Could you try this patch on top of
> >> >> >> >>> > 394b7e587b05d0f4a5fd6f067b38339ab5a77121?  It should help us understand
> >> >> >> >>> > if the problem is specifically with software irqs.
> >> >> >> >>> >
> >> >> >> >>> >
> >> >> >> >>> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> >> >> >> >>> > index b7516c0..d8a17c9 100644
> >> >> >> >>> > --- a/xen/arch/arm/gic.c
> >> >> >> >>> > +++ b/xen/arch/arm/gic.c
> >> >> >> >>> > @@ -66,7 +66,7 @@ static DEFINE_PER_CPU(u8, gic_cpu_id);
> >> >> >> >>> >  /* Maximum cpu interface per GIC */
> >> >> >> >>> >  #define NR_GIC_CPU_IF 8
> >> >> >> >>> >
> >> >> >> >>> > -#undef GIC_DEBUG
> >> >> >> >>> > +#define GIC_DEBUG 1
> >> >> >> >>> >
> >> >> >> >>> >  static void gic_update_one_lr(struct vcpu *v, int i);
> >> >> >> >>> >
> >> >> >> >>> > @@ -563,6 +563,8 @@ static inline void gic_set_lr(int lr, struct pending_irq *p,
> >> >> >> >>> >          ((p->irq & GICH_LR_VIRTUAL_MASK) << GICH_LR_VIRTUAL_SHIFT);
> >> >> >> >>> >      if ( p->desc != NULL )
> >> >> >> >>> >          lr_val |= GICH_LR_HW | (p->desc->irq << GICH_LR_PHYSICAL_SHIFT);
> >> >> >> >>> > +    else
> >> >> >> >>> > +        lr_val |= GICH_LR_MAINTENANCE_IRQ;
> >> >> >> >>> >
> >> >> >> >>> >      GICH[GICH_LR + lr] = lr_val;
> >> >> >> >>> >
> >> >> >> >>>
> >> >> >> >>>
> >> >> >> >>>
> >> >> >> >>> --
> >> >> >> >>>
> >> >> >> >>> Andrii Tseglytskyi | Embedded Dev
> >> >> >> >>> GlobalLogic
> >> >> >> >>> www.globallogic.com
> >> >> >> >>>
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> > --
> >> >> >> >
> >> >> >> > Andrii Tseglytskyi | Embedded Dev
> >> >> >> > GlobalLogic
> >> >> >> > www.globallogic.com
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> --
> >> >> >>
> >> >> >> Andrii Tseglytskyi | Embedded Dev
> >> >> >> GlobalLogic
> >> >> >> www.globallogic.com
> >> >> >>
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >>
> >> >> Andrii Tseglytskyi | Embedded Dev
> >> >> GlobalLogic
> >> >> www.globallogic.com
> >> >>
> >>
> >>
> >>
> >> --
> >>
> >> Andrii Tseglytskyi | Embedded Dev
> >> GlobalLogic
> >> www.globallogic.com
> >>
> 
> 
> 
> -- 
> 
> Andrii Tseglytskyi | Embedded Dev
> GlobalLogic
> www.globallogic.com
> 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Xen 4.5 random freeze question
  2014-11-19 11:42                                                 ` Stefano Stabellini
  2014-11-19 11:57                                                   ` Andrii Tseglytskyi
@ 2014-11-19 12:13                                                   ` Ian Campbell
  2014-11-19 12:17                                                     ` Stefano Stabellini
  1 sibling, 1 reply; 66+ messages in thread
From: Ian Campbell @ 2014-11-19 12:13 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: Julien Grall, xen-devel, Andrii Tseglytskyi

On Wed, 2014-11-19 at 11:42 +0000, Stefano Stabellini wrote:
> So it looks like there is not actually anything wrong, is just that you
> have too much inflight irqs? It should cause problems because in that
> case GICH_HCR_UIE should be set and you should get a maintenance
> interrupt when LRs become available (actually when "none, or only one,
> of the List register entries is marked as a valid interrupt").
> 
> Maybe GICH_HCR_UIE is the one that doesn't work properly.

How much testing did this aspect get when the no-maint-irq series
originally went in? Did you manage to find a workload which filled all
the LRs or try artificially limiting the number of LRs somehow in order
to provoke it?

I ask because my intuition is that this won't happen very much, meaning
those code paths may not be as well tested...



>  It might be
> worth checking that you are receiving maintenance interrupts:
> 
> 
> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> index b7516c0..b3eaa44 100644
> --- a/xen/arch/arm/gic.c
> +++ b/xen/arch/arm/gic.c
> @@ -868,6 +868,8 @@ static void maintenance_interrupt(int irq, void *dev_id, struct cpu_user_regs *r
>       * on return to guest that is going to clear the old LRs and inject
>       * new interrupts.
>       */
> +    
> +    gdprintk(XENLOG_DEBUG, "maintenance interrupt\n");
>  }
>  
>  void gic_dump_info(struct vcpu *v)
> 
>  
> You could also try to replace GICH_HCR_UIE with GICH_HCR_NPIE, you
> should still be receiving maintenance interrupts when one or more LRs
> become available.
> 
> 
> > >
> > > I doubt you have so much interrupt traffic to actually fill all the LRs,
> > > so I am thinking that a few LRs might not be cleared properly (that
> > > should happen on hypervisor entry, gic_update_one_lr should take care of
> > > it).
> > 
> > This actually explains why this happens during domU start - SGI
> > traffic might be very heavy this time
> > 
> > >
> > >
> > >> Regards,
> > >> Andrii
> > >>
> > >> On Tue, Nov 18, 2014 at 7:51 PM, Stefano Stabellini
> > >> <stefano.stabellini@eu.citrix.com> wrote:
> > >> > Hello Andrii,
> > >> > we are getting closer :-)
> > >> >
> > >> > It would help if you post the output with GIC_DEBUG defined but without
> > >> > the other change that "fixes" the issue.
> > >> >
> > >> > I think the problem is probably due to software irqs.
> > >> > You are getting too many
> > >> >
> > >> > gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is still lr_pending
> > >> >
> > >> > messages. That means you are loosing virtual SGIs (guest VCPU to guest
> > >> > VCPU). It would be best to investigate why, especially if you get many
> > >> > more of the same messages without the MAINTENANCE_IRQ change I
> > >> > suggested.
> > >> >
> > >> > This patch might also help understading the problem more:
> > >> >
> > >> >
> > >> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> > >> > index b7516c0..5eaeca2 100644
> > >> > --- a/xen/arch/arm/gic.c
> > >> > +++ b/xen/arch/arm/gic.c
> > >> > @@ -717,7 +717,12 @@ static void gic_restore_pending_irqs(struct vcpu *v)
> > >> >      list_for_each_entry_safe ( p, t, &v->arch.vgic.lr_pending, lr_queue )
> > >> >      {
> > >> >          i = find_first_zero_bit(&this_cpu(lr_mask), nr_lrs);
> > >> > -        if ( i >= nr_lrs ) return;
> > >> > +        if ( i >= nr_lrs )
> > >> > +        {
> > >> > +            gdprintk(XENLOG_DEBUG, "LRs full, not injecting irq=%u into d%dv%d\n",
> > >> > +                    p->irq, v->domain->domain_id, v->vcpu_id);
> > >> > +            continue;
> > >> > +        }
> > >> >
> > >> >          spin_lock_irqsave(&gic.lock, flags);
> > >> >          gic_set_lr(i, p, GICH_LR_PENDING);
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote:
> > >> >> Hi Stefano,
> > >> >>
> > >> >> No hangs with this change.
> > >> >> Complete log is the following:
> > >> >>
> > >> >> U-Boot SPL 2013.10-00499-g062782f (Oct 14 2014 - 11:36:26)
> > >> >> DRA752 ES1.0
> > >> >> <ethaddr> not set. Validating first E-fuse MAC
> > >> >> cpsw
> > >> >> - UART enabled -
> > >> >> - CPU 00000000 booting -
> > >> >> - Xen starting in Hyp mode -
> > >> >> - Zero BSS -
> > >> >> - Setting up control registers -
> > >> >> - Turning on paging -
> > >> >> - Ready -
> > >> >> (XEN) Checking for initrd in /chosen
> > >> >> (XEN) RAM: 0000000080000000 - 000000009fffffff
> > >> >> (XEN) RAM: 00000000a0000000 - 00000000bfffffff
> > >> >> (XEN) RAM: 00000000c0000000 - 00000000dfffffff
> > >> >> (XEN)
> > >> >> (XEN) MODULE[1]: 00000000c2000000 - 00000000c20069aa
> > >> >> (XEN) MODULE[2]: 00000000c0000000 - 00000000c2000000
> > >> >> (XEN) MODULE[3]: 0000000000000000 - 0000000000000000
> > >> >> (XEN) MODULE[4]: 00000000c3000000 - 00000000c3010000
> > >> >> (XEN)  RESVD[0]: 00000000ba300000 - 00000000bfd00000
> > >> >> (XEN)  RESVD[1]: 0000000095800000 - 0000000095900000
> > >> >> (XEN)  RESVD[2]: 0000000098a00000 - 0000000098b00000
> > >> >> (XEN)  RESVD[3]: 0000000095f00000 - 0000000098a00000
> > >> >> (XEN)  RESVD[4]: 0000000095900000 - 0000000095f00000
> > >> >> (XEN)
> > >> >> (XEN) Command line: dom0_mem=128M console=dtuart dtuart=serial0
> > >> >> dom0_max_vcpus=2 bootscrub=0 flask_enforcing=1
> > >> >> (XEN) Placing Xen at 0x00000000dfe00000-0x00000000e0000000
> > >> >> (XEN) Xen heap: 00000000d2000000-00000000de000000 (49152 pages)
> > >> >> (XEN) Dom heap: 344064 pages
> > >> >> (XEN) Domain heap initialised
> > >> >> (XEN) Looking for UART console serial0
> > >> >>  Xen 4.5-unstable
> > >> >> (XEN) Xen version 4.5-unstable (atseglytskyi@)
> > >> >> (arm-linux-gnueabihf-gcc (crosstool-NG
> > >> >> linaro-1.13.1-4.7-2013.04-20130415 - Linaro GCC 2013.04) 4.7.3
> > >> >> 20130328 (prerelease)) debu4
> > >> >> (XEN) Latest ChangeSet: Thu Jul 3 12:55:26 2014 +0300 git:3ee354f-dirty
> > >> >> (XEN) Processor: 412fc0f2: "ARM Limited", variant: 0x2, part 0xc0f, rev 0x2
> > >> >> (XEN) 32-bit Execution:
> > >> >> (XEN)   Processor Features: 00001131:00011011
> > >> >> (XEN)     Instruction Sets: AArch32 Thumb Thumb-2 ThumbEE Jazelle
> > >> >> (XEN)     Extensions: GenericTimer Security
> > >> >> (XEN)   Debug Features: 02010555
> > >> >> (XEN)   Auxiliary Features: 00000000
> > >> >> (XEN)   Memory Model Features: 10201105 20000000 01240000 02102211
> > >> >> (XEN)  ISA Features: 02101110 13112111 21232041 11112131 10011142 00000000
> > >> >> (XEN) Platform: TI DRA7
> > >> >> (XEN) /psci method must be smc, but is: "hvc"
> > >> >> (XEN) Set AuxCoreBoot1 to 00000000dfe0004c (0020004c)
> > >> >> (XEN) Set AuxCoreBoot0 to 0x20
> > >> >> (XEN) Generic Timer IRQ: phys=30 hyp=26 virt=27
> > >> >> (XEN) Using generic timer at 6144 KHz
> > >> >> (XEN) GIC initialization:
> > >> >> (XEN)         gic_dist_addr=0000000048211000
> > >> >> (XEN)         gic_cpu_addr=0000000048212000
> > >> >> (XEN)         gic_hyp_addr=0000000048214000
> > >> >> (XEN)         gic_vcpu_addr=0000000048216000
> > >> >> (XEN)         gic_maintenance_irq=25
> > >> >> (XEN) GIC: 192 lines, 2 cpus, secure (IID 0000043b).
> > >> >> (XEN) Using scheduler: SMP Credit Scheduler (credit)
> > >> >> (XEN) I/O virtualisation disabled
> > >> >> (XEN) Allocated console ring of 16 KiB.
> > >> >> (XEN) VFP implementer 0x41 architecture 4 part 0x30 variant 0xf rev 0x0
> > >> >> (XEN) Bringing up CPU1
> > >> >> - CPU 00000001 booting -
> > >> >> - Xen starting in Hyp mode -
> > >> >> - Setting up control registers -
> > >> >> - Turning on paging -
> > >> >> - Ready -
> > >> >> (XEN) CPU 1 booted.
> > >> >> (XEN) Brought up 2 CPUs
> > >> >> (XEN) *** LOADING DOMAIN 0 ***
> > >> >> (XEN) Loading kernel from boot module 2
> > >> >> (XEN) Populate P2M 0xc8000000->0xd0000000 (1:1 mapping for dom0)
> > >> >> (XEN) Loading zImage from 00000000c0000040 to 00000000cfc00000-00000000cff50c48
> > >> >> (XEN) Loading dom0 DTB to 0x00000000cfa00000-0x00000000cfa05ba8
> > >> >> (XEN) Std. Loglevel: All
> > >> >> (XEN) Guest Loglevel: All
> > >> >> (XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch
> > >> >> input to Xen)
> > >> >> (XEN) Freed 272kB init memory.
> > >> >> (XEN) gic.c:673:d0v0 trying to inject irq=2 into d0v0, when it is
> > >> >> already pending in LR0
> > >> >> (XEN) gic.c:673:d0v0 trying to inject irq=2 into d0v0, when it is
> > >> >> already pending in LR0
> > >> >> [    0.000000] /cpus/cpu@0 missing clock-frequency property
> > >> >> [    0.000000] /cpus/cpu@1 missing clock-frequency property
> > >> >> [    0.140625] omap-gpmc omap-gpmc: failed to reserve memory
> > >> >> [    0.187500] omap_l3_noc ocp.3: couldn't find resource 2
> > >> >> [    0.273437] i2c i2c-1: of_i2c: invalid reg on
> > >> >> /ocp/i2c@48072000/camera_ov10635
> > >> >> [    0.437500] ldo3: operation not allowed
> > >> >> [    0.437500] omapdss HDMI error: can't set the voltage regulator
> > >> >> [    0.468750] tfc_s9700 display0: tfc_s9700_probe probe
> > >> >> [    0.468750] ov1063x 1-0030: No deserializer node found
> > >> >> [    0.468750] ov1063x 1-0030: No serializer node found
> > >> >> [    0.468750] ov1063x 1-0030: Failed writing register 0x0103!
> > >> >> [    0.468750] dra7xx-vip vip1-0: Waiting for I2C subdevice 30
> > >> >> [    0.578125] ahci ahci.0.auto: can't get clock
> > >> >> [    0.898437] ldc_module_init
> > >> >> [    1.304687] Missing dual_emac_res_vlan in DT.
> > >> >> [    1.304687] Using 1 as Reserved VLAN for 0 slave
> > >> >> [    1.312500] Missing dual_emac_res_vlan in DT.
> > >> >> [    1.320312] Using 2 as Reserved VLAN for 1 slave
> > >> >> [    1.382812] Freeing init memory: 236K
> > >> >> sh: write error: No such device
> > >> >> Cannot identify '/dev/camera0': 2, No such file or directory
> > >> >> Parsing config from /xen/images/DomUAndroid.cfg
> > >> >> XSM Disabled: seclabel not supported
> > >> >> (XEN) do_physdev_op 16 cmd=13: not implemented yet
> > >> >> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give
> > >> >> dom1 access to irq 53: Function not implemented
> > >> >> (XEN) do_physdev_op 16 cmd=13: not implemented yet
> > >> >> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give
> > >> >> dom1 access to irq 71: Function not implemented
> > >> >> (XEN) do_physdev_op 16 cmd=13: not implemented yet
> > >> >> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give
> > >> >> dom1 access to irq 173: Function not implemented
> > >> >> (XEN) do_physdev_op 16 cmd=13: not implemented yet
> > >> >> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give
> > >> >> dom1 access to irq 174: Function not implemented
> > >> >> Turning on vfb in domain 1
> > >> >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is
> > >> >> still lr_pending
> > >> >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is
> > >> >> still lr_pending
> > >> >> Parsing config from /xen/images/DomUQNX.cfg
> > >> >> XSM Disabled: seclabel not supported(XEN) gic.c:617:d0v1 trying to
> > >> >> inject irq=2 into d0v0, when it is still lr_pending
> > >> >>
> > >> >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is
> > >> >> still lr_pending
> > >> >> [    4.304687] dra7-evm-sound sound.17: cpu dai node is invalid
> > >> >> [    4.312500] dra7-evm-sound sound.17: failed to add bluetooth dai link -22
> > >> >> xc: error: panic: xc_dom_core.c:644: xc_dom_find_loader: no loader
> > >> >> found: Invalid kernel
> > >> >> libxl: error: libxl_dom.c:436:libxl__build_pv: xc_dom_parse_image
> > >> >> failed: No such file or directory
> > >> >> libxl: error: libxl_create.c:1030:domcreate_rebuild_done: cannot
> > >> >> (re-)build domain: -3
> > >> >> (XEN) gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is
> > >> >> still lr_pending
> > >> >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is
> > >> >> still lr_pending
> > >> >> Turning on 'vsnd' in domain '1' (dev_id: '0')
> > >> >> Turning on vkbd in domain 1
> > >> >> (XEN) gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is
> > >> >> still lr_pending
> > >> >> (XEN) gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is
> > >> >> still lr_pending
> > >> >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is
> > >> >> still lr_pending
> > >> >>
> > >> >> Please press Enter to activate this console. (XEN) gic.c:617:d0v1
> > >> >> trying to inject irq=2 into d0v0, when it is still lr_pending
> > >> >>
> > >> >> On Tue, Nov 18, 2014 at 6:18 PM, Andrii Tseglytskyi
> > >> >> <andrii.tseglytskyi@globallogic.com> wrote:
> > >> >> > OK got it. Give me a few mins
> > >> >> >
> > >> >> > On Tue, Nov 18, 2014 at 6:14 PM, Stefano Stabellini
> > >> >> > <stefano.stabellini@eu.citrix.com> wrote:
> > >> >> >> It is not the same: I would like to set GICH_V2_LR_MAINTENANCE_IRQ only
> > >> >> >> for non-hardware irqs (desc == NULL) and keep avoiding
> > >> >> >> GICH_V2_LR_MAINTENANCE_IRQ and setting GICH_LR_HW for hardware irqs.
> > >> >> >>
> > >> >> >> Also testing on 394b7e587b05d0f4a5fd6f067b38339ab5a77121 would avoid
> > >> >> >> other potential bugs introduced later.
> > >> >> >>
> > >> >> >> On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote:
> > >> >> >>> What if I try on top of current master branch the following code:
> > >> >> >>>
> > >> >> >>> diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
> > >> >> >>> index 31fb81a..6764ab7 100644
> > >> >> >>> --- a/xen/arch/arm/gic-v2.c
> > >> >> >>> +++ b/xen/arch/arm/gic-v2.c
> > >> >> >>> @@ -36,6 +36,8 @@
> > >> >> >>>  #include <asm/io.h>
> > >> >> >>>  #include <asm/gic.h>
> > >> >> >>>
> > >> >> >>> +#define GIC_DEBUG 1
> > >> >> >>> +
> > >> >> >>>  /*
> > >> >> >>>   * LR register definitions are GIC v2 specific.
> > >> >> >>>   * Moved these definitions from header file to here
> > >> >> >>> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> > >> >> >>> index bcaded9..c03d6a6 100644
> > >> >> >>> --- a/xen/arch/arm/gic.c
> > >> >> >>> +++ b/xen/arch/arm/gic.c
> > >> >> >>> @@ -41,7 +41,7 @@ static DEFINE_PER_CPU(uint64_t, lr_mask);
> > >> >> >>>
> > >> >> >>>  #define lr_all_full() (this_cpu(lr_mask) == ((1 <<
> > >> >> >>> gic_hw_ops->info->nr_lrs) - 1))
> > >> >> >>>
> > >> >> >>> -#undef GIC_DEBUG
> > >> >> >>> +#define GIC_DEBUG 1
> > >> >> >>>
> > >> >> >>>  static void gic_update_one_lr(struct vcpu *v, int i);
> > >> >> >>>
> > >> >> >>> It is equivalent to what you proposing - my code contains
> > >> >> >>> PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI, as result the following lone will
> > >> >> >>> be executed:
> > >> >> >>>  lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ; inside gicv2_update_lr() function
> > >> >> >>>
> > >> >> >>> regards,
> > >> >> >>> Andrii
> > >> >> >>>
> > >> >> >>> On Tue, Nov 18, 2014 at 5:39 PM, Stefano Stabellini
> > >> >> >>> <stefano.stabellini@eu.citrix.com> wrote:
> > >> >> >>> > On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote:
> > >> >> >>> >> OK, I see that GICH_V2_LR_MAINTENANCE_IRQ must always be set and
> > >> >> >>> >> everything works fine
> > >> >> >>> >> The following 2 patches fixes xen/master for my platform.
> > >> >> >>> >>
> > >> >> >>> >> Stefano, could you please take a look to these changes?
> > >> >> >>> >>
> > >> >> >>> >> commit 3628a0aa35706a8f532af865ed784536ce514eca
> > >> >> >>> >> Author: Andrii Tseglytskyi <andrii.tseglytskyi@globallogic.com>
> > >> >> >>> >> Date:   Tue Nov 18 14:20:42 2014 +0200
> > >> >> >>> >>
> > >> >> >>> >>     xen/arm: dra7: always set GICH_V2_LR_MAINTENANCE_IRQ flag
> > >> >> >>> >>
> > >> >> >>> >>     Change-Id: Ia380b3507a182b11592588f65fd23693d4f87434
> > >> >> >>> >>     Signed-off-by: Andrii Tseglytskyi <andrii.tseglytskyi@globallogic.com>
> > >> >> >>> >>
> > >> >> >>> >> diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
> > >> >> >>> >> index 31fb81a..093ecdb 100644
> > >> >> >>> >> --- a/xen/arch/arm/gic-v2.c
> > >> >> >>> >> +++ b/xen/arch/arm/gic-v2.c
> > >> >> >>> >> @@ -396,13 +396,14 @@ static void gicv2_update_lr(int lr, const struct
> > >> >> >>> >> pending_irq *p,
> > >> >> >>> >>                                               << GICH_V2_LR_PRIORITY_SHIFT) |
> > >> >> >>> >>                ((p->irq & GICH_V2_LR_VIRTUAL_MASK) <<
> > >> >> >>> >> GICH_V2_LR_VIRTUAL_SHIFT));
> > >> >> >>> >>
> > >> >> >>> >> -    if ( p->desc != NULL )
> > >> >> >>> >> +    if ( platform_has_quirk(PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI) )
> > >> >> >>> >>      {
> > >> >> >>> >> -        if ( platform_has_quirk(PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI) )
> > >> >> >>> >> -            lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ;
> > >> >> >>> >> -        else
> > >> >> >>> >> -            lr_reg |= GICH_V2_LR_HW | ((p->desc->irq &
> > >> >> >>> >> GICH_V2_LR_PHYSICAL_MASK )
> > >> >> >>> >> -                            << GICH_V2_LR_PHYSICAL_SHIFT);
> > >> >> >>> >> +        lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ;
> > >> >> >>> >> +    }
> > >> >> >>> >> +    else if ( p->desc != NULL )
> > >> >> >>> >> +    {
> > >> >> >>> >> +        lr_reg |= GICH_V2_LR_HW | ((p->desc->irq & GICH_V2_LR_PHYSICAL_MASK )
> > >> >> >>> >> +                       << GICH_V2_LR_PHYSICAL_SHIFT);
> > >> >> >>> >>      }
> > >> >> >>> >>
> > >> >> >>> >>      writel_gich(lr_reg, GICH_LR + lr * 4);
> > >> >> >>> >
> > >> >> >>> > Actually in case p->desc == NULL (the irq is not an hardware irq, it
> > >> >> >>> > could be the virtual timer irq or the evtchn irq), you shouldn't need
> > >> >> >>> > the maintenance interrupt, if the bug was really due to GICH_LR_HW not
> > >> >> >>> > working correctly on OMAP5. This changes might only be better at
> > >> >> >>> > "hiding" the real issue.
> > >> >> >>> >
> > >> >> >>> > Maybe the problem is exactly the opposite: the new scheme for avoiding
> > >> >> >>> > maintenance interrupts doesn't work for software interrupts.
> > >> >> >>> > The commit that should make them work correctly after the
> > >> >> >>> > no-maintenance-irq commit is 394b7e587b05d0f4a5fd6f067b38339ab5a77121
> > >> >> >>> > If you look at the changes to gic_update_one_lr in that commit, you'll
> > >> >> >>> > see that is going to set a software irq as PENDING if it is already ACTIVE.
> > >> >> >>> > Maybe that doesn't work correctly on OMAP5.
> > >> >> >>> >
> > >> >> >>> > Could you try this patch on top of
> > >> >> >>> > 394b7e587b05d0f4a5fd6f067b38339ab5a77121?  It should help us understand
> > >> >> >>> > if the problem is specifically with software irqs.
> > >> >> >>> >
> > >> >> >>> >
> > >> >> >>> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> > >> >> >>> > index b7516c0..d8a17c9 100644
> > >> >> >>> > --- a/xen/arch/arm/gic.c
> > >> >> >>> > +++ b/xen/arch/arm/gic.c
> > >> >> >>> > @@ -66,7 +66,7 @@ static DEFINE_PER_CPU(u8, gic_cpu_id);
> > >> >> >>> >  /* Maximum cpu interface per GIC */
> > >> >> >>> >  #define NR_GIC_CPU_IF 8
> > >> >> >>> >
> > >> >> >>> > -#undef GIC_DEBUG
> > >> >> >>> > +#define GIC_DEBUG 1
> > >> >> >>> >
> > >> >> >>> >  static void gic_update_one_lr(struct vcpu *v, int i);
> > >> >> >>> >
> > >> >> >>> > @@ -563,6 +563,8 @@ static inline void gic_set_lr(int lr, struct pending_irq *p,
> > >> >> >>> >          ((p->irq & GICH_LR_VIRTUAL_MASK) << GICH_LR_VIRTUAL_SHIFT);
> > >> >> >>> >      if ( p->desc != NULL )
> > >> >> >>> >          lr_val |= GICH_LR_HW | (p->desc->irq << GICH_LR_PHYSICAL_SHIFT);
> > >> >> >>> > +    else
> > >> >> >>> > +        lr_val |= GICH_LR_MAINTENANCE_IRQ;
> > >> >> >>> >
> > >> >> >>> >      GICH[GICH_LR + lr] = lr_val;
> > >> >> >>> >
> > >> >> >>>
> > >> >> >>>
> > >> >> >>>
> > >> >> >>> --
> > >> >> >>>
> > >> >> >>> Andrii Tseglytskyi | Embedded Dev
> > >> >> >>> GlobalLogic
> > >> >> >>> www.globallogic.com
> > >> >> >>>
> > >> >> >
> > >> >> >
> > >> >> >
> > >> >> > --
> > >> >> >
> > >> >> > Andrii Tseglytskyi | Embedded Dev
> > >> >> > GlobalLogic
> > >> >> > www.globallogic.com
> > >> >>
> > >> >>
> > >> >>
> > >> >> --
> > >> >>
> > >> >> Andrii Tseglytskyi | Embedded Dev
> > >> >> GlobalLogic
> > >> >> www.globallogic.com
> > >> >>
> > >>
> > >>
> > >>
> > >> --
> > >>
> > >> Andrii Tseglytskyi | Embedded Dev
> > >> GlobalLogic
> > >> www.globallogic.com
> > >>
> > 
> > 
> > 
> > -- 
> > 
> > Andrii Tseglytskyi | Embedded Dev
> > GlobalLogic
> > www.globallogic.com
> > 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Xen 4.5 random freeze question
  2014-11-19 12:13                                                   ` Ian Campbell
@ 2014-11-19 12:17                                                     ` Stefano Stabellini
  2014-11-19 12:23                                                       ` Julien Grall
  0 siblings, 1 reply; 66+ messages in thread
From: Stefano Stabellini @ 2014-11-19 12:17 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Andrii Tseglytskyi, xen-devel, Julien Grall, Stefano Stabellini

On Wed, 19 Nov 2014, Ian Campbell wrote:
> On Wed, 2014-11-19 at 11:42 +0000, Stefano Stabellini wrote:
> > So it looks like there is not actually anything wrong, is just that you
> > have too much inflight irqs? It should cause problems because in that
> > case GICH_HCR_UIE should be set and you should get a maintenance
> > interrupt when LRs become available (actually when "none, or only one,
> > of the List register entries is marked as a valid interrupt").
> > 
> > Maybe GICH_HCR_UIE is the one that doesn't work properly.
> 
> How much testing did this aspect get when the no-maint-irq series
> originally went in? Did you manage to find a workload which filled all
> the LRs or try artificially limiting the number of LRs somehow in order
> to provoke it?
> 
> I ask because my intuition is that this won't happen very much, meaning
> those code paths may not be as well tested...

I did test it by artificially limiting the number of LRs to 1.
However there have been many iterations of that series and I didn't run
this test at every iteration.

 
> 
> >  It might be
> > worth checking that you are receiving maintenance interrupts:
> > 
> > 
> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> > index b7516c0..b3eaa44 100644
> > --- a/xen/arch/arm/gic.c
> > +++ b/xen/arch/arm/gic.c
> > @@ -868,6 +868,8 @@ static void maintenance_interrupt(int irq, void *dev_id, struct cpu_user_regs *r
> >       * on return to guest that is going to clear the old LRs and inject
> >       * new interrupts.
> >       */
> > +    
> > +    gdprintk(XENLOG_DEBUG, "maintenance interrupt\n");
> >  }
> >  
> >  void gic_dump_info(struct vcpu *v)
> > 
> >  
> > You could also try to replace GICH_HCR_UIE with GICH_HCR_NPIE, you
> > should still be receiving maintenance interrupts when one or more LRs
> > become available.
> > 
> > 
> > > >
> > > > I doubt you have so much interrupt traffic to actually fill all the LRs,
> > > > so I am thinking that a few LRs might not be cleared properly (that
> > > > should happen on hypervisor entry, gic_update_one_lr should take care of
> > > > it).
> > > 
> > > This actually explains why this happens during domU start - SGI
> > > traffic might be very heavy this time
> > > 
> > > >
> > > >
> > > >> Regards,
> > > >> Andrii
> > > >>
> > > >> On Tue, Nov 18, 2014 at 7:51 PM, Stefano Stabellini
> > > >> <stefano.stabellini@eu.citrix.com> wrote:
> > > >> > Hello Andrii,
> > > >> > we are getting closer :-)
> > > >> >
> > > >> > It would help if you post the output with GIC_DEBUG defined but without
> > > >> > the other change that "fixes" the issue.
> > > >> >
> > > >> > I think the problem is probably due to software irqs.
> > > >> > You are getting too many
> > > >> >
> > > >> > gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is still lr_pending
> > > >> >
> > > >> > messages. That means you are loosing virtual SGIs (guest VCPU to guest
> > > >> > VCPU). It would be best to investigate why, especially if you get many
> > > >> > more of the same messages without the MAINTENANCE_IRQ change I
> > > >> > suggested.
> > > >> >
> > > >> > This patch might also help understading the problem more:
> > > >> >
> > > >> >
> > > >> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> > > >> > index b7516c0..5eaeca2 100644
> > > >> > --- a/xen/arch/arm/gic.c
> > > >> > +++ b/xen/arch/arm/gic.c
> > > >> > @@ -717,7 +717,12 @@ static void gic_restore_pending_irqs(struct vcpu *v)
> > > >> >      list_for_each_entry_safe ( p, t, &v->arch.vgic.lr_pending, lr_queue )
> > > >> >      {
> > > >> >          i = find_first_zero_bit(&this_cpu(lr_mask), nr_lrs);
> > > >> > -        if ( i >= nr_lrs ) return;
> > > >> > +        if ( i >= nr_lrs )
> > > >> > +        {
> > > >> > +            gdprintk(XENLOG_DEBUG, "LRs full, not injecting irq=%u into d%dv%d\n",
> > > >> > +                    p->irq, v->domain->domain_id, v->vcpu_id);
> > > >> > +            continue;
> > > >> > +        }
> > > >> >
> > > >> >          spin_lock_irqsave(&gic.lock, flags);
> > > >> >          gic_set_lr(i, p, GICH_LR_PENDING);
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> > On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote:
> > > >> >> Hi Stefano,
> > > >> >>
> > > >> >> No hangs with this change.
> > > >> >> Complete log is the following:
> > > >> >>
> > > >> >> U-Boot SPL 2013.10-00499-g062782f (Oct 14 2014 - 11:36:26)
> > > >> >> DRA752 ES1.0
> > > >> >> <ethaddr> not set. Validating first E-fuse MAC
> > > >> >> cpsw
> > > >> >> - UART enabled -
> > > >> >> - CPU 00000000 booting -
> > > >> >> - Xen starting in Hyp mode -
> > > >> >> - Zero BSS -
> > > >> >> - Setting up control registers -
> > > >> >> - Turning on paging -
> > > >> >> - Ready -
> > > >> >> (XEN) Checking for initrd in /chosen
> > > >> >> (XEN) RAM: 0000000080000000 - 000000009fffffff
> > > >> >> (XEN) RAM: 00000000a0000000 - 00000000bfffffff
> > > >> >> (XEN) RAM: 00000000c0000000 - 00000000dfffffff
> > > >> >> (XEN)
> > > >> >> (XEN) MODULE[1]: 00000000c2000000 - 00000000c20069aa
> > > >> >> (XEN) MODULE[2]: 00000000c0000000 - 00000000c2000000
> > > >> >> (XEN) MODULE[3]: 0000000000000000 - 0000000000000000
> > > >> >> (XEN) MODULE[4]: 00000000c3000000 - 00000000c3010000
> > > >> >> (XEN)  RESVD[0]: 00000000ba300000 - 00000000bfd00000
> > > >> >> (XEN)  RESVD[1]: 0000000095800000 - 0000000095900000
> > > >> >> (XEN)  RESVD[2]: 0000000098a00000 - 0000000098b00000
> > > >> >> (XEN)  RESVD[3]: 0000000095f00000 - 0000000098a00000
> > > >> >> (XEN)  RESVD[4]: 0000000095900000 - 0000000095f00000
> > > >> >> (XEN)
> > > >> >> (XEN) Command line: dom0_mem=128M console=dtuart dtuart=serial0
> > > >> >> dom0_max_vcpus=2 bootscrub=0 flask_enforcing=1
> > > >> >> (XEN) Placing Xen at 0x00000000dfe00000-0x00000000e0000000
> > > >> >> (XEN) Xen heap: 00000000d2000000-00000000de000000 (49152 pages)
> > > >> >> (XEN) Dom heap: 344064 pages
> > > >> >> (XEN) Domain heap initialised
> > > >> >> (XEN) Looking for UART console serial0
> > > >> >>  Xen 4.5-unstable
> > > >> >> (XEN) Xen version 4.5-unstable (atseglytskyi@)
> > > >> >> (arm-linux-gnueabihf-gcc (crosstool-NG
> > > >> >> linaro-1.13.1-4.7-2013.04-20130415 - Linaro GCC 2013.04) 4.7.3
> > > >> >> 20130328 (prerelease)) debu4
> > > >> >> (XEN) Latest ChangeSet: Thu Jul 3 12:55:26 2014 +0300 git:3ee354f-dirty
> > > >> >> (XEN) Processor: 412fc0f2: "ARM Limited", variant: 0x2, part 0xc0f, rev 0x2
> > > >> >> (XEN) 32-bit Execution:
> > > >> >> (XEN)   Processor Features: 00001131:00011011
> > > >> >> (XEN)     Instruction Sets: AArch32 Thumb Thumb-2 ThumbEE Jazelle
> > > >> >> (XEN)     Extensions: GenericTimer Security
> > > >> >> (XEN)   Debug Features: 02010555
> > > >> >> (XEN)   Auxiliary Features: 00000000
> > > >> >> (XEN)   Memory Model Features: 10201105 20000000 01240000 02102211
> > > >> >> (XEN)  ISA Features: 02101110 13112111 21232041 11112131 10011142 00000000
> > > >> >> (XEN) Platform: TI DRA7
> > > >> >> (XEN) /psci method must be smc, but is: "hvc"
> > > >> >> (XEN) Set AuxCoreBoot1 to 00000000dfe0004c (0020004c)
> > > >> >> (XEN) Set AuxCoreBoot0 to 0x20
> > > >> >> (XEN) Generic Timer IRQ: phys=30 hyp=26 virt=27
> > > >> >> (XEN) Using generic timer at 6144 KHz
> > > >> >> (XEN) GIC initialization:
> > > >> >> (XEN)         gic_dist_addr=0000000048211000
> > > >> >> (XEN)         gic_cpu_addr=0000000048212000
> > > >> >> (XEN)         gic_hyp_addr=0000000048214000
> > > >> >> (XEN)         gic_vcpu_addr=0000000048216000
> > > >> >> (XEN)         gic_maintenance_irq=25
> > > >> >> (XEN) GIC: 192 lines, 2 cpus, secure (IID 0000043b).
> > > >> >> (XEN) Using scheduler: SMP Credit Scheduler (credit)
> > > >> >> (XEN) I/O virtualisation disabled
> > > >> >> (XEN) Allocated console ring of 16 KiB.
> > > >> >> (XEN) VFP implementer 0x41 architecture 4 part 0x30 variant 0xf rev 0x0
> > > >> >> (XEN) Bringing up CPU1
> > > >> >> - CPU 00000001 booting -
> > > >> >> - Xen starting in Hyp mode -
> > > >> >> - Setting up control registers -
> > > >> >> - Turning on paging -
> > > >> >> - Ready -
> > > >> >> (XEN) CPU 1 booted.
> > > >> >> (XEN) Brought up 2 CPUs
> > > >> >> (XEN) *** LOADING DOMAIN 0 ***
> > > >> >> (XEN) Loading kernel from boot module 2
> > > >> >> (XEN) Populate P2M 0xc8000000->0xd0000000 (1:1 mapping for dom0)
> > > >> >> (XEN) Loading zImage from 00000000c0000040 to 00000000cfc00000-00000000cff50c48
> > > >> >> (XEN) Loading dom0 DTB to 0x00000000cfa00000-0x00000000cfa05ba8
> > > >> >> (XEN) Std. Loglevel: All
> > > >> >> (XEN) Guest Loglevel: All
> > > >> >> (XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch
> > > >> >> input to Xen)
> > > >> >> (XEN) Freed 272kB init memory.
> > > >> >> (XEN) gic.c:673:d0v0 trying to inject irq=2 into d0v0, when it is
> > > >> >> already pending in LR0
> > > >> >> (XEN) gic.c:673:d0v0 trying to inject irq=2 into d0v0, when it is
> > > >> >> already pending in LR0
> > > >> >> [    0.000000] /cpus/cpu@0 missing clock-frequency property
> > > >> >> [    0.000000] /cpus/cpu@1 missing clock-frequency property
> > > >> >> [    0.140625] omap-gpmc omap-gpmc: failed to reserve memory
> > > >> >> [    0.187500] omap_l3_noc ocp.3: couldn't find resource 2
> > > >> >> [    0.273437] i2c i2c-1: of_i2c: invalid reg on
> > > >> >> /ocp/i2c@48072000/camera_ov10635
> > > >> >> [    0.437500] ldo3: operation not allowed
> > > >> >> [    0.437500] omapdss HDMI error: can't set the voltage regulator
> > > >> >> [    0.468750] tfc_s9700 display0: tfc_s9700_probe probe
> > > >> >> [    0.468750] ov1063x 1-0030: No deserializer node found
> > > >> >> [    0.468750] ov1063x 1-0030: No serializer node found
> > > >> >> [    0.468750] ov1063x 1-0030: Failed writing register 0x0103!
> > > >> >> [    0.468750] dra7xx-vip vip1-0: Waiting for I2C subdevice 30
> > > >> >> [    0.578125] ahci ahci.0.auto: can't get clock
> > > >> >> [    0.898437] ldc_module_init
> > > >> >> [    1.304687] Missing dual_emac_res_vlan in DT.
> > > >> >> [    1.304687] Using 1 as Reserved VLAN for 0 slave
> > > >> >> [    1.312500] Missing dual_emac_res_vlan in DT.
> > > >> >> [    1.320312] Using 2 as Reserved VLAN for 1 slave
> > > >> >> [    1.382812] Freeing init memory: 236K
> > > >> >> sh: write error: No such device
> > > >> >> Cannot identify '/dev/camera0': 2, No such file or directory
> > > >> >> Parsing config from /xen/images/DomUAndroid.cfg
> > > >> >> XSM Disabled: seclabel not supported
> > > >> >> (XEN) do_physdev_op 16 cmd=13: not implemented yet
> > > >> >> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give
> > > >> >> dom1 access to irq 53: Function not implemented
> > > >> >> (XEN) do_physdev_op 16 cmd=13: not implemented yet
> > > >> >> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give
> > > >> >> dom1 access to irq 71: Function not implemented
> > > >> >> (XEN) do_physdev_op 16 cmd=13: not implemented yet
> > > >> >> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give
> > > >> >> dom1 access to irq 173: Function not implemented
> > > >> >> (XEN) do_physdev_op 16 cmd=13: not implemented yet
> > > >> >> libxl: error: libxl_create.c:1092:domcreate_launch_dm: failed give
> > > >> >> dom1 access to irq 174: Function not implemented
> > > >> >> Turning on vfb in domain 1
> > > >> >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is
> > > >> >> still lr_pending
> > > >> >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is
> > > >> >> still lr_pending
> > > >> >> Parsing config from /xen/images/DomUQNX.cfg
> > > >> >> XSM Disabled: seclabel not supported(XEN) gic.c:617:d0v1 trying to
> > > >> >> inject irq=2 into d0v0, when it is still lr_pending
> > > >> >>
> > > >> >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is
> > > >> >> still lr_pending
> > > >> >> [    4.304687] dra7-evm-sound sound.17: cpu dai node is invalid
> > > >> >> [    4.312500] dra7-evm-sound sound.17: failed to add bluetooth dai link -22
> > > >> >> xc: error: panic: xc_dom_core.c:644: xc_dom_find_loader: no loader
> > > >> >> found: Invalid kernel
> > > >> >> libxl: error: libxl_dom.c:436:libxl__build_pv: xc_dom_parse_image
> > > >> >> failed: No such file or directory
> > > >> >> libxl: error: libxl_create.c:1030:domcreate_rebuild_done: cannot
> > > >> >> (re-)build domain: -3
> > > >> >> (XEN) gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is
> > > >> >> still lr_pending
> > > >> >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is
> > > >> >> still lr_pending
> > > >> >> Turning on 'vsnd' in domain '1' (dev_id: '0')
> > > >> >> Turning on vkbd in domain 1
> > > >> >> (XEN) gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is
> > > >> >> still lr_pending
> > > >> >> (XEN) gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is
> > > >> >> still lr_pending
> > > >> >> (XEN) gic.c:617:d0v0 trying to inject irq=2 into d0v1, when it is
> > > >> >> still lr_pending
> > > >> >>
> > > >> >> Please press Enter to activate this console. (XEN) gic.c:617:d0v1
> > > >> >> trying to inject irq=2 into d0v0, when it is still lr_pending
> > > >> >>
> > > >> >> On Tue, Nov 18, 2014 at 6:18 PM, Andrii Tseglytskyi
> > > >> >> <andrii.tseglytskyi@globallogic.com> wrote:
> > > >> >> > OK got it. Give me a few mins
> > > >> >> >
> > > >> >> > On Tue, Nov 18, 2014 at 6:14 PM, Stefano Stabellini
> > > >> >> > <stefano.stabellini@eu.citrix.com> wrote:
> > > >> >> >> It is not the same: I would like to set GICH_V2_LR_MAINTENANCE_IRQ only
> > > >> >> >> for non-hardware irqs (desc == NULL) and keep avoiding
> > > >> >> >> GICH_V2_LR_MAINTENANCE_IRQ and setting GICH_LR_HW for hardware irqs.
> > > >> >> >>
> > > >> >> >> Also testing on 394b7e587b05d0f4a5fd6f067b38339ab5a77121 would avoid
> > > >> >> >> other potential bugs introduced later.
> > > >> >> >>
> > > >> >> >> On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote:
> > > >> >> >>> What if I try on top of current master branch the following code:
> > > >> >> >>>
> > > >> >> >>> diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
> > > >> >> >>> index 31fb81a..6764ab7 100644
> > > >> >> >>> --- a/xen/arch/arm/gic-v2.c
> > > >> >> >>> +++ b/xen/arch/arm/gic-v2.c
> > > >> >> >>> @@ -36,6 +36,8 @@
> > > >> >> >>>  #include <asm/io.h>
> > > >> >> >>>  #include <asm/gic.h>
> > > >> >> >>>
> > > >> >> >>> +#define GIC_DEBUG 1
> > > >> >> >>> +
> > > >> >> >>>  /*
> > > >> >> >>>   * LR register definitions are GIC v2 specific.
> > > >> >> >>>   * Moved these definitions from header file to here
> > > >> >> >>> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> > > >> >> >>> index bcaded9..c03d6a6 100644
> > > >> >> >>> --- a/xen/arch/arm/gic.c
> > > >> >> >>> +++ b/xen/arch/arm/gic.c
> > > >> >> >>> @@ -41,7 +41,7 @@ static DEFINE_PER_CPU(uint64_t, lr_mask);
> > > >> >> >>>
> > > >> >> >>>  #define lr_all_full() (this_cpu(lr_mask) == ((1 <<
> > > >> >> >>> gic_hw_ops->info->nr_lrs) - 1))
> > > >> >> >>>
> > > >> >> >>> -#undef GIC_DEBUG
> > > >> >> >>> +#define GIC_DEBUG 1
> > > >> >> >>>
> > > >> >> >>>  static void gic_update_one_lr(struct vcpu *v, int i);
> > > >> >> >>>
> > > >> >> >>> It is equivalent to what you proposing - my code contains
> > > >> >> >>> PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI, as result the following lone will
> > > >> >> >>> be executed:
> > > >> >> >>>  lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ; inside gicv2_update_lr() function
> > > >> >> >>>
> > > >> >> >>> regards,
> > > >> >> >>> Andrii
> > > >> >> >>>
> > > >> >> >>> On Tue, Nov 18, 2014 at 5:39 PM, Stefano Stabellini
> > > >> >> >>> <stefano.stabellini@eu.citrix.com> wrote:
> > > >> >> >>> > On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote:
> > > >> >> >>> >> OK, I see that GICH_V2_LR_MAINTENANCE_IRQ must always be set and
> > > >> >> >>> >> everything works fine
> > > >> >> >>> >> The following 2 patches fixes xen/master for my platform.
> > > >> >> >>> >>
> > > >> >> >>> >> Stefano, could you please take a look to these changes?
> > > >> >> >>> >>
> > > >> >> >>> >> commit 3628a0aa35706a8f532af865ed784536ce514eca
> > > >> >> >>> >> Author: Andrii Tseglytskyi <andrii.tseglytskyi@globallogic.com>
> > > >> >> >>> >> Date:   Tue Nov 18 14:20:42 2014 +0200
> > > >> >> >>> >>
> > > >> >> >>> >>     xen/arm: dra7: always set GICH_V2_LR_MAINTENANCE_IRQ flag
> > > >> >> >>> >>
> > > >> >> >>> >>     Change-Id: Ia380b3507a182b11592588f65fd23693d4f87434
> > > >> >> >>> >>     Signed-off-by: Andrii Tseglytskyi <andrii.tseglytskyi@globallogic.com>
> > > >> >> >>> >>
> > > >> >> >>> >> diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
> > > >> >> >>> >> index 31fb81a..093ecdb 100644
> > > >> >> >>> >> --- a/xen/arch/arm/gic-v2.c
> > > >> >> >>> >> +++ b/xen/arch/arm/gic-v2.c
> > > >> >> >>> >> @@ -396,13 +396,14 @@ static void gicv2_update_lr(int lr, const struct
> > > >> >> >>> >> pending_irq *p,
> > > >> >> >>> >>                                               << GICH_V2_LR_PRIORITY_SHIFT) |
> > > >> >> >>> >>                ((p->irq & GICH_V2_LR_VIRTUAL_MASK) <<
> > > >> >> >>> >> GICH_V2_LR_VIRTUAL_SHIFT));
> > > >> >> >>> >>
> > > >> >> >>> >> -    if ( p->desc != NULL )
> > > >> >> >>> >> +    if ( platform_has_quirk(PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI) )
> > > >> >> >>> >>      {
> > > >> >> >>> >> -        if ( platform_has_quirk(PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI) )
> > > >> >> >>> >> -            lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ;
> > > >> >> >>> >> -        else
> > > >> >> >>> >> -            lr_reg |= GICH_V2_LR_HW | ((p->desc->irq &
> > > >> >> >>> >> GICH_V2_LR_PHYSICAL_MASK )
> > > >> >> >>> >> -                            << GICH_V2_LR_PHYSICAL_SHIFT);
> > > >> >> >>> >> +        lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ;
> > > >> >> >>> >> +    }
> > > >> >> >>> >> +    else if ( p->desc != NULL )
> > > >> >> >>> >> +    {
> > > >> >> >>> >> +        lr_reg |= GICH_V2_LR_HW | ((p->desc->irq & GICH_V2_LR_PHYSICAL_MASK )
> > > >> >> >>> >> +                       << GICH_V2_LR_PHYSICAL_SHIFT);
> > > >> >> >>> >>      }
> > > >> >> >>> >>
> > > >> >> >>> >>      writel_gich(lr_reg, GICH_LR + lr * 4);
> > > >> >> >>> >
> > > >> >> >>> > Actually in case p->desc == NULL (the irq is not an hardware irq, it
> > > >> >> >>> > could be the virtual timer irq or the evtchn irq), you shouldn't need
> > > >> >> >>> > the maintenance interrupt, if the bug was really due to GICH_LR_HW not
> > > >> >> >>> > working correctly on OMAP5. This changes might only be better at
> > > >> >> >>> > "hiding" the real issue.
> > > >> >> >>> >
> > > >> >> >>> > Maybe the problem is exactly the opposite: the new scheme for avoiding
> > > >> >> >>> > maintenance interrupts doesn't work for software interrupts.
> > > >> >> >>> > The commit that should make them work correctly after the
> > > >> >> >>> > no-maintenance-irq commit is 394b7e587b05d0f4a5fd6f067b38339ab5a77121
> > > >> >> >>> > If you look at the changes to gic_update_one_lr in that commit, you'll
> > > >> >> >>> > see that is going to set a software irq as PENDING if it is already ACTIVE.
> > > >> >> >>> > Maybe that doesn't work correctly on OMAP5.
> > > >> >> >>> >
> > > >> >> >>> > Could you try this patch on top of
> > > >> >> >>> > 394b7e587b05d0f4a5fd6f067b38339ab5a77121?  It should help us understand
> > > >> >> >>> > if the problem is specifically with software irqs.
> > > >> >> >>> >
> > > >> >> >>> >
> > > >> >> >>> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> > > >> >> >>> > index b7516c0..d8a17c9 100644
> > > >> >> >>> > --- a/xen/arch/arm/gic.c
> > > >> >> >>> > +++ b/xen/arch/arm/gic.c
> > > >> >> >>> > @@ -66,7 +66,7 @@ static DEFINE_PER_CPU(u8, gic_cpu_id);
> > > >> >> >>> >  /* Maximum cpu interface per GIC */
> > > >> >> >>> >  #define NR_GIC_CPU_IF 8
> > > >> >> >>> >
> > > >> >> >>> > -#undef GIC_DEBUG
> > > >> >> >>> > +#define GIC_DEBUG 1
> > > >> >> >>> >
> > > >> >> >>> >  static void gic_update_one_lr(struct vcpu *v, int i);
> > > >> >> >>> >
> > > >> >> >>> > @@ -563,6 +563,8 @@ static inline void gic_set_lr(int lr, struct pending_irq *p,
> > > >> >> >>> >          ((p->irq & GICH_LR_VIRTUAL_MASK) << GICH_LR_VIRTUAL_SHIFT);
> > > >> >> >>> >      if ( p->desc != NULL )
> > > >> >> >>> >          lr_val |= GICH_LR_HW | (p->desc->irq << GICH_LR_PHYSICAL_SHIFT);
> > > >> >> >>> > +    else
> > > >> >> >>> > +        lr_val |= GICH_LR_MAINTENANCE_IRQ;
> > > >> >> >>> >
> > > >> >> >>> >      GICH[GICH_LR + lr] = lr_val;
> > > >> >> >>> >
> > > >> >> >>>
> > > >> >> >>>
> > > >> >> >>>
> > > >> >> >>> --
> > > >> >> >>>
> > > >> >> >>> Andrii Tseglytskyi | Embedded Dev
> > > >> >> >>> GlobalLogic
> > > >> >> >>> www.globallogic.com
> > > >> >> >>>
> > > >> >> >
> > > >> >> >
> > > >> >> >
> > > >> >> > --
> > > >> >> >
> > > >> >> > Andrii Tseglytskyi | Embedded Dev
> > > >> >> > GlobalLogic
> > > >> >> > www.globallogic.com
> > > >> >>
> > > >> >>
> > > >> >>
> > > >> >> --
> > > >> >>
> > > >> >> Andrii Tseglytskyi | Embedded Dev
> > > >> >> GlobalLogic
> > > >> >> www.globallogic.com
> > > >> >>
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >>
> > > >> Andrii Tseglytskyi | Embedded Dev
> > > >> GlobalLogic
> > > >> www.globallogic.com
> > > >>
> > > 
> > > 
> > > 
> > > -- 
> > > 
> > > Andrii Tseglytskyi | Embedded Dev
> > > GlobalLogic
> > > www.globallogic.com
> > > 
> 
> 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Xen 4.5 random freeze question
  2014-11-19 12:17                                                     ` Stefano Stabellini
@ 2014-11-19 12:23                                                       ` Julien Grall
  2014-11-19 12:40                                                         ` Andrii Tseglytskyi
  0 siblings, 1 reply; 66+ messages in thread
From: Julien Grall @ 2014-11-19 12:23 UTC (permalink / raw)
  To: Stefano Stabellini, Ian Campbell; +Cc: xen-devel, Andrii Tseglytskyi

On 11/19/2014 12:17 PM, Stefano Stabellini wrote:
> On Wed, 19 Nov 2014, Ian Campbell wrote:
>> On Wed, 2014-11-19 at 11:42 +0000, Stefano Stabellini wrote:
>>> So it looks like there is not actually anything wrong, is just that you
>>> have too much inflight irqs? It should cause problems because in that
>>> case GICH_HCR_UIE should be set and you should get a maintenance
>>> interrupt when LRs become available (actually when "none, or only one,
>>> of the List register entries is marked as a valid interrupt").
>>>
>>> Maybe GICH_HCR_UIE is the one that doesn't work properly.
>>
>> How much testing did this aspect get when the no-maint-irq series
>> originally went in? Did you manage to find a workload which filled all
>> the LRs or try artificially limiting the number of LRs somehow in order
>> to provoke it?
>>
>> I ask because my intuition is that this won't happen very much, meaning
>> those code paths may not be as well tested...
> 
> I did test it by artificially limiting the number of LRs to 1.
> However there have been many iterations of that series and I didn't run
> this test at every iteration.

am I the only to think this may not be related to this bug? All the LRs
are full with IRQ of the same priority. So it's valid.

As gic_restore_pending_irqs is called every time that we return to the
guest. It could be anything else.

It would be interesting to see why we are trapping all the time in Xen.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Xen 4.5 random freeze question
  2014-11-19 11:59                                                     ` Stefano Stabellini
@ 2014-11-19 12:37                                                       ` Andrii Tseglytskyi
  2014-11-19 14:52                                                         ` Stefano Stabellini
  0 siblings, 1 reply; 66+ messages in thread
From: Andrii Tseglytskyi @ 2014-11-19 12:37 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: Julien Grall, Ian Campbell, xen-devel

Hi Stefano,

> >      if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
> > -        GICH[GICH_HCR] |= GICH_HCR_UIE;
> > +        GICH[GICH_HCR] |= GICH_HCR_NPIE;
> >      else
> > -        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> > +        GICH[GICH_HCR] &= ~GICH_HCR_NPIE;
> >
> >  }
>
> Yes, exactly

I tried, hang still occurs with this change

Regards,
Andrii




-- 

Andrii Tseglytskyi | Embedded Dev
GlobalLogic
www.globallogic.com

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Xen 4.5 random freeze question
  2014-11-19 12:23                                                       ` Julien Grall
@ 2014-11-19 12:40                                                         ` Andrii Tseglytskyi
  2014-11-19 13:26                                                           ` Julien Grall
  0 siblings, 1 reply; 66+ messages in thread
From: Andrii Tseglytskyi @ 2014-11-19 12:40 UTC (permalink / raw)
  To: Julien Grall; +Cc: xen-devel, Ian Campbell, Stefano Stabellini

Hi Julien,

On Wed, Nov 19, 2014 at 2:23 PM, Julien Grall <julien.grall@linaro.org> wrote:
> On 11/19/2014 12:17 PM, Stefano Stabellini wrote:
>> On Wed, 19 Nov 2014, Ian Campbell wrote:
>>> On Wed, 2014-11-19 at 11:42 +0000, Stefano Stabellini wrote:
>>>> So it looks like there is not actually anything wrong, is just that you
>>>> have too much inflight irqs? It should cause problems because in that
>>>> case GICH_HCR_UIE should be set and you should get a maintenance
>>>> interrupt when LRs become available (actually when "none, or only one,
>>>> of the List register entries is marked as a valid interrupt").
>>>>
>>>> Maybe GICH_HCR_UIE is the one that doesn't work properly.
>>>
>>> How much testing did this aspect get when the no-maint-irq series
>>> originally went in? Did you manage to find a workload which filled all
>>> the LRs or try artificially limiting the number of LRs somehow in order
>>> to provoke it?
>>>
>>> I ask because my intuition is that this won't happen very much, meaning
>>> those code paths may not be as well tested...
>>
>> I did test it by artificially limiting the number of LRs to 1.
>> However there have been many iterations of that series and I didn't run
>> this test at every iteration.
>
> am I the only to think this may not be related to this bug? All the LRs
> are full with IRQ of the same priority. So it's valid.
>
> As gic_restore_pending_irqs is called every time that we return to the
> guest. It could be anything else.
>
> It would be interesting to see why we are trapping all the time in Xen.
>

I may perform any test if you have some specific scenario.


> Regards,
>
> --
> Julien Grall



-- 

Andrii Tseglytskyi | Embedded Dev
GlobalLogic
www.globallogic.com

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Xen 4.5 random freeze question
  2014-11-19 12:40                                                         ` Andrii Tseglytskyi
@ 2014-11-19 13:26                                                           ` Julien Grall
  2014-11-19 13:30                                                             ` Andrii Tseglytskyi
  0 siblings, 1 reply; 66+ messages in thread
From: Julien Grall @ 2014-11-19 13:26 UTC (permalink / raw)
  To: Andrii Tseglytskyi; +Cc: xen-devel, Ian Campbell, Stefano Stabellini

On 11/19/2014 12:40 PM, Andrii Tseglytskyi wrote:
> Hi Julien,
> 
> On Wed, Nov 19, 2014 at 2:23 PM, Julien Grall <julien.grall@linaro.org> wrote:
>> On 11/19/2014 12:17 PM, Stefano Stabellini wrote:
>>> On Wed, 19 Nov 2014, Ian Campbell wrote:
>>>> On Wed, 2014-11-19 at 11:42 +0000, Stefano Stabellini wrote:
>>>>> So it looks like there is not actually anything wrong, is just that you
>>>>> have too much inflight irqs? It should cause problems because in that
>>>>> case GICH_HCR_UIE should be set and you should get a maintenance
>>>>> interrupt when LRs become available (actually when "none, or only one,
>>>>> of the List register entries is marked as a valid interrupt").
>>>>>
>>>>> Maybe GICH_HCR_UIE is the one that doesn't work properly.
>>>>
>>>> How much testing did this aspect get when the no-maint-irq series
>>>> originally went in? Did you manage to find a workload which filled all
>>>> the LRs or try artificially limiting the number of LRs somehow in order
>>>> to provoke it?
>>>>
>>>> I ask because my intuition is that this won't happen very much, meaning
>>>> those code paths may not be as well tested...
>>>
>>> I did test it by artificially limiting the number of LRs to 1.
>>> However there have been many iterations of that series and I didn't run
>>> this test at every iteration.
>>
>> am I the only to think this may not be related to this bug? All the LRs
>> are full with IRQ of the same priority. So it's valid.
>>
>> As gic_restore_pending_irqs is called every time that we return to the
>> guest. It could be anything else.
>>
>> It would be interesting to see why we are trapping all the time in Xen.
>>
> 
> I may perform any test if you have some specific scenario.

I have no specific scenario in my mind :/.

It looks like I'm able to reproduce it on my ARM board by the restricted
the number of LRs to 1.

I will investigate.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Xen 4.5 random freeze question
  2014-11-19 13:26                                                           ` Julien Grall
@ 2014-11-19 13:30                                                             ` Andrii Tseglytskyi
  2014-11-19 14:05                                                               ` Julien Grall
  0 siblings, 1 reply; 66+ messages in thread
From: Andrii Tseglytskyi @ 2014-11-19 13:30 UTC (permalink / raw)
  To: Julien Grall; +Cc: xen-devel, Ian Campbell, Stefano Stabellini

On Wed, Nov 19, 2014 at 3:26 PM, Julien Grall <julien.grall@linaro.org> wrote:
> On 11/19/2014 12:40 PM, Andrii Tseglytskyi wrote:
>> Hi Julien,
>>
>> On Wed, Nov 19, 2014 at 2:23 PM, Julien Grall <julien.grall@linaro.org> wrote:
>>> On 11/19/2014 12:17 PM, Stefano Stabellini wrote:
>>>> On Wed, 19 Nov 2014, Ian Campbell wrote:
>>>>> On Wed, 2014-11-19 at 11:42 +0000, Stefano Stabellini wrote:
>>>>>> So it looks like there is not actually anything wrong, is just that you
>>>>>> have too much inflight irqs? It should cause problems because in that
>>>>>> case GICH_HCR_UIE should be set and you should get a maintenance
>>>>>> interrupt when LRs become available (actually when "none, or only one,
>>>>>> of the List register entries is marked as a valid interrupt").
>>>>>>
>>>>>> Maybe GICH_HCR_UIE is the one that doesn't work properly.
>>>>>
>>>>> How much testing did this aspect get when the no-maint-irq series
>>>>> originally went in? Did you manage to find a workload which filled all
>>>>> the LRs or try artificially limiting the number of LRs somehow in order
>>>>> to provoke it?
>>>>>
>>>>> I ask because my intuition is that this won't happen very much, meaning
>>>>> those code paths may not be as well tested...
>>>>
>>>> I did test it by artificially limiting the number of LRs to 1.
>>>> However there have been many iterations of that series and I didn't run
>>>> this test at every iteration.
>>>
>>> am I the only to think this may not be related to this bug? All the LRs
>>> are full with IRQ of the same priority. So it's valid.
>>>
>>> As gic_restore_pending_irqs is called every time that we return to the
>>> guest. It could be anything else.
>>>
>>> It would be interesting to see why we are trapping all the time in Xen.
>>>
>>
>> I may perform any test if you have some specific scenario.
>
> I have no specific scenario in my mind :/.
>
> It looks like I'm able to reproduce it on my ARM board by the restricted
> the number of LRs to 1.
>

Do you mean that you got a hang with current xen/master branch ?

Regards,
Andrii

> I will investigate.
>
> Regards,
>
> --
> Julien Grall



-- 

Andrii Tseglytskyi | Embedded Dev
GlobalLogic
www.globallogic.com

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Xen 4.5 random freeze question
  2014-11-19 13:30                                                             ` Andrii Tseglytskyi
@ 2014-11-19 14:05                                                               ` Julien Grall
  0 siblings, 0 replies; 66+ messages in thread
From: Julien Grall @ 2014-11-19 14:05 UTC (permalink / raw)
  To: Andrii Tseglytskyi; +Cc: xen-devel, Ian Campbell, Stefano Stabellini

On 11/19/2014 01:30 PM, Andrii Tseglytskyi wrote:
> On Wed, Nov 19, 2014 at 3:26 PM, Julien Grall <julien.grall@linaro.org> wrote:
>> On 11/19/2014 12:40 PM, Andrii Tseglytskyi wrote:
>>> Hi Julien,
>>>
>>> On Wed, Nov 19, 2014 at 2:23 PM, Julien Grall <julien.grall@linaro.org> wrote:
>>>> On 11/19/2014 12:17 PM, Stefano Stabellini wrote:
>>>>> On Wed, 19 Nov 2014, Ian Campbell wrote:
>>>>>> On Wed, 2014-11-19 at 11:42 +0000, Stefano Stabellini wrote:
>>>>>>> So it looks like there is not actually anything wrong, is just that you
>>>>>>> have too much inflight irqs? It should cause problems because in that
>>>>>>> case GICH_HCR_UIE should be set and you should get a maintenance
>>>>>>> interrupt when LRs become available (actually when "none, or only one,
>>>>>>> of the List register entries is marked as a valid interrupt").
>>>>>>>
>>>>>>> Maybe GICH_HCR_UIE is the one that doesn't work properly.
>>>>>>
>>>>>> How much testing did this aspect get when the no-maint-irq series
>>>>>> originally went in? Did you manage to find a workload which filled all
>>>>>> the LRs or try artificially limiting the number of LRs somehow in order
>>>>>> to provoke it?
>>>>>>
>>>>>> I ask because my intuition is that this won't happen very much, meaning
>>>>>> those code paths may not be as well tested...
>>>>>
>>>>> I did test it by artificially limiting the number of LRs to 1.
>>>>> However there have been many iterations of that series and I didn't run
>>>>> this test at every iteration.
>>>>
>>>> am I the only to think this may not be related to this bug? All the LRs
>>>> are full with IRQ of the same priority. So it's valid.
>>>>
>>>> As gic_restore_pending_irqs is called every time that we return to the
>>>> guest. It could be anything else.
>>>>
>>>> It would be interesting to see why we are trapping all the time in Xen.
>>>>
>>>
>>> I may perform any test if you have some specific scenario.
>>
>> I have no specific scenario in my mind :/.
>>
>> It looks like I'm able to reproduce it on my ARM board by the restricted
>> the number of LRs to 1.
>>
> 
> Do you mean that you got a hang with current xen/master branch ?

Yes but I forgot to update another part of the code.

With the patch below to restrict the number of LRs I'm still able to boot.
And don't see any maintenance interrupt.

Stefano, is it valid?

diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
index faad1ff..c1c0f7ff 100644
--- a/xen/arch/arm/gic-v2.c
+++ b/xen/arch/arm/gic-v2.c
@@ -327,6 +327,7 @@ static void __cpuinit gicv2_hyp_init(void)
     vtr = readl_gich(GICH_VTR);
     nr_lrs  = (vtr & GICH_V2_VTR_NRLRGS) + 1;
     gicv2_info.nr_lrs = nr_lrs;
+    gicv2_info.nr_lrs = 1;
 
     writel_gich(GICH_MISR_EOI, GICH_MISR);
 }
@@ -488,6 +489,16 @@ static void gicv2_write_lr(int lr, const struct gic_lr *lr_reg)
 
 static void gicv2_hcr_status(uint32_t flag, bool_t status)
 {
+    uint32_t lr = readl_gich(GICH_LR + 0);
+
+    if ( status )
+        lr |= GICH_V2_LR_MAINTENANCE_IRQ;
+    else
+        lr &= ~GICH_V2_LR_MAINTENANCE_IRQ;
+
+    writel_gich(lr, GICH_LR + 0);
+
+#if 0
     uint32_t hcr = readl_gich(GICH_HCR);
 
     if ( status )
@@ -496,6 +507,7 @@ static void gicv2_hcr_status(uint32_t flag, bool_t status)
         hcr &= (~flag);
 
     writel_gich(hcr, GICH_HCR);
+#endif
 }
 
 static unsigned int gicv2_read_vmcr_priority(void)
diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index 70d10d6..c726d7a 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -599,6 +599,7 @@ static void maintenance_interrupt(int irq, void *dev_id, struct cpu_user_regs *r
      * on return to guest that is going to clear the old LRs and inject
      * new interrupts.
      */
+    gdprintk(XENLOG_DEBUG, "\n");
 }
 
 void gic_dump_info(struct vcpu *v)


-- 
Julien Grall

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: Xen 4.5 random freeze question
  2014-11-19 12:37                                                       ` Andrii Tseglytskyi
@ 2014-11-19 14:52                                                         ` Stefano Stabellini
  2014-11-19 15:27                                                           ` Andrii Tseglytskyi
  0 siblings, 1 reply; 66+ messages in thread
From: Stefano Stabellini @ 2014-11-19 14:52 UTC (permalink / raw)
  To: Andrii Tseglytskyi
  Cc: Julien Grall, xen-devel, Ian Campbell, Stefano Stabellini

On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> Hi Stefano,
> 
> > >      if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
> > > -        GICH[GICH_HCR] |= GICH_HCR_UIE;
> > > +        GICH[GICH_HCR] |= GICH_HCR_NPIE;
> > >      else
> > > -        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> > > +        GICH[GICH_HCR] &= ~GICH_HCR_NPIE;
> > >
> > >  }
> >
> > Yes, exactly
> 
> I tried, hang still occurs with this change

We need to figure out why during the hang you still have all the LRs
busy even if you are getting maintenance interrupts that should cause
them to be cleared.

Could you please call gic_dump_info(current) from maintenance_interrupt,
and post the output during the hang? Remove the other gic_dump_info to
avoid confusion, we want to understand what is the status of the LRs
after clearing them upon receiving a maintenance interrupt at busy times.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Xen 4.5 random freeze question
  2014-11-19 14:52                                                         ` Stefano Stabellini
@ 2014-11-19 15:27                                                           ` Andrii Tseglytskyi
  2014-11-19 15:41                                                             ` Stefano Stabellini
  0 siblings, 1 reply; 66+ messages in thread
From: Andrii Tseglytskyi @ 2014-11-19 15:27 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: Julien Grall, Ian Campbell, xen-devel

Hi Stefano,



On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini
<stefano.stabellini@eu.citrix.com> wrote:
> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>> Hi Stefano,
>>
>> > >      if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
>> > > -        GICH[GICH_HCR] |= GICH_HCR_UIE;
>> > > +        GICH[GICH_HCR] |= GICH_HCR_NPIE;
>> > >      else
>> > > -        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>> > > +        GICH[GICH_HCR] &= ~GICH_HCR_NPIE;
>> > >
>> > >  }
>> >
>> > Yes, exactly
>>
>> I tried, hang still occurs with this change
>
> We need to figure out why during the hang you still have all the LRs
> busy even if you are getting maintenance interrupts that should cause
> them to be cleared.
>

I see that I have free LRs during maintenance interrupt

(XEN) gic.c:871:d0v0 maintenance interrupt
(XEN) GICH_LRs (vcpu 0) mask=0
(XEN)    HW_LR[0]=9a015856
(XEN)    HW_LR[1]=0
(XEN)    HW_LR[2]=0
(XEN)    HW_LR[3]=0
(XEN) Inflight irq=86 lr=0
(XEN) Inflight irq=2 lr=255
(XEN) Pending irq=2

But I see that after I got hang - maintenance interrupts are generated
continuously. Platform continues printing the same log till reboot.


My diff is on top of 394b7e587b05d0f4a5fd6f067b38339ab5a77121

diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index b7516c0..1e0316a 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -66,7 +66,7 @@ static DEFINE_PER_CPU(u8, gic_cpu_id);
 /* Maximum cpu interface per GIC */
 #define NR_GIC_CPU_IF 8

-#undef GIC_DEBUG
+#define GIC_DEBUG 1

 static void gic_update_one_lr(struct vcpu *v, int i);

@@ -868,6 +868,8 @@ static void maintenance_interrupt(int irq, void
*dev_id, struct cpu_user_regs *r
      * on return to guest that is going to clear the old LRs and inject
      * new interrupts.
      */
+    gdprintk(XENLOG_DEBUG, "maintenance interrupt\n");
+    gic_dump_info(current);
 }

 void gic_dump_info(struct vcpu *v)


> Could you please call gic_dump_info(current) from maintenance_interrupt,
> and post the output during the hang? Remove the other gic_dump_info to
> avoid confusion, we want to understand what is the status of the LRs
> after clearing them upon receiving a maintenance interrupt at busy times.



-- 

Andrii Tseglytskyi | Embedded Dev
GlobalLogic
www.globallogic.com

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: Xen 4.5 random freeze question
  2014-11-19 15:27                                                           ` Andrii Tseglytskyi
@ 2014-11-19 15:41                                                             ` Stefano Stabellini
  2014-11-19 16:01                                                               ` Andrii Tseglytskyi
  0 siblings, 1 reply; 66+ messages in thread
From: Stefano Stabellini @ 2014-11-19 15:41 UTC (permalink / raw)
  To: Andrii Tseglytskyi
  Cc: Julien Grall, xen-devel, Ian Campbell, Stefano Stabellini

On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> Hi Stefano,
> 
> On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini
> <stefano.stabellini@eu.citrix.com> wrote:
> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> >> Hi Stefano,
> >>
> >> > >      if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
> >> > > -        GICH[GICH_HCR] |= GICH_HCR_UIE;
> >> > > +        GICH[GICH_HCR] |= GICH_HCR_NPIE;
> >> > >      else
> >> > > -        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> >> > > +        GICH[GICH_HCR] &= ~GICH_HCR_NPIE;
> >> > >
> >> > >  }
> >> >
> >> > Yes, exactly
> >>
> >> I tried, hang still occurs with this change
> >
> > We need to figure out why during the hang you still have all the LRs
> > busy even if you are getting maintenance interrupts that should cause
> > them to be cleared.
> >
> 
> I see that I have free LRs during maintenance interrupt
> 
> (XEN) gic.c:871:d0v0 maintenance interrupt
> (XEN) GICH_LRs (vcpu 0) mask=0
> (XEN)    HW_LR[0]=9a015856
> (XEN)    HW_LR[1]=0
> (XEN)    HW_LR[2]=0
> (XEN)    HW_LR[3]=0
> (XEN) Inflight irq=86 lr=0
> (XEN) Inflight irq=2 lr=255
> (XEN) Pending irq=2
> 
> But I see that after I got hang - maintenance interrupts are generated
> continuously. Platform continues printing the same log till reboot.

Exactly the same log? As in the one above you just pasted?
That is very very suspicious.

I am thinking that we are not handling GICH_HCR_UIE correctly and
something we do in Xen, maybe writing to an LR register, might trigger a
new maintenance interrupt immediately causing an infinite loop.

Could you please try this patch? It disable GICH_HCR_UIE immediately on
hypervisor entry.


diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index 4d2a92d..6ae8dc4 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v)
     if ( is_idle_vcpu(v) )
         return;
 
+    GICH[GICH_HCR] &= ~GICH_HCR_UIE;
+
     spin_lock_irqsave(&v->arch.vgic.lock, flags);
 
     while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask),
@@ -821,12 +823,8 @@ void gic_inject(void)
 
     gic_restore_pending_irqs(current);
 
-
     if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
         GICH[GICH_HCR] |= GICH_HCR_UIE;
-    else
-        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
-
 }
 
 static void do_sgi(struct cpu_user_regs *regs, int othercpu, enum gic_sgi sgi)

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: Xen 4.5 random freeze question
  2014-11-19 15:41                                                             ` Stefano Stabellini
@ 2014-11-19 16:01                                                               ` Andrii Tseglytskyi
  2014-11-19 16:09                                                                 ` Andrii Tseglytskyi
  0 siblings, 1 reply; 66+ messages in thread
From: Andrii Tseglytskyi @ 2014-11-19 16:01 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: Julien Grall, Ian Campbell, xen-devel

On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini
<stefano.stabellini@eu.citrix.com> wrote:
> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>> Hi Stefano,
>>
>> On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini
>> <stefano.stabellini@eu.citrix.com> wrote:
>> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>> >> Hi Stefano,
>> >>
>> >> > >      if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
>> >> > > -        GICH[GICH_HCR] |= GICH_HCR_UIE;
>> >> > > +        GICH[GICH_HCR] |= GICH_HCR_NPIE;
>> >> > >      else
>> >> > > -        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>> >> > > +        GICH[GICH_HCR] &= ~GICH_HCR_NPIE;
>> >> > >
>> >> > >  }
>> >> >
>> >> > Yes, exactly
>> >>
>> >> I tried, hang still occurs with this change
>> >
>> > We need to figure out why during the hang you still have all the LRs
>> > busy even if you are getting maintenance interrupts that should cause
>> > them to be cleared.
>> >
>>
>> I see that I have free LRs during maintenance interrupt
>>
>> (XEN) gic.c:871:d0v0 maintenance interrupt
>> (XEN) GICH_LRs (vcpu 0) mask=0
>> (XEN)    HW_LR[0]=9a015856
>> (XEN)    HW_LR[1]=0
>> (XEN)    HW_LR[2]=0
>> (XEN)    HW_LR[3]=0
>> (XEN) Inflight irq=86 lr=0
>> (XEN) Inflight irq=2 lr=255
>> (XEN) Pending irq=2
>>
>> But I see that after I got hang - maintenance interrupts are generated
>> continuously. Platform continues printing the same log till reboot.
>
> Exactly the same log? As in the one above you just pasted?
> That is very very suspicious.

Yes exactly the same log. And looks like it means that LRs are flushed
correctly.

>
> I am thinking that we are not handling GICH_HCR_UIE correctly and
> something we do in Xen, maybe writing to an LR register, might trigger a
> new maintenance interrupt immediately causing an infinite loop.
>

Yes, this is what I'm thinking about. Taking in account all collected
debug info it looks like once LRs are overloaded with SGIs -
maintenance interrupt occurs.
And then it is not handled properly, and occurs again and again - so
platform hangs inside its handler.

> Could you please try this patch? It disable GICH_HCR_UIE immediately on
> hypervisor entry.
>

Now trying.

>
> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> index 4d2a92d..6ae8dc4 100644
> --- a/xen/arch/arm/gic.c
> +++ b/xen/arch/arm/gic.c
> @@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v)
>      if ( is_idle_vcpu(v) )
>          return;
>
> +    GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> +
>      spin_lock_irqsave(&v->arch.vgic.lock, flags);
>
>      while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask),
> @@ -821,12 +823,8 @@ void gic_inject(void)
>
>      gic_restore_pending_irqs(current);
>
> -
>      if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
>          GICH[GICH_HCR] |= GICH_HCR_UIE;
> -    else
> -        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> -
>  }
>
>  static void do_sgi(struct cpu_user_regs *regs, int othercpu, enum gic_sgi sgi)



-- 

Andrii Tseglytskyi | Embedded Dev
GlobalLogic
www.globallogic.com

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Xen 4.5 random freeze question
  2014-11-19 16:01                                                               ` Andrii Tseglytskyi
@ 2014-11-19 16:09                                                                 ` Andrii Tseglytskyi
  2014-11-19 16:13                                                                   ` Stefano Stabellini
  0 siblings, 1 reply; 66+ messages in thread
From: Andrii Tseglytskyi @ 2014-11-19 16:09 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: Julien Grall, Ian Campbell, xen-devel

On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi
<andrii.tseglytskyi@globallogic.com> wrote:
> On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini
> <stefano.stabellini@eu.citrix.com> wrote:
>> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>>> Hi Stefano,
>>>
>>> On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini
>>> <stefano.stabellini@eu.citrix.com> wrote:
>>> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>>> >> Hi Stefano,
>>> >>
>>> >> > >      if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
>>> >> > > -        GICH[GICH_HCR] |= GICH_HCR_UIE;
>>> >> > > +        GICH[GICH_HCR] |= GICH_HCR_NPIE;
>>> >> > >      else
>>> >> > > -        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>>> >> > > +        GICH[GICH_HCR] &= ~GICH_HCR_NPIE;
>>> >> > >
>>> >> > >  }
>>> >> >
>>> >> > Yes, exactly
>>> >>
>>> >> I tried, hang still occurs with this change
>>> >
>>> > We need to figure out why during the hang you still have all the LRs
>>> > busy even if you are getting maintenance interrupts that should cause
>>> > them to be cleared.
>>> >
>>>
>>> I see that I have free LRs during maintenance interrupt
>>>
>>> (XEN) gic.c:871:d0v0 maintenance interrupt
>>> (XEN) GICH_LRs (vcpu 0) mask=0
>>> (XEN)    HW_LR[0]=9a015856
>>> (XEN)    HW_LR[1]=0
>>> (XEN)    HW_LR[2]=0
>>> (XEN)    HW_LR[3]=0
>>> (XEN) Inflight irq=86 lr=0
>>> (XEN) Inflight irq=2 lr=255
>>> (XEN) Pending irq=2
>>>
>>> But I see that after I got hang - maintenance interrupts are generated
>>> continuously. Platform continues printing the same log till reboot.
>>
>> Exactly the same log? As in the one above you just pasted?
>> That is very very suspicious.
>
> Yes exactly the same log. And looks like it means that LRs are flushed
> correctly.
>
>>
>> I am thinking that we are not handling GICH_HCR_UIE correctly and
>> something we do in Xen, maybe writing to an LR register, might trigger a
>> new maintenance interrupt immediately causing an infinite loop.
>>
>
> Yes, this is what I'm thinking about. Taking in account all collected
> debug info it looks like once LRs are overloaded with SGIs -
> maintenance interrupt occurs.
> And then it is not handled properly, and occurs again and again - so
> platform hangs inside its handler.
>
>> Could you please try this patch? It disable GICH_HCR_UIE immediately on
>> hypervisor entry.
>>
>
> Now trying.
>
>>
>> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
>> index 4d2a92d..6ae8dc4 100644
>> --- a/xen/arch/arm/gic.c
>> +++ b/xen/arch/arm/gic.c
>> @@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v)
>>      if ( is_idle_vcpu(v) )
>>          return;
>>
>> +    GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>> +
>>      spin_lock_irqsave(&v->arch.vgic.lock, flags);
>>
>>      while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask),
>> @@ -821,12 +823,8 @@ void gic_inject(void)
>>
>>      gic_restore_pending_irqs(current);
>>
>> -
>>      if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
>>          GICH[GICH_HCR] |= GICH_HCR_UIE;
>> -    else
>> -        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>> -
>>  }
>>
>>  static void do_sgi(struct cpu_user_regs *regs, int othercpu, enum gic_sgi sgi)
>

Heh - I don't see hangs with this patch :) But also I see that
maintenance interrupt doesn't occur (and no hang as result)
Stefano - is this expected?

>
>
> --
>
> Andrii Tseglytskyi | Embedded Dev
> GlobalLogic
> www.globallogic.com



-- 

Andrii Tseglytskyi | Embedded Dev
GlobalLogic
www.globallogic.com

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Xen 4.5 random freeze question
  2014-11-19 16:09                                                                 ` Andrii Tseglytskyi
@ 2014-11-19 16:13                                                                   ` Stefano Stabellini
  2014-11-19 16:29                                                                     ` Andrii Tseglytskyi
  0 siblings, 1 reply; 66+ messages in thread
From: Stefano Stabellini @ 2014-11-19 16:13 UTC (permalink / raw)
  To: Andrii Tseglytskyi
  Cc: Julien Grall, xen-devel, Ian Campbell, Stefano Stabellini

On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi
> <andrii.tseglytskyi@globallogic.com> wrote:
> > On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini
> > <stefano.stabellini@eu.citrix.com> wrote:
> >> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> >>> Hi Stefano,
> >>>
> >>> On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini
> >>> <stefano.stabellini@eu.citrix.com> wrote:
> >>> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> >>> >> Hi Stefano,
> >>> >>
> >>> >> > >      if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
> >>> >> > > -        GICH[GICH_HCR] |= GICH_HCR_UIE;
> >>> >> > > +        GICH[GICH_HCR] |= GICH_HCR_NPIE;
> >>> >> > >      else
> >>> >> > > -        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> >>> >> > > +        GICH[GICH_HCR] &= ~GICH_HCR_NPIE;
> >>> >> > >
> >>> >> > >  }
> >>> >> >
> >>> >> > Yes, exactly
> >>> >>
> >>> >> I tried, hang still occurs with this change
> >>> >
> >>> > We need to figure out why during the hang you still have all the LRs
> >>> > busy even if you are getting maintenance interrupts that should cause
> >>> > them to be cleared.
> >>> >
> >>>
> >>> I see that I have free LRs during maintenance interrupt
> >>>
> >>> (XEN) gic.c:871:d0v0 maintenance interrupt
> >>> (XEN) GICH_LRs (vcpu 0) mask=0
> >>> (XEN)    HW_LR[0]=9a015856
> >>> (XEN)    HW_LR[1]=0
> >>> (XEN)    HW_LR[2]=0
> >>> (XEN)    HW_LR[3]=0
> >>> (XEN) Inflight irq=86 lr=0
> >>> (XEN) Inflight irq=2 lr=255
> >>> (XEN) Pending irq=2
> >>>
> >>> But I see that after I got hang - maintenance interrupts are generated
> >>> continuously. Platform continues printing the same log till reboot.
> >>
> >> Exactly the same log? As in the one above you just pasted?
> >> That is very very suspicious.
> >
> > Yes exactly the same log. And looks like it means that LRs are flushed
> > correctly.
> >
> >>
> >> I am thinking that we are not handling GICH_HCR_UIE correctly and
> >> something we do in Xen, maybe writing to an LR register, might trigger a
> >> new maintenance interrupt immediately causing an infinite loop.
> >>
> >
> > Yes, this is what I'm thinking about. Taking in account all collected
> > debug info it looks like once LRs are overloaded with SGIs -
> > maintenance interrupt occurs.
> > And then it is not handled properly, and occurs again and again - so
> > platform hangs inside its handler.
> >
> >> Could you please try this patch? It disable GICH_HCR_UIE immediately on
> >> hypervisor entry.
> >>
> >
> > Now trying.
> >
> >>
> >> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> >> index 4d2a92d..6ae8dc4 100644
> >> --- a/xen/arch/arm/gic.c
> >> +++ b/xen/arch/arm/gic.c
> >> @@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v)
> >>      if ( is_idle_vcpu(v) )
> >>          return;
> >>
> >> +    GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> >> +
> >>      spin_lock_irqsave(&v->arch.vgic.lock, flags);
> >>
> >>      while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask),
> >> @@ -821,12 +823,8 @@ void gic_inject(void)
> >>
> >>      gic_restore_pending_irqs(current);
> >>
> >> -
> >>      if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
> >>          GICH[GICH_HCR] |= GICH_HCR_UIE;
> >> -    else
> >> -        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> >> -
> >>  }
> >>
> >>  static void do_sgi(struct cpu_user_regs *regs, int othercpu, enum gic_sgi sgi)
> >
> 
> Heh - I don't see hangs with this patch :) But also I see that
> maintenance interrupt doesn't occur (and no hang as result)
> Stefano - is this expected?

No maintenance interrupts at all? That's strange. You should be
receiving them when LRs are full and you still have interrupts pending
to be added to them.

You could add another printk here to see if you should be receiving
them:

     if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
+    {
+        gdprintk(XENLOG_DEBUG, "requesting maintenance interrupt\n");
         GICH[GICH_HCR] |= GICH_HCR_UIE;
-    else
-        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
-
+    }
 }


> >
> >
> > --
> >
> > Andrii Tseglytskyi | Embedded Dev
> > GlobalLogic
> > www.globallogic.com
> 
> 
> 
> -- 
> 
> Andrii Tseglytskyi | Embedded Dev
> GlobalLogic
> www.globallogic.com
> 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Xen 4.5 random freeze question
  2014-11-19 16:13                                                                   ` Stefano Stabellini
@ 2014-11-19 16:29                                                                     ` Andrii Tseglytskyi
  2014-11-19 16:32                                                                       ` Andrii Tseglytskyi
  2014-11-19 16:50                                                                       ` Stefano Stabellini
  0 siblings, 2 replies; 66+ messages in thread
From: Andrii Tseglytskyi @ 2014-11-19 16:29 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: Julien Grall, Ian Campbell, xen-devel

On Wed, Nov 19, 2014 at 6:13 PM, Stefano Stabellini
<stefano.stabellini@eu.citrix.com> wrote:
> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>> On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi
>> <andrii.tseglytskyi@globallogic.com> wrote:
>> > On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini
>> > <stefano.stabellini@eu.citrix.com> wrote:
>> >> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>> >>> Hi Stefano,
>> >>>
>> >>> On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini
>> >>> <stefano.stabellini@eu.citrix.com> wrote:
>> >>> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>> >>> >> Hi Stefano,
>> >>> >>
>> >>> >> > >      if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
>> >>> >> > > -        GICH[GICH_HCR] |= GICH_HCR_UIE;
>> >>> >> > > +        GICH[GICH_HCR] |= GICH_HCR_NPIE;
>> >>> >> > >      else
>> >>> >> > > -        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>> >>> >> > > +        GICH[GICH_HCR] &= ~GICH_HCR_NPIE;
>> >>> >> > >
>> >>> >> > >  }
>> >>> >> >
>> >>> >> > Yes, exactly
>> >>> >>
>> >>> >> I tried, hang still occurs with this change
>> >>> >
>> >>> > We need to figure out why during the hang you still have all the LRs
>> >>> > busy even if you are getting maintenance interrupts that should cause
>> >>> > them to be cleared.
>> >>> >
>> >>>
>> >>> I see that I have free LRs during maintenance interrupt
>> >>>
>> >>> (XEN) gic.c:871:d0v0 maintenance interrupt
>> >>> (XEN) GICH_LRs (vcpu 0) mask=0
>> >>> (XEN)    HW_LR[0]=9a015856
>> >>> (XEN)    HW_LR[1]=0
>> >>> (XEN)    HW_LR[2]=0
>> >>> (XEN)    HW_LR[3]=0
>> >>> (XEN) Inflight irq=86 lr=0
>> >>> (XEN) Inflight irq=2 lr=255
>> >>> (XEN) Pending irq=2
>> >>>
>> >>> But I see that after I got hang - maintenance interrupts are generated
>> >>> continuously. Platform continues printing the same log till reboot.
>> >>
>> >> Exactly the same log? As in the one above you just pasted?
>> >> That is very very suspicious.
>> >
>> > Yes exactly the same log. And looks like it means that LRs are flushed
>> > correctly.
>> >
>> >>
>> >> I am thinking that we are not handling GICH_HCR_UIE correctly and
>> >> something we do in Xen, maybe writing to an LR register, might trigger a
>> >> new maintenance interrupt immediately causing an infinite loop.
>> >>
>> >
>> > Yes, this is what I'm thinking about. Taking in account all collected
>> > debug info it looks like once LRs are overloaded with SGIs -
>> > maintenance interrupt occurs.
>> > And then it is not handled properly, and occurs again and again - so
>> > platform hangs inside its handler.
>> >
>> >> Could you please try this patch? It disable GICH_HCR_UIE immediately on
>> >> hypervisor entry.
>> >>
>> >
>> > Now trying.
>> >
>> >>
>> >> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
>> >> index 4d2a92d..6ae8dc4 100644
>> >> --- a/xen/arch/arm/gic.c
>> >> +++ b/xen/arch/arm/gic.c
>> >> @@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v)
>> >>      if ( is_idle_vcpu(v) )
>> >>          return;
>> >>
>> >> +    GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>> >> +
>> >>      spin_lock_irqsave(&v->arch.vgic.lock, flags);
>> >>
>> >>      while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask),
>> >> @@ -821,12 +823,8 @@ void gic_inject(void)
>> >>
>> >>      gic_restore_pending_irqs(current);
>> >>
>> >> -
>> >>      if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
>> >>          GICH[GICH_HCR] |= GICH_HCR_UIE;
>> >> -    else
>> >> -        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>> >> -
>> >>  }
>> >>
>> >>  static void do_sgi(struct cpu_user_regs *regs, int othercpu, enum gic_sgi sgi)
>> >
>>
>> Heh - I don't see hangs with this patch :) But also I see that
>> maintenance interrupt doesn't occur (and no hang as result)
>> Stefano - is this expected?
>
> No maintenance interrupts at all? That's strange. You should be
> receiving them when LRs are full and you still have interrupts pending
> to be added to them.
>
> You could add another printk here to see if you should be receiving
> them:
>
>      if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
> +    {
> +        gdprintk(XENLOG_DEBUG, "requesting maintenance interrupt\n");
>          GICH[GICH_HCR] |= GICH_HCR_UIE;
> -    else
> -        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> -
> +    }
>  }
>

Requested properly:

(XEN) gic.c:756:d0v0 requesting maintenance interrupt
(XEN) gic.c:756:d0v0 requesting maintenance interrupt
(XEN) gic.c:756:d0v0 requesting maintenance interrupt
(XEN) gic.c:756:d0v0 requesting maintenance interrupt
(XEN) gic.c:756:d0v0 requesting maintenance interrupt
(XEN) gic.c:756:d0v0 requesting maintenance interrupt
(XEN) gic.c:756:d0v0 requesting maintenance interrupt

But does not occur


>
>> >
>> >
>> > --
>> >
>> > Andrii Tseglytskyi | Embedded Dev
>> > GlobalLogic
>> > www.globallogic.com
>>
>>
>>
>> --
>>
>> Andrii Tseglytskyi | Embedded Dev
>> GlobalLogic
>> www.globallogic.com
>>



-- 

Andrii Tseglytskyi | Embedded Dev
GlobalLogic
www.globallogic.com

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Xen 4.5 random freeze question
  2014-11-19 16:29                                                                     ` Andrii Tseglytskyi
@ 2014-11-19 16:32                                                                       ` Andrii Tseglytskyi
  2014-11-19 16:43                                                                         ` Andrii Tseglytskyi
  2014-11-19 16:50                                                                       ` Stefano Stabellini
  1 sibling, 1 reply; 66+ messages in thread
From: Andrii Tseglytskyi @ 2014-11-19 16:32 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: Julien Grall, Ian Campbell, xen-devel

Gic dump during interrupt requesting:

(XEN) GICH_LRs (vcpu 0) mask=f
(XEN)    HW_LR[0]=3a00001f
(XEN)    HW_LR[1]=9a015856
(XEN)    HW_LR[2]=1a00001b
(XEN)    HW_LR[3]=9a00e439
(XEN) Inflight irq=31 lr=0
(XEN) Inflight irq=86 lr=1
(XEN) Inflight irq=27 lr=2
(XEN) Inflight irq=57 lr=3
(XEN) Inflight irq=2 lr=255
(XEN) Pending irq=2

On Wed, Nov 19, 2014 at 6:29 PM, Andrii Tseglytskyi
<andrii.tseglytskyi@globallogic.com> wrote:
> On Wed, Nov 19, 2014 at 6:13 PM, Stefano Stabellini
> <stefano.stabellini@eu.citrix.com> wrote:
>> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>>> On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi
>>> <andrii.tseglytskyi@globallogic.com> wrote:
>>> > On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini
>>> > <stefano.stabellini@eu.citrix.com> wrote:
>>> >> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>>> >>> Hi Stefano,
>>> >>>
>>> >>> On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini
>>> >>> <stefano.stabellini@eu.citrix.com> wrote:
>>> >>> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>>> >>> >> Hi Stefano,
>>> >>> >>
>>> >>> >> > >      if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
>>> >>> >> > > -        GICH[GICH_HCR] |= GICH_HCR_UIE;
>>> >>> >> > > +        GICH[GICH_HCR] |= GICH_HCR_NPIE;
>>> >>> >> > >      else
>>> >>> >> > > -        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>>> >>> >> > > +        GICH[GICH_HCR] &= ~GICH_HCR_NPIE;
>>> >>> >> > >
>>> >>> >> > >  }
>>> >>> >> >
>>> >>> >> > Yes, exactly
>>> >>> >>
>>> >>> >> I tried, hang still occurs with this change
>>> >>> >
>>> >>> > We need to figure out why during the hang you still have all the LRs
>>> >>> > busy even if you are getting maintenance interrupts that should cause
>>> >>> > them to be cleared.
>>> >>> >
>>> >>>
>>> >>> I see that I have free LRs during maintenance interrupt
>>> >>>
>>> >>> (XEN) gic.c:871:d0v0 maintenance interrupt
>>> >>> (XEN) GICH_LRs (vcpu 0) mask=0
>>> >>> (XEN)    HW_LR[0]=9a015856
>>> >>> (XEN)    HW_LR[1]=0
>>> >>> (XEN)    HW_LR[2]=0
>>> >>> (XEN)    HW_LR[3]=0
>>> >>> (XEN) Inflight irq=86 lr=0
>>> >>> (XEN) Inflight irq=2 lr=255
>>> >>> (XEN) Pending irq=2
>>> >>>
>>> >>> But I see that after I got hang - maintenance interrupts are generated
>>> >>> continuously. Platform continues printing the same log till reboot.
>>> >>
>>> >> Exactly the same log? As in the one above you just pasted?
>>> >> That is very very suspicious.
>>> >
>>> > Yes exactly the same log. And looks like it means that LRs are flushed
>>> > correctly.
>>> >
>>> >>
>>> >> I am thinking that we are not handling GICH_HCR_UIE correctly and
>>> >> something we do in Xen, maybe writing to an LR register, might trigger a
>>> >> new maintenance interrupt immediately causing an infinite loop.
>>> >>
>>> >
>>> > Yes, this is what I'm thinking about. Taking in account all collected
>>> > debug info it looks like once LRs are overloaded with SGIs -
>>> > maintenance interrupt occurs.
>>> > And then it is not handled properly, and occurs again and again - so
>>> > platform hangs inside its handler.
>>> >
>>> >> Could you please try this patch? It disable GICH_HCR_UIE immediately on
>>> >> hypervisor entry.
>>> >>
>>> >
>>> > Now trying.
>>> >
>>> >>
>>> >> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
>>> >> index 4d2a92d..6ae8dc4 100644
>>> >> --- a/xen/arch/arm/gic.c
>>> >> +++ b/xen/arch/arm/gic.c
>>> >> @@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v)
>>> >>      if ( is_idle_vcpu(v) )
>>> >>          return;
>>> >>
>>> >> +    GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>>> >> +
>>> >>      spin_lock_irqsave(&v->arch.vgic.lock, flags);
>>> >>
>>> >>      while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask),
>>> >> @@ -821,12 +823,8 @@ void gic_inject(void)
>>> >>
>>> >>      gic_restore_pending_irqs(current);
>>> >>
>>> >> -
>>> >>      if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
>>> >>          GICH[GICH_HCR] |= GICH_HCR_UIE;
>>> >> -    else
>>> >> -        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>>> >> -
>>> >>  }
>>> >>
>>> >>  static void do_sgi(struct cpu_user_regs *regs, int othercpu, enum gic_sgi sgi)
>>> >
>>>
>>> Heh - I don't see hangs with this patch :) But also I see that
>>> maintenance interrupt doesn't occur (and no hang as result)
>>> Stefano - is this expected?
>>
>> No maintenance interrupts at all? That's strange. You should be
>> receiving them when LRs are full and you still have interrupts pending
>> to be added to them.
>>
>> You could add another printk here to see if you should be receiving
>> them:
>>
>>      if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
>> +    {
>> +        gdprintk(XENLOG_DEBUG, "requesting maintenance interrupt\n");
>>          GICH[GICH_HCR] |= GICH_HCR_UIE;
>> -    else
>> -        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>> -
>> +    }
>>  }
>>
>
> Requested properly:
>
> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
>
> But does not occur
>
>
>>
>>> >
>>> >
>>> > --
>>> >
>>> > Andrii Tseglytskyi | Embedded Dev
>>> > GlobalLogic
>>> > www.globallogic.com
>>>
>>>
>>>
>>> --
>>>
>>> Andrii Tseglytskyi | Embedded Dev
>>> GlobalLogic
>>> www.globallogic.com
>>>
>
>
>
> --
>
> Andrii Tseglytskyi | Embedded Dev
> GlobalLogic
> www.globallogic.com



-- 

Andrii Tseglytskyi | Embedded Dev
GlobalLogic
www.globallogic.com

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Xen 4.5 random freeze question
  2014-11-19 16:32                                                                       ` Andrii Tseglytskyi
@ 2014-11-19 16:43                                                                         ` Andrii Tseglytskyi
  2014-11-19 16:52                                                                           ` Stefano Stabellini
  0 siblings, 1 reply; 66+ messages in thread
From: Andrii Tseglytskyi @ 2014-11-19 16:43 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: Julien Grall, Ian Campbell, xen-devel

BTW - shouldn't this flag GICH_LR_MAINTENANCE_IRQ be set after
maintenance interrupt requesting ?

On Wed, Nov 19, 2014 at 6:32 PM, Andrii Tseglytskyi
<andrii.tseglytskyi@globallogic.com> wrote:
> Gic dump during interrupt requesting:
>
> (XEN) GICH_LRs (vcpu 0) mask=f
> (XEN)    HW_LR[0]=3a00001f
> (XEN)    HW_LR[1]=9a015856
> (XEN)    HW_LR[2]=1a00001b
> (XEN)    HW_LR[3]=9a00e439
> (XEN) Inflight irq=31 lr=0
> (XEN) Inflight irq=86 lr=1
> (XEN) Inflight irq=27 lr=2
> (XEN) Inflight irq=57 lr=3
> (XEN) Inflight irq=2 lr=255
> (XEN) Pending irq=2
>
> On Wed, Nov 19, 2014 at 6:29 PM, Andrii Tseglytskyi
> <andrii.tseglytskyi@globallogic.com> wrote:
>> On Wed, Nov 19, 2014 at 6:13 PM, Stefano Stabellini
>> <stefano.stabellini@eu.citrix.com> wrote:
>>> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>>>> On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi
>>>> <andrii.tseglytskyi@globallogic.com> wrote:
>>>> > On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini
>>>> > <stefano.stabellini@eu.citrix.com> wrote:
>>>> >> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>>>> >>> Hi Stefano,
>>>> >>>
>>>> >>> On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini
>>>> >>> <stefano.stabellini@eu.citrix.com> wrote:
>>>> >>> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>>>> >>> >> Hi Stefano,
>>>> >>> >>
>>>> >>> >> > >      if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
>>>> >>> >> > > -        GICH[GICH_HCR] |= GICH_HCR_UIE;
>>>> >>> >> > > +        GICH[GICH_HCR] |= GICH_HCR_NPIE;
>>>> >>> >> > >      else
>>>> >>> >> > > -        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>>>> >>> >> > > +        GICH[GICH_HCR] &= ~GICH_HCR_NPIE;
>>>> >>> >> > >
>>>> >>> >> > >  }
>>>> >>> >> >
>>>> >>> >> > Yes, exactly
>>>> >>> >>
>>>> >>> >> I tried, hang still occurs with this change
>>>> >>> >
>>>> >>> > We need to figure out why during the hang you still have all the LRs
>>>> >>> > busy even if you are getting maintenance interrupts that should cause
>>>> >>> > them to be cleared.
>>>> >>> >
>>>> >>>
>>>> >>> I see that I have free LRs during maintenance interrupt
>>>> >>>
>>>> >>> (XEN) gic.c:871:d0v0 maintenance interrupt
>>>> >>> (XEN) GICH_LRs (vcpu 0) mask=0
>>>> >>> (XEN)    HW_LR[0]=9a015856
>>>> >>> (XEN)    HW_LR[1]=0
>>>> >>> (XEN)    HW_LR[2]=0
>>>> >>> (XEN)    HW_LR[3]=0
>>>> >>> (XEN) Inflight irq=86 lr=0
>>>> >>> (XEN) Inflight irq=2 lr=255
>>>> >>> (XEN) Pending irq=2
>>>> >>>
>>>> >>> But I see that after I got hang - maintenance interrupts are generated
>>>> >>> continuously. Platform continues printing the same log till reboot.
>>>> >>
>>>> >> Exactly the same log? As in the one above you just pasted?
>>>> >> That is very very suspicious.
>>>> >
>>>> > Yes exactly the same log. And looks like it means that LRs are flushed
>>>> > correctly.
>>>> >
>>>> >>
>>>> >> I am thinking that we are not handling GICH_HCR_UIE correctly and
>>>> >> something we do in Xen, maybe writing to an LR register, might trigger a
>>>> >> new maintenance interrupt immediately causing an infinite loop.
>>>> >>
>>>> >
>>>> > Yes, this is what I'm thinking about. Taking in account all collected
>>>> > debug info it looks like once LRs are overloaded with SGIs -
>>>> > maintenance interrupt occurs.
>>>> > And then it is not handled properly, and occurs again and again - so
>>>> > platform hangs inside its handler.
>>>> >
>>>> >> Could you please try this patch? It disable GICH_HCR_UIE immediately on
>>>> >> hypervisor entry.
>>>> >>
>>>> >
>>>> > Now trying.
>>>> >
>>>> >>
>>>> >> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
>>>> >> index 4d2a92d..6ae8dc4 100644
>>>> >> --- a/xen/arch/arm/gic.c
>>>> >> +++ b/xen/arch/arm/gic.c
>>>> >> @@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v)
>>>> >>      if ( is_idle_vcpu(v) )
>>>> >>          return;
>>>> >>
>>>> >> +    GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>>>> >> +
>>>> >>      spin_lock_irqsave(&v->arch.vgic.lock, flags);
>>>> >>
>>>> >>      while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask),
>>>> >> @@ -821,12 +823,8 @@ void gic_inject(void)
>>>> >>
>>>> >>      gic_restore_pending_irqs(current);
>>>> >>
>>>> >> -
>>>> >>      if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
>>>> >>          GICH[GICH_HCR] |= GICH_HCR_UIE;
>>>> >> -    else
>>>> >> -        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>>>> >> -
>>>> >>  }
>>>> >>
>>>> >>  static void do_sgi(struct cpu_user_regs *regs, int othercpu, enum gic_sgi sgi)
>>>> >
>>>>
>>>> Heh - I don't see hangs with this patch :) But also I see that
>>>> maintenance interrupt doesn't occur (and no hang as result)
>>>> Stefano - is this expected?
>>>
>>> No maintenance interrupts at all? That's strange. You should be
>>> receiving them when LRs are full and you still have interrupts pending
>>> to be added to them.
>>>
>>> You could add another printk here to see if you should be receiving
>>> them:
>>>
>>>      if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
>>> +    {
>>> +        gdprintk(XENLOG_DEBUG, "requesting maintenance interrupt\n");
>>>          GICH[GICH_HCR] |= GICH_HCR_UIE;
>>> -    else
>>> -        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>>> -
>>> +    }
>>>  }
>>>
>>
>> Requested properly:
>>
>> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
>> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
>> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
>> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
>> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
>> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
>> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
>>
>> But does not occur
>>
>>
>>>
>>>> >
>>>> >
>>>> > --
>>>> >
>>>> > Andrii Tseglytskyi | Embedded Dev
>>>> > GlobalLogic
>>>> > www.globallogic.com
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Andrii Tseglytskyi | Embedded Dev
>>>> GlobalLogic
>>>> www.globallogic.com
>>>>
>>
>>
>>
>> --
>>
>> Andrii Tseglytskyi | Embedded Dev
>> GlobalLogic
>> www.globallogic.com
>
>
>
> --
>
> Andrii Tseglytskyi | Embedded Dev
> GlobalLogic
> www.globallogic.com



-- 

Andrii Tseglytskyi | Embedded Dev
GlobalLogic
www.globallogic.com

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Xen 4.5 random freeze question
  2014-11-19 16:29                                                                     ` Andrii Tseglytskyi
  2014-11-19 16:32                                                                       ` Andrii Tseglytskyi
@ 2014-11-19 16:50                                                                       ` Stefano Stabellini
  2014-11-19 17:03                                                                         ` Andrii Tseglytskyi
  1 sibling, 1 reply; 66+ messages in thread
From: Stefano Stabellini @ 2014-11-19 16:50 UTC (permalink / raw)
  To: Andrii Tseglytskyi
  Cc: Julien Grall, xen-devel, Ian Campbell, Stefano Stabellini

On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> On Wed, Nov 19, 2014 at 6:13 PM, Stefano Stabellini
> <stefano.stabellini@eu.citrix.com> wrote:
> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> >> On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi
> >> <andrii.tseglytskyi@globallogic.com> wrote:
> >> > On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini
> >> > <stefano.stabellini@eu.citrix.com> wrote:
> >> >> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> >> >>> Hi Stefano,
> >> >>>
> >> >>> On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini
> >> >>> <stefano.stabellini@eu.citrix.com> wrote:
> >> >>> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> >> >>> >> Hi Stefano,
> >> >>> >>
> >> >>> >> > >      if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
> >> >>> >> > > -        GICH[GICH_HCR] |= GICH_HCR_UIE;
> >> >>> >> > > +        GICH[GICH_HCR] |= GICH_HCR_NPIE;
> >> >>> >> > >      else
> >> >>> >> > > -        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> >> >>> >> > > +        GICH[GICH_HCR] &= ~GICH_HCR_NPIE;
> >> >>> >> > >
> >> >>> >> > >  }
> >> >>> >> >
> >> >>> >> > Yes, exactly
> >> >>> >>
> >> >>> >> I tried, hang still occurs with this change
> >> >>> >
> >> >>> > We need to figure out why during the hang you still have all the LRs
> >> >>> > busy even if you are getting maintenance interrupts that should cause
> >> >>> > them to be cleared.
> >> >>> >
> >> >>>
> >> >>> I see that I have free LRs during maintenance interrupt
> >> >>>
> >> >>> (XEN) gic.c:871:d0v0 maintenance interrupt
> >> >>> (XEN) GICH_LRs (vcpu 0) mask=0
> >> >>> (XEN)    HW_LR[0]=9a015856
> >> >>> (XEN)    HW_LR[1]=0
> >> >>> (XEN)    HW_LR[2]=0
> >> >>> (XEN)    HW_LR[3]=0
> >> >>> (XEN) Inflight irq=86 lr=0
> >> >>> (XEN) Inflight irq=2 lr=255
> >> >>> (XEN) Pending irq=2
> >> >>>
> >> >>> But I see that after I got hang - maintenance interrupts are generated
> >> >>> continuously. Platform continues printing the same log till reboot.
> >> >>
> >> >> Exactly the same log? As in the one above you just pasted?
> >> >> That is very very suspicious.
> >> >
> >> > Yes exactly the same log. And looks like it means that LRs are flushed
> >> > correctly.
> >> >
> >> >>
> >> >> I am thinking that we are not handling GICH_HCR_UIE correctly and
> >> >> something we do in Xen, maybe writing to an LR register, might trigger a
> >> >> new maintenance interrupt immediately causing an infinite loop.
> >> >>
> >> >
> >> > Yes, this is what I'm thinking about. Taking in account all collected
> >> > debug info it looks like once LRs are overloaded with SGIs -
> >> > maintenance interrupt occurs.
> >> > And then it is not handled properly, and occurs again and again - so
> >> > platform hangs inside its handler.
> >> >
> >> >> Could you please try this patch? It disable GICH_HCR_UIE immediately on
> >> >> hypervisor entry.
> >> >>
> >> >
> >> > Now trying.
> >> >
> >> >>
> >> >> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> >> >> index 4d2a92d..6ae8dc4 100644
> >> >> --- a/xen/arch/arm/gic.c
> >> >> +++ b/xen/arch/arm/gic.c
> >> >> @@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v)
> >> >>      if ( is_idle_vcpu(v) )
> >> >>          return;
> >> >>
> >> >> +    GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> >> >> +
> >> >>      spin_lock_irqsave(&v->arch.vgic.lock, flags);
> >> >>
> >> >>      while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask),
> >> >> @@ -821,12 +823,8 @@ void gic_inject(void)
> >> >>
> >> >>      gic_restore_pending_irqs(current);
> >> >>
> >> >> -
> >> >>      if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
> >> >>          GICH[GICH_HCR] |= GICH_HCR_UIE;
> >> >> -    else
> >> >> -        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> >> >> -
> >> >>  }
> >> >>
> >> >>  static void do_sgi(struct cpu_user_regs *regs, int othercpu, enum gic_sgi sgi)
> >> >
> >>
> >> Heh - I don't see hangs with this patch :) But also I see that
> >> maintenance interrupt doesn't occur (and no hang as result)
> >> Stefano - is this expected?
> >
> > No maintenance interrupts at all? That's strange. You should be
> > receiving them when LRs are full and you still have interrupts pending
> > to be added to them.
> >
> > You could add another printk here to see if you should be receiving
> > them:
> >
> >      if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
> > +    {
> > +        gdprintk(XENLOG_DEBUG, "requesting maintenance interrupt\n");
> >          GICH[GICH_HCR] |= GICH_HCR_UIE;
> > -    else
> > -        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> > -
> > +    }
> >  }
> >
> 
> Requested properly:
> 
> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> 
> But does not occur

OK, let's see what's going on then by printing the irq number of the
maintenance interrupt:

diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index 4d2a92d..fed3167 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -55,6 +55,7 @@ static struct {
 static DEFINE_PER_CPU(uint64_t, lr_mask);
 
 static uint8_t nr_lrs;
+static bool uie_on;
 #define lr_all_full() (this_cpu(lr_mask) == ((1 << nr_lrs) - 1))
 
 /* The GIC mapping of CPU interfaces does not necessarily match the
@@ -694,6 +695,7 @@ void gic_clear_lrs(struct vcpu *v)
 {
     int i = 0;
     unsigned long flags;
+    unsigned long hcr;
 
     /* The idle domain has no LRs to be cleared. Since gic_restore_state
      * doesn't write any LR registers for the idle domain they could be
@@ -701,6 +703,13 @@ void gic_clear_lrs(struct vcpu *v)
     if ( is_idle_vcpu(v) )
         return;
 
+    hcr = GICH[GICH_HCR];
+    if ( hcr & GICH_HCR_UIE )
+    {
+        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
+        uie_on = 1;
+    }
+
     spin_lock_irqsave(&v->arch.vgic.lock, flags);
 
     while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask),
@@ -865,6 +873,11 @@ void gic_interrupt(struct cpu_user_regs *regs, int is_fiq)
         intack = GICC[GICC_IAR];
         irq = intack & GICC_IA_IRQ;
 
+        if ( uie_on )
+        {
+            uie_on = 0;
+            printk("received maintenance interrupt irq=%d\n", irq);
+        }
         if ( likely(irq >= 16 && irq < 1021) )
         {
             local_irq_enable();

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: Xen 4.5 random freeze question
  2014-11-19 16:43                                                                         ` Andrii Tseglytskyi
@ 2014-11-19 16:52                                                                           ` Stefano Stabellini
  0 siblings, 0 replies; 66+ messages in thread
From: Stefano Stabellini @ 2014-11-19 16:52 UTC (permalink / raw)
  To: Andrii Tseglytskyi
  Cc: Julien Grall, xen-devel, Ian Campbell, Stefano Stabellini

No, that's for requesting a maintenance interrupt for a specific irq
when it is EOI'ed by the guest.

In our case we are requesting maintenance interrupts via UIE: a single
global maintenance interrupt when most LRs become free.

On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> BTW - shouldn't this flag GICH_LR_MAINTENANCE_IRQ be set after
> maintenance interrupt requesting ?
> 
> On Wed, Nov 19, 2014 at 6:32 PM, Andrii Tseglytskyi
> <andrii.tseglytskyi@globallogic.com> wrote:
> > Gic dump during interrupt requesting:
> >
> > (XEN) GICH_LRs (vcpu 0) mask=f
> > (XEN)    HW_LR[0]=3a00001f
> > (XEN)    HW_LR[1]=9a015856
> > (XEN)    HW_LR[2]=1a00001b
> > (XEN)    HW_LR[3]=9a00e439
> > (XEN) Inflight irq=31 lr=0
> > (XEN) Inflight irq=86 lr=1
> > (XEN) Inflight irq=27 lr=2
> > (XEN) Inflight irq=57 lr=3
> > (XEN) Inflight irq=2 lr=255
> > (XEN) Pending irq=2
> >
> > On Wed, Nov 19, 2014 at 6:29 PM, Andrii Tseglytskyi
> > <andrii.tseglytskyi@globallogic.com> wrote:
> >> On Wed, Nov 19, 2014 at 6:13 PM, Stefano Stabellini
> >> <stefano.stabellini@eu.citrix.com> wrote:
> >>> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> >>>> On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi
> >>>> <andrii.tseglytskyi@globallogic.com> wrote:
> >>>> > On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini
> >>>> > <stefano.stabellini@eu.citrix.com> wrote:
> >>>> >> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> >>>> >>> Hi Stefano,
> >>>> >>>
> >>>> >>> On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini
> >>>> >>> <stefano.stabellini@eu.citrix.com> wrote:
> >>>> >>> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> >>>> >>> >> Hi Stefano,
> >>>> >>> >>
> >>>> >>> >> > >      if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
> >>>> >>> >> > > -        GICH[GICH_HCR] |= GICH_HCR_UIE;
> >>>> >>> >> > > +        GICH[GICH_HCR] |= GICH_HCR_NPIE;
> >>>> >>> >> > >      else
> >>>> >>> >> > > -        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> >>>> >>> >> > > +        GICH[GICH_HCR] &= ~GICH_HCR_NPIE;
> >>>> >>> >> > >
> >>>> >>> >> > >  }
> >>>> >>> >> >
> >>>> >>> >> > Yes, exactly
> >>>> >>> >>
> >>>> >>> >> I tried, hang still occurs with this change
> >>>> >>> >
> >>>> >>> > We need to figure out why during the hang you still have all the LRs
> >>>> >>> > busy even if you are getting maintenance interrupts that should cause
> >>>> >>> > them to be cleared.
> >>>> >>> >
> >>>> >>>
> >>>> >>> I see that I have free LRs during maintenance interrupt
> >>>> >>>
> >>>> >>> (XEN) gic.c:871:d0v0 maintenance interrupt
> >>>> >>> (XEN) GICH_LRs (vcpu 0) mask=0
> >>>> >>> (XEN)    HW_LR[0]=9a015856
> >>>> >>> (XEN)    HW_LR[1]=0
> >>>> >>> (XEN)    HW_LR[2]=0
> >>>> >>> (XEN)    HW_LR[3]=0
> >>>> >>> (XEN) Inflight irq=86 lr=0
> >>>> >>> (XEN) Inflight irq=2 lr=255
> >>>> >>> (XEN) Pending irq=2
> >>>> >>>
> >>>> >>> But I see that after I got hang - maintenance interrupts are generated
> >>>> >>> continuously. Platform continues printing the same log till reboot.
> >>>> >>
> >>>> >> Exactly the same log? As in the one above you just pasted?
> >>>> >> That is very very suspicious.
> >>>> >
> >>>> > Yes exactly the same log. And looks like it means that LRs are flushed
> >>>> > correctly.
> >>>> >
> >>>> >>
> >>>> >> I am thinking that we are not handling GICH_HCR_UIE correctly and
> >>>> >> something we do in Xen, maybe writing to an LR register, might trigger a
> >>>> >> new maintenance interrupt immediately causing an infinite loop.
> >>>> >>
> >>>> >
> >>>> > Yes, this is what I'm thinking about. Taking in account all collected
> >>>> > debug info it looks like once LRs are overloaded with SGIs -
> >>>> > maintenance interrupt occurs.
> >>>> > And then it is not handled properly, and occurs again and again - so
> >>>> > platform hangs inside its handler.
> >>>> >
> >>>> >> Could you please try this patch? It disable GICH_HCR_UIE immediately on
> >>>> >> hypervisor entry.
> >>>> >>
> >>>> >
> >>>> > Now trying.
> >>>> >
> >>>> >>
> >>>> >> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> >>>> >> index 4d2a92d..6ae8dc4 100644
> >>>> >> --- a/xen/arch/arm/gic.c
> >>>> >> +++ b/xen/arch/arm/gic.c
> >>>> >> @@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v)
> >>>> >>      if ( is_idle_vcpu(v) )
> >>>> >>          return;
> >>>> >>
> >>>> >> +    GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> >>>> >> +
> >>>> >>      spin_lock_irqsave(&v->arch.vgic.lock, flags);
> >>>> >>
> >>>> >>      while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask),
> >>>> >> @@ -821,12 +823,8 @@ void gic_inject(void)
> >>>> >>
> >>>> >>      gic_restore_pending_irqs(current);
> >>>> >>
> >>>> >> -
> >>>> >>      if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
> >>>> >>          GICH[GICH_HCR] |= GICH_HCR_UIE;
> >>>> >> -    else
> >>>> >> -        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> >>>> >> -
> >>>> >>  }
> >>>> >>
> >>>> >>  static void do_sgi(struct cpu_user_regs *regs, int othercpu, enum gic_sgi sgi)
> >>>> >
> >>>>
> >>>> Heh - I don't see hangs with this patch :) But also I see that
> >>>> maintenance interrupt doesn't occur (and no hang as result)
> >>>> Stefano - is this expected?
> >>>
> >>> No maintenance interrupts at all? That's strange. You should be
> >>> receiving them when LRs are full and you still have interrupts pending
> >>> to be added to them.
> >>>
> >>> You could add another printk here to see if you should be receiving
> >>> them:
> >>>
> >>>      if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
> >>> +    {
> >>> +        gdprintk(XENLOG_DEBUG, "requesting maintenance interrupt\n");
> >>>          GICH[GICH_HCR] |= GICH_HCR_UIE;
> >>> -    else
> >>> -        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> >>> -
> >>> +    }
> >>>  }
> >>>
> >>
> >> Requested properly:
> >>
> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> >>
> >> But does not occur
> >>
> >>
> >>>
> >>>> >
> >>>> >
> >>>> > --
> >>>> >
> >>>> > Andrii Tseglytskyi | Embedded Dev
> >>>> > GlobalLogic
> >>>> > www.globallogic.com
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>>
> >>>> Andrii Tseglytskyi | Embedded Dev
> >>>> GlobalLogic
> >>>> www.globallogic.com
> >>>>
> >>
> >>
> >>
> >> --
> >>
> >> Andrii Tseglytskyi | Embedded Dev
> >> GlobalLogic
> >> www.globallogic.com
> >
> >
> >
> > --
> >
> > Andrii Tseglytskyi | Embedded Dev
> > GlobalLogic
> > www.globallogic.com
> 
> 
> 
> -- 
> 
> Andrii Tseglytskyi | Embedded Dev
> GlobalLogic
> www.globallogic.com
> 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Xen 4.5 random freeze question
  2014-11-19 16:50                                                                       ` Stefano Stabellini
@ 2014-11-19 17:03                                                                         ` Andrii Tseglytskyi
  2014-11-19 17:07                                                                           ` Stefano Stabellini
  2014-11-19 17:11                                                                           ` Andrii Tseglytskyi
  0 siblings, 2 replies; 66+ messages in thread
From: Andrii Tseglytskyi @ 2014-11-19 17:03 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: Julien Grall, Ian Campbell, xen-devel

I got this strange log:

(XEN) received maintenance interrupt irq=1023

And platform does not hang due to this:
+    hcr = GICH[GICH_HCR];
+    if ( hcr & GICH_HCR_UIE )
+    {
+        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
+        uie_on = 1;
+    }

On Wed, Nov 19, 2014 at 6:50 PM, Stefano Stabellini
<stefano.stabellini@eu.citrix.com> wrote:
> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>> On Wed, Nov 19, 2014 at 6:13 PM, Stefano Stabellini
>> <stefano.stabellini@eu.citrix.com> wrote:
>> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>> >> On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi
>> >> <andrii.tseglytskyi@globallogic.com> wrote:
>> >> > On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini
>> >> > <stefano.stabellini@eu.citrix.com> wrote:
>> >> >> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>> >> >>> Hi Stefano,
>> >> >>>
>> >> >>> On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini
>> >> >>> <stefano.stabellini@eu.citrix.com> wrote:
>> >> >>> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>> >> >>> >> Hi Stefano,
>> >> >>> >>
>> >> >>> >> > >      if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
>> >> >>> >> > > -        GICH[GICH_HCR] |= GICH_HCR_UIE;
>> >> >>> >> > > +        GICH[GICH_HCR] |= GICH_HCR_NPIE;
>> >> >>> >> > >      else
>> >> >>> >> > > -        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>> >> >>> >> > > +        GICH[GICH_HCR] &= ~GICH_HCR_NPIE;
>> >> >>> >> > >
>> >> >>> >> > >  }
>> >> >>> >> >
>> >> >>> >> > Yes, exactly
>> >> >>> >>
>> >> >>> >> I tried, hang still occurs with this change
>> >> >>> >
>> >> >>> > We need to figure out why during the hang you still have all the LRs
>> >> >>> > busy even if you are getting maintenance interrupts that should cause
>> >> >>> > them to be cleared.
>> >> >>> >
>> >> >>>
>> >> >>> I see that I have free LRs during maintenance interrupt
>> >> >>>
>> >> >>> (XEN) gic.c:871:d0v0 maintenance interrupt
>> >> >>> (XEN) GICH_LRs (vcpu 0) mask=0
>> >> >>> (XEN)    HW_LR[0]=9a015856
>> >> >>> (XEN)    HW_LR[1]=0
>> >> >>> (XEN)    HW_LR[2]=0
>> >> >>> (XEN)    HW_LR[3]=0
>> >> >>> (XEN) Inflight irq=86 lr=0
>> >> >>> (XEN) Inflight irq=2 lr=255
>> >> >>> (XEN) Pending irq=2
>> >> >>>
>> >> >>> But I see that after I got hang - maintenance interrupts are generated
>> >> >>> continuously. Platform continues printing the same log till reboot.
>> >> >>
>> >> >> Exactly the same log? As in the one above you just pasted?
>> >> >> That is very very suspicious.
>> >> >
>> >> > Yes exactly the same log. And looks like it means that LRs are flushed
>> >> > correctly.
>> >> >
>> >> >>
>> >> >> I am thinking that we are not handling GICH_HCR_UIE correctly and
>> >> >> something we do in Xen, maybe writing to an LR register, might trigger a
>> >> >> new maintenance interrupt immediately causing an infinite loop.
>> >> >>
>> >> >
>> >> > Yes, this is what I'm thinking about. Taking in account all collected
>> >> > debug info it looks like once LRs are overloaded with SGIs -
>> >> > maintenance interrupt occurs.
>> >> > And then it is not handled properly, and occurs again and again - so
>> >> > platform hangs inside its handler.
>> >> >
>> >> >> Could you please try this patch? It disable GICH_HCR_UIE immediately on
>> >> >> hypervisor entry.
>> >> >>
>> >> >
>> >> > Now trying.
>> >> >
>> >> >>
>> >> >> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
>> >> >> index 4d2a92d..6ae8dc4 100644
>> >> >> --- a/xen/arch/arm/gic.c
>> >> >> +++ b/xen/arch/arm/gic.c
>> >> >> @@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v)
>> >> >>      if ( is_idle_vcpu(v) )
>> >> >>          return;
>> >> >>
>> >> >> +    GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>> >> >> +
>> >> >>      spin_lock_irqsave(&v->arch.vgic.lock, flags);
>> >> >>
>> >> >>      while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask),
>> >> >> @@ -821,12 +823,8 @@ void gic_inject(void)
>> >> >>
>> >> >>      gic_restore_pending_irqs(current);
>> >> >>
>> >> >> -
>> >> >>      if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
>> >> >>          GICH[GICH_HCR] |= GICH_HCR_UIE;
>> >> >> -    else
>> >> >> -        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>> >> >> -
>> >> >>  }
>> >> >>
>> >> >>  static void do_sgi(struct cpu_user_regs *regs, int othercpu, enum gic_sgi sgi)
>> >> >
>> >>
>> >> Heh - I don't see hangs with this patch :) But also I see that
>> >> maintenance interrupt doesn't occur (and no hang as result)
>> >> Stefano - is this expected?
>> >
>> > No maintenance interrupts at all? That's strange. You should be
>> > receiving them when LRs are full and you still have interrupts pending
>> > to be added to them.
>> >
>> > You could add another printk here to see if you should be receiving
>> > them:
>> >
>> >      if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
>> > +    {
>> > +        gdprintk(XENLOG_DEBUG, "requesting maintenance interrupt\n");
>> >          GICH[GICH_HCR] |= GICH_HCR_UIE;
>> > -    else
>> > -        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>> > -
>> > +    }
>> >  }
>> >
>>
>> Requested properly:
>>
>> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
>> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
>> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
>> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
>> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
>> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
>> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
>>
>> But does not occur
>
> OK, let's see what's going on then by printing the irq number of the
> maintenance interrupt:
>
> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> index 4d2a92d..fed3167 100644
> --- a/xen/arch/arm/gic.c
> +++ b/xen/arch/arm/gic.c
> @@ -55,6 +55,7 @@ static struct {
>  static DEFINE_PER_CPU(uint64_t, lr_mask);
>
>  static uint8_t nr_lrs;
> +static bool uie_on;
>  #define lr_all_full() (this_cpu(lr_mask) == ((1 << nr_lrs) - 1))
>
>  /* The GIC mapping of CPU interfaces does not necessarily match the
> @@ -694,6 +695,7 @@ void gic_clear_lrs(struct vcpu *v)
>  {
>      int i = 0;
>      unsigned long flags;
> +    unsigned long hcr;
>
>      /* The idle domain has no LRs to be cleared. Since gic_restore_state
>       * doesn't write any LR registers for the idle domain they could be
> @@ -701,6 +703,13 @@ void gic_clear_lrs(struct vcpu *v)
>      if ( is_idle_vcpu(v) )
>          return;
>
> +    hcr = GICH[GICH_HCR];
> +    if ( hcr & GICH_HCR_UIE )
> +    {
> +        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> +        uie_on = 1;
> +    }
> +
>      spin_lock_irqsave(&v->arch.vgic.lock, flags);
>
>      while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask),
> @@ -865,6 +873,11 @@ void gic_interrupt(struct cpu_user_regs *regs, int is_fiq)
>          intack = GICC[GICC_IAR];
>          irq = intack & GICC_IA_IRQ;
>
> +        if ( uie_on )
> +        {
> +            uie_on = 0;
> +            printk("received maintenance interrupt irq=%d\n", irq);
> +        }
>          if ( likely(irq >= 16 && irq < 1021) )
>          {
>              local_irq_enable();



-- 

Andrii Tseglytskyi | Embedded Dev
GlobalLogic
www.globallogic.com

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Xen 4.5 random freeze question
  2014-11-19 17:03                                                                         ` Andrii Tseglytskyi
@ 2014-11-19 17:07                                                                           ` Stefano Stabellini
  2014-11-19 17:37                                                                             ` Andrii Tseglytskyi
  2014-11-19 17:11                                                                           ` Andrii Tseglytskyi
  1 sibling, 1 reply; 66+ messages in thread
From: Stefano Stabellini @ 2014-11-19 17:07 UTC (permalink / raw)
  To: Andrii Tseglytskyi
  Cc: Julien Grall, xen-devel, Ian Campbell, Stefano Stabellini

I think that's OK: it looks like that on your board for some reasons
when UIE is set you get irq 1023 (spurious interrupt) instead of your
normal maintenance interrupt.

But everything should work anyway without issues.

This is the same patch as before but on top of the lastest xen-unstable
tree. Please confirm if it works.

diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index 70d10d6..df140b9 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -403,6 +403,8 @@ void gic_clear_lrs(struct vcpu *v)
     if ( is_idle_vcpu(v) )
         return;
 
+    gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 0);
+
     spin_lock_irqsave(&v->arch.vgic.lock, flags);
 
     while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask),
@@ -527,8 +529,6 @@ void gic_inject(void)
 
     if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
         gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 1);
-    else
-        gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 0);
 }
 
 static void do_sgi(struct cpu_user_regs *regs, enum gic_sgi sgi)

On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> I got this strange log:
> 
> (XEN) received maintenance interrupt irq=1023
> 
> And platform does not hang due to this:
> +    hcr = GICH[GICH_HCR];
> +    if ( hcr & GICH_HCR_UIE )
> +    {
> +        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> +        uie_on = 1;
> +    }
> 
> On Wed, Nov 19, 2014 at 6:50 PM, Stefano Stabellini
> <stefano.stabellini@eu.citrix.com> wrote:
> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> >> On Wed, Nov 19, 2014 at 6:13 PM, Stefano Stabellini
> >> <stefano.stabellini@eu.citrix.com> wrote:
> >> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> >> >> On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi
> >> >> <andrii.tseglytskyi@globallogic.com> wrote:
> >> >> > On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini
> >> >> > <stefano.stabellini@eu.citrix.com> wrote:
> >> >> >> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> >> >> >>> Hi Stefano,
> >> >> >>>
> >> >> >>> On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini
> >> >> >>> <stefano.stabellini@eu.citrix.com> wrote:
> >> >> >>> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> >> >> >>> >> Hi Stefano,
> >> >> >>> >>
> >> >> >>> >> > >      if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
> >> >> >>> >> > > -        GICH[GICH_HCR] |= GICH_HCR_UIE;
> >> >> >>> >> > > +        GICH[GICH_HCR] |= GICH_HCR_NPIE;
> >> >> >>> >> > >      else
> >> >> >>> >> > > -        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> >> >> >>> >> > > +        GICH[GICH_HCR] &= ~GICH_HCR_NPIE;
> >> >> >>> >> > >
> >> >> >>> >> > >  }
> >> >> >>> >> >
> >> >> >>> >> > Yes, exactly
> >> >> >>> >>
> >> >> >>> >> I tried, hang still occurs with this change
> >> >> >>> >
> >> >> >>> > We need to figure out why during the hang you still have all the LRs
> >> >> >>> > busy even if you are getting maintenance interrupts that should cause
> >> >> >>> > them to be cleared.
> >> >> >>> >
> >> >> >>>
> >> >> >>> I see that I have free LRs during maintenance interrupt
> >> >> >>>
> >> >> >>> (XEN) gic.c:871:d0v0 maintenance interrupt
> >> >> >>> (XEN) GICH_LRs (vcpu 0) mask=0
> >> >> >>> (XEN)    HW_LR[0]=9a015856
> >> >> >>> (XEN)    HW_LR[1]=0
> >> >> >>> (XEN)    HW_LR[2]=0
> >> >> >>> (XEN)    HW_LR[3]=0
> >> >> >>> (XEN) Inflight irq=86 lr=0
> >> >> >>> (XEN) Inflight irq=2 lr=255
> >> >> >>> (XEN) Pending irq=2
> >> >> >>>
> >> >> >>> But I see that after I got hang - maintenance interrupts are generated
> >> >> >>> continuously. Platform continues printing the same log till reboot.
> >> >> >>
> >> >> >> Exactly the same log? As in the one above you just pasted?
> >> >> >> That is very very suspicious.
> >> >> >
> >> >> > Yes exactly the same log. And looks like it means that LRs are flushed
> >> >> > correctly.
> >> >> >
> >> >> >>
> >> >> >> I am thinking that we are not handling GICH_HCR_UIE correctly and
> >> >> >> something we do in Xen, maybe writing to an LR register, might trigger a
> >> >> >> new maintenance interrupt immediately causing an infinite loop.
> >> >> >>
> >> >> >
> >> >> > Yes, this is what I'm thinking about. Taking in account all collected
> >> >> > debug info it looks like once LRs are overloaded with SGIs -
> >> >> > maintenance interrupt occurs.
> >> >> > And then it is not handled properly, and occurs again and again - so
> >> >> > platform hangs inside its handler.
> >> >> >
> >> >> >> Could you please try this patch? It disable GICH_HCR_UIE immediately on
> >> >> >> hypervisor entry.
> >> >> >>
> >> >> >
> >> >> > Now trying.
> >> >> >
> >> >> >>
> >> >> >> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> >> >> >> index 4d2a92d..6ae8dc4 100644
> >> >> >> --- a/xen/arch/arm/gic.c
> >> >> >> +++ b/xen/arch/arm/gic.c
> >> >> >> @@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v)
> >> >> >>      if ( is_idle_vcpu(v) )
> >> >> >>          return;
> >> >> >>
> >> >> >> +    GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> >> >> >> +
> >> >> >>      spin_lock_irqsave(&v->arch.vgic.lock, flags);
> >> >> >>
> >> >> >>      while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask),
> >> >> >> @@ -821,12 +823,8 @@ void gic_inject(void)
> >> >> >>
> >> >> >>      gic_restore_pending_irqs(current);
> >> >> >>
> >> >> >> -
> >> >> >>      if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
> >> >> >>          GICH[GICH_HCR] |= GICH_HCR_UIE;
> >> >> >> -    else
> >> >> >> -        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> >> >> >> -
> >> >> >>  }
> >> >> >>
> >> >> >>  static void do_sgi(struct cpu_user_regs *regs, int othercpu, enum gic_sgi sgi)
> >> >> >
> >> >>
> >> >> Heh - I don't see hangs with this patch :) But also I see that
> >> >> maintenance interrupt doesn't occur (and no hang as result)
> >> >> Stefano - is this expected?
> >> >
> >> > No maintenance interrupts at all? That's strange. You should be
> >> > receiving them when LRs are full and you still have interrupts pending
> >> > to be added to them.
> >> >
> >> > You could add another printk here to see if you should be receiving
> >> > them:
> >> >
> >> >      if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
> >> > +    {
> >> > +        gdprintk(XENLOG_DEBUG, "requesting maintenance interrupt\n");
> >> >          GICH[GICH_HCR] |= GICH_HCR_UIE;
> >> > -    else
> >> > -        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> >> > -
> >> > +    }
> >> >  }
> >> >
> >>
> >> Requested properly:
> >>
> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> >>
> >> But does not occur
> >
> > OK, let's see what's going on then by printing the irq number of the
> > maintenance interrupt:
> >
> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> > index 4d2a92d..fed3167 100644
> > --- a/xen/arch/arm/gic.c
> > +++ b/xen/arch/arm/gic.c
> > @@ -55,6 +55,7 @@ static struct {
> >  static DEFINE_PER_CPU(uint64_t, lr_mask);
> >
> >  static uint8_t nr_lrs;
> > +static bool uie_on;
> >  #define lr_all_full() (this_cpu(lr_mask) == ((1 << nr_lrs) - 1))
> >
> >  /* The GIC mapping of CPU interfaces does not necessarily match the
> > @@ -694,6 +695,7 @@ void gic_clear_lrs(struct vcpu *v)
> >  {
> >      int i = 0;
> >      unsigned long flags;
> > +    unsigned long hcr;
> >
> >      /* The idle domain has no LRs to be cleared. Since gic_restore_state
> >       * doesn't write any LR registers for the idle domain they could be
> > @@ -701,6 +703,13 @@ void gic_clear_lrs(struct vcpu *v)
> >      if ( is_idle_vcpu(v) )
> >          return;
> >
> > +    hcr = GICH[GICH_HCR];
> > +    if ( hcr & GICH_HCR_UIE )
> > +    {
> > +        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> > +        uie_on = 1;
> > +    }
> > +
> >      spin_lock_irqsave(&v->arch.vgic.lock, flags);
> >
> >      while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask),
> > @@ -865,6 +873,11 @@ void gic_interrupt(struct cpu_user_regs *regs, int is_fiq)
> >          intack = GICC[GICC_IAR];
> >          irq = intack & GICC_IA_IRQ;
> >
> > +        if ( uie_on )
> > +        {
> > +            uie_on = 0;
> > +            printk("received maintenance interrupt irq=%d\n", irq);
> > +        }
> >          if ( likely(irq >= 16 && irq < 1021) )
> >          {
> >              local_irq_enable();
> 
> 
> 
> -- 
> 
> Andrii Tseglytskyi | Embedded Dev
> GlobalLogic
> www.globallogic.com
> 

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: Xen 4.5 random freeze question
  2014-11-19 17:03                                                                         ` Andrii Tseglytskyi
  2014-11-19 17:07                                                                           ` Stefano Stabellini
@ 2014-11-19 17:11                                                                           ` Andrii Tseglytskyi
  2014-11-19 17:14                                                                             ` Stefano Stabellini
  1 sibling, 1 reply; 66+ messages in thread
From: Andrii Tseglytskyi @ 2014-11-19 17:11 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: Julien Grall, Ian Campbell, xen-devel

Does number 1023 mean that maintenance interrupt is global?

On Wed, Nov 19, 2014 at 7:03 PM, Andrii Tseglytskyi
<andrii.tseglytskyi@globallogic.com> wrote:
> I got this strange log:
>
> (XEN) received maintenance interrupt irq=1023
>
> And platform does not hang due to this:
> +    hcr = GICH[GICH_HCR];
> +    if ( hcr & GICH_HCR_UIE )
> +    {
> +        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> +        uie_on = 1;
> +    }
>
> On Wed, Nov 19, 2014 at 6:50 PM, Stefano Stabellini
> <stefano.stabellini@eu.citrix.com> wrote:
>> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>>> On Wed, Nov 19, 2014 at 6:13 PM, Stefano Stabellini
>>> <stefano.stabellini@eu.citrix.com> wrote:
>>> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>>> >> On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi
>>> >> <andrii.tseglytskyi@globallogic.com> wrote:
>>> >> > On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini
>>> >> > <stefano.stabellini@eu.citrix.com> wrote:
>>> >> >> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>>> >> >>> Hi Stefano,
>>> >> >>>
>>> >> >>> On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini
>>> >> >>> <stefano.stabellini@eu.citrix.com> wrote:
>>> >> >>> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>>> >> >>> >> Hi Stefano,
>>> >> >>> >>
>>> >> >>> >> > >      if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
>>> >> >>> >> > > -        GICH[GICH_HCR] |= GICH_HCR_UIE;
>>> >> >>> >> > > +        GICH[GICH_HCR] |= GICH_HCR_NPIE;
>>> >> >>> >> > >      else
>>> >> >>> >> > > -        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>>> >> >>> >> > > +        GICH[GICH_HCR] &= ~GICH_HCR_NPIE;
>>> >> >>> >> > >
>>> >> >>> >> > >  }
>>> >> >>> >> >
>>> >> >>> >> > Yes, exactly
>>> >> >>> >>
>>> >> >>> >> I tried, hang still occurs with this change
>>> >> >>> >
>>> >> >>> > We need to figure out why during the hang you still have all the LRs
>>> >> >>> > busy even if you are getting maintenance interrupts that should cause
>>> >> >>> > them to be cleared.
>>> >> >>> >
>>> >> >>>
>>> >> >>> I see that I have free LRs during maintenance interrupt
>>> >> >>>
>>> >> >>> (XEN) gic.c:871:d0v0 maintenance interrupt
>>> >> >>> (XEN) GICH_LRs (vcpu 0) mask=0
>>> >> >>> (XEN)    HW_LR[0]=9a015856
>>> >> >>> (XEN)    HW_LR[1]=0
>>> >> >>> (XEN)    HW_LR[2]=0
>>> >> >>> (XEN)    HW_LR[3]=0
>>> >> >>> (XEN) Inflight irq=86 lr=0
>>> >> >>> (XEN) Inflight irq=2 lr=255
>>> >> >>> (XEN) Pending irq=2
>>> >> >>>
>>> >> >>> But I see that after I got hang - maintenance interrupts are generated
>>> >> >>> continuously. Platform continues printing the same log till reboot.
>>> >> >>
>>> >> >> Exactly the same log? As in the one above you just pasted?
>>> >> >> That is very very suspicious.
>>> >> >
>>> >> > Yes exactly the same log. And looks like it means that LRs are flushed
>>> >> > correctly.
>>> >> >
>>> >> >>
>>> >> >> I am thinking that we are not handling GICH_HCR_UIE correctly and
>>> >> >> something we do in Xen, maybe writing to an LR register, might trigger a
>>> >> >> new maintenance interrupt immediately causing an infinite loop.
>>> >> >>
>>> >> >
>>> >> > Yes, this is what I'm thinking about. Taking in account all collected
>>> >> > debug info it looks like once LRs are overloaded with SGIs -
>>> >> > maintenance interrupt occurs.
>>> >> > And then it is not handled properly, and occurs again and again - so
>>> >> > platform hangs inside its handler.
>>> >> >
>>> >> >> Could you please try this patch? It disable GICH_HCR_UIE immediately on
>>> >> >> hypervisor entry.
>>> >> >>
>>> >> >
>>> >> > Now trying.
>>> >> >
>>> >> >>
>>> >> >> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
>>> >> >> index 4d2a92d..6ae8dc4 100644
>>> >> >> --- a/xen/arch/arm/gic.c
>>> >> >> +++ b/xen/arch/arm/gic.c
>>> >> >> @@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v)
>>> >> >>      if ( is_idle_vcpu(v) )
>>> >> >>          return;
>>> >> >>
>>> >> >> +    GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>>> >> >> +
>>> >> >>      spin_lock_irqsave(&v->arch.vgic.lock, flags);
>>> >> >>
>>> >> >>      while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask),
>>> >> >> @@ -821,12 +823,8 @@ void gic_inject(void)
>>> >> >>
>>> >> >>      gic_restore_pending_irqs(current);
>>> >> >>
>>> >> >> -
>>> >> >>      if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
>>> >> >>          GICH[GICH_HCR] |= GICH_HCR_UIE;
>>> >> >> -    else
>>> >> >> -        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>>> >> >> -
>>> >> >>  }
>>> >> >>
>>> >> >>  static void do_sgi(struct cpu_user_regs *regs, int othercpu, enum gic_sgi sgi)
>>> >> >
>>> >>
>>> >> Heh - I don't see hangs with this patch :) But also I see that
>>> >> maintenance interrupt doesn't occur (and no hang as result)
>>> >> Stefano - is this expected?
>>> >
>>> > No maintenance interrupts at all? That's strange. You should be
>>> > receiving them when LRs are full and you still have interrupts pending
>>> > to be added to them.
>>> >
>>> > You could add another printk here to see if you should be receiving
>>> > them:
>>> >
>>> >      if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
>>> > +    {
>>> > +        gdprintk(XENLOG_DEBUG, "requesting maintenance interrupt\n");
>>> >          GICH[GICH_HCR] |= GICH_HCR_UIE;
>>> > -    else
>>> > -        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>>> > -
>>> > +    }
>>> >  }
>>> >
>>>
>>> Requested properly:
>>>
>>> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
>>> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
>>> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
>>> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
>>> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
>>> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
>>> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
>>>
>>> But does not occur
>>
>> OK, let's see what's going on then by printing the irq number of the
>> maintenance interrupt:
>>
>> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
>> index 4d2a92d..fed3167 100644
>> --- a/xen/arch/arm/gic.c
>> +++ b/xen/arch/arm/gic.c
>> @@ -55,6 +55,7 @@ static struct {
>>  static DEFINE_PER_CPU(uint64_t, lr_mask);
>>
>>  static uint8_t nr_lrs;
>> +static bool uie_on;
>>  #define lr_all_full() (this_cpu(lr_mask) == ((1 << nr_lrs) - 1))
>>
>>  /* The GIC mapping of CPU interfaces does not necessarily match the
>> @@ -694,6 +695,7 @@ void gic_clear_lrs(struct vcpu *v)
>>  {
>>      int i = 0;
>>      unsigned long flags;
>> +    unsigned long hcr;
>>
>>      /* The idle domain has no LRs to be cleared. Since gic_restore_state
>>       * doesn't write any LR registers for the idle domain they could be
>> @@ -701,6 +703,13 @@ void gic_clear_lrs(struct vcpu *v)
>>      if ( is_idle_vcpu(v) )
>>          return;
>>
>> +    hcr = GICH[GICH_HCR];
>> +    if ( hcr & GICH_HCR_UIE )
>> +    {
>> +        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>> +        uie_on = 1;
>> +    }
>> +
>>      spin_lock_irqsave(&v->arch.vgic.lock, flags);
>>
>>      while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask),
>> @@ -865,6 +873,11 @@ void gic_interrupt(struct cpu_user_regs *regs, int is_fiq)
>>          intack = GICC[GICC_IAR];
>>          irq = intack & GICC_IA_IRQ;
>>
>> +        if ( uie_on )
>> +        {
>> +            uie_on = 0;
>> +            printk("received maintenance interrupt irq=%d\n", irq);
>> +        }
>>          if ( likely(irq >= 16 && irq < 1021) )
>>          {
>>              local_irq_enable();
>
>
>
> --
>
> Andrii Tseglytskyi | Embedded Dev
> GlobalLogic
> www.globallogic.com



-- 

Andrii Tseglytskyi | Embedded Dev
GlobalLogic
www.globallogic.com

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Xen 4.5 random freeze question
  2014-11-19 17:11                                                                           ` Andrii Tseglytskyi
@ 2014-11-19 17:14                                                                             ` Stefano Stabellini
  0 siblings, 0 replies; 66+ messages in thread
From: Stefano Stabellini @ 2014-11-19 17:14 UTC (permalink / raw)
  To: Andrii Tseglytskyi
  Cc: Julien Grall, xen-devel, Ian Campbell, Stefano Stabellini

No, it just means "spurious interrupt".

On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> Does number 1023 mean that maintenance interrupt is global?
> 
> On Wed, Nov 19, 2014 at 7:03 PM, Andrii Tseglytskyi
> <andrii.tseglytskyi@globallogic.com> wrote:
> > I got this strange log:
> >
> > (XEN) received maintenance interrupt irq=1023
> >
> > And platform does not hang due to this:
> > +    hcr = GICH[GICH_HCR];
> > +    if ( hcr & GICH_HCR_UIE )
> > +    {
> > +        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> > +        uie_on = 1;
> > +    }
> >
> > On Wed, Nov 19, 2014 at 6:50 PM, Stefano Stabellini
> > <stefano.stabellini@eu.citrix.com> wrote:
> >> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> >>> On Wed, Nov 19, 2014 at 6:13 PM, Stefano Stabellini
> >>> <stefano.stabellini@eu.citrix.com> wrote:
> >>> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> >>> >> On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi
> >>> >> <andrii.tseglytskyi@globallogic.com> wrote:
> >>> >> > On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini
> >>> >> > <stefano.stabellini@eu.citrix.com> wrote:
> >>> >> >> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> >>> >> >>> Hi Stefano,
> >>> >> >>>
> >>> >> >>> On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini
> >>> >> >>> <stefano.stabellini@eu.citrix.com> wrote:
> >>> >> >>> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> >>> >> >>> >> Hi Stefano,
> >>> >> >>> >>
> >>> >> >>> >> > >      if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
> >>> >> >>> >> > > -        GICH[GICH_HCR] |= GICH_HCR_UIE;
> >>> >> >>> >> > > +        GICH[GICH_HCR] |= GICH_HCR_NPIE;
> >>> >> >>> >> > >      else
> >>> >> >>> >> > > -        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> >>> >> >>> >> > > +        GICH[GICH_HCR] &= ~GICH_HCR_NPIE;
> >>> >> >>> >> > >
> >>> >> >>> >> > >  }
> >>> >> >>> >> >
> >>> >> >>> >> > Yes, exactly
> >>> >> >>> >>
> >>> >> >>> >> I tried, hang still occurs with this change
> >>> >> >>> >
> >>> >> >>> > We need to figure out why during the hang you still have all the LRs
> >>> >> >>> > busy even if you are getting maintenance interrupts that should cause
> >>> >> >>> > them to be cleared.
> >>> >> >>> >
> >>> >> >>>
> >>> >> >>> I see that I have free LRs during maintenance interrupt
> >>> >> >>>
> >>> >> >>> (XEN) gic.c:871:d0v0 maintenance interrupt
> >>> >> >>> (XEN) GICH_LRs (vcpu 0) mask=0
> >>> >> >>> (XEN)    HW_LR[0]=9a015856
> >>> >> >>> (XEN)    HW_LR[1]=0
> >>> >> >>> (XEN)    HW_LR[2]=0
> >>> >> >>> (XEN)    HW_LR[3]=0
> >>> >> >>> (XEN) Inflight irq=86 lr=0
> >>> >> >>> (XEN) Inflight irq=2 lr=255
> >>> >> >>> (XEN) Pending irq=2
> >>> >> >>>
> >>> >> >>> But I see that after I got hang - maintenance interrupts are generated
> >>> >> >>> continuously. Platform continues printing the same log till reboot.
> >>> >> >>
> >>> >> >> Exactly the same log? As in the one above you just pasted?
> >>> >> >> That is very very suspicious.
> >>> >> >
> >>> >> > Yes exactly the same log. And looks like it means that LRs are flushed
> >>> >> > correctly.
> >>> >> >
> >>> >> >>
> >>> >> >> I am thinking that we are not handling GICH_HCR_UIE correctly and
> >>> >> >> something we do in Xen, maybe writing to an LR register, might trigger a
> >>> >> >> new maintenance interrupt immediately causing an infinite loop.
> >>> >> >>
> >>> >> >
> >>> >> > Yes, this is what I'm thinking about. Taking in account all collected
> >>> >> > debug info it looks like once LRs are overloaded with SGIs -
> >>> >> > maintenance interrupt occurs.
> >>> >> > And then it is not handled properly, and occurs again and again - so
> >>> >> > platform hangs inside its handler.
> >>> >> >
> >>> >> >> Could you please try this patch? It disable GICH_HCR_UIE immediately on
> >>> >> >> hypervisor entry.
> >>> >> >>
> >>> >> >
> >>> >> > Now trying.
> >>> >> >
> >>> >> >>
> >>> >> >> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> >>> >> >> index 4d2a92d..6ae8dc4 100644
> >>> >> >> --- a/xen/arch/arm/gic.c
> >>> >> >> +++ b/xen/arch/arm/gic.c
> >>> >> >> @@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v)
> >>> >> >>      if ( is_idle_vcpu(v) )
> >>> >> >>          return;
> >>> >> >>
> >>> >> >> +    GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> >>> >> >> +
> >>> >> >>      spin_lock_irqsave(&v->arch.vgic.lock, flags);
> >>> >> >>
> >>> >> >>      while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask),
> >>> >> >> @@ -821,12 +823,8 @@ void gic_inject(void)
> >>> >> >>
> >>> >> >>      gic_restore_pending_irqs(current);
> >>> >> >>
> >>> >> >> -
> >>> >> >>      if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
> >>> >> >>          GICH[GICH_HCR] |= GICH_HCR_UIE;
> >>> >> >> -    else
> >>> >> >> -        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> >>> >> >> -
> >>> >> >>  }
> >>> >> >>
> >>> >> >>  static void do_sgi(struct cpu_user_regs *regs, int othercpu, enum gic_sgi sgi)
> >>> >> >
> >>> >>
> >>> >> Heh - I don't see hangs with this patch :) But also I see that
> >>> >> maintenance interrupt doesn't occur (and no hang as result)
> >>> >> Stefano - is this expected?
> >>> >
> >>> > No maintenance interrupts at all? That's strange. You should be
> >>> > receiving them when LRs are full and you still have interrupts pending
> >>> > to be added to them.
> >>> >
> >>> > You could add another printk here to see if you should be receiving
> >>> > them:
> >>> >
> >>> >      if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
> >>> > +    {
> >>> > +        gdprintk(XENLOG_DEBUG, "requesting maintenance interrupt\n");
> >>> >          GICH[GICH_HCR] |= GICH_HCR_UIE;
> >>> > -    else
> >>> > -        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> >>> > -
> >>> > +    }
> >>> >  }
> >>> >
> >>>
> >>> Requested properly:
> >>>
> >>> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> >>> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> >>> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> >>> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> >>> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> >>> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> >>> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> >>>
> >>> But does not occur
> >>
> >> OK, let's see what's going on then by printing the irq number of the
> >> maintenance interrupt:
> >>
> >> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> >> index 4d2a92d..fed3167 100644
> >> --- a/xen/arch/arm/gic.c
> >> +++ b/xen/arch/arm/gic.c
> >> @@ -55,6 +55,7 @@ static struct {
> >>  static DEFINE_PER_CPU(uint64_t, lr_mask);
> >>
> >>  static uint8_t nr_lrs;
> >> +static bool uie_on;
> >>  #define lr_all_full() (this_cpu(lr_mask) == ((1 << nr_lrs) - 1))
> >>
> >>  /* The GIC mapping of CPU interfaces does not necessarily match the
> >> @@ -694,6 +695,7 @@ void gic_clear_lrs(struct vcpu *v)
> >>  {
> >>      int i = 0;
> >>      unsigned long flags;
> >> +    unsigned long hcr;
> >>
> >>      /* The idle domain has no LRs to be cleared. Since gic_restore_state
> >>       * doesn't write any LR registers for the idle domain they could be
> >> @@ -701,6 +703,13 @@ void gic_clear_lrs(struct vcpu *v)
> >>      if ( is_idle_vcpu(v) )
> >>          return;
> >>
> >> +    hcr = GICH[GICH_HCR];
> >> +    if ( hcr & GICH_HCR_UIE )
> >> +    {
> >> +        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> >> +        uie_on = 1;
> >> +    }
> >> +
> >>      spin_lock_irqsave(&v->arch.vgic.lock, flags);
> >>
> >>      while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask),
> >> @@ -865,6 +873,11 @@ void gic_interrupt(struct cpu_user_regs *regs, int is_fiq)
> >>          intack = GICC[GICC_IAR];
> >>          irq = intack & GICC_IA_IRQ;
> >>
> >> +        if ( uie_on )
> >> +        {
> >> +            uie_on = 0;
> >> +            printk("received maintenance interrupt irq=%d\n", irq);
> >> +        }
> >>          if ( likely(irq >= 16 && irq < 1021) )
> >>          {
> >>              local_irq_enable();
> >
> >
> >
> > --
> >
> > Andrii Tseglytskyi | Embedded Dev
> > GlobalLogic
> > www.globallogic.com
> 
> 
> 
> -- 
> 
> Andrii Tseglytskyi | Embedded Dev
> GlobalLogic
> www.globallogic.com
> 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Xen 4.5 random freeze question
  2014-11-19 17:07                                                                           ` Stefano Stabellini
@ 2014-11-19 17:37                                                                             ` Andrii Tseglytskyi
  2014-11-19 17:42                                                                               ` Stefano Stabellini
  0 siblings, 1 reply; 66+ messages in thread
From: Andrii Tseglytskyi @ 2014-11-19 17:37 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: Julien Grall, Ian Campbell, xen-devel

Hi Stefano,

On Wed, Nov 19, 2014 at 7:07 PM, Stefano Stabellini
<stefano.stabellini@eu.citrix.com> wrote:
> I think that's OK: it looks like that on your board for some reasons
> when UIE is set you get irq 1023 (spurious interrupt) instead of your
> normal maintenance interrupt.

OK, but I think this should be investigated too. What do you think ?

>
> But everything should work anyway without issues.
>
> This is the same patch as before but on top of the lastest xen-unstable
> tree. Please confirm if it works.
>
> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> index 70d10d6..df140b9 100644
> --- a/xen/arch/arm/gic.c
> +++ b/xen/arch/arm/gic.c
> @@ -403,6 +403,8 @@ void gic_clear_lrs(struct vcpu *v)
>      if ( is_idle_vcpu(v) )
>          return;
>
> +    gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 0);
> +
>      spin_lock_irqsave(&v->arch.vgic.lock, flags);
>
>      while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask),
> @@ -527,8 +529,6 @@ void gic_inject(void)
>
>      if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
>          gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 1);
> -    else
> -        gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 0);
>  }
>

I confirm - it works fine. Will this be a final fix ?

Regards,
Andrii

>  static void do_sgi(struct cpu_user_regs *regs, enum gic_sgi sgi)
>
> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>> I got this strange log:
>>
>> (XEN) received maintenance interrupt irq=1023
>>
>> And platform does not hang due to this:
>> +    hcr = GICH[GICH_HCR];
>> +    if ( hcr & GICH_HCR_UIE )
>> +    {
>> +        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>> +        uie_on = 1;
>> +    }
>>
>> On Wed, Nov 19, 2014 at 6:50 PM, Stefano Stabellini
>> <stefano.stabellini@eu.citrix.com> wrote:
>> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>> >> On Wed, Nov 19, 2014 at 6:13 PM, Stefano Stabellini
>> >> <stefano.stabellini@eu.citrix.com> wrote:
>> >> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>> >> >> On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi
>> >> >> <andrii.tseglytskyi@globallogic.com> wrote:
>> >> >> > On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini
>> >> >> > <stefano.stabellini@eu.citrix.com> wrote:
>> >> >> >> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>> >> >> >>> Hi Stefano,
>> >> >> >>>
>> >> >> >>> On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini
>> >> >> >>> <stefano.stabellini@eu.citrix.com> wrote:
>> >> >> >>> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>> >> >> >>> >> Hi Stefano,
>> >> >> >>> >>
>> >> >> >>> >> > >      if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
>> >> >> >>> >> > > -        GICH[GICH_HCR] |= GICH_HCR_UIE;
>> >> >> >>> >> > > +        GICH[GICH_HCR] |= GICH_HCR_NPIE;
>> >> >> >>> >> > >      else
>> >> >> >>> >> > > -        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>> >> >> >>> >> > > +        GICH[GICH_HCR] &= ~GICH_HCR_NPIE;
>> >> >> >>> >> > >
>> >> >> >>> >> > >  }
>> >> >> >>> >> >
>> >> >> >>> >> > Yes, exactly
>> >> >> >>> >>
>> >> >> >>> >> I tried, hang still occurs with this change
>> >> >> >>> >
>> >> >> >>> > We need to figure out why during the hang you still have all the LRs
>> >> >> >>> > busy even if you are getting maintenance interrupts that should cause
>> >> >> >>> > them to be cleared.
>> >> >> >>> >
>> >> >> >>>
>> >> >> >>> I see that I have free LRs during maintenance interrupt
>> >> >> >>>
>> >> >> >>> (XEN) gic.c:871:d0v0 maintenance interrupt
>> >> >> >>> (XEN) GICH_LRs (vcpu 0) mask=0
>> >> >> >>> (XEN)    HW_LR[0]=9a015856
>> >> >> >>> (XEN)    HW_LR[1]=0
>> >> >> >>> (XEN)    HW_LR[2]=0
>> >> >> >>> (XEN)    HW_LR[3]=0
>> >> >> >>> (XEN) Inflight irq=86 lr=0
>> >> >> >>> (XEN) Inflight irq=2 lr=255
>> >> >> >>> (XEN) Pending irq=2
>> >> >> >>>
>> >> >> >>> But I see that after I got hang - maintenance interrupts are generated
>> >> >> >>> continuously. Platform continues printing the same log till reboot.
>> >> >> >>
>> >> >> >> Exactly the same log? As in the one above you just pasted?
>> >> >> >> That is very very suspicious.
>> >> >> >
>> >> >> > Yes exactly the same log. And looks like it means that LRs are flushed
>> >> >> > correctly.
>> >> >> >
>> >> >> >>
>> >> >> >> I am thinking that we are not handling GICH_HCR_UIE correctly and
>> >> >> >> something we do in Xen, maybe writing to an LR register, might trigger a
>> >> >> >> new maintenance interrupt immediately causing an infinite loop.
>> >> >> >>
>> >> >> >
>> >> >> > Yes, this is what I'm thinking about. Taking in account all collected
>> >> >> > debug info it looks like once LRs are overloaded with SGIs -
>> >> >> > maintenance interrupt occurs.
>> >> >> > And then it is not handled properly, and occurs again and again - so
>> >> >> > platform hangs inside its handler.
>> >> >> >
>> >> >> >> Could you please try this patch? It disable GICH_HCR_UIE immediately on
>> >> >> >> hypervisor entry.
>> >> >> >>
>> >> >> >
>> >> >> > Now trying.
>> >> >> >
>> >> >> >>
>> >> >> >> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
>> >> >> >> index 4d2a92d..6ae8dc4 100644
>> >> >> >> --- a/xen/arch/arm/gic.c
>> >> >> >> +++ b/xen/arch/arm/gic.c
>> >> >> >> @@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v)
>> >> >> >>      if ( is_idle_vcpu(v) )
>> >> >> >>          return;
>> >> >> >>
>> >> >> >> +    GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>> >> >> >> +
>> >> >> >>      spin_lock_irqsave(&v->arch.vgic.lock, flags);
>> >> >> >>
>> >> >> >>      while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask),
>> >> >> >> @@ -821,12 +823,8 @@ void gic_inject(void)
>> >> >> >>
>> >> >> >>      gic_restore_pending_irqs(current);
>> >> >> >>
>> >> >> >> -
>> >> >> >>      if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
>> >> >> >>          GICH[GICH_HCR] |= GICH_HCR_UIE;
>> >> >> >> -    else
>> >> >> >> -        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>> >> >> >> -
>> >> >> >>  }
>> >> >> >>
>> >> >> >>  static void do_sgi(struct cpu_user_regs *regs, int othercpu, enum gic_sgi sgi)
>> >> >> >
>> >> >>
>> >> >> Heh - I don't see hangs with this patch :) But also I see that
>> >> >> maintenance interrupt doesn't occur (and no hang as result)
>> >> >> Stefano - is this expected?
>> >> >
>> >> > No maintenance interrupts at all? That's strange. You should be
>> >> > receiving them when LRs are full and you still have interrupts pending
>> >> > to be added to them.
>> >> >
>> >> > You could add another printk here to see if you should be receiving
>> >> > them:
>> >> >
>> >> >      if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
>> >> > +    {
>> >> > +        gdprintk(XENLOG_DEBUG, "requesting maintenance interrupt\n");
>> >> >          GICH[GICH_HCR] |= GICH_HCR_UIE;
>> >> > -    else
>> >> > -        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>> >> > -
>> >> > +    }
>> >> >  }
>> >> >
>> >>
>> >> Requested properly:
>> >>
>> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
>> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
>> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
>> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
>> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
>> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
>> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
>> >>
>> >> But does not occur
>> >
>> > OK, let's see what's going on then by printing the irq number of the
>> > maintenance interrupt:
>> >
>> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
>> > index 4d2a92d..fed3167 100644
>> > --- a/xen/arch/arm/gic.c
>> > +++ b/xen/arch/arm/gic.c
>> > @@ -55,6 +55,7 @@ static struct {
>> >  static DEFINE_PER_CPU(uint64_t, lr_mask);
>> >
>> >  static uint8_t nr_lrs;
>> > +static bool uie_on;
>> >  #define lr_all_full() (this_cpu(lr_mask) == ((1 << nr_lrs) - 1))
>> >
>> >  /* The GIC mapping of CPU interfaces does not necessarily match the
>> > @@ -694,6 +695,7 @@ void gic_clear_lrs(struct vcpu *v)
>> >  {
>> >      int i = 0;
>> >      unsigned long flags;
>> > +    unsigned long hcr;
>> >
>> >      /* The idle domain has no LRs to be cleared. Since gic_restore_state
>> >       * doesn't write any LR registers for the idle domain they could be
>> > @@ -701,6 +703,13 @@ void gic_clear_lrs(struct vcpu *v)
>> >      if ( is_idle_vcpu(v) )
>> >          return;
>> >
>> > +    hcr = GICH[GICH_HCR];
>> > +    if ( hcr & GICH_HCR_UIE )
>> > +    {
>> > +        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>> > +        uie_on = 1;
>> > +    }
>> > +
>> >      spin_lock_irqsave(&v->arch.vgic.lock, flags);
>> >
>> >      while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask),
>> > @@ -865,6 +873,11 @@ void gic_interrupt(struct cpu_user_regs *regs, int is_fiq)
>> >          intack = GICC[GICC_IAR];
>> >          irq = intack & GICC_IA_IRQ;
>> >
>> > +        if ( uie_on )
>> > +        {
>> > +            uie_on = 0;
>> > +            printk("received maintenance interrupt irq=%d\n", irq);
>> > +        }
>> >          if ( likely(irq >= 16 && irq < 1021) )
>> >          {
>> >              local_irq_enable();
>>
>>
>>
>> --
>>
>> Andrii Tseglytskyi | Embedded Dev
>> GlobalLogic
>> www.globallogic.com
>>



-- 

Andrii Tseglytskyi | Embedded Dev
GlobalLogic
www.globallogic.com

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Xen 4.5 random freeze question
  2014-11-19 17:37                                                                             ` Andrii Tseglytskyi
@ 2014-11-19 17:42                                                                               ` Stefano Stabellini
  2014-11-19 17:47                                                                                 ` Andrii Tseglytskyi
  0 siblings, 1 reply; 66+ messages in thread
From: Stefano Stabellini @ 2014-11-19 17:42 UTC (permalink / raw)
  To: Andrii Tseglytskyi
  Cc: Julien Grall, xen-devel, Ian Campbell, Stefano Stabellini

On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> Hi Stefano,
> 
> On Wed, Nov 19, 2014 at 7:07 PM, Stefano Stabellini
> <stefano.stabellini@eu.citrix.com> wrote:
> > I think that's OK: it looks like that on your board for some reasons
> > when UIE is set you get irq 1023 (spurious interrupt) instead of your
> > normal maintenance interrupt.
> 
> OK, but I think this should be investigated too. What do you think ?

I think it is harmless: my guess is that if we clear UIE before reading
GICC_IAR, GICC_IAR returns spurious interrupt instead of maintenance
interrupt. But it doesn't really matter to us.

> >
> > But everything should work anyway without issues.
> >
> > This is the same patch as before but on top of the lastest xen-unstable
> > tree. Please confirm if it works.
> >
> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> > index 70d10d6..df140b9 100644
> > --- a/xen/arch/arm/gic.c
> > +++ b/xen/arch/arm/gic.c
> > @@ -403,6 +403,8 @@ void gic_clear_lrs(struct vcpu *v)
> >      if ( is_idle_vcpu(v) )
> >          return;
> >
> > +    gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 0);
> > +
> >      spin_lock_irqsave(&v->arch.vgic.lock, flags);
> >
> >      while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask),
> > @@ -527,8 +529,6 @@ void gic_inject(void)
> >
> >      if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
> >          gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 1);
> > -    else
> > -        gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 0);
> >  }
> >
> 
> I confirm - it works fine. Will this be a final fix ?

Yep :-)
Many thanks for your help on this!


> Regards,
> Andrii
> 
> >  static void do_sgi(struct cpu_user_regs *regs, enum gic_sgi sgi)
> >
> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> >> I got this strange log:
> >>
> >> (XEN) received maintenance interrupt irq=1023
> >>
> >> And platform does not hang due to this:
> >> +    hcr = GICH[GICH_HCR];
> >> +    if ( hcr & GICH_HCR_UIE )
> >> +    {
> >> +        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> >> +        uie_on = 1;
> >> +    }
> >>
> >> On Wed, Nov 19, 2014 at 6:50 PM, Stefano Stabellini
> >> <stefano.stabellini@eu.citrix.com> wrote:
> >> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> >> >> On Wed, Nov 19, 2014 at 6:13 PM, Stefano Stabellini
> >> >> <stefano.stabellini@eu.citrix.com> wrote:
> >> >> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> >> >> >> On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi
> >> >> >> <andrii.tseglytskyi@globallogic.com> wrote:
> >> >> >> > On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini
> >> >> >> > <stefano.stabellini@eu.citrix.com> wrote:
> >> >> >> >> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> >> >> >> >>> Hi Stefano,
> >> >> >> >>>
> >> >> >> >>> On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini
> >> >> >> >>> <stefano.stabellini@eu.citrix.com> wrote:
> >> >> >> >>> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> >> >> >> >>> >> Hi Stefano,
> >> >> >> >>> >>
> >> >> >> >>> >> > >      if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
> >> >> >> >>> >> > > -        GICH[GICH_HCR] |= GICH_HCR_UIE;
> >> >> >> >>> >> > > +        GICH[GICH_HCR] |= GICH_HCR_NPIE;
> >> >> >> >>> >> > >      else
> >> >> >> >>> >> > > -        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> >> >> >> >>> >> > > +        GICH[GICH_HCR] &= ~GICH_HCR_NPIE;
> >> >> >> >>> >> > >
> >> >> >> >>> >> > >  }
> >> >> >> >>> >> >
> >> >> >> >>> >> > Yes, exactly
> >> >> >> >>> >>
> >> >> >> >>> >> I tried, hang still occurs with this change
> >> >> >> >>> >
> >> >> >> >>> > We need to figure out why during the hang you still have all the LRs
> >> >> >> >>> > busy even if you are getting maintenance interrupts that should cause
> >> >> >> >>> > them to be cleared.
> >> >> >> >>> >
> >> >> >> >>>
> >> >> >> >>> I see that I have free LRs during maintenance interrupt
> >> >> >> >>>
> >> >> >> >>> (XEN) gic.c:871:d0v0 maintenance interrupt
> >> >> >> >>> (XEN) GICH_LRs (vcpu 0) mask=0
> >> >> >> >>> (XEN)    HW_LR[0]=9a015856
> >> >> >> >>> (XEN)    HW_LR[1]=0
> >> >> >> >>> (XEN)    HW_LR[2]=0
> >> >> >> >>> (XEN)    HW_LR[3]=0
> >> >> >> >>> (XEN) Inflight irq=86 lr=0
> >> >> >> >>> (XEN) Inflight irq=2 lr=255
> >> >> >> >>> (XEN) Pending irq=2
> >> >> >> >>>
> >> >> >> >>> But I see that after I got hang - maintenance interrupts are generated
> >> >> >> >>> continuously. Platform continues printing the same log till reboot.
> >> >> >> >>
> >> >> >> >> Exactly the same log? As in the one above you just pasted?
> >> >> >> >> That is very very suspicious.
> >> >> >> >
> >> >> >> > Yes exactly the same log. And looks like it means that LRs are flushed
> >> >> >> > correctly.
> >> >> >> >
> >> >> >> >>
> >> >> >> >> I am thinking that we are not handling GICH_HCR_UIE correctly and
> >> >> >> >> something we do in Xen, maybe writing to an LR register, might trigger a
> >> >> >> >> new maintenance interrupt immediately causing an infinite loop.
> >> >> >> >>
> >> >> >> >
> >> >> >> > Yes, this is what I'm thinking about. Taking in account all collected
> >> >> >> > debug info it looks like once LRs are overloaded with SGIs -
> >> >> >> > maintenance interrupt occurs.
> >> >> >> > And then it is not handled properly, and occurs again and again - so
> >> >> >> > platform hangs inside its handler.
> >> >> >> >
> >> >> >> >> Could you please try this patch? It disable GICH_HCR_UIE immediately on
> >> >> >> >> hypervisor entry.
> >> >> >> >>
> >> >> >> >
> >> >> >> > Now trying.
> >> >> >> >
> >> >> >> >>
> >> >> >> >> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> >> >> >> >> index 4d2a92d..6ae8dc4 100644
> >> >> >> >> --- a/xen/arch/arm/gic.c
> >> >> >> >> +++ b/xen/arch/arm/gic.c
> >> >> >> >> @@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v)
> >> >> >> >>      if ( is_idle_vcpu(v) )
> >> >> >> >>          return;
> >> >> >> >>
> >> >> >> >> +    GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> >> >> >> >> +
> >> >> >> >>      spin_lock_irqsave(&v->arch.vgic.lock, flags);
> >> >> >> >>
> >> >> >> >>      while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask),
> >> >> >> >> @@ -821,12 +823,8 @@ void gic_inject(void)
> >> >> >> >>
> >> >> >> >>      gic_restore_pending_irqs(current);
> >> >> >> >>
> >> >> >> >> -
> >> >> >> >>      if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
> >> >> >> >>          GICH[GICH_HCR] |= GICH_HCR_UIE;
> >> >> >> >> -    else
> >> >> >> >> -        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> >> >> >> >> -
> >> >> >> >>  }
> >> >> >> >>
> >> >> >> >>  static void do_sgi(struct cpu_user_regs *regs, int othercpu, enum gic_sgi sgi)
> >> >> >> >
> >> >> >>
> >> >> >> Heh - I don't see hangs with this patch :) But also I see that
> >> >> >> maintenance interrupt doesn't occur (and no hang as result)
> >> >> >> Stefano - is this expected?
> >> >> >
> >> >> > No maintenance interrupts at all? That's strange. You should be
> >> >> > receiving them when LRs are full and you still have interrupts pending
> >> >> > to be added to them.
> >> >> >
> >> >> > You could add another printk here to see if you should be receiving
> >> >> > them:
> >> >> >
> >> >> >      if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
> >> >> > +    {
> >> >> > +        gdprintk(XENLOG_DEBUG, "requesting maintenance interrupt\n");
> >> >> >          GICH[GICH_HCR] |= GICH_HCR_UIE;
> >> >> > -    else
> >> >> > -        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> >> >> > -
> >> >> > +    }
> >> >> >  }
> >> >> >
> >> >>
> >> >> Requested properly:
> >> >>
> >> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> >> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> >> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> >> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> >> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> >> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> >> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> >> >>
> >> >> But does not occur
> >> >
> >> > OK, let's see what's going on then by printing the irq number of the
> >> > maintenance interrupt:
> >> >
> >> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> >> > index 4d2a92d..fed3167 100644
> >> > --- a/xen/arch/arm/gic.c
> >> > +++ b/xen/arch/arm/gic.c
> >> > @@ -55,6 +55,7 @@ static struct {
> >> >  static DEFINE_PER_CPU(uint64_t, lr_mask);
> >> >
> >> >  static uint8_t nr_lrs;
> >> > +static bool uie_on;
> >> >  #define lr_all_full() (this_cpu(lr_mask) == ((1 << nr_lrs) - 1))
> >> >
> >> >  /* The GIC mapping of CPU interfaces does not necessarily match the
> >> > @@ -694,6 +695,7 @@ void gic_clear_lrs(struct vcpu *v)
> >> >  {
> >> >      int i = 0;
> >> >      unsigned long flags;
> >> > +    unsigned long hcr;
> >> >
> >> >      /* The idle domain has no LRs to be cleared. Since gic_restore_state
> >> >       * doesn't write any LR registers for the idle domain they could be
> >> > @@ -701,6 +703,13 @@ void gic_clear_lrs(struct vcpu *v)
> >> >      if ( is_idle_vcpu(v) )
> >> >          return;
> >> >
> >> > +    hcr = GICH[GICH_HCR];
> >> > +    if ( hcr & GICH_HCR_UIE )
> >> > +    {
> >> > +        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> >> > +        uie_on = 1;
> >> > +    }
> >> > +
> >> >      spin_lock_irqsave(&v->arch.vgic.lock, flags);
> >> >
> >> >      while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask),
> >> > @@ -865,6 +873,11 @@ void gic_interrupt(struct cpu_user_regs *regs, int is_fiq)
> >> >          intack = GICC[GICC_IAR];
> >> >          irq = intack & GICC_IA_IRQ;
> >> >
> >> > +        if ( uie_on )
> >> > +        {
> >> > +            uie_on = 0;
> >> > +            printk("received maintenance interrupt irq=%d\n", irq);
> >> > +        }
> >> >          if ( likely(irq >= 16 && irq < 1021) )
> >> >          {
> >> >              local_irq_enable();
> >>
> >>
> >>
> >> --
> >>
> >> Andrii Tseglytskyi | Embedded Dev
> >> GlobalLogic
> >> www.globallogic.com
> >>
> 
> 
> 
> -- 
> 
> Andrii Tseglytskyi | Embedded Dev
> GlobalLogic
> www.globallogic.com
> 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Xen 4.5 random freeze question
  2014-11-19 17:42                                                                               ` Stefano Stabellini
@ 2014-11-19 17:47                                                                                 ` Andrii Tseglytskyi
  2014-11-19 18:06                                                                                   ` Andrii Tseglytskyi
  0 siblings, 1 reply; 66+ messages in thread
From: Andrii Tseglytskyi @ 2014-11-19 17:47 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: Julien Grall, Ian Campbell, xen-devel

On Wed, Nov 19, 2014 at 7:42 PM, Stefano Stabellini
<stefano.stabellini@eu.citrix.com> wrote:
> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>> Hi Stefano,
>>
>> On Wed, Nov 19, 2014 at 7:07 PM, Stefano Stabellini
>> <stefano.stabellini@eu.citrix.com> wrote:
>> > I think that's OK: it looks like that on your board for some reasons
>> > when UIE is set you get irq 1023 (spurious interrupt) instead of your
>> > normal maintenance interrupt.
>>
>> OK, but I think this should be investigated too. What do you think ?
>
> I think it is harmless: my guess is that if we clear UIE before reading
> GICC_IAR, GICC_IAR returns spurious interrupt instead of maintenance
> interrupt. But it doesn't really matter to us.

OK. I think catching this will be a good exercise for someone )) But
out of scope for this issue.

>
>> >
>> > But everything should work anyway without issues.
>> >
>> > This is the same patch as before but on top of the lastest xen-unstable
>> > tree. Please confirm if it works.
>> >
>> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
>> > index 70d10d6..df140b9 100644
>> > --- a/xen/arch/arm/gic.c
>> > +++ b/xen/arch/arm/gic.c
>> > @@ -403,6 +403,8 @@ void gic_clear_lrs(struct vcpu *v)
>> >      if ( is_idle_vcpu(v) )
>> >          return;
>> >
>> > +    gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 0);
>> > +
>> >      spin_lock_irqsave(&v->arch.vgic.lock, flags);
>> >
>> >      while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask),
>> > @@ -527,8 +529,6 @@ void gic_inject(void)
>> >
>> >      if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
>> >          gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 1);
>> > -    else
>> > -        gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 0);
>> >  }
>> >
>>
>> I confirm - it works fine. Will this be a final fix ?
>
> Yep :-)
> Many thanks for your help on this!

Thank you Stefano. This issue was really critical for us :)

Regards,
Andrii

>
>
>> Regards,
>> Andrii
>>
>> >  static void do_sgi(struct cpu_user_regs *regs, enum gic_sgi sgi)
>> >
>> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>> >> I got this strange log:
>> >>
>> >> (XEN) received maintenance interrupt irq=1023
>> >>
>> >> And platform does not hang due to this:
>> >> +    hcr = GICH[GICH_HCR];
>> >> +    if ( hcr & GICH_HCR_UIE )
>> >> +    {
>> >> +        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>> >> +        uie_on = 1;
>> >> +    }
>> >>
>> >> On Wed, Nov 19, 2014 at 6:50 PM, Stefano Stabellini
>> >> <stefano.stabellini@eu.citrix.com> wrote:
>> >> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>> >> >> On Wed, Nov 19, 2014 at 6:13 PM, Stefano Stabellini
>> >> >> <stefano.stabellini@eu.citrix.com> wrote:
>> >> >> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>> >> >> >> On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi
>> >> >> >> <andrii.tseglytskyi@globallogic.com> wrote:
>> >> >> >> > On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini
>> >> >> >> > <stefano.stabellini@eu.citrix.com> wrote:
>> >> >> >> >> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>> >> >> >> >>> Hi Stefano,
>> >> >> >> >>>
>> >> >> >> >>> On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini
>> >> >> >> >>> <stefano.stabellini@eu.citrix.com> wrote:
>> >> >> >> >>> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>> >> >> >> >>> >> Hi Stefano,
>> >> >> >> >>> >>
>> >> >> >> >>> >> > >      if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
>> >> >> >> >>> >> > > -        GICH[GICH_HCR] |= GICH_HCR_UIE;
>> >> >> >> >>> >> > > +        GICH[GICH_HCR] |= GICH_HCR_NPIE;
>> >> >> >> >>> >> > >      else
>> >> >> >> >>> >> > > -        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>> >> >> >> >>> >> > > +        GICH[GICH_HCR] &= ~GICH_HCR_NPIE;
>> >> >> >> >>> >> > >
>> >> >> >> >>> >> > >  }
>> >> >> >> >>> >> >
>> >> >> >> >>> >> > Yes, exactly
>> >> >> >> >>> >>
>> >> >> >> >>> >> I tried, hang still occurs with this change
>> >> >> >> >>> >
>> >> >> >> >>> > We need to figure out why during the hang you still have all the LRs
>> >> >> >> >>> > busy even if you are getting maintenance interrupts that should cause
>> >> >> >> >>> > them to be cleared.
>> >> >> >> >>> >
>> >> >> >> >>>
>> >> >> >> >>> I see that I have free LRs during maintenance interrupt
>> >> >> >> >>>
>> >> >> >> >>> (XEN) gic.c:871:d0v0 maintenance interrupt
>> >> >> >> >>> (XEN) GICH_LRs (vcpu 0) mask=0
>> >> >> >> >>> (XEN)    HW_LR[0]=9a015856
>> >> >> >> >>> (XEN)    HW_LR[1]=0
>> >> >> >> >>> (XEN)    HW_LR[2]=0
>> >> >> >> >>> (XEN)    HW_LR[3]=0
>> >> >> >> >>> (XEN) Inflight irq=86 lr=0
>> >> >> >> >>> (XEN) Inflight irq=2 lr=255
>> >> >> >> >>> (XEN) Pending irq=2
>> >> >> >> >>>
>> >> >> >> >>> But I see that after I got hang - maintenance interrupts are generated
>> >> >> >> >>> continuously. Platform continues printing the same log till reboot.
>> >> >> >> >>
>> >> >> >> >> Exactly the same log? As in the one above you just pasted?
>> >> >> >> >> That is very very suspicious.
>> >> >> >> >
>> >> >> >> > Yes exactly the same log. And looks like it means that LRs are flushed
>> >> >> >> > correctly.
>> >> >> >> >
>> >> >> >> >>
>> >> >> >> >> I am thinking that we are not handling GICH_HCR_UIE correctly and
>> >> >> >> >> something we do in Xen, maybe writing to an LR register, might trigger a
>> >> >> >> >> new maintenance interrupt immediately causing an infinite loop.
>> >> >> >> >>
>> >> >> >> >
>> >> >> >> > Yes, this is what I'm thinking about. Taking in account all collected
>> >> >> >> > debug info it looks like once LRs are overloaded with SGIs -
>> >> >> >> > maintenance interrupt occurs.
>> >> >> >> > And then it is not handled properly, and occurs again and again - so
>> >> >> >> > platform hangs inside its handler.
>> >> >> >> >
>> >> >> >> >> Could you please try this patch? It disable GICH_HCR_UIE immediately on
>> >> >> >> >> hypervisor entry.
>> >> >> >> >>
>> >> >> >> >
>> >> >> >> > Now trying.
>> >> >> >> >
>> >> >> >> >>
>> >> >> >> >> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
>> >> >> >> >> index 4d2a92d..6ae8dc4 100644
>> >> >> >> >> --- a/xen/arch/arm/gic.c
>> >> >> >> >> +++ b/xen/arch/arm/gic.c
>> >> >> >> >> @@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v)
>> >> >> >> >>      if ( is_idle_vcpu(v) )
>> >> >> >> >>          return;
>> >> >> >> >>
>> >> >> >> >> +    GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>> >> >> >> >> +
>> >> >> >> >>      spin_lock_irqsave(&v->arch.vgic.lock, flags);
>> >> >> >> >>
>> >> >> >> >>      while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask),
>> >> >> >> >> @@ -821,12 +823,8 @@ void gic_inject(void)
>> >> >> >> >>
>> >> >> >> >>      gic_restore_pending_irqs(current);
>> >> >> >> >>
>> >> >> >> >> -
>> >> >> >> >>      if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
>> >> >> >> >>          GICH[GICH_HCR] |= GICH_HCR_UIE;
>> >> >> >> >> -    else
>> >> >> >> >> -        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>> >> >> >> >> -
>> >> >> >> >>  }
>> >> >> >> >>
>> >> >> >> >>  static void do_sgi(struct cpu_user_regs *regs, int othercpu, enum gic_sgi sgi)
>> >> >> >> >
>> >> >> >>
>> >> >> >> Heh - I don't see hangs with this patch :) But also I see that
>> >> >> >> maintenance interrupt doesn't occur (and no hang as result)
>> >> >> >> Stefano - is this expected?
>> >> >> >
>> >> >> > No maintenance interrupts at all? That's strange. You should be
>> >> >> > receiving them when LRs are full and you still have interrupts pending
>> >> >> > to be added to them.
>> >> >> >
>> >> >> > You could add another printk here to see if you should be receiving
>> >> >> > them:
>> >> >> >
>> >> >> >      if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
>> >> >> > +    {
>> >> >> > +        gdprintk(XENLOG_DEBUG, "requesting maintenance interrupt\n");
>> >> >> >          GICH[GICH_HCR] |= GICH_HCR_UIE;
>> >> >> > -    else
>> >> >> > -        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>> >> >> > -
>> >> >> > +    }
>> >> >> >  }
>> >> >> >
>> >> >>
>> >> >> Requested properly:
>> >> >>
>> >> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
>> >> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
>> >> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
>> >> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
>> >> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
>> >> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
>> >> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
>> >> >>
>> >> >> But does not occur
>> >> >
>> >> > OK, let's see what's going on then by printing the irq number of the
>> >> > maintenance interrupt:
>> >> >
>> >> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
>> >> > index 4d2a92d..fed3167 100644
>> >> > --- a/xen/arch/arm/gic.c
>> >> > +++ b/xen/arch/arm/gic.c
>> >> > @@ -55,6 +55,7 @@ static struct {
>> >> >  static DEFINE_PER_CPU(uint64_t, lr_mask);
>> >> >
>> >> >  static uint8_t nr_lrs;
>> >> > +static bool uie_on;
>> >> >  #define lr_all_full() (this_cpu(lr_mask) == ((1 << nr_lrs) - 1))
>> >> >
>> >> >  /* The GIC mapping of CPU interfaces does not necessarily match the
>> >> > @@ -694,6 +695,7 @@ void gic_clear_lrs(struct vcpu *v)
>> >> >  {
>> >> >      int i = 0;
>> >> >      unsigned long flags;
>> >> > +    unsigned long hcr;
>> >> >
>> >> >      /* The idle domain has no LRs to be cleared. Since gic_restore_state
>> >> >       * doesn't write any LR registers for the idle domain they could be
>> >> > @@ -701,6 +703,13 @@ void gic_clear_lrs(struct vcpu *v)
>> >> >      if ( is_idle_vcpu(v) )
>> >> >          return;
>> >> >
>> >> > +    hcr = GICH[GICH_HCR];
>> >> > +    if ( hcr & GICH_HCR_UIE )
>> >> > +    {
>> >> > +        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>> >> > +        uie_on = 1;
>> >> > +    }
>> >> > +
>> >> >      spin_lock_irqsave(&v->arch.vgic.lock, flags);
>> >> >
>> >> >      while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask),
>> >> > @@ -865,6 +873,11 @@ void gic_interrupt(struct cpu_user_regs *regs, int is_fiq)
>> >> >          intack = GICC[GICC_IAR];
>> >> >          irq = intack & GICC_IA_IRQ;
>> >> >
>> >> > +        if ( uie_on )
>> >> > +        {
>> >> > +            uie_on = 0;
>> >> > +            printk("received maintenance interrupt irq=%d\n", irq);
>> >> > +        }
>> >> >          if ( likely(irq >= 16 && irq < 1021) )
>> >> >          {
>> >> >              local_irq_enable();
>> >>
>> >>
>> >>
>> >> --
>> >>
>> >> Andrii Tseglytskyi | Embedded Dev
>> >> GlobalLogic
>> >> www.globallogic.com
>> >>
>>
>>
>>
>> --
>>
>> Andrii Tseglytskyi | Embedded Dev
>> GlobalLogic
>> www.globallogic.com
>>



-- 

Andrii Tseglytskyi | Embedded Dev
GlobalLogic
www.globallogic.com

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Xen 4.5 random freeze question
  2014-11-19 17:47                                                                                 ` Andrii Tseglytskyi
@ 2014-11-19 18:06                                                                                   ` Andrii Tseglytskyi
  2014-11-19 18:14                                                                                     ` Stefano Stabellini
  0 siblings, 1 reply; 66+ messages in thread
From: Andrii Tseglytskyi @ 2014-11-19 18:06 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: Julien Grall, Ian Campbell, xen-devel

The only ambiguity left - maintenance interrupt handler is not called.
It was requested for specific IRQ number, retrieved from device tree.
But when we trigger GICH_HCR_UIE - we got maintenance interrupt for
spurious number 1023.

Regards,
Andrii

On Wed, Nov 19, 2014 at 7:47 PM, Andrii Tseglytskyi
<andrii.tseglytskyi@globallogic.com> wrote:
> On Wed, Nov 19, 2014 at 7:42 PM, Stefano Stabellini
> <stefano.stabellini@eu.citrix.com> wrote:
>> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>>> Hi Stefano,
>>>
>>> On Wed, Nov 19, 2014 at 7:07 PM, Stefano Stabellini
>>> <stefano.stabellini@eu.citrix.com> wrote:
>>> > I think that's OK: it looks like that on your board for some reasons
>>> > when UIE is set you get irq 1023 (spurious interrupt) instead of your
>>> > normal maintenance interrupt.
>>>
>>> OK, but I think this should be investigated too. What do you think ?
>>
>> I think it is harmless: my guess is that if we clear UIE before reading
>> GICC_IAR, GICC_IAR returns spurious interrupt instead of maintenance
>> interrupt. But it doesn't really matter to us.
>
> OK. I think catching this will be a good exercise for someone )) But
> out of scope for this issue.
>
>>
>>> >
>>> > But everything should work anyway without issues.
>>> >
>>> > This is the same patch as before but on top of the lastest xen-unstable
>>> > tree. Please confirm if it works.
>>> >
>>> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
>>> > index 70d10d6..df140b9 100644
>>> > --- a/xen/arch/arm/gic.c
>>> > +++ b/xen/arch/arm/gic.c
>>> > @@ -403,6 +403,8 @@ void gic_clear_lrs(struct vcpu *v)
>>> >      if ( is_idle_vcpu(v) )
>>> >          return;
>>> >
>>> > +    gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 0);
>>> > +
>>> >      spin_lock_irqsave(&v->arch.vgic.lock, flags);
>>> >
>>> >      while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask),
>>> > @@ -527,8 +529,6 @@ void gic_inject(void)
>>> >
>>> >      if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
>>> >          gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 1);
>>> > -    else
>>> > -        gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 0);
>>> >  }
>>> >
>>>
>>> I confirm - it works fine. Will this be a final fix ?
>>
>> Yep :-)
>> Many thanks for your help on this!
>
> Thank you Stefano. This issue was really critical for us :)
>
> Regards,
> Andrii
>
>>
>>
>>> Regards,
>>> Andrii
>>>
>>> >  static void do_sgi(struct cpu_user_regs *regs, enum gic_sgi sgi)
>>> >
>>> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>>> >> I got this strange log:
>>> >>
>>> >> (XEN) received maintenance interrupt irq=1023
>>> >>
>>> >> And platform does not hang due to this:
>>> >> +    hcr = GICH[GICH_HCR];
>>> >> +    if ( hcr & GICH_HCR_UIE )
>>> >> +    {
>>> >> +        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>>> >> +        uie_on = 1;
>>> >> +    }
>>> >>
>>> >> On Wed, Nov 19, 2014 at 6:50 PM, Stefano Stabellini
>>> >> <stefano.stabellini@eu.citrix.com> wrote:
>>> >> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>>> >> >> On Wed, Nov 19, 2014 at 6:13 PM, Stefano Stabellini
>>> >> >> <stefano.stabellini@eu.citrix.com> wrote:
>>> >> >> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>>> >> >> >> On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi
>>> >> >> >> <andrii.tseglytskyi@globallogic.com> wrote:
>>> >> >> >> > On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini
>>> >> >> >> > <stefano.stabellini@eu.citrix.com> wrote:
>>> >> >> >> >> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>>> >> >> >> >>> Hi Stefano,
>>> >> >> >> >>>
>>> >> >> >> >>> On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini
>>> >> >> >> >>> <stefano.stabellini@eu.citrix.com> wrote:
>>> >> >> >> >>> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>>> >> >> >> >>> >> Hi Stefano,
>>> >> >> >> >>> >>
>>> >> >> >> >>> >> > >      if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
>>> >> >> >> >>> >> > > -        GICH[GICH_HCR] |= GICH_HCR_UIE;
>>> >> >> >> >>> >> > > +        GICH[GICH_HCR] |= GICH_HCR_NPIE;
>>> >> >> >> >>> >> > >      else
>>> >> >> >> >>> >> > > -        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>>> >> >> >> >>> >> > > +        GICH[GICH_HCR] &= ~GICH_HCR_NPIE;
>>> >> >> >> >>> >> > >
>>> >> >> >> >>> >> > >  }
>>> >> >> >> >>> >> >
>>> >> >> >> >>> >> > Yes, exactly
>>> >> >> >> >>> >>
>>> >> >> >> >>> >> I tried, hang still occurs with this change
>>> >> >> >> >>> >
>>> >> >> >> >>> > We need to figure out why during the hang you still have all the LRs
>>> >> >> >> >>> > busy even if you are getting maintenance interrupts that should cause
>>> >> >> >> >>> > them to be cleared.
>>> >> >> >> >>> >
>>> >> >> >> >>>
>>> >> >> >> >>> I see that I have free LRs during maintenance interrupt
>>> >> >> >> >>>
>>> >> >> >> >>> (XEN) gic.c:871:d0v0 maintenance interrupt
>>> >> >> >> >>> (XEN) GICH_LRs (vcpu 0) mask=0
>>> >> >> >> >>> (XEN)    HW_LR[0]=9a015856
>>> >> >> >> >>> (XEN)    HW_LR[1]=0
>>> >> >> >> >>> (XEN)    HW_LR[2]=0
>>> >> >> >> >>> (XEN)    HW_LR[3]=0
>>> >> >> >> >>> (XEN) Inflight irq=86 lr=0
>>> >> >> >> >>> (XEN) Inflight irq=2 lr=255
>>> >> >> >> >>> (XEN) Pending irq=2
>>> >> >> >> >>>
>>> >> >> >> >>> But I see that after I got hang - maintenance interrupts are generated
>>> >> >> >> >>> continuously. Platform continues printing the same log till reboot.
>>> >> >> >> >>
>>> >> >> >> >> Exactly the same log? As in the one above you just pasted?
>>> >> >> >> >> That is very very suspicious.
>>> >> >> >> >
>>> >> >> >> > Yes exactly the same log. And looks like it means that LRs are flushed
>>> >> >> >> > correctly.
>>> >> >> >> >
>>> >> >> >> >>
>>> >> >> >> >> I am thinking that we are not handling GICH_HCR_UIE correctly and
>>> >> >> >> >> something we do in Xen, maybe writing to an LR register, might trigger a
>>> >> >> >> >> new maintenance interrupt immediately causing an infinite loop.
>>> >> >> >> >>
>>> >> >> >> >
>>> >> >> >> > Yes, this is what I'm thinking about. Taking in account all collected
>>> >> >> >> > debug info it looks like once LRs are overloaded with SGIs -
>>> >> >> >> > maintenance interrupt occurs.
>>> >> >> >> > And then it is not handled properly, and occurs again and again - so
>>> >> >> >> > platform hangs inside its handler.
>>> >> >> >> >
>>> >> >> >> >> Could you please try this patch? It disable GICH_HCR_UIE immediately on
>>> >> >> >> >> hypervisor entry.
>>> >> >> >> >>
>>> >> >> >> >
>>> >> >> >> > Now trying.
>>> >> >> >> >
>>> >> >> >> >>
>>> >> >> >> >> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
>>> >> >> >> >> index 4d2a92d..6ae8dc4 100644
>>> >> >> >> >> --- a/xen/arch/arm/gic.c
>>> >> >> >> >> +++ b/xen/arch/arm/gic.c
>>> >> >> >> >> @@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v)
>>> >> >> >> >>      if ( is_idle_vcpu(v) )
>>> >> >> >> >>          return;
>>> >> >> >> >>
>>> >> >> >> >> +    GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>>> >> >> >> >> +
>>> >> >> >> >>      spin_lock_irqsave(&v->arch.vgic.lock, flags);
>>> >> >> >> >>
>>> >> >> >> >>      while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask),
>>> >> >> >> >> @@ -821,12 +823,8 @@ void gic_inject(void)
>>> >> >> >> >>
>>> >> >> >> >>      gic_restore_pending_irqs(current);
>>> >> >> >> >>
>>> >> >> >> >> -
>>> >> >> >> >>      if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
>>> >> >> >> >>          GICH[GICH_HCR] |= GICH_HCR_UIE;
>>> >> >> >> >> -    else
>>> >> >> >> >> -        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>>> >> >> >> >> -
>>> >> >> >> >>  }
>>> >> >> >> >>
>>> >> >> >> >>  static void do_sgi(struct cpu_user_regs *regs, int othercpu, enum gic_sgi sgi)
>>> >> >> >> >
>>> >> >> >>
>>> >> >> >> Heh - I don't see hangs with this patch :) But also I see that
>>> >> >> >> maintenance interrupt doesn't occur (and no hang as result)
>>> >> >> >> Stefano - is this expected?
>>> >> >> >
>>> >> >> > No maintenance interrupts at all? That's strange. You should be
>>> >> >> > receiving them when LRs are full and you still have interrupts pending
>>> >> >> > to be added to them.
>>> >> >> >
>>> >> >> > You could add another printk here to see if you should be receiving
>>> >> >> > them:
>>> >> >> >
>>> >> >> >      if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
>>> >> >> > +    {
>>> >> >> > +        gdprintk(XENLOG_DEBUG, "requesting maintenance interrupt\n");
>>> >> >> >          GICH[GICH_HCR] |= GICH_HCR_UIE;
>>> >> >> > -    else
>>> >> >> > -        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>>> >> >> > -
>>> >> >> > +    }
>>> >> >> >  }
>>> >> >> >
>>> >> >>
>>> >> >> Requested properly:
>>> >> >>
>>> >> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
>>> >> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
>>> >> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
>>> >> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
>>> >> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
>>> >> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
>>> >> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
>>> >> >>
>>> >> >> But does not occur
>>> >> >
>>> >> > OK, let's see what's going on then by printing the irq number of the
>>> >> > maintenance interrupt:
>>> >> >
>>> >> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
>>> >> > index 4d2a92d..fed3167 100644
>>> >> > --- a/xen/arch/arm/gic.c
>>> >> > +++ b/xen/arch/arm/gic.c
>>> >> > @@ -55,6 +55,7 @@ static struct {
>>> >> >  static DEFINE_PER_CPU(uint64_t, lr_mask);
>>> >> >
>>> >> >  static uint8_t nr_lrs;
>>> >> > +static bool uie_on;
>>> >> >  #define lr_all_full() (this_cpu(lr_mask) == ((1 << nr_lrs) - 1))
>>> >> >
>>> >> >  /* The GIC mapping of CPU interfaces does not necessarily match the
>>> >> > @@ -694,6 +695,7 @@ void gic_clear_lrs(struct vcpu *v)
>>> >> >  {
>>> >> >      int i = 0;
>>> >> >      unsigned long flags;
>>> >> > +    unsigned long hcr;
>>> >> >
>>> >> >      /* The idle domain has no LRs to be cleared. Since gic_restore_state
>>> >> >       * doesn't write any LR registers for the idle domain they could be
>>> >> > @@ -701,6 +703,13 @@ void gic_clear_lrs(struct vcpu *v)
>>> >> >      if ( is_idle_vcpu(v) )
>>> >> >          return;
>>> >> >
>>> >> > +    hcr = GICH[GICH_HCR];
>>> >> > +    if ( hcr & GICH_HCR_UIE )
>>> >> > +    {
>>> >> > +        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
>>> >> > +        uie_on = 1;
>>> >> > +    }
>>> >> > +
>>> >> >      spin_lock_irqsave(&v->arch.vgic.lock, flags);
>>> >> >
>>> >> >      while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask),
>>> >> > @@ -865,6 +873,11 @@ void gic_interrupt(struct cpu_user_regs *regs, int is_fiq)
>>> >> >          intack = GICC[GICC_IAR];
>>> >> >          irq = intack & GICC_IA_IRQ;
>>> >> >
>>> >> > +        if ( uie_on )
>>> >> > +        {
>>> >> > +            uie_on = 0;
>>> >> > +            printk("received maintenance interrupt irq=%d\n", irq);
>>> >> > +        }
>>> >> >          if ( likely(irq >= 16 && irq < 1021) )
>>> >> >          {
>>> >> >              local_irq_enable();
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >>
>>> >> Andrii Tseglytskyi | Embedded Dev
>>> >> GlobalLogic
>>> >> www.globallogic.com
>>> >>
>>>
>>>
>>>
>>> --
>>>
>>> Andrii Tseglytskyi | Embedded Dev
>>> GlobalLogic
>>> www.globallogic.com
>>>
>
>
>
> --
>
> Andrii Tseglytskyi | Embedded Dev
> GlobalLogic
> www.globallogic.com



-- 

Andrii Tseglytskyi | Embedded Dev
GlobalLogic
www.globallogic.com

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Xen 4.5 random freeze question
  2014-11-19 18:06                                                                                   ` Andrii Tseglytskyi
@ 2014-11-19 18:14                                                                                     ` Stefano Stabellini
  2014-11-19 18:26                                                                                       ` Julien Grall
  0 siblings, 1 reply; 66+ messages in thread
From: Stefano Stabellini @ 2014-11-19 18:14 UTC (permalink / raw)
  To: Andrii Tseglytskyi
  Cc: Julien Grall, xen-devel, Ian Campbell, Stefano Stabellini

That's right, the maintenance interrupt handler is not called, but it
doesn't do anything so we are fine. The important thing is that an
interrupt is sent and git_clear_lrs gets called on hypervisor entry.

On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> The only ambiguity left - maintenance interrupt handler is not called.
> It was requested for specific IRQ number, retrieved from device tree.
> But when we trigger GICH_HCR_UIE - we got maintenance interrupt for
> spurious number 1023.
> 
> Regards,
> Andrii
> 
> On Wed, Nov 19, 2014 at 7:47 PM, Andrii Tseglytskyi
> <andrii.tseglytskyi@globallogic.com> wrote:
> > On Wed, Nov 19, 2014 at 7:42 PM, Stefano Stabellini
> > <stefano.stabellini@eu.citrix.com> wrote:
> >> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> >>> Hi Stefano,
> >>>
> >>> On Wed, Nov 19, 2014 at 7:07 PM, Stefano Stabellini
> >>> <stefano.stabellini@eu.citrix.com> wrote:
> >>> > I think that's OK: it looks like that on your board for some reasons
> >>> > when UIE is set you get irq 1023 (spurious interrupt) instead of your
> >>> > normal maintenance interrupt.
> >>>
> >>> OK, but I think this should be investigated too. What do you think ?
> >>
> >> I think it is harmless: my guess is that if we clear UIE before reading
> >> GICC_IAR, GICC_IAR returns spurious interrupt instead of maintenance
> >> interrupt. But it doesn't really matter to us.
> >
> > OK. I think catching this will be a good exercise for someone )) But
> > out of scope for this issue.
> >
> >>
> >>> >
> >>> > But everything should work anyway without issues.
> >>> >
> >>> > This is the same patch as before but on top of the lastest xen-unstable
> >>> > tree. Please confirm if it works.
> >>> >
> >>> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> >>> > index 70d10d6..df140b9 100644
> >>> > --- a/xen/arch/arm/gic.c
> >>> > +++ b/xen/arch/arm/gic.c
> >>> > @@ -403,6 +403,8 @@ void gic_clear_lrs(struct vcpu *v)
> >>> >      if ( is_idle_vcpu(v) )
> >>> >          return;
> >>> >
> >>> > +    gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 0);
> >>> > +
> >>> >      spin_lock_irqsave(&v->arch.vgic.lock, flags);
> >>> >
> >>> >      while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask),
> >>> > @@ -527,8 +529,6 @@ void gic_inject(void)
> >>> >
> >>> >      if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
> >>> >          gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 1);
> >>> > -    else
> >>> > -        gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 0);
> >>> >  }
> >>> >
> >>>
> >>> I confirm - it works fine. Will this be a final fix ?
> >>
> >> Yep :-)
> >> Many thanks for your help on this!
> >
> > Thank you Stefano. This issue was really critical for us :)
> >
> > Regards,
> > Andrii
> >
> >>
> >>
> >>> Regards,
> >>> Andrii
> >>>
> >>> >  static void do_sgi(struct cpu_user_regs *regs, enum gic_sgi sgi)
> >>> >
> >>> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> >>> >> I got this strange log:
> >>> >>
> >>> >> (XEN) received maintenance interrupt irq=1023
> >>> >>
> >>> >> And platform does not hang due to this:
> >>> >> +    hcr = GICH[GICH_HCR];
> >>> >> +    if ( hcr & GICH_HCR_UIE )
> >>> >> +    {
> >>> >> +        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> >>> >> +        uie_on = 1;
> >>> >> +    }
> >>> >>
> >>> >> On Wed, Nov 19, 2014 at 6:50 PM, Stefano Stabellini
> >>> >> <stefano.stabellini@eu.citrix.com> wrote:
> >>> >> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> >>> >> >> On Wed, Nov 19, 2014 at 6:13 PM, Stefano Stabellini
> >>> >> >> <stefano.stabellini@eu.citrix.com> wrote:
> >>> >> >> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> >>> >> >> >> On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi
> >>> >> >> >> <andrii.tseglytskyi@globallogic.com> wrote:
> >>> >> >> >> > On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini
> >>> >> >> >> > <stefano.stabellini@eu.citrix.com> wrote:
> >>> >> >> >> >> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> >>> >> >> >> >>> Hi Stefano,
> >>> >> >> >> >>>
> >>> >> >> >> >>> On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini
> >>> >> >> >> >>> <stefano.stabellini@eu.citrix.com> wrote:
> >>> >> >> >> >>> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> >>> >> >> >> >>> >> Hi Stefano,
> >>> >> >> >> >>> >>
> >>> >> >> >> >>> >> > >      if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
> >>> >> >> >> >>> >> > > -        GICH[GICH_HCR] |= GICH_HCR_UIE;
> >>> >> >> >> >>> >> > > +        GICH[GICH_HCR] |= GICH_HCR_NPIE;
> >>> >> >> >> >>> >> > >      else
> >>> >> >> >> >>> >> > > -        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> >>> >> >> >> >>> >> > > +        GICH[GICH_HCR] &= ~GICH_HCR_NPIE;
> >>> >> >> >> >>> >> > >
> >>> >> >> >> >>> >> > >  }
> >>> >> >> >> >>> >> >
> >>> >> >> >> >>> >> > Yes, exactly
> >>> >> >> >> >>> >>
> >>> >> >> >> >>> >> I tried, hang still occurs with this change
> >>> >> >> >> >>> >
> >>> >> >> >> >>> > We need to figure out why during the hang you still have all the LRs
> >>> >> >> >> >>> > busy even if you are getting maintenance interrupts that should cause
> >>> >> >> >> >>> > them to be cleared.
> >>> >> >> >> >>> >
> >>> >> >> >> >>>
> >>> >> >> >> >>> I see that I have free LRs during maintenance interrupt
> >>> >> >> >> >>>
> >>> >> >> >> >>> (XEN) gic.c:871:d0v0 maintenance interrupt
> >>> >> >> >> >>> (XEN) GICH_LRs (vcpu 0) mask=0
> >>> >> >> >> >>> (XEN)    HW_LR[0]=9a015856
> >>> >> >> >> >>> (XEN)    HW_LR[1]=0
> >>> >> >> >> >>> (XEN)    HW_LR[2]=0
> >>> >> >> >> >>> (XEN)    HW_LR[3]=0
> >>> >> >> >> >>> (XEN) Inflight irq=86 lr=0
> >>> >> >> >> >>> (XEN) Inflight irq=2 lr=255
> >>> >> >> >> >>> (XEN) Pending irq=2
> >>> >> >> >> >>>
> >>> >> >> >> >>> But I see that after I got hang - maintenance interrupts are generated
> >>> >> >> >> >>> continuously. Platform continues printing the same log till reboot.
> >>> >> >> >> >>
> >>> >> >> >> >> Exactly the same log? As in the one above you just pasted?
> >>> >> >> >> >> That is very very suspicious.
> >>> >> >> >> >
> >>> >> >> >> > Yes exactly the same log. And looks like it means that LRs are flushed
> >>> >> >> >> > correctly.
> >>> >> >> >> >
> >>> >> >> >> >>
> >>> >> >> >> >> I am thinking that we are not handling GICH_HCR_UIE correctly and
> >>> >> >> >> >> something we do in Xen, maybe writing to an LR register, might trigger a
> >>> >> >> >> >> new maintenance interrupt immediately causing an infinite loop.
> >>> >> >> >> >>
> >>> >> >> >> >
> >>> >> >> >> > Yes, this is what I'm thinking about. Taking in account all collected
> >>> >> >> >> > debug info it looks like once LRs are overloaded with SGIs -
> >>> >> >> >> > maintenance interrupt occurs.
> >>> >> >> >> > And then it is not handled properly, and occurs again and again - so
> >>> >> >> >> > platform hangs inside its handler.
> >>> >> >> >> >
> >>> >> >> >> >> Could you please try this patch? It disable GICH_HCR_UIE immediately on
> >>> >> >> >> >> hypervisor entry.
> >>> >> >> >> >>
> >>> >> >> >> >
> >>> >> >> >> > Now trying.
> >>> >> >> >> >
> >>> >> >> >> >>
> >>> >> >> >> >> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> >>> >> >> >> >> index 4d2a92d..6ae8dc4 100644
> >>> >> >> >> >> --- a/xen/arch/arm/gic.c
> >>> >> >> >> >> +++ b/xen/arch/arm/gic.c
> >>> >> >> >> >> @@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v)
> >>> >> >> >> >>      if ( is_idle_vcpu(v) )
> >>> >> >> >> >>          return;
> >>> >> >> >> >>
> >>> >> >> >> >> +    GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> >>> >> >> >> >> +
> >>> >> >> >> >>      spin_lock_irqsave(&v->arch.vgic.lock, flags);
> >>> >> >> >> >>
> >>> >> >> >> >>      while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask),
> >>> >> >> >> >> @@ -821,12 +823,8 @@ void gic_inject(void)
> >>> >> >> >> >>
> >>> >> >> >> >>      gic_restore_pending_irqs(current);
> >>> >> >> >> >>
> >>> >> >> >> >> -
> >>> >> >> >> >>      if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
> >>> >> >> >> >>          GICH[GICH_HCR] |= GICH_HCR_UIE;
> >>> >> >> >> >> -    else
> >>> >> >> >> >> -        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> >>> >> >> >> >> -
> >>> >> >> >> >>  }
> >>> >> >> >> >>
> >>> >> >> >> >>  static void do_sgi(struct cpu_user_regs *regs, int othercpu, enum gic_sgi sgi)
> >>> >> >> >> >
> >>> >> >> >>
> >>> >> >> >> Heh - I don't see hangs with this patch :) But also I see that
> >>> >> >> >> maintenance interrupt doesn't occur (and no hang as result)
> >>> >> >> >> Stefano - is this expected?
> >>> >> >> >
> >>> >> >> > No maintenance interrupts at all? That's strange. You should be
> >>> >> >> > receiving them when LRs are full and you still have interrupts pending
> >>> >> >> > to be added to them.
> >>> >> >> >
> >>> >> >> > You could add another printk here to see if you should be receiving
> >>> >> >> > them:
> >>> >> >> >
> >>> >> >> >      if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
> >>> >> >> > +    {
> >>> >> >> > +        gdprintk(XENLOG_DEBUG, "requesting maintenance interrupt\n");
> >>> >> >> >          GICH[GICH_HCR] |= GICH_HCR_UIE;
> >>> >> >> > -    else
> >>> >> >> > -        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> >>> >> >> > -
> >>> >> >> > +    }
> >>> >> >> >  }
> >>> >> >> >
> >>> >> >>
> >>> >> >> Requested properly:
> >>> >> >>
> >>> >> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> >>> >> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> >>> >> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> >>> >> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> >>> >> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> >>> >> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> >>> >> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> >>> >> >>
> >>> >> >> But does not occur
> >>> >> >
> >>> >> > OK, let's see what's going on then by printing the irq number of the
> >>> >> > maintenance interrupt:
> >>> >> >
> >>> >> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> >>> >> > index 4d2a92d..fed3167 100644
> >>> >> > --- a/xen/arch/arm/gic.c
> >>> >> > +++ b/xen/arch/arm/gic.c
> >>> >> > @@ -55,6 +55,7 @@ static struct {
> >>> >> >  static DEFINE_PER_CPU(uint64_t, lr_mask);
> >>> >> >
> >>> >> >  static uint8_t nr_lrs;
> >>> >> > +static bool uie_on;
> >>> >> >  #define lr_all_full() (this_cpu(lr_mask) == ((1 << nr_lrs) - 1))
> >>> >> >
> >>> >> >  /* The GIC mapping of CPU interfaces does not necessarily match the
> >>> >> > @@ -694,6 +695,7 @@ void gic_clear_lrs(struct vcpu *v)
> >>> >> >  {
> >>> >> >      int i = 0;
> >>> >> >      unsigned long flags;
> >>> >> > +    unsigned long hcr;
> >>> >> >
> >>> >> >      /* The idle domain has no LRs to be cleared. Since gic_restore_state
> >>> >> >       * doesn't write any LR registers for the idle domain they could be
> >>> >> > @@ -701,6 +703,13 @@ void gic_clear_lrs(struct vcpu *v)
> >>> >> >      if ( is_idle_vcpu(v) )
> >>> >> >          return;
> >>> >> >
> >>> >> > +    hcr = GICH[GICH_HCR];
> >>> >> > +    if ( hcr & GICH_HCR_UIE )
> >>> >> > +    {
> >>> >> > +        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> >>> >> > +        uie_on = 1;
> >>> >> > +    }
> >>> >> > +
> >>> >> >      spin_lock_irqsave(&v->arch.vgic.lock, flags);
> >>> >> >
> >>> >> >      while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask),
> >>> >> > @@ -865,6 +873,11 @@ void gic_interrupt(struct cpu_user_regs *regs, int is_fiq)
> >>> >> >          intack = GICC[GICC_IAR];
> >>> >> >          irq = intack & GICC_IA_IRQ;
> >>> >> >
> >>> >> > +        if ( uie_on )
> >>> >> > +        {
> >>> >> > +            uie_on = 0;
> >>> >> > +            printk("received maintenance interrupt irq=%d\n", irq);
> >>> >> > +        }
> >>> >> >          if ( likely(irq >= 16 && irq < 1021) )
> >>> >> >          {
> >>> >> >              local_irq_enable();
> >>> >>
> >>> >>
> >>> >>
> >>> >> --
> >>> >>
> >>> >> Andrii Tseglytskyi | Embedded Dev
> >>> >> GlobalLogic
> >>> >> www.globallogic.com
> >>> >>
> >>>
> >>>
> >>>
> >>> --
> >>>
> >>> Andrii Tseglytskyi | Embedded Dev
> >>> GlobalLogic
> >>> www.globallogic.com
> >>>
> >
> >
> >
> > --
> >
> > Andrii Tseglytskyi | Embedded Dev
> > GlobalLogic
> > www.globallogic.com
> 
> 
> 
> -- 
> 
> Andrii Tseglytskyi | Embedded Dev
> GlobalLogic
> www.globallogic.com
> 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Xen 4.5 random freeze question
  2014-11-19 18:14                                                                                     ` Stefano Stabellini
@ 2014-11-19 18:26                                                                                       ` Julien Grall
  2014-11-19 18:31                                                                                         ` Stefano Stabellini
  0 siblings, 1 reply; 66+ messages in thread
From: Julien Grall @ 2014-11-19 18:26 UTC (permalink / raw)
  To: Stefano Stabellini, Andrii Tseglytskyi; +Cc: Ian Campbell, xen-devel

On 11/19/2014 06:14 PM, Stefano Stabellini wrote:
> That's right, the maintenance interrupt handler is not called, but it
> doesn't do anything so we are fine. The important thing is that an
> interrupt is sent and git_clear_lrs gets called on hypervisor entry.

It would be worth to write down this somewhere. Just in case someone
decide to add code in maintenance interrupt later.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Xen 4.5 random freeze question
  2014-11-19 18:26                                                                                       ` Julien Grall
@ 2014-11-19 18:31                                                                                         ` Stefano Stabellini
  2014-11-19 19:24                                                                                           ` Andrii Tseglytskyi
  0 siblings, 1 reply; 66+ messages in thread
From: Stefano Stabellini @ 2014-11-19 18:31 UTC (permalink / raw)
  To: Julien Grall
  Cc: Andrii Tseglytskyi, xen-devel, Ian Campbell, Stefano Stabellini

On Wed, 19 Nov 2014, Julien Grall wrote:
> On 11/19/2014 06:14 PM, Stefano Stabellini wrote:
> > That's right, the maintenance interrupt handler is not called, but it
> > doesn't do anything so we are fine. The important thing is that an
> > interrupt is sent and git_clear_lrs gets called on hypervisor entry.
> 
> It would be worth to write down this somewhere. Just in case someone
> decide to add code in maintenance interrupt later.

Yes, I could add a comment in the handler

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Xen 4.5 random freeze question
  2014-11-19 18:31                                                                                         ` Stefano Stabellini
@ 2014-11-19 19:24                                                                                           ` Andrii Tseglytskyi
  2014-11-20 10:28                                                                                             ` Stefano Stabellini
  0 siblings, 1 reply; 66+ messages in thread
From: Andrii Tseglytskyi @ 2014-11-19 19:24 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: Julien Grall, Ian Campbell, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 793 bytes --]

19 лист. 2014 20:32, користувач "Stefano Stabellini" <
stefano.stabellini@eu.citrix.com> написав:
>
> On Wed, 19 Nov 2014, Julien Grall wrote:
> > On 11/19/2014 06:14 PM, Stefano Stabellini wrote:
> > > That's right, the maintenance interrupt handler is not called, but it
> > > doesn't do anything so we are fine. The important thing is that an
> > > interrupt is sent and git_clear_lrs gets called on hypervisor entry.
> >
> > It would be worth to write down this somewhere. Just in case someone
> > decide to add code in maintenance interrupt later.
>
> Yes, I could add a comment in the handler

Maybe it wouldn't take a lot of effort to fix it? I am just worrying that
we may hide some issue - typically spurious interrupt this not what is
expected.

[-- Attachment #1.2: Type: text/html, Size: 1022 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Xen 4.5 random freeze question
  2014-11-19 19:24                                                                                           ` Andrii Tseglytskyi
@ 2014-11-20 10:28                                                                                             ` Stefano Stabellini
  2014-11-20 11:15                                                                                               ` Julien Grall
  0 siblings, 1 reply; 66+ messages in thread
From: Stefano Stabellini @ 2014-11-20 10:28 UTC (permalink / raw)
  To: Andrii Tseglytskyi
  Cc: Julien Grall, xen-devel, Ian Campbell, Stefano Stabellini

[-- Attachment #1: Type: text/plain, Size: 1646 bytes --]

On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> 19 лист. 2014 20:32, користувач "Stefano Stabellini" <stefano.stabellini@eu.citrix.com> написав:
> >
> > On Wed, 19 Nov 2014, Julien Grall wrote:
> > > On 11/19/2014 06:14 PM, Stefano Stabellini wrote:
> > > > That's right, the maintenance interrupt handler is not called, but it
> > > > doesn't do anything so we are fine. The important thing is that an
> > > > interrupt is sent and git_clear_lrs gets called on hypervisor entry.
> > >
> > > It would be worth to write down this somewhere. Just in case someone
> > > decide to add code in maintenance interrupt later.
> >
> > Yes, I could add a comment in the handler
> 
> Maybe it wouldn't take a lot of effort to fix it? I am just worrying that we may hide some issue -
> typically spurious interrupt this not what is expected.

My guess is that by clearing UIE before reading GICC_IAR, we "clear" the
maintenance interrupt too, as a consequence the following read to
GICC_IAR would return 1023 (nothing to be read). As bit as if the
maintenance interrupt was a level interrupt and we just disabled it.

So I think that if we cleared UIE after reading GICC_IAR, GICC_IAR would
return the correct value.

However with the current structure of the code, the first thing that we
do upon entering the hypervisor is clearing LRs and given what happened
on your platform I think is a good idea to do it with UIE disabled.

This is way I would rather read spurious interrupts but read/write LRs
with UIE disabled than reading maintenance interrupts but risking
strange behaviours on some platforms.

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Xen 4.5 random freeze question
  2014-11-20 10:28                                                                                             ` Stefano Stabellini
@ 2014-11-20 11:15                                                                                               ` Julien Grall
  2014-11-20 16:06                                                                                                 ` Andrii Tseglytskyi
  0 siblings, 1 reply; 66+ messages in thread
From: Julien Grall @ 2014-11-20 11:15 UTC (permalink / raw)
  To: Stefano Stabellini, Andrii Tseglytskyi; +Cc: Ian Campbell, xen-devel

On 11/20/2014 10:28 AM, Stefano Stabellini wrote:
> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>> 19 лист. 2014 20:32, користувач "Stefano Stabellini" <stefano.stabellini@eu.citrix.com> написав:
>>>
>>> On Wed, 19 Nov 2014, Julien Grall wrote:
>>>> On 11/19/2014 06:14 PM, Stefano Stabellini wrote:
>>>>> That's right, the maintenance interrupt handler is not called, but it
>>>>> doesn't do anything so we are fine. The important thing is that an
>>>>> interrupt is sent and git_clear_lrs gets called on hypervisor entry.
>>>>
>>>> It would be worth to write down this somewhere. Just in case someone
>>>> decide to add code in maintenance interrupt later.
>>>
>>> Yes, I could add a comment in the handler
>>
>> Maybe it wouldn't take a lot of effort to fix it? I am just worrying that we may hide some issue -
>> typically spurious interrupt this not what is expected.
> 
> My guess is that by clearing UIE before reading GICC_IAR, we "clear" the
> maintenance interrupt too, as a consequence the following read to
> GICC_IAR would return 1023 (nothing to be read). As bit as if the
> maintenance interrupt was a level interrupt and we just disabled it.
> 
> So I think that if we cleared UIE after reading GICC_IAR, GICC_IAR would
> return the correct value.
> 
> However with the current structure of the code, the first thing that we
> do upon entering the hypervisor is clearing LRs and given what happened
> on your platform I think is a good idea to do it with UIE disabled.

Agreed. UIE should be disabled to avoid another maintenance interrupt as
soon as we EOI the IRQ.

> This is way I would rather read spurious interrupts but read/write LRs
> with UIE disabled than reading maintenance interrupts but risking
> strange behaviours on some platforms.

Reading the GIC-v2 documentation, the spurious interrupt things should
happen on any platform every time the UIE is disabled while we receive a
maintenance interrupt.

"The read returns a spurious interrupt ID of 1023 if any of the
following apply:

- no pending interrupt on the CPU interface has sufficient priority for
the interface to signal it to the processor"

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Xen 4.5 random freeze question
  2014-11-20 11:15                                                                                               ` Julien Grall
@ 2014-11-20 16:06                                                                                                 ` Andrii Tseglytskyi
  2014-11-20 16:15                                                                                                   ` Stefano Stabellini
  0 siblings, 1 reply; 66+ messages in thread
From: Andrii Tseglytskyi @ 2014-11-20 16:06 UTC (permalink / raw)
  To: Julien Grall; +Cc: xen-devel, Ian Campbell, Stefano Stabellini

I think I'll debug this a bit later - unfortunately, now don't have
time for this. But I want to get rid of spurious interrupt here.

BTW - Stefano are you going to post the patch that we created
yesterday ? Will Ian accept it?

Regards,
Andrii

On Thu, Nov 20, 2014 at 1:15 PM, Julien Grall <julien.grall@linaro.org> wrote:
> On 11/20/2014 10:28 AM, Stefano Stabellini wrote:
>> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
>>> 19 лист. 2014 20:32, користувач "Stefano Stabellini" <stefano.stabellini@eu.citrix.com> написав:
>>>>
>>>> On Wed, 19 Nov 2014, Julien Grall wrote:
>>>>> On 11/19/2014 06:14 PM, Stefano Stabellini wrote:
>>>>>> That's right, the maintenance interrupt handler is not called, but it
>>>>>> doesn't do anything so we are fine. The important thing is that an
>>>>>> interrupt is sent and git_clear_lrs gets called on hypervisor entry.
>>>>>
>>>>> It would be worth to write down this somewhere. Just in case someone
>>>>> decide to add code in maintenance interrupt later.
>>>>
>>>> Yes, I could add a comment in the handler
>>>
>>> Maybe it wouldn't take a lot of effort to fix it? I am just worrying that we may hide some issue -
>>> typically spurious interrupt this not what is expected.
>>
>> My guess is that by clearing UIE before reading GICC_IAR, we "clear" the
>> maintenance interrupt too, as a consequence the following read to
>> GICC_IAR would return 1023 (nothing to be read). As bit as if the
>> maintenance interrupt was a level interrupt and we just disabled it.
>>
>> So I think that if we cleared UIE after reading GICC_IAR, GICC_IAR would
>> return the correct value.
>>
>> However with the current structure of the code, the first thing that we
>> do upon entering the hypervisor is clearing LRs and given what happened
>> on your platform I think is a good idea to do it with UIE disabled.
>
> Agreed. UIE should be disabled to avoid another maintenance interrupt as
> soon as we EOI the IRQ.
>
>> This is way I would rather read spurious interrupts but read/write LRs
>> with UIE disabled than reading maintenance interrupts but risking
>> strange behaviours on some platforms.
>
> Reading the GIC-v2 documentation, the spurious interrupt things should
> happen on any platform every time the UIE is disabled while we receive a
> maintenance interrupt.
>
> "The read returns a spurious interrupt ID of 1023 if any of the
> following apply:
>
> - no pending interrupt on the CPU interface has sufficient priority for
> the interface to signal it to the processor"
>
> --
> Julien Grall



-- 

Andrii Tseglytskyi | Embedded Dev
GlobalLogic
www.globallogic.com

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Xen 4.5 random freeze question
  2014-11-20 16:06                                                                                                 ` Andrii Tseglytskyi
@ 2014-11-20 16:15                                                                                                   ` Stefano Stabellini
  2014-11-20 16:43                                                                                                     ` Andrii Tseglytskyi
  0 siblings, 1 reply; 66+ messages in thread
From: Stefano Stabellini @ 2014-11-20 16:15 UTC (permalink / raw)
  To: Andrii Tseglytskyi
  Cc: Julien Grall, xen-devel, Ian Campbell, Stefano Stabellini

[-- Attachment #1: Type: text/plain, Size: 2993 bytes --]

Already posted:

http://marc.info/?l=xen-devel&m=141648092100568

Ian hasn't provided any feedback yet.

On Thu, 20 Nov 2014, Andrii Tseglytskyi wrote:
> I think I'll debug this a bit later - unfortunately, now don't have
> time for this. But I want to get rid of spurious interrupt here.
> 
> BTW - Stefano are you going to post the patch that we created
> yesterday ? Will Ian accept it?
> 
> Regards,
> Andrii
> 
> On Thu, Nov 20, 2014 at 1:15 PM, Julien Grall <julien.grall@linaro.org> wrote:
> > On 11/20/2014 10:28 AM, Stefano Stabellini wrote:
> >> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> >>> 19 лист. 2014 20:32, користувач "Stefano Stabellini" <stefano.stabellini@eu.citrix.com> написав:
> >>>>
> >>>> On Wed, 19 Nov 2014, Julien Grall wrote:
> >>>>> On 11/19/2014 06:14 PM, Stefano Stabellini wrote:
> >>>>>> That's right, the maintenance interrupt handler is not called, but it
> >>>>>> doesn't do anything so we are fine. The important thing is that an
> >>>>>> interrupt is sent and git_clear_lrs gets called on hypervisor entry.
> >>>>>
> >>>>> It would be worth to write down this somewhere. Just in case someone
> >>>>> decide to add code in maintenance interrupt later.
> >>>>
> >>>> Yes, I could add a comment in the handler
> >>>
> >>> Maybe it wouldn't take a lot of effort to fix it? I am just worrying that we may hide some issue -
> >>> typically spurious interrupt this not what is expected.
> >>
> >> My guess is that by clearing UIE before reading GICC_IAR, we "clear" the
> >> maintenance interrupt too, as a consequence the following read to
> >> GICC_IAR would return 1023 (nothing to be read). As bit as if the
> >> maintenance interrupt was a level interrupt and we just disabled it.
> >>
> >> So I think that if we cleared UIE after reading GICC_IAR, GICC_IAR would
> >> return the correct value.
> >>
> >> However with the current structure of the code, the first thing that we
> >> do upon entering the hypervisor is clearing LRs and given what happened
> >> on your platform I think is a good idea to do it with UIE disabled.
> >
> > Agreed. UIE should be disabled to avoid another maintenance interrupt as
> > soon as we EOI the IRQ.
> >
> >> This is way I would rather read spurious interrupts but read/write LRs
> >> with UIE disabled than reading maintenance interrupts but risking
> >> strange behaviours on some platforms.
> >
> > Reading the GIC-v2 documentation, the spurious interrupt things should
> > happen on any platform every time the UIE is disabled while we receive a
> > maintenance interrupt.
> >
> > "The read returns a spurious interrupt ID of 1023 if any of the
> > following apply:
> >
> > - no pending interrupt on the CPU interface has sufficient priority for
> > the interface to signal it to the processor"
> >
> > --
> > Julien Grall
> 
> 
> 
> -- 
> 
> Andrii Tseglytskyi | Embedded Dev
> GlobalLogic
> www.globallogic.com
> 

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Xen 4.5 random freeze question
  2014-11-20 16:15                                                                                                   ` Stefano Stabellini
@ 2014-11-20 16:43                                                                                                     ` Andrii Tseglytskyi
  0 siblings, 0 replies; 66+ messages in thread
From: Andrii Tseglytskyi @ 2014-11-20 16:43 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: Julien Grall, Ian Campbell, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 3383 bytes --]

OK - I see. thanks a lot.

On Thu, Nov 20, 2014 at 6:15 PM, Stefano Stabellini <
stefano.stabellini@eu.citrix.com> wrote:

> Already posted:
>
> http://marc.info/?l=xen-devel&m=141648092100568
>
> Ian hasn't provided any feedback yet.
>
> On Thu, 20 Nov 2014, Andrii Tseglytskyi wrote:
> > I think I'll debug this a bit later - unfortunately, now don't have
> > time for this. But I want to get rid of spurious interrupt here.
> >
> > BTW - Stefano are you going to post the patch that we created
> > yesterday ? Will Ian accept it?
> >
> > Regards,
> > Andrii
> >
> > On Thu, Nov 20, 2014 at 1:15 PM, Julien Grall <julien.grall@linaro.org>
> wrote:
> > > On 11/20/2014 10:28 AM, Stefano Stabellini wrote:
> > >> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> > >>> 19 лист. 2014 20:32, користувач "Stefano Stabellini" <
> stefano.stabellini@eu.citrix.com> написав:
> > >>>>
> > >>>> On Wed, 19 Nov 2014, Julien Grall wrote:
> > >>>>> On 11/19/2014 06:14 PM, Stefano Stabellini wrote:
> > >>>>>> That's right, the maintenance interrupt handler is not called,
> but it
> > >>>>>> doesn't do anything so we are fine. The important thing is that an
> > >>>>>> interrupt is sent and git_clear_lrs gets called on hypervisor
> entry.
> > >>>>>
> > >>>>> It would be worth to write down this somewhere. Just in case
> someone
> > >>>>> decide to add code in maintenance interrupt later.
> > >>>>
> > >>>> Yes, I could add a comment in the handler
> > >>>
> > >>> Maybe it wouldn't take a lot of effort to fix it? I am just worrying
> that we may hide some issue -
> > >>> typically spurious interrupt this not what is expected.
> > >>
> > >> My guess is that by clearing UIE before reading GICC_IAR, we "clear"
> the
> > >> maintenance interrupt too, as a consequence the following read to
> > >> GICC_IAR would return 1023 (nothing to be read). As bit as if the
> > >> maintenance interrupt was a level interrupt and we just disabled it.
> > >>
> > >> So I think that if we cleared UIE after reading GICC_IAR, GICC_IAR
> would
> > >> return the correct value.
> > >>
> > >> However with the current structure of the code, the first thing that
> we
> > >> do upon entering the hypervisor is clearing LRs and given what
> happened
> > >> on your platform I think is a good idea to do it with UIE disabled.
> > >
> > > Agreed. UIE should be disabled to avoid another maintenance interrupt
> as
> > > soon as we EOI the IRQ.
> > >
> > >> This is way I would rather read spurious interrupts but read/write LRs
> > >> with UIE disabled than reading maintenance interrupts but risking
> > >> strange behaviours on some platforms.
> > >
> > > Reading the GIC-v2 documentation, the spurious interrupt things should
> > > happen on any platform every time the UIE is disabled while we receive
> a
> > > maintenance interrupt.
> > >
> > > "The read returns a spurious interrupt ID of 1023 if any of the
> > > following apply:
> > >
> > > - no pending interrupt on the CPU interface has sufficient priority for
> > > the interface to signal it to the processor"
> > >
> > > --
> > > Julien Grall
> >
> >
> >
> > --
> >
> > Andrii Tseglytskyi | Embedded Dev
> > GlobalLogic
> > www.globallogic.com
> >
>



-- 

Andrii Tseglytskyi | Embedded Dev
GlobalLogic
www.globallogic.com

[-- Attachment #1.2: Type: text/html, Size: 5528 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 66+ messages in thread

end of thread, other threads:[~2014-11-20 16:43 UTC | newest]

Thread overview: 66+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-11-14 14:25 Xen 4.5 random freeze question Andrii Tseglytskyi
2014-11-14 14:35 ` Stefano Stabellini
2014-11-14 14:43   ` Andrii Tseglytskyi
2014-11-14 15:22     ` Stefano Stabellini
2014-11-14 15:39       ` Andrii Tseglytskyi
2014-11-14 15:49         ` Julien Grall
2014-11-14 15:58           ` Andrii Tseglytskyi
2014-11-14 16:15         ` Stefano Stabellini
2014-11-14 16:22           ` Andrii Tseglytskyi
2014-11-14 16:35             ` Julien Grall
2014-11-14 16:40               ` Andrii Tseglytskyi
2014-11-17 15:47                 ` Andrii Tseglytskyi
2014-11-17 16:39                   ` Stefano Stabellini
2014-11-17 17:05                     ` Andrii Tseglytskyi
2014-11-17 18:02                       ` Stefano Stabellini
2014-11-18 10:41                         ` Andrii Tseglytskyi
2014-11-18 11:31                           ` Andrii Tseglytskyi
2014-11-18 12:35                             ` Andrii Tseglytskyi
2014-11-18 15:39                               ` Stefano Stabellini
2014-11-18 16:11                                 ` Andrii Tseglytskyi
2014-11-18 16:14                                   ` Stefano Stabellini
2014-11-18 16:18                                     ` Andrii Tseglytskyi
2014-11-18 16:46                                       ` Andrii Tseglytskyi
2014-11-18 17:51                                         ` Stefano Stabellini
2014-11-19  9:38                                           ` Andrii Tseglytskyi
2014-11-19 11:12                                             ` Stefano Stabellini
2014-11-19 11:16                                               ` Andrii Tseglytskyi
2014-11-19 11:42                                                 ` Stefano Stabellini
2014-11-19 11:57                                                   ` Andrii Tseglytskyi
2014-11-19 11:59                                                     ` Stefano Stabellini
2014-11-19 12:37                                                       ` Andrii Tseglytskyi
2014-11-19 14:52                                                         ` Stefano Stabellini
2014-11-19 15:27                                                           ` Andrii Tseglytskyi
2014-11-19 15:41                                                             ` Stefano Stabellini
2014-11-19 16:01                                                               ` Andrii Tseglytskyi
2014-11-19 16:09                                                                 ` Andrii Tseglytskyi
2014-11-19 16:13                                                                   ` Stefano Stabellini
2014-11-19 16:29                                                                     ` Andrii Tseglytskyi
2014-11-19 16:32                                                                       ` Andrii Tseglytskyi
2014-11-19 16:43                                                                         ` Andrii Tseglytskyi
2014-11-19 16:52                                                                           ` Stefano Stabellini
2014-11-19 16:50                                                                       ` Stefano Stabellini
2014-11-19 17:03                                                                         ` Andrii Tseglytskyi
2014-11-19 17:07                                                                           ` Stefano Stabellini
2014-11-19 17:37                                                                             ` Andrii Tseglytskyi
2014-11-19 17:42                                                                               ` Stefano Stabellini
2014-11-19 17:47                                                                                 ` Andrii Tseglytskyi
2014-11-19 18:06                                                                                   ` Andrii Tseglytskyi
2014-11-19 18:14                                                                                     ` Stefano Stabellini
2014-11-19 18:26                                                                                       ` Julien Grall
2014-11-19 18:31                                                                                         ` Stefano Stabellini
2014-11-19 19:24                                                                                           ` Andrii Tseglytskyi
2014-11-20 10:28                                                                                             ` Stefano Stabellini
2014-11-20 11:15                                                                                               ` Julien Grall
2014-11-20 16:06                                                                                                 ` Andrii Tseglytskyi
2014-11-20 16:15                                                                                                   ` Stefano Stabellini
2014-11-20 16:43                                                                                                     ` Andrii Tseglytskyi
2014-11-19 17:11                                                                           ` Andrii Tseglytskyi
2014-11-19 17:14                                                                             ` Stefano Stabellini
2014-11-19 12:13                                                   ` Ian Campbell
2014-11-19 12:17                                                     ` Stefano Stabellini
2014-11-19 12:23                                                       ` Julien Grall
2014-11-19 12:40                                                         ` Andrii Tseglytskyi
2014-11-19 13:26                                                           ` Julien Grall
2014-11-19 13:30                                                             ` Andrii Tseglytskyi
2014-11-19 14:05                                                               ` Julien Grall

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.