All of lore.kernel.org
 help / color / mirror / Atom feed
* kselftest:lost_exception_test failure with 4.11.0-rc5
@ 2017-04-07  8:05 Sachin Sant
  2017-04-07 12:36 ` Michael Ellerman
  0 siblings, 1 reply; 5+ messages in thread
From: Sachin Sant @ 2017-04-07  8:05 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Madhavan Srinivasan, Michael Ellerman

I have run into few instances where the lost_exception_test from
powerpc kselftest fails with SIGABRT. Following o/p is against
4.11.0-rc5. The failure is intermittent. 

When the test fails it is killed due to SIGABRT.

# ./lost_exception_test 
test: lost_exception
tags: git_version:unknown
Binding to cpu 8
main test running as pid 9208
EBB Handler is at 0x10003dcc
!! killing lost_exception
ebb_state:
  ebb_count    = 191529
  spurious     = 0
  negative     = 0
  no_overflow  = 0
  pmc[1] count = 0x0
  pmc[2] count = 0x0
  pmc[3] count = 0x0
  pmc[4] count = 0x4c1b707
  pmc[5] count = 0x0
  pmc[6] count = 0x0
HW state:
MMCR0 0x0000000080000080 FC PMAO 
MMCR2 0x0000000000000000
EBBHR 0x0000000010003dcc
BESCR 0x8000000100000000 GE PMAE 
PMC1  0x0000000000000000
PMC2  0x0000000000000000
PMC3  0x0000000000000000
PMC4  0x0000000080000000
PMC5  0x0000000088d4f0c8
PMC6  0x000000001e49da22
SIAR  0x00003fffad60a608
!! child died by signal 6
failure: lost_exception
#

Thanks
-Sachin

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: kselftest:lost_exception_test failure with 4.11.0-rc5
  2017-04-07  8:05 kselftest:lost_exception_test failure with 4.11.0-rc5 Sachin Sant
@ 2017-04-07 12:36 ` Michael Ellerman
  2017-04-10  3:54   ` Madhavan Srinivasan
  2017-04-10  8:00   ` Sachin Sant
  0 siblings, 2 replies; 5+ messages in thread
From: Michael Ellerman @ 2017-04-07 12:36 UTC (permalink / raw)
  To: Sachin Sant, linuxppc-dev; +Cc: Madhavan Srinivasan

Sachin Sant <sachinp@linux.vnet.ibm.com> writes:

> I have run into few instances where the lost_exception_test from
> powerpc kselftest fails with SIGABRT. Following o/p is against
> 4.11.0-rc5. The failure is intermittent. 

What hardware are you on?

How long does it take to run when it fails? I assume ~2 minutes?

> When the test fails it is killed due to SIGABRT.

> # ./lost_exception_test 
> test: lost_exception
> tags: git_version:unknown
> Binding to cpu 8
> main test running as pid 9208
> EBB Handler is at 0x10003dcc
> !! killing lost_exception

This is the parent (test harness saying) it's about to kill the child,
because it took too long.

It sends SIGTERM, but the child catches that, prints all this info, and
then aborts() - so that's why you're seeing SIGABRT.

> ebb_state):
>   ebb_count    = 191529

The test usually runs until it's taken 1,000,000 EBBs, so it looks like
we got stuck.

>   spurious     = 0
>   negative     = 0
>   no_overflow  = 0
>   pmc[1] count = 0x0
>   pmc[2] count = 0x0
>   pmc[3] count = 0x0
>   pmc[4] count = 0x4c1b707

We use a varying sample period of between 400 and 600, and from above
we've taken 191,529 EBBs.

0x4c1b707 / 191,529 ~= 416

So that looks reasonable.

>   pmc[5] count = 0x0
>   pmc[6] count = 0x0
> HW state:
> MMCR0 0x0000000080000080 FC PMAO 

But this says we're stopped with counters frozen and an event pending.

> MMCR2 0x0000000000000000
> EBBHR 0x0000000010003dcc
> BESCR 0x8000000100000000 GE PMAE 

And that says we have global enable set and events enabled.


So I think there is a bug here somewhere. I don't really have time to
dig into it now, neither does Maddy I think. But we should try and get
to it at some point.

cheers

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: kselftest:lost_exception_test failure with 4.11.0-rc5
  2017-04-07 12:36 ` Michael Ellerman
@ 2017-04-10  3:54   ` Madhavan Srinivasan
  2017-04-11 10:05     ` Michael Ellerman
  2017-04-10  8:00   ` Sachin Sant
  1 sibling, 1 reply; 5+ messages in thread
From: Madhavan Srinivasan @ 2017-04-10  3:54 UTC (permalink / raw)
  To: Michael Ellerman, Sachin Sant, linuxppc-dev



On Friday 07 April 2017 06:06 PM, Michael Ellerman wrote:
> Sachin Sant <sachinp@linux.vnet.ibm.com> writes:
>
>> I have run into few instances where the lost_exception_test from
>> powerpc kselftest fails with SIGABRT. Following o/p is against
>> 4.11.0-rc5. The failure is intermittent.
> What hardware are you on?
>
> How long does it take to run when it fails? I assume ~2 minutes?

Started a run in power8 host (habanero) and it is more than 24hrs and
havent failed yet. So this should be guest/VM scenario then?

>
>> When the test fails it is killed due to SIGABRT.
>> # ./lost_exception_test
>> test: lost_exception
>> tags: git_version:unknown
>> Binding to cpu 8
>> main test running as pid 9208
>> EBB Handler is at 0x10003dcc
>> !! killing lost_exception
> This is the parent (test harness saying) it's about to kill the child,
> because it took too long.
>
> It sends SIGTERM, but the child catches that, prints all this info, and
> then aborts() - so that's why you're seeing SIGABRT.
>
>> ebb_state):
>>    ebb_count    = 191529
> The test usually runs until it's taken 1,000,000 EBBs, so it looks like
> we got stuck.
>
>>    spurious     = 0
>>    negative     = 0
>>    no_overflow  = 0
>>    pmc[1] count = 0x0
>>    pmc[2] count = 0x0
>>    pmc[3] count = 0x0
>>    pmc[4] count = 0x4c1b707
> We use a varying sample period of between 400 and 600, and from above
> we've taken 191,529 EBBs.
>
> 0x4c1b707 / 191,529 ~= 416
>
> So that looks reasonable.
>
>>    pmc[5] count = 0x0
>>    pmc[6] count = 0x0
>> HW state:
>> MMCR0 0x0000000080000080 FC PMAO
> But this says we're stopped with counters frozen and an event pending.
>
>> MMCR2 0x0000000000000000
>> EBBHR 0x0000000010003dcc
>> BESCR 0x8000000100000000 GE PMAE
> And that says we have global enable set and events enabled.
>
>
> So I think there is a bug here somewhere. I don't really have time to
> dig into it now, neither does Maddy I think. But we should try and get
> to it at some point.
>
> cheers
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: kselftest:lost_exception_test failure with 4.11.0-rc5
  2017-04-07 12:36 ` Michael Ellerman
  2017-04-10  3:54   ` Madhavan Srinivasan
@ 2017-04-10  8:00   ` Sachin Sant
  1 sibling, 0 replies; 5+ messages in thread
From: Sachin Sant @ 2017-04-10  8:00 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev, Madhavan Srinivasan


> On 07-Apr-2017, at 6:06 PM, Michael Ellerman <mpe@ellerman.id.au> wrote:
> 
> Sachin Sant <sachinp@linux.vnet.ibm.com> writes:
> 
>> I have run into few instances where the lost_exception_test from
>> powerpc kselftest fails with SIGABRT. Following o/p is against
>> 4.11.0-rc5. The failure is intermittent. 
> 
> What hardware are you on?

I have seen this problem on a POWER8 LPAR.

> 
> How long does it take to run when it fails? I assume ~2 minutes?

Yes somewhere around 2 min.


>> MMCR2 0x0000000000000000
>> EBBHR 0x0000000010003dcc
>> BESCR 0x8000000100000000 GE PMAE 
> 
> And that says we have global enable set and events enabled.
> 
> 
> So I think there is a bug here somewhere. I don't really have time to
> dig into it now, neither does Maddy I think. But we should try and get
> to it at some point.
> 

Let me know if I can help with debug.

Thanks
-Sachin


> cheers
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: kselftest:lost_exception_test failure with 4.11.0-rc5
  2017-04-10  3:54   ` Madhavan Srinivasan
@ 2017-04-11 10:05     ` Michael Ellerman
  0 siblings, 0 replies; 5+ messages in thread
From: Michael Ellerman @ 2017-04-11 10:05 UTC (permalink / raw)
  To: Madhavan Srinivasan, Sachin Sant, linuxppc-dev

Madhavan Srinivasan <maddy@linux.vnet.ibm.com> writes:

> On Friday 07 April 2017 06:06 PM, Michael Ellerman wrote:
>> Sachin Sant <sachinp@linux.vnet.ibm.com> writes:
>>
>>> I have run into few instances where the lost_exception_test from
>>> powerpc kselftest fails with SIGABRT. Following o/p is against
>>> 4.11.0-rc5. The failure is intermittent.
>> What hardware are you on?
>>
>> How long does it take to run when it fails? I assume ~2 minutes?
>
> Started a run in power8 host (habanero) and it is more than 24hrs and
> havent failed yet. So this should be guest/VM scenario then?

Aha good point. I never tested this much (at all?) on VMs because it was
about verifying a workaround for a hardware bug.

So does it happen on both KVM and PowerVM or just one or the other?

cheers

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-04-11 10:05 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-07  8:05 kselftest:lost_exception_test failure with 4.11.0-rc5 Sachin Sant
2017-04-07 12:36 ` Michael Ellerman
2017-04-10  3:54   ` Madhavan Srinivasan
2017-04-11 10:05     ` Michael Ellerman
2017-04-10  8:00   ` Sachin Sant

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.