next/master boot: 273 boots: 63 failed, 209 passed with 1 untried/unknown (next-20171106)

All of lore.kernel.org
 help / color / mirror / Atom feed

* next/master boot: 273 boots: 63 failed, 209 passed with 1 untried/unknown (next-20171106)
       [not found] <5a0055f1.85a8500a.98d54.a4e4@mx.google.com>
@ 2017-11-06 18:47 ` Mark Brown
  2017-11-07  2:17   ` Will Deacon
       [not found] ` <5a0055f1.85a8500a.98d54.a4e4-ATjtLOhZ0NVl57MIdRCFDg@public.gmane.org>
  2017-11-06 19:26 ` Mark Brown
  2 siblings, 1 reply; 46+ messages in thread
From: Mark Brown @ 2017-11-06 18:47 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Nov 06, 2017 at 04:30:41AM -0800, kernelci.org bot wrote:

> next/master boot: 273 boots: 63 failed, 209 passed with 1 untried/unknown (next-20171106)

> Full Boot Summary: https://kernelci.org/boot/all/job/next/branch/master/kernel/next-20171106/
> Full Build Summary: https://kernelci.org/build/next/branch/master/kernel/next-20171106/

There's several arm64 boots failing in -next at the minute, including
qemu - the last success seems to have been 20171017 which isn't terribly
helpful, obviously -next was spotty in October:

   https://kernelci.org/boot/id/5a001ef359b5149e6f1cdd22/

There's also this physical failure:

   https://kernelci.org/boot/id/5a001efa59b5149e7d1cdd20/

Some others could be noise from the labs, some of them have been noisy
for a while, but mainline seems to be fine.  The majority of them seem
to be failing with no console output which isn't ideal.

Is this known?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20171106/b1d9380c/attachment.sig>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: next/master boot: 273 boots: 63 failed, 209 passed with 1 untried/unknown (next-20171106)
       [not found] <5a0055f1.85a8500a.98d54.a4e4@mx.google.com>
@ 2017-11-06 19:17     ` Mark Brown
       [not found] ` <5a0055f1.85a8500a.98d54.a4e4-ATjtLOhZ0NVl57MIdRCFDg@public.gmane.org>
  2017-11-06 19:26 ` Mark Brown
  2 siblings, 0 replies; 46+ messages in thread
From: Mark Brown @ 2017-11-06 19:17 UTC (permalink / raw)
  To: kernelci.org bot, Jonathan Hunter, Thierry Reding
  Cc: kernel-build-reports-cunTk1MwBs8s++Sfvej+rw,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 789 bytes --]

On Mon, Nov 06, 2017 at 04:30:41AM -0800, kernelci.org bot wrote:

> Full Boot Summary:
https://kernelci.org/boot/all/job/next/branch/master/kernel/next-20171106/
> Full Build Summary: https://kernelci.org/build/next/branch/master/kernel/next-20171106/

Since Friday -next has been failing to boot tegra124-nyan-big in
kernelci with all configs:

> arm:

>     multi_v7_defconfig:
>         tegra124-nyan-big:
>             lab-collabora: failing since 2 days (last pass: next-20171102 - first fail: next-20171103)

It's fine in mainline, see: 

    https://kernelci.org/boot/id/5a0030c959b514aad21cdd1a/

for boot log, history and comparisons with other trees.  Looks like
output started grinding to a halt at the point where it starts
requesting firmware but ICBW.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* next/master boot: 273 boots: 63 failed, 209 passed with 1 untried/unknown (next-20171106)
@ 2017-11-06 19:17     ` Mark Brown
  0 siblings, 0 replies; 46+ messages in thread
From: Mark Brown @ 2017-11-06 19:17 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Nov 06, 2017 at 04:30:41AM -0800, kernelci.org bot wrote:

> Full Boot Summary:
https://kernelci.org/boot/all/job/next/branch/master/kernel/next-20171106/
> Full Build Summary: https://kernelci.org/build/next/branch/master/kernel/next-20171106/

Since Friday -next has been failing to boot tegra124-nyan-big in
kernelci with all configs:

> arm:

>     multi_v7_defconfig:
>         tegra124-nyan-big:
>             lab-collabora: failing since 2 days (last pass: next-20171102 - first fail: next-20171103)

It's fine in mainline, see: 

    https://kernelci.org/boot/id/5a0030c959b514aad21cdd1a/

for boot log, history and comparisons with other trees.  Looks like
output started grinding to a halt at the point where it starts
requesting firmware but ICBW.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20171106/bf1ae8aa/attachment.sig>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* next/master boot: 273 boots: 63 failed, 209 passed with 1 untried/unknown (next-20171106)
       [not found] <5a0055f1.85a8500a.98d54.a4e4@mx.google.com>
  2017-11-06 18:47 ` next/master boot: 273 boots: 63 failed, 209 passed with 1 untried/unknown (next-20171106) Mark Brown
       [not found] ` <5a0055f1.85a8500a.98d54.a4e4-ATjtLOhZ0NVl57MIdRCFDg@public.gmane.org>
@ 2017-11-06 19:26 ` Mark Brown
  2 siblings, 0 replies; 46+ messages in thread
From: Mark Brown @ 2017-11-06 19:26 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Nov 06, 2017 at 04:30:41AM -0800, kernelci.org bot wrote:

> Full Boot Summary: https://kernelci.org/boot/all/job/next/branch/master/kernel/next-20171106/
> Full Build Summary: https://kernelci.org/build/next/branch/master/kernel/next-20171106/

Today's -next failed to boot sun8i-a83t-allwinner-h8homlet-v2 with
sunxi_defconfig (it *did* boot with multi_v7_defconfig):

>     sunxi_defconfig:
>         sun8i-a83t-allwinner-h8homlet-v2:
>             lab-free-electrons: new failure (last pass: next-20171103)

Logs and comparisons with other trees can be seen here:

   https://kernelci.org/boot/id/5a00271d59b514a38b1cdd21/

Looks like it locked up at some point.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20171106/630c2114/attachment.sig>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* next/master boot: 273 boots: 63 failed, 209 passed with 1 untried/unknown (next-20171106)
  2017-11-06 18:47 ` next/master boot: 273 boots: 63 failed, 209 passed with 1 untried/unknown (next-20171106) Mark Brown
@ 2017-11-07  2:17   ` Will Deacon
  2017-11-07 11:30     ` Mark Brown
  0 siblings, 1 reply; 46+ messages in thread
From: Will Deacon @ 2017-11-07  2:17 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Mark, [+akpm, +Kirill]

On Mon, Nov 06, 2017 at 06:47:53PM +0000, Mark Brown wrote:
> On Mon, Nov 06, 2017 at 04:30:41AM -0800, kernelci.org bot wrote:
> 
> > next/master boot: 273 boots: 63 failed, 209 passed with 1 untried/unknown (next-20171106)
> 
> > Full Boot Summary: https://kernelci.org/boot/all/job/next/branch/master/kernel/next-20171106/
> > Full Build Summary: https://kernelci.org/build/next/branch/master/kernel/next-20171106/
> 
> There's several arm64 boots failing in -next at the minute, including
> qemu - the last success seems to have been 20171017 which isn't terribly
> helpful, obviously -next was spotty in October:
> 
>    https://kernelci.org/boot/id/5a001ef359b5149e6f1cdd22/
> 
> There's also this physical failure:
> 
>    https://kernelci.org/boot/id/5a001efa59b5149e7d1cdd20/
> 
> Some others could be noise from the labs, some of them have been noisy
> for a while, but mainline seems to be fine.  The majority of them seem
> to be failing with no console output which isn't ideal.
> 
> Is this known?

There is a known boot failure due to 83e3c48729d9:

http://lkml.kernel.org/r/20171102141210.gu4cwpoq2e6o7liu at black.fi.intel.com

There's a patch there which appears to fix the problem, but it's not yet
in next (and it needs to go via -mm).

Will

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: next/master boot: 273 boots: 63 failed, 209 passed with 1 untried/unknown (next-20171106)
  2017-11-06 19:17     ` Mark Brown
@ 2017-11-07 10:12       ` Jon Hunter
  -1 siblings, 0 replies; 46+ messages in thread
From: Jon Hunter @ 2017-11-07 10:12 UTC (permalink / raw)
  To: Mark Brown, kernelci.org bot, Thierry Reding
  Cc: linux-tegra, linux-arm-kernel, kernel-build-reports



On 06/11/17 19:17, Mark Brown wrote:
> On Mon, Nov 06, 2017 at 04:30:41AM -0800, kernelci.org bot wrote:
> 
>> Full Boot Summary:
> https://kernelci.org/boot/all/job/next/branch/master/kernel/next-20171106/
>> Full Build Summary: https://kernelci.org/build/next/branch/master/kernel/next-20171106/
> 
> Since Friday -next has been failing to boot tegra124-nyan-big in
> kernelci with all configs:
> 
>> arm:
> 
>>     multi_v7_defconfig:
>>         tegra124-nyan-big:
>>             lab-collabora: failing since 2 days (last pass: next-20171102 - first fail: next-20171103)
> 
> It's fine in mainline, see: 
> 
>     https://kernelci.org/boot/id/5a0030c959b514aad21cdd1a/
> 
> for boot log, history and comparisons with other trees.  Looks like
> output started grinding to a halt at the point where it starts
> requesting firmware but ICBW.

Thanks for the report. I have been looking into a failure on nyan-big
[0], but this one looks like a new failure. I will take a look.

Cheers
Jon

[0] https://lkml.org/lkml/2017/9/19/306

-- 
nvpublic

^ permalink raw reply	[flat|nested] 46+ messages in thread

* next/master boot: 273 boots: 63 failed, 209 passed with 1 untried/unknown (next-20171106)
@ 2017-11-07 10:12       ` Jon Hunter
  0 siblings, 0 replies; 46+ messages in thread
From: Jon Hunter @ 2017-11-07 10:12 UTC (permalink / raw)
  To: linux-arm-kernel



On 06/11/17 19:17, Mark Brown wrote:
> On Mon, Nov 06, 2017 at 04:30:41AM -0800, kernelci.org bot wrote:
> 
>> Full Boot Summary:
> https://kernelci.org/boot/all/job/next/branch/master/kernel/next-20171106/
>> Full Build Summary: https://kernelci.org/build/next/branch/master/kernel/next-20171106/
> 
> Since Friday -next has been failing to boot tegra124-nyan-big in
> kernelci with all configs:
> 
>> arm:
> 
>>     multi_v7_defconfig:
>>         tegra124-nyan-big:
>>             lab-collabora: failing since 2 days (last pass: next-20171102 - first fail: next-20171103)
> 
> It's fine in mainline, see: 
> 
>     https://kernelci.org/boot/id/5a0030c959b514aad21cdd1a/
> 
> for boot log, history and comparisons with other trees.  Looks like
> output started grinding to a halt at the point where it starts
> requesting firmware but ICBW.

Thanks for the report. I have been looking into a failure on nyan-big
[0], but this one looks like a new failure. I will take a look.

Cheers
Jon

[0] https://lkml.org/lkml/2017/9/19/306

-- 
nvpublic

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: next/master boot: 273 boots: 63 failed, 209 passed with 1 untried/unknown (next-20171106)
  2017-11-07 10:12       ` Jon Hunter
@ 2017-11-07 10:55           ` Mark Brown
  -1 siblings, 0 replies; 46+ messages in thread
From: Mark Brown @ 2017-11-07 10:55 UTC (permalink / raw)
  To: Jon Hunter
  Cc: kernelci.org bot, Thierry Reding,
	kernel-build-reports-cunTk1MwBs8s++Sfvej+rw,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA, Guillaume Tucker

[-- Attachment #1: Type: text/plain, Size: 674 bytes --]

On Tue, Nov 07, 2017 at 10:12:59AM +0000, Jon Hunter wrote:
> On 06/11/17 19:17, Mark Brown wrote:

> >>     multi_v7_defconfig:
> >>         tegra124-nyan-big:
> >>             lab-collabora: failing since 2 days (last pass: next-20171102 - first fail: next-20171103)

> Thanks for the report. I have been looking into a failure on nyan-big
> [0], but this one looks like a new failure. I will take a look.

Guillaume Tucker has been bisecting this with the shiny new bisection
code he's testing, he was saying on IRC he thinks he's found the
offending commit:

   https://people.collabora.com/~gtucker/tmp/bisect-tegra-4.14.rc8-next-20171106.txt

(not CCing Johannes yet)

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* next/master boot: 273 boots: 63 failed, 209 passed with 1 untried/unknown (next-20171106)
@ 2017-11-07 10:55           ` Mark Brown
  0 siblings, 0 replies; 46+ messages in thread
From: Mark Brown @ 2017-11-07 10:55 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Nov 07, 2017 at 10:12:59AM +0000, Jon Hunter wrote:
> On 06/11/17 19:17, Mark Brown wrote:

> >>     multi_v7_defconfig:
> >>         tegra124-nyan-big:
> >>             lab-collabora: failing since 2 days (last pass: next-20171102 - first fail: next-20171103)

> Thanks for the report. I have been looking into a failure on nyan-big
> [0], but this one looks like a new failure. I will take a look.

Guillaume Tucker has been bisecting this with the shiny new bisection
code he's testing, he was saying on IRC he thinks he's found the
offending commit:

   https://people.collabora.com/~gtucker/tmp/bisect-tegra-4.14.rc8-next-20171106.txt

(not CCing Johannes yet)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20171107/f83c694f/attachment.sig>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* next/master boot: 273 boots: 63 failed, 209 passed with 1 untried/unknown (next-20171106)
  2017-11-07  2:17   ` Will Deacon
@ 2017-11-07 11:30     ` Mark Brown
  0 siblings, 0 replies; 46+ messages in thread
From: Mark Brown @ 2017-11-07 11:30 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Nov 07, 2017 at 02:17:25AM +0000, Will Deacon wrote:
> On Mon, Nov 06, 2017 at 06:47:53PM +0000, Mark Brown wrote:
> > On Mon, Nov 06, 2017 at 04:30:41AM -0800, kernelci.org bot wrote:

> > There's several arm64 boots failing in -next at the minute, including
> > qemu - the last success seems to have been 20171017 which isn't terribly
> > helpful, obviously -next was spotty in October:

> There is a known boot failure due to 83e3c48729d9:

> http://lkml.kernel.org/r/20171102141210.gu4cwpoq2e6o7liu at black.fi.intel.com

> There's a patch there which appears to fix the problem, but it's not yet
> in next (and it needs to go via -mm).

Ugh, right.  It'd be good to get this in soon as it's making the boot
reports very hard to use and obscuring any other arm64 boot failures.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20171107/dec1f6eb/attachment.sig>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: next/master boot: 273 boots: 63 failed, 209 passed with 1 untried/unknown (next-20171106)
  2017-11-07 10:55           ` Mark Brown
@ 2017-11-07 11:43               ` Guillaume Tucker
  -1 siblings, 0 replies; 46+ messages in thread
From: Guillaume Tucker @ 2017-11-07 11:43 UTC (permalink / raw)
  To: Mark Brown, Jon Hunter
  Cc: kernelci.org bot, Thierry Reding,
	kernel-build-reports-cunTk1MwBs8s++Sfvej+rw,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA

On 07/11/17 10:55, Mark Brown wrote:
> On Tue, Nov 07, 2017 at 10:12:59AM +0000, Jon Hunter wrote:
>> On 06/11/17 19:17, Mark Brown wrote:
>
>>>>     multi_v7_defconfig:
>>>>         tegra124-nyan-big:
>>>>             lab-collabora: failing since 2 days (last pass: next-20171102 - first fail: next-20171103)
>
>> Thanks for the report. I have been looking into a failure on nyan-big
>> [0], but this one looks like a new failure. I will take a look.
>
> Guillaume Tucker has been bisecting this with the shiny new bisection
> code he's testing, he was saying on IRC he thinks he's found the
> offending commit:
>
>    https://people.collabora.com/~gtucker/tmp/bisect-tegra-4.14.rc8-next-20171106.txt
>
> (not CCing Johannes yet)

Please take this with a pinch of salt, I'm now running some extra
boot tests to prove it.  If you look at this log, all the boots
passed which is a bit suspicious.  I did build and boot the
revision it found with multi_v7_defconfig on tegra124 and it
passed, so it looks like this commit may not have anything to do
with the boot failure.  The automated bisection is still experimental.

Passing LAVA boot test with this revision:

   https://lava.collabora.co.uk/scheduler/job/976375

I've started a slightly different bisection job now on
next-20171107 and the common ancestor between next and mainline,
results can take a few hours to come back.

Guillaume

^ permalink raw reply	[flat|nested] 46+ messages in thread

* next/master boot: 273 boots: 63 failed, 209 passed with 1 untried/unknown (next-20171106)
@ 2017-11-07 11:43               ` Guillaume Tucker
  0 siblings, 0 replies; 46+ messages in thread
From: Guillaume Tucker @ 2017-11-07 11:43 UTC (permalink / raw)
  To: linux-arm-kernel

On 07/11/17 10:55, Mark Brown wrote:
> On Tue, Nov 07, 2017 at 10:12:59AM +0000, Jon Hunter wrote:
>> On 06/11/17 19:17, Mark Brown wrote:
>
>>>>     multi_v7_defconfig:
>>>>         tegra124-nyan-big:
>>>>             lab-collabora: failing since 2 days (last pass: next-20171102 - first fail: next-20171103)
>
>> Thanks for the report. I have been looking into a failure on nyan-big
>> [0], but this one looks like a new failure. I will take a look.
>
> Guillaume Tucker has been bisecting this with the shiny new bisection
> code he's testing, he was saying on IRC he thinks he's found the
> offending commit:
>
>    https://people.collabora.com/~gtucker/tmp/bisect-tegra-4.14.rc8-next-20171106.txt
>
> (not CCing Johannes yet)

Please take this with a pinch of salt, I'm now running some extra
boot tests to prove it.  If you look at this log, all the boots
passed which is a bit suspicious.  I did build and boot the
revision it found with multi_v7_defconfig on tegra124 and it
passed, so it looks like this commit may not have anything to do
with the boot failure.  The automated bisection is still experimental.

Passing LAVA boot test with this revision:

   https://lava.collabora.co.uk/scheduler/job/976375

I've started a slightly different bisection job now on
next-20171107 and the common ancestor between next and mainline,
results can take a few hours to come back.

Guillaume

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: next/master boot: 273 boots: 63 failed, 209 passed with 1 untried/unknown (next-20171106)
  2017-11-07 11:43               ` Guillaume Tucker
@ 2017-11-08 15:19                   ` Guillaume Tucker
  -1 siblings, 0 replies; 46+ messages in thread
From: Guillaume Tucker @ 2017-11-08 15:19 UTC (permalink / raw)
  To: Mark Brown, Jon Hunter
  Cc: linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	kernelci.org bot, kernel-build-reports-cunTk1MwBs8s++Sfvej+rw,
	Robin Murphy

On 07/11/17 11:43, Guillaume Tucker wrote:
> On 07/11/17 10:55, Mark Brown wrote:
>> On Tue, Nov 07, 2017 at 10:12:59AM +0000, Jon Hunter wrote:
>>> On 06/11/17 19:17, Mark Brown wrote:
>>
>>>>>     multi_v7_defconfig:
>>>>>         tegra124-nyan-big:
>>>>>             lab-collabora: failing since 2 days (last pass: next-20171102 - first fail: next-20171103)
>>
>>> Thanks for the report. I have been looking into a failure on nyan-big
>>> [0], but this one looks like a new failure. I will take a look.
>>
>> Guillaume Tucker has been bisecting this with the shiny new bisection
>> code he's testing, he was saying on IRC he thinks he's found the
>> offending commit:
>>
>>    https://people.collabora.com/~gtucker/tmp/bisect-tegra-4.14.rc8-next-20171106.txt
>>
>> (not CCing Johannes yet)
>
> Please take this with a pinch of salt, I'm now running some extra
> boot tests to prove it.  If you look at this log, all the boots
> passed which is a bit suspicious.  I did build and boot the
> revision it found with multi_v7_defconfig on tegra124 and it
> passed, so it looks like this commit may not have anything to do
> with the boot failure.  The automated bisection is still experimental.
>
> Passing LAVA boot test with this revision:
>
>   https://lava.collabora.co.uk/scheduler/job/976375
>
> I've started a slightly different bisection job now on
> next-20171107 and the common ancestor between next and mainline,
> results can take a few hours to come back.

After a few more automated bisection attempts and a bug fix in
LAVA, I've now found at least one potentially breaking commit:

   commit d89e2378a97fafdc74cbf997e7c88af75b81610a
   Author: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
   Date:   Thu Oct 12 16:56:14 2017 +0100

       drivers: flag buses which demand DMA configuration


I've run some boot tests manually with this revision and then
also after reverting it in-place, these respectively failed and
passed:

   * d89e2378, failed:
     https://lava.collabora.co.uk/scheduler/job/978968

   * d89e2378 reverted, passed:
     https://lava.collabora.co.uk/scheduler/job/978969


I then went on and tried the same but on top of next-20171108 and
found that they both failed

   * next-20171108, failed:
     https://lava.collabora.co.uk/scheduler/job/979063

   * next-20171108 with d89e2378 reverted, failed as well:
     https://lava.collabora.co.uk/scheduler/job/979167


So this shows there is almost certainly another offending commit
in -next.  The errors in both cases are not quite the same, the
last one is triggered by a BUG whereas the first one is a NULL
pointer (I haven't looked any further).  Also I don't think
there's any fix for d89e2378a97fafdc74cbf997e7c88af75b81610a
which is currently still in next.

Note: This happens to be a very good example of running a
kernelci.org bisection on a real issue, it's quite a bit of a
pipe cleaner.  I'll now see if there's a way to bisect what looks
like another breaking change in-between.

Guillaume

^ permalink raw reply	[flat|nested] 46+ messages in thread

* next/master boot: 273 boots: 63 failed, 209 passed with 1 untried/unknown (next-20171106)
@ 2017-11-08 15:19                   ` Guillaume Tucker
  0 siblings, 0 replies; 46+ messages in thread
From: Guillaume Tucker @ 2017-11-08 15:19 UTC (permalink / raw)
  To: linux-arm-kernel

On 07/11/17 11:43, Guillaume Tucker wrote:
> On 07/11/17 10:55, Mark Brown wrote:
>> On Tue, Nov 07, 2017 at 10:12:59AM +0000, Jon Hunter wrote:
>>> On 06/11/17 19:17, Mark Brown wrote:
>>
>>>>>     multi_v7_defconfig:
>>>>>         tegra124-nyan-big:
>>>>>             lab-collabora: failing since 2 days (last pass: next-20171102 - first fail: next-20171103)
>>
>>> Thanks for the report. I have been looking into a failure on nyan-big
>>> [0], but this one looks like a new failure. I will take a look.
>>
>> Guillaume Tucker has been bisecting this with the shiny new bisection
>> code he's testing, he was saying on IRC he thinks he's found the
>> offending commit:
>>
>>    https://people.collabora.com/~gtucker/tmp/bisect-tegra-4.14.rc8-next-20171106.txt
>>
>> (not CCing Johannes yet)
>
> Please take this with a pinch of salt, I'm now running some extra
> boot tests to prove it.  If you look at this log, all the boots
> passed which is a bit suspicious.  I did build and boot the
> revision it found with multi_v7_defconfig on tegra124 and it
> passed, so it looks like this commit may not have anything to do
> with the boot failure.  The automated bisection is still experimental.
>
> Passing LAVA boot test with this revision:
>
>   https://lava.collabora.co.uk/scheduler/job/976375
>
> I've started a slightly different bisection job now on
> next-20171107 and the common ancestor between next and mainline,
> results can take a few hours to come back.

After a few more automated bisection attempts and a bug fix in
LAVA, I've now found at least one potentially breaking commit:

   commit d89e2378a97fafdc74cbf997e7c88af75b81610a
   Author: Robin Murphy <robin.murphy@arm.com>
   Date:   Thu Oct 12 16:56:14 2017 +0100

       drivers: flag buses which demand DMA configuration


I've run some boot tests manually with this revision and then
also after reverting it in-place, these respectively failed and
passed:

   * d89e2378, failed:
     https://lava.collabora.co.uk/scheduler/job/978968

   * d89e2378 reverted, passed:
     https://lava.collabora.co.uk/scheduler/job/978969


I then went on and tried the same but on top of next-20171108 and
found that they both failed

   * next-20171108, failed:
     https://lava.collabora.co.uk/scheduler/job/979063

   * next-20171108 with d89e2378 reverted, failed as well:
     https://lava.collabora.co.uk/scheduler/job/979167


So this shows there is almost certainly another offending commit
in -next.  The errors in both cases are not quite the same, the
last one is triggered by a BUG whereas the first one is a NULL
pointer (I haven't looked any further).  Also I don't think
there's any fix for d89e2378a97fafdc74cbf997e7c88af75b81610a
which is currently still in next.

Note: This happens to be a very good example of running a
kernelci.org bisection on a real issue, it's quite a bit of a
pipe cleaner.  I'll now see if there's a way to bisect what looks
like another breaking change in-between.

Guillaume

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: next/master boot: 273 boots: 63 failed, 209 passed with 1 untried/unknown (next-20171106)
  2017-11-08 15:19                   ` Guillaume Tucker
@ 2017-11-08 15:55                       ` Robin Murphy
  -1 siblings, 0 replies; 46+ messages in thread
From: Robin Murphy @ 2017-11-08 15:55 UTC (permalink / raw)
  To: Guillaume Tucker, Mark Brown, Jon Hunter
  Cc: linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	kernelci.org bot, kernel-build-reports-cunTk1MwBs8s++Sfvej+rw



On 08/11/17 15:19, Guillaume Tucker wrote:
> On 07/11/17 11:43, Guillaume Tucker wrote:
>> On 07/11/17 10:55, Mark Brown wrote:
>>> On Tue, Nov 07, 2017 at 10:12:59AM +0000, Jon Hunter wrote:
>>>> On 06/11/17 19:17, Mark Brown wrote:
>>>
>>>>>>     multi_v7_defconfig:
>>>>>>         tegra124-nyan-big:
>>>>>>             lab-collabora: failing since 2 days (last pass: 
>>>>>> next-20171102 - first fail: next-20171103)
>>>
>>>> Thanks for the report. I have been looking into a failure on nyan-big
>>>> [0], but this one looks like a new failure. I will take a look.
>>>
>>> Guillaume Tucker has been bisecting this with the shiny new bisection
>>> code he's testing, he was saying on IRC he thinks he's found the
>>> offending commit:
>>>
>>>    
>>> https://people.collabora.com/~gtucker/tmp/bisect-tegra-4.14.rc8-next-20171106.txt 
>>>
>>>
>>> (not CCing Johannes yet)
>>
>> Please take this with a pinch of salt, I'm now running some extra
>> boot tests to prove it.  If you look at this log, all the boots
>> passed which is a bit suspicious.  I did build and boot the
>> revision it found with multi_v7_defconfig on tegra124 and it
>> passed, so it looks like this commit may not have anything to do
>> with the boot failure.  The automated bisection is still experimental.
>>
>> Passing LAVA boot test with this revision:
>>
>>   https://lava.collabora.co.uk/scheduler/job/976375
>>
>> I've started a slightly different bisection job now on
>> next-20171107 and the common ancestor between next and mainline,
>> results can take a few hours to come back.
> 
> After a few more automated bisection attempts and a bug fix in
> LAVA, I've now found at least one potentially breaking commit:
> 
>    commit d89e2378a97fafdc74cbf997e7c88af75b81610a
>    Author: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
>    Date:   Thu Oct 12 16:56:14 2017 +0100
> 
>        drivers: flag buses which demand DMA configuration
> 
> 
> I've run some boot tests manually with this revision and then
> also after reverting it in-place, these respectively failed and
> passed:
> 
>    * d89e2378, failed:
>      https://lava.collabora.co.uk/scheduler/job/978968
> 
>    * d89e2378 reverted, passed:
>      https://lava.collabora.co.uk/scheduler/job/978969
> 
> 
> I then went on and tried the same but on top of next-20171108 and
> found that they both failed
> 
>    * next-20171108, failed:
>      https://lava.collabora.co.uk/scheduler/job/979063
> 
>    * next-20171108 with d89e2378 reverted, failed as well:
>      https://lava.collabora.co.uk/scheduler/job/979167
> 
> 
> So this shows there is almost certainly another offending commit
> in -next.  The errors in both cases are not quite the same, the
> last one is triggered by a BUG whereas the first one is a NULL
> pointer (I haven't looked any further).  Also I don't think
> there's any fix for d89e2378a97fafdc74cbf997e7c88af75b81610a
> which is currently still in next.

The fix was actually posted before said commit was even written:

https://patchwork.kernel.org/patch/9967847/

What is currently queued in the DMA tree fell out of the discussion on 
patch 2 of that series, but I kind of assumed the host1x folks would 
still take patch 1; I guess that hasn't happened.

Robin.

> 
> Note: This happens to be a very good example of running a
> kernelci.org bisection on a real issue, it's quite a bit of a
> pipe cleaner.  I'll now see if there's a way to bisect what looks
> like another breaking change in-between.
> 
> Guillaume

^ permalink raw reply	[flat|nested] 46+ messages in thread

* next/master boot: 273 boots: 63 failed, 209 passed with 1 untried/unknown (next-20171106)
@ 2017-11-08 15:55                       ` Robin Murphy
  0 siblings, 0 replies; 46+ messages in thread
From: Robin Murphy @ 2017-11-08 15:55 UTC (permalink / raw)
  To: linux-arm-kernel



On 08/11/17 15:19, Guillaume Tucker wrote:
> On 07/11/17 11:43, Guillaume Tucker wrote:
>> On 07/11/17 10:55, Mark Brown wrote:
>>> On Tue, Nov 07, 2017 at 10:12:59AM +0000, Jon Hunter wrote:
>>>> On 06/11/17 19:17, Mark Brown wrote:
>>>
>>>>>> ??? multi_v7_defconfig:
>>>>>> ??????? tegra124-nyan-big:
>>>>>> ??????????? lab-collabora: failing since 2 days (last pass: 
>>>>>> next-20171102 - first fail: next-20171103)
>>>
>>>> Thanks for the report. I have been looking into a failure on nyan-big
>>>> [0], but this one looks like a new failure. I will take a look.
>>>
>>> Guillaume Tucker has been bisecting this with the shiny new bisection
>>> code he's testing, he was saying on IRC he thinks he's found the
>>> offending commit:
>>>
>>>    
>>> https://people.collabora.com/~gtucker/tmp/bisect-tegra-4.14.rc8-next-20171106.txt 
>>>
>>>
>>> (not CCing Johannes yet)
>>
>> Please take this with a pinch of salt, I'm now running some extra
>> boot tests to prove it.? If you look at this log, all the boots
>> passed which is a bit suspicious.? I did build and boot the
>> revision it found with multi_v7_defconfig on tegra124 and it
>> passed, so it looks like this commit may not have anything to do
>> with the boot failure.? The automated bisection is still experimental.
>>
>> Passing LAVA boot test with this revision:
>>
>> ? https://lava.collabora.co.uk/scheduler/job/976375
>>
>> I've started a slightly different bisection job now on
>> next-20171107 and the common ancestor between next and mainline,
>> results can take a few hours to come back.
> 
> After a few more automated bisection attempts and a bug fix in
> LAVA, I've now found at least one potentially breaking commit:
> 
>  ? commit d89e2378a97fafdc74cbf997e7c88af75b81610a
>  ? Author: Robin Murphy <robin.murphy@arm.com>
>  ? Date:?? Thu Oct 12 16:56:14 2017 +0100
> 
>  ????? drivers: flag buses which demand DMA configuration
> 
> 
> I've run some boot tests manually with this revision and then
> also after reverting it in-place, these respectively failed and
> passed:
> 
>  ? * d89e2378, failed:
>  ??? https://lava.collabora.co.uk/scheduler/job/978968
> 
>  ? * d89e2378 reverted, passed:
>  ??? https://lava.collabora.co.uk/scheduler/job/978969
> 
> 
> I then went on and tried the same but on top of next-20171108 and
> found that they both failed
> 
>  ? * next-20171108, failed:
>  ??? https://lava.collabora.co.uk/scheduler/job/979063
> 
>  ? * next-20171108 with d89e2378 reverted, failed as well:
>  ??? https://lava.collabora.co.uk/scheduler/job/979167
> 
> 
> So this shows there is almost certainly another offending commit
> in -next.? The errors in both cases are not quite the same, the
> last one is triggered by a BUG whereas the first one is a NULL
> pointer (I haven't looked any further).? Also I don't think
> there's any fix for d89e2378a97fafdc74cbf997e7c88af75b81610a
> which is currently still in next.

The fix was actually posted before said commit was even written:

https://patchwork.kernel.org/patch/9967847/

What is currently queued in the DMA tree fell out of the discussion on 
patch 2 of that series, but I kind of assumed the host1x folks would 
still take patch 1; I guess that hasn't happened.

Robin.

> 
> Note: This happens to be a very good example of running a
> kernelci.org bisection on a real issue, it's quite a bit of a
> pipe cleaner.? I'll now see if there's a way to bisect what looks
> like another breaking change in-between.
> 
> Guillaume

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: next/master boot: 273 boots: 63 failed, 209 passed with 1 untried/unknown (next-20171106)
  2017-11-08 15:19                   ` Guillaume Tucker
@ 2017-11-08 15:57                     ` Jon Hunter
  -1 siblings, 0 replies; 46+ messages in thread
From: Jon Hunter @ 2017-11-08 15:57 UTC (permalink / raw)
  To: Guillaume Tucker, Mark Brown
  Cc: linux-tegra, Robin Murphy, kernelci.org bot, linux-arm-kernel,
	kernel-build-reports


On 08/11/17 15:19, Guillaume Tucker wrote:

...

> After a few more automated bisection attempts and a bug fix in
> LAVA, I've now found at least one potentially breaking commit:
> 
>   commit d89e2378a97fafdc74cbf997e7c88af75b81610a
>   Author: Robin Murphy <robin.murphy@arm.com>
>   Date:   Thu Oct 12 16:56:14 2017 +0100
> 
>       drivers: flag buses which demand DMA configuration
> 
> 
> I've run some boot tests manually with this revision and then
> also after reverting it in-place, these respectively failed and
> passed:
> 
>   * d89e2378, failed:
>     https://lava.collabora.co.uk/scheduler/job/978968
> 
>   * d89e2378 reverted, passed:
>     https://lava.collabora.co.uk/scheduler/job/978969
> 
> 
> I then went on and tried the same but on top of next-20171108 and
> found that they both failed
> 
>   * next-20171108, failed:
>     https://lava.collabora.co.uk/scheduler/job/979063
> 
>   * next-20171108 with d89e2378 reverted, failed as well:
>     https://lava.collabora.co.uk/scheduler/job/979167
> 
> 
> So this shows there is almost certainly another offending commit
> in -next.  The errors in both cases are not quite the same, the
> last one is triggered by a BUG whereas the first one is a NULL
> pointer (I haven't looked any further).  Also I don't think
> there's any fix for d89e2378a97fafdc74cbf997e7c88af75b81610a
> which is currently still in next.

This crash is a known issue [0] and we have been discussing this. Can
you try applying [1]?

Cheers
Jon

[0] https://lkml.org/lkml/2017/9/19/306
[1] https://patchwork.kernel.org/patch/9974835/
-- 
nvpublic

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* next/master boot: 273 boots: 63 failed, 209 passed with 1 untried/unknown (next-20171106)
@ 2017-11-08 15:57                     ` Jon Hunter
  0 siblings, 0 replies; 46+ messages in thread
From: Jon Hunter @ 2017-11-08 15:57 UTC (permalink / raw)
  To: linux-arm-kernel


On 08/11/17 15:19, Guillaume Tucker wrote:

...

> After a few more automated bisection attempts and a bug fix in
> LAVA, I've now found at least one potentially breaking commit:
> 
> ? commit d89e2378a97fafdc74cbf997e7c88af75b81610a
> ? Author: Robin Murphy <robin.murphy@arm.com>
> ? Date:?? Thu Oct 12 16:56:14 2017 +0100
> 
> ????? drivers: flag buses which demand DMA configuration
> 
> 
> I've run some boot tests manually with this revision and then
> also after reverting it in-place, these respectively failed and
> passed:
> 
> ? * d89e2378, failed:
> ??? https://lava.collabora.co.uk/scheduler/job/978968
> 
> ? * d89e2378 reverted, passed:
> ??? https://lava.collabora.co.uk/scheduler/job/978969
> 
> 
> I then went on and tried the same but on top of next-20171108 and
> found that they both failed
> 
> ? * next-20171108, failed:
> ??? https://lava.collabora.co.uk/scheduler/job/979063
> 
> ? * next-20171108 with d89e2378 reverted, failed as well:
> ??? https://lava.collabora.co.uk/scheduler/job/979167
> 
> 
> So this shows there is almost certainly another offending commit
> in -next.? The errors in both cases are not quite the same, the
> last one is triggered by a BUG whereas the first one is a NULL
> pointer (I haven't looked any further).? Also I don't think
> there's any fix for d89e2378a97fafdc74cbf997e7c88af75b81610a
> which is currently still in next.

This crash is a known issue [0] and we have been discussing this. Can
you try applying [1]?

Cheers
Jon

[0] https://lkml.org/lkml/2017/9/19/306
[1] https://patchwork.kernel.org/patch/9974835/
-- 
nvpublic

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: next/master boot: 273 boots: 63 failed, 209 passed with 1 untried/unknown (next-20171106)
  2017-11-08 15:55                       ` Robin Murphy
@ 2017-11-08 16:23                           ` Mikko Perttunen
  -1 siblings, 0 replies; 46+ messages in thread
From: Mikko Perttunen @ 2017-11-08 16:23 UTC (permalink / raw)
  To: Robin Murphy, Guillaume Tucker, Mark Brown, Jon Hunter
  Cc: linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	kernelci.org bot, kernel-build-reports-cunTk1MwBs8s++Sfvej+rw

On 08.11.2017 17:55, Robin Murphy wrote:
>
>
> On 08/11/17 15:19, Guillaume Tucker wrote:
>> On 07/11/17 11:43, Guillaume Tucker wrote:
>>> On 07/11/17 10:55, Mark Brown wrote:
>>>> On Tue, Nov 07, 2017 at 10:12:59AM +0000, Jon Hunter wrote:
>>>>> On 06/11/17 19:17, Mark Brown wrote:
>>>>
>>>>>>>     multi_v7_defconfig:
>>>>>>>         tegra124-nyan-big:
>>>>>>>             lab-collabora: failing since 2 days (last pass:
>>>>>>> next-20171102 - first fail: next-20171103)
>>>>
>>>>> Thanks for the report. I have been looking into a failure on nyan-big
>>>>> [0], but this one looks like a new failure. I will take a look.
>>>>
>>>> Guillaume Tucker has been bisecting this with the shiny new bisection
>>>> code he's testing, he was saying on IRC he thinks he's found the
>>>> offending commit:
>>>>
>>>>
>>>> https://people.collabora.com/~gtucker/tmp/bisect-tegra-4.14.rc8-next-20171106.txt
>>>>
>>>>
>>>> (not CCing Johannes yet)
>>>
>>> Please take this with a pinch of salt, I'm now running some extra
>>> boot tests to prove it.  If you look at this log, all the boots
>>> passed which is a bit suspicious.  I did build and boot the
>>> revision it found with multi_v7_defconfig on tegra124 and it
>>> passed, so it looks like this commit may not have anything to do
>>> with the boot failure.  The automated bisection is still experimental.
>>>
>>> Passing LAVA boot test with this revision:
>>>
>>>   https://lava.collabora.co.uk/scheduler/job/976375
>>>
>>> I've started a slightly different bisection job now on
>>> next-20171107 and the common ancestor between next and mainline,
>>> results can take a few hours to come back.
>>
>> After a few more automated bisection attempts and a bug fix in
>> LAVA, I've now found at least one potentially breaking commit:
>>
>>    commit d89e2378a97fafdc74cbf997e7c88af75b81610a
>>    Author: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
>>    Date:   Thu Oct 12 16:56:14 2017 +0100
>>
>>        drivers: flag buses which demand DMA configuration
>>
>>
>> I've run some boot tests manually with this revision and then
>> also after reverting it in-place, these respectively failed and
>> passed:
>>
>>    * d89e2378, failed:
>>      https://lava.collabora.co.uk/scheduler/job/978968
>>
>>    * d89e2378 reverted, passed:
>>      https://lava.collabora.co.uk/scheduler/job/978969
>>
>>
>> I then went on and tried the same but on top of next-20171108 and
>> found that they both failed
>>
>>    * next-20171108, failed:
>>      https://lava.collabora.co.uk/scheduler/job/979063
>>
>>    * next-20171108 with d89e2378 reverted, failed as well:
>>      https://lava.collabora.co.uk/scheduler/job/979167
>>
>>
>> So this shows there is almost certainly another offending commit
>> in -next.  The errors in both cases are not quite the same, the
>> last one is triggered by a BUG whereas the first one is a NULL
>> pointer (I haven't looked any further).  Also I don't think
>> there's any fix for d89e2378a97fafdc74cbf997e7c88af75b81610a
>> which is currently still in next.
>
> The fix was actually posted before said commit was even written:
>
> https://patchwork.kernel.org/patch/9967847/
>
> What is currently queued in the DMA tree fell out of the discussion on
> patch 2 of that series, but I kind of assumed the host1x folks would
> still take patch 1; I guess that hasn't happened.

I am seeing this patch in linux-next, though:

commit 2fb0dceb69ce957f01bdb6fddf7baf4c4b9cbc0d
Author:     Mikko Perttunen <mperttunen-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
AuthorDate: Sun Sep 24 12:04:53 2017 +0300
Commit:     Thierry Reding <treding-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
CommitDate: Fri Oct 20 14:19:51 2017 +0200

     gpu: host1x: Call of_dma_configure() after setting bus

     of_dma_configure() now checks the device's bus before configuring 
it, so
     we need to set the device's bus before calling.

     Signed-off-by: Mikko Perttunen <mperttunen-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
     Signed-off-by: Thierry Reding <treding-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>


Mikko

>
> Robin.
>
>>
>> Note: This happens to be a very good example of running a
>> kernelci.org bisection on a real issue, it's quite a bit of a
>> pipe cleaner.  I'll now see if there's a way to bisect what looks
>> like another breaking change in-between.
>>
>> Guillaume
> --
> To unsubscribe from this list: send the line "unsubscribe linux-tegra" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 46+ messages in thread

* next/master boot: 273 boots: 63 failed, 209 passed with 1 untried/unknown (next-20171106)
@ 2017-11-08 16:23                           ` Mikko Perttunen
  0 siblings, 0 replies; 46+ messages in thread
From: Mikko Perttunen @ 2017-11-08 16:23 UTC (permalink / raw)
  To: linux-arm-kernel

On 08.11.2017 17:55, Robin Murphy wrote:
>
>
> On 08/11/17 15:19, Guillaume Tucker wrote:
>> On 07/11/17 11:43, Guillaume Tucker wrote:
>>> On 07/11/17 10:55, Mark Brown wrote:
>>>> On Tue, Nov 07, 2017 at 10:12:59AM +0000, Jon Hunter wrote:
>>>>> On 06/11/17 19:17, Mark Brown wrote:
>>>>
>>>>>>>     multi_v7_defconfig:
>>>>>>>         tegra124-nyan-big:
>>>>>>>             lab-collabora: failing since 2 days (last pass:
>>>>>>> next-20171102 - first fail: next-20171103)
>>>>
>>>>> Thanks for the report. I have been looking into a failure on nyan-big
>>>>> [0], but this one looks like a new failure. I will take a look.
>>>>
>>>> Guillaume Tucker has been bisecting this with the shiny new bisection
>>>> code he's testing, he was saying on IRC he thinks he's found the
>>>> offending commit:
>>>>
>>>>
>>>> https://people.collabora.com/~gtucker/tmp/bisect-tegra-4.14.rc8-next-20171106.txt
>>>>
>>>>
>>>> (not CCing Johannes yet)
>>>
>>> Please take this with a pinch of salt, I'm now running some extra
>>> boot tests to prove it.  If you look at this log, all the boots
>>> passed which is a bit suspicious.  I did build and boot the
>>> revision it found with multi_v7_defconfig on tegra124 and it
>>> passed, so it looks like this commit may not have anything to do
>>> with the boot failure.  The automated bisection is still experimental.
>>>
>>> Passing LAVA boot test with this revision:
>>>
>>>   https://lava.collabora.co.uk/scheduler/job/976375
>>>
>>> I've started a slightly different bisection job now on
>>> next-20171107 and the common ancestor between next and mainline,
>>> results can take a few hours to come back.
>>
>> After a few more automated bisection attempts and a bug fix in
>> LAVA, I've now found at least one potentially breaking commit:
>>
>>    commit d89e2378a97fafdc74cbf997e7c88af75b81610a
>>    Author: Robin Murphy <robin.murphy@arm.com>
>>    Date:   Thu Oct 12 16:56:14 2017 +0100
>>
>>        drivers: flag buses which demand DMA configuration
>>
>>
>> I've run some boot tests manually with this revision and then
>> also after reverting it in-place, these respectively failed and
>> passed:
>>
>>    * d89e2378, failed:
>>      https://lava.collabora.co.uk/scheduler/job/978968
>>
>>    * d89e2378 reverted, passed:
>>      https://lava.collabora.co.uk/scheduler/job/978969
>>
>>
>> I then went on and tried the same but on top of next-20171108 and
>> found that they both failed
>>
>>    * next-20171108, failed:
>>      https://lava.collabora.co.uk/scheduler/job/979063
>>
>>    * next-20171108 with d89e2378 reverted, failed as well:
>>      https://lava.collabora.co.uk/scheduler/job/979167
>>
>>
>> So this shows there is almost certainly another offending commit
>> in -next.  The errors in both cases are not quite the same, the
>> last one is triggered by a BUG whereas the first one is a NULL
>> pointer (I haven't looked any further).  Also I don't think
>> there's any fix for d89e2378a97fafdc74cbf997e7c88af75b81610a
>> which is currently still in next.
>
> The fix was actually posted before said commit was even written:
>
> https://patchwork.kernel.org/patch/9967847/
>
> What is currently queued in the DMA tree fell out of the discussion on
> patch 2 of that series, but I kind of assumed the host1x folks would
> still take patch 1; I guess that hasn't happened.

I am seeing this patch in linux-next, though:

commit 2fb0dceb69ce957f01bdb6fddf7baf4c4b9cbc0d
Author:     Mikko Perttunen <mperttunen@nvidia.com>
AuthorDate: Sun Sep 24 12:04:53 2017 +0300
Commit:     Thierry Reding <treding@nvidia.com>
CommitDate: Fri Oct 20 14:19:51 2017 +0200

     gpu: host1x: Call of_dma_configure() after setting bus

     of_dma_configure() now checks the device's bus before configuring 
it, so
     we need to set the device's bus before calling.

     Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
     Signed-off-by: Thierry Reding <treding@nvidia.com>


Mikko

>
> Robin.
>
>>
>> Note: This happens to be a very good example of running a
>> kernelci.org bisection on a real issue, it's quite a bit of a
>> pipe cleaner.  I'll now see if there's a way to bisect what looks
>> like another breaking change in-between.
>>
>> Guillaume
> --
> To unsubscribe from this list: send the line "unsubscribe linux-tegra" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: next/master boot: 273 boots: 63 failed, 209 passed with 1 untried/unknown (next-20171106)
  2017-11-08 15:57                     ` Jon Hunter
@ 2017-11-08 16:42                         ` Guillaume Tucker
  -1 siblings, 0 replies; 46+ messages in thread
From: Guillaume Tucker @ 2017-11-08 16:42 UTC (permalink / raw)
  To: Jon Hunter, Mark Brown
  Cc: linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	kernelci.org bot, kernel-build-reports-cunTk1MwBs8s++Sfvej+rw,
	Robin Murphy

On 08/11/17 15:57, Jon Hunter wrote:
>
> On 08/11/17 15:19, Guillaume Tucker wrote:
>
> ...
>
>> After a few more automated bisection attempts and a bug fix in
>> LAVA, I've now found at least one potentially breaking commit:
>>
>>   commit d89e2378a97fafdc74cbf997e7c88af75b81610a
>>   Author: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
>>   Date:   Thu Oct 12 16:56:14 2017 +0100
>>
>>       drivers: flag buses which demand DMA configuration
>>
>>
>> I've run some boot tests manually with this revision and then
>> also after reverting it in-place, these respectively failed and
>> passed:
>>
>>   * d89e2378, failed:
>>     https://lava.collabora.co.uk/scheduler/job/978968
>>
>>   * d89e2378 reverted, passed:
>>     https://lava.collabora.co.uk/scheduler/job/978969
>>
>>
>> I then went on and tried the same but on top of next-20171108 and
>> found that they both failed
>>
>>   * next-20171108, failed:
>>     https://lava.collabora.co.uk/scheduler/job/979063
>>
>>   * next-20171108 with d89e2378 reverted, failed as well:
>>     https://lava.collabora.co.uk/scheduler/job/979167
>>
>>
>> So this shows there is almost certainly another offending commit
>> in -next.  The errors in both cases are not quite the same, the
>> last one is triggered by a BUG whereas the first one is a NULL
>> pointer (I haven't looked any further).  Also I don't think
>> there's any fix for d89e2378a97fafdc74cbf997e7c88af75b81610a
>> which is currently still in next.
>
> This crash is a known issue [0] and we have been discussing this. Can
> you try applying [1]?

So with next-20171108 + d89e2378a9 reverted + [1] applied:

   https://lava.collabora.co.uk/scheduler/job/979173

No visible kernel crash in the log but it hangs.


I also tried next-20171108 + [1] applied only:

   https://lava.collabora.co.uk/scheduler/job/979179

which also appears to hang.

Guillaume

> [0] https://lkml.org/lkml/2017/9/19/306
> [1] https://patchwork.kernel.org/patch/9974835/

^ permalink raw reply	[flat|nested] 46+ messages in thread

* next/master boot: 273 boots: 63 failed, 209 passed with 1 untried/unknown (next-20171106)
@ 2017-11-08 16:42                         ` Guillaume Tucker
  0 siblings, 0 replies; 46+ messages in thread
From: Guillaume Tucker @ 2017-11-08 16:42 UTC (permalink / raw)
  To: linux-arm-kernel

On 08/11/17 15:57, Jon Hunter wrote:
>
> On 08/11/17 15:19, Guillaume Tucker wrote:
>
> ...
>
>> After a few more automated bisection attempts and a bug fix in
>> LAVA, I've now found at least one potentially breaking commit:
>>
>>   commit d89e2378a97fafdc74cbf997e7c88af75b81610a
>>   Author: Robin Murphy <robin.murphy@arm.com>
>>   Date:   Thu Oct 12 16:56:14 2017 +0100
>>
>>       drivers: flag buses which demand DMA configuration
>>
>>
>> I've run some boot tests manually with this revision and then
>> also after reverting it in-place, these respectively failed and
>> passed:
>>
>>   * d89e2378, failed:
>>     https://lava.collabora.co.uk/scheduler/job/978968
>>
>>   * d89e2378 reverted, passed:
>>     https://lava.collabora.co.uk/scheduler/job/978969
>>
>>
>> I then went on and tried the same but on top of next-20171108 and
>> found that they both failed
>>
>>   * next-20171108, failed:
>>     https://lava.collabora.co.uk/scheduler/job/979063
>>
>>   * next-20171108 with d89e2378 reverted, failed as well:
>>     https://lava.collabora.co.uk/scheduler/job/979167
>>
>>
>> So this shows there is almost certainly another offending commit
>> in -next.  The errors in both cases are not quite the same, the
>> last one is triggered by a BUG whereas the first one is a NULL
>> pointer (I haven't looked any further).  Also I don't think
>> there's any fix for d89e2378a97fafdc74cbf997e7c88af75b81610a
>> which is currently still in next.
>
> This crash is a known issue [0] and we have been discussing this. Can
> you try applying [1]?

So with next-20171108 + d89e2378a9 reverted + [1] applied:

   https://lava.collabora.co.uk/scheduler/job/979173

No visible kernel crash in the log but it hangs.


I also tried next-20171108 + [1] applied only:

   https://lava.collabora.co.uk/scheduler/job/979179

which also appears to hang.

Guillaume

> [0] https://lkml.org/lkml/2017/9/19/306
> [1] https://patchwork.kernel.org/patch/9974835/

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: next/master boot: 273 boots: 63 failed, 209 passed with 1 untried/unknown (next-20171106)
  2017-11-08 16:23                           ` Mikko Perttunen
@ 2017-11-08 16:47                               ` Robin Murphy
  -1 siblings, 0 replies; 46+ messages in thread
From: Robin Murphy @ 2017-11-08 16:47 UTC (permalink / raw)
  To: Mikko Perttunen, Guillaume Tucker, Mark Brown, Jon Hunter
  Cc: linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	kernelci.org bot, kernel-build-reports-cunTk1MwBs8s++Sfvej+rw

On 08/11/17 16:23, Mikko Perttunen wrote:
> On 08.11.2017 17:55, Robin Murphy wrote:
>>
>>
>> On 08/11/17 15:19, Guillaume Tucker wrote:
>>> On 07/11/17 11:43, Guillaume Tucker wrote:
>>>> On 07/11/17 10:55, Mark Brown wrote:
>>>>> On Tue, Nov 07, 2017 at 10:12:59AM +0000, Jon Hunter wrote:
>>>>>> On 06/11/17 19:17, Mark Brown wrote:
>>>>>
>>>>>>>>     multi_v7_defconfig:
>>>>>>>>         tegra124-nyan-big:
>>>>>>>>             lab-collabora: failing since 2 days (last pass:
>>>>>>>> next-20171102 - first fail: next-20171103)
>>>>>
>>>>>> Thanks for the report. I have been looking into a failure on nyan-big
>>>>>> [0], but this one looks like a new failure. I will take a look.
>>>>>
>>>>> Guillaume Tucker has been bisecting this with the shiny new bisection
>>>>> code he's testing, he was saying on IRC he thinks he's found the
>>>>> offending commit:
>>>>>
>>>>>
>>>>> https://people.collabora.com/~gtucker/tmp/bisect-tegra-4.14.rc8-next-20171106.txt 
>>>>>
>>>>>
>>>>>
>>>>> (not CCing Johannes yet)
>>>>
>>>> Please take this with a pinch of salt, I'm now running some extra
>>>> boot tests to prove it.  If you look at this log, all the boots
>>>> passed which is a bit suspicious.  I did build and boot the
>>>> revision it found with multi_v7_defconfig on tegra124 and it
>>>> passed, so it looks like this commit may not have anything to do
>>>> with the boot failure.  The automated bisection is still experimental.
>>>>
>>>> Passing LAVA boot test with this revision:
>>>>
>>>>   https://lava.collabora.co.uk/scheduler/job/976375
>>>>
>>>> I've started a slightly different bisection job now on
>>>> next-20171107 and the common ancestor between next and mainline,
>>>> results can take a few hours to come back.
>>>
>>> After a few more automated bisection attempts and a bug fix in
>>> LAVA, I've now found at least one potentially breaking commit:
>>>
>>>    commit d89e2378a97fafdc74cbf997e7c88af75b81610a
>>>    Author: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
>>>    Date:   Thu Oct 12 16:56:14 2017 +0100
>>>
>>>        drivers: flag buses which demand DMA configuration
>>>
>>>
>>> I've run some boot tests manually with this revision and then
>>> also after reverting it in-place, these respectively failed and
>>> passed:
>>>
>>>    * d89e2378, failed:
>>>      https://lava.collabora.co.uk/scheduler/job/978968
>>>
>>>    * d89e2378 reverted, passed:
>>>      https://lava.collabora.co.uk/scheduler/job/978969
>>>
>>>
>>> I then went on and tried the same but on top of next-20171108 and
>>> found that they both failed
>>>
>>>    * next-20171108, failed:
>>>      https://lava.collabora.co.uk/scheduler/job/979063
>>>
>>>    * next-20171108 with d89e2378 reverted, failed as well:
>>>      https://lava.collabora.co.uk/scheduler/job/979167
>>>
>>>
>>> So this shows there is almost certainly another offending commit
>>> in -next.  The errors in both cases are not quite the same, the
>>> last one is triggered by a BUG whereas the first one is a NULL
>>> pointer (I haven't looked any further).  Also I don't think
>>> there's any fix for d89e2378a97fafdc74cbf997e7c88af75b81610a
>>> which is currently still in next.
>>
>> The fix was actually posted before said commit was even written:
>>
>> https://patchwork.kernel.org/patch/9967847/
>>
>> What is currently queued in the DMA tree fell out of the discussion on
>> patch 2 of that series, but I kind of assumed the host1x folks would
>> still take patch 1; I guess that hasn't happened.
> 
> I am seeing this patch in linux-next, though:

Phew, great! Perhaps I should have actually looked :)

So for this case it seems it's only the DRM tree being merged into -next 
after the DMA mapping tree which is hurting Guillaume's bisection. 
That's unfortunate, but it least it's not an a complete showstopper.

Robin.

> commit 2fb0dceb69ce957f01bdb6fddf7baf4c4b9cbc0d
> Author:     Mikko Perttunen <mperttunen-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
> AuthorDate: Sun Sep 24 12:04:53 2017 +0300
> Commit:     Thierry Reding <treding-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
> CommitDate: Fri Oct 20 14:19:51 2017 +0200
> 
>      gpu: host1x: Call of_dma_configure() after setting bus
> 
>      of_dma_configure() now checks the device's bus before configuring 
> it, so
>      we need to set the device's bus before calling.
> 
>      Signed-off-by: Mikko Perttunen <mperttunen-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
>      Signed-off-by: Thierry Reding <treding-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
> 
> 
> Mikko
> 
>>
>> Robin.
>>
>>>
>>> Note: This happens to be a very good example of running a
>>> kernelci.org bisection on a real issue, it's quite a bit of a
>>> pipe cleaner.  I'll now see if there's a way to bisect what looks
>>> like another breaking change in-between.
>>>
>>> Guillaume
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe linux-tegra" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 46+ messages in thread

* next/master boot: 273 boots: 63 failed, 209 passed with 1 untried/unknown (next-20171106)
@ 2017-11-08 16:47                               ` Robin Murphy
  0 siblings, 0 replies; 46+ messages in thread
From: Robin Murphy @ 2017-11-08 16:47 UTC (permalink / raw)
  To: linux-arm-kernel

On 08/11/17 16:23, Mikko Perttunen wrote:
> On 08.11.2017 17:55, Robin Murphy wrote:
>>
>>
>> On 08/11/17 15:19, Guillaume Tucker wrote:
>>> On 07/11/17 11:43, Guillaume Tucker wrote:
>>>> On 07/11/17 10:55, Mark Brown wrote:
>>>>> On Tue, Nov 07, 2017 at 10:12:59AM +0000, Jon Hunter wrote:
>>>>>> On 06/11/17 19:17, Mark Brown wrote:
>>>>>
>>>>>>>> ??? multi_v7_defconfig:
>>>>>>>> ??????? tegra124-nyan-big:
>>>>>>>> ??????????? lab-collabora: failing since 2 days (last pass:
>>>>>>>> next-20171102 - first fail: next-20171103)
>>>>>
>>>>>> Thanks for the report. I have been looking into a failure on nyan-big
>>>>>> [0], but this one looks like a new failure. I will take a look.
>>>>>
>>>>> Guillaume Tucker has been bisecting this with the shiny new bisection
>>>>> code he's testing, he was saying on IRC he thinks he's found the
>>>>> offending commit:
>>>>>
>>>>>
>>>>> https://people.collabora.com/~gtucker/tmp/bisect-tegra-4.14.rc8-next-20171106.txt 
>>>>>
>>>>>
>>>>>
>>>>> (not CCing Johannes yet)
>>>>
>>>> Please take this with a pinch of salt, I'm now running some extra
>>>> boot tests to prove it.? If you look at this log, all the boots
>>>> passed which is a bit suspicious.? I did build and boot the
>>>> revision it found with multi_v7_defconfig on tegra124 and it
>>>> passed, so it looks like this commit may not have anything to do
>>>> with the boot failure.? The automated bisection is still experimental.
>>>>
>>>> Passing LAVA boot test with this revision:
>>>>
>>>> ? https://lava.collabora.co.uk/scheduler/job/976375
>>>>
>>>> I've started a slightly different bisection job now on
>>>> next-20171107 and the common ancestor between next and mainline,
>>>> results can take a few hours to come back.
>>>
>>> After a few more automated bisection attempts and a bug fix in
>>> LAVA, I've now found at least one potentially breaking commit:
>>>
>>> ?? commit d89e2378a97fafdc74cbf997e7c88af75b81610a
>>> ?? Author: Robin Murphy <robin.murphy@arm.com>
>>> ?? Date:?? Thu Oct 12 16:56:14 2017 +0100
>>>
>>> ?????? drivers: flag buses which demand DMA configuration
>>>
>>>
>>> I've run some boot tests manually with this revision and then
>>> also after reverting it in-place, these respectively failed and
>>> passed:
>>>
>>> ?? * d89e2378, failed:
>>> ???? https://lava.collabora.co.uk/scheduler/job/978968
>>>
>>> ?? * d89e2378 reverted, passed:
>>> ???? https://lava.collabora.co.uk/scheduler/job/978969
>>>
>>>
>>> I then went on and tried the same but on top of next-20171108 and
>>> found that they both failed
>>>
>>> ?? * next-20171108, failed:
>>> ???? https://lava.collabora.co.uk/scheduler/job/979063
>>>
>>> ?? * next-20171108 with d89e2378 reverted, failed as well:
>>> ???? https://lava.collabora.co.uk/scheduler/job/979167
>>>
>>>
>>> So this shows there is almost certainly another offending commit
>>> in -next.? The errors in both cases are not quite the same, the
>>> last one is triggered by a BUG whereas the first one is a NULL
>>> pointer (I haven't looked any further).? Also I don't think
>>> there's any fix for d89e2378a97fafdc74cbf997e7c88af75b81610a
>>> which is currently still in next.
>>
>> The fix was actually posted before said commit was even written:
>>
>> https://patchwork.kernel.org/patch/9967847/
>>
>> What is currently queued in the DMA tree fell out of the discussion on
>> patch 2 of that series, but I kind of assumed the host1x folks would
>> still take patch 1; I guess that hasn't happened.
> 
> I am seeing this patch in linux-next, though:

Phew, great! Perhaps I should have actually looked :)

So for this case it seems it's only the DRM tree being merged into -next 
after the DMA mapping tree which is hurting Guillaume's bisection. 
That's unfortunate, but it least it's not an a complete showstopper.

Robin.

> commit 2fb0dceb69ce957f01bdb6fddf7baf4c4b9cbc0d
> Author:???? Mikko Perttunen <mperttunen@nvidia.com>
> AuthorDate: Sun Sep 24 12:04:53 2017 +0300
> Commit:???? Thierry Reding <treding@nvidia.com>
> CommitDate: Fri Oct 20 14:19:51 2017 +0200
> 
>  ??? gpu: host1x: Call of_dma_configure() after setting bus
> 
>  ??? of_dma_configure() now checks the device's bus before configuring 
> it, so
>  ??? we need to set the device's bus before calling.
> 
>  ??? Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
>  ??? Signed-off-by: Thierry Reding <treding@nvidia.com>
> 
> 
> Mikko
> 
>>
>> Robin.
>>
>>>
>>> Note: This happens to be a very good example of running a
>>> kernelci.org bisection on a real issue, it's quite a bit of a
>>> pipe cleaner.? I'll now see if there's a way to bisect what looks
>>> like another breaking change in-between.
>>>
>>> Guillaume
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe linux-tegra" in
>> the body of a message to majordomo at vger.kernel.org
>> More majordomo info at? http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: next/master boot: 273 boots: 63 failed, 209 passed with 1 untried/unknown (next-20171106)
  2017-11-08 16:42                         ` Guillaume Tucker
@ 2017-11-09  9:55                           ` Jon Hunter
  -1 siblings, 0 replies; 46+ messages in thread
From: Jon Hunter @ 2017-11-09  9:55 UTC (permalink / raw)
  To: Guillaume Tucker, Mark Brown
  Cc: linux-tegra, Robin Murphy, kernelci.org bot, linux-arm-kernel,
	kernel-build-reports


On 08/11/17 16:42, Guillaume Tucker wrote:
> On 08/11/17 15:57, Jon Hunter wrote:
>>
>> On 08/11/17 15:19, Guillaume Tucker wrote:
>>
>> ...
>>
>>> After a few more automated bisection attempts and a bug fix in
>>> LAVA, I've now found at least one potentially breaking commit:
>>>
>>>   commit d89e2378a97fafdc74cbf997e7c88af75b81610a
>>>   Author: Robin Murphy <robin.murphy@arm.com>
>>>   Date:   Thu Oct 12 16:56:14 2017 +0100
>>>
>>>       drivers: flag buses which demand DMA configuration
>>>
>>>
>>> I've run some boot tests manually with this revision and then
>>> also after reverting it in-place, these respectively failed and
>>> passed:
>>>
>>>   * d89e2378, failed:
>>>     https://lava.collabora.co.uk/scheduler/job/978968
>>>
>>>   * d89e2378 reverted, passed:
>>>     https://lava.collabora.co.uk/scheduler/job/978969
>>>
>>>
>>> I then went on and tried the same but on top of next-20171108 and
>>> found that they both failed
>>>
>>>   * next-20171108, failed:
>>>     https://lava.collabora.co.uk/scheduler/job/979063
>>>
>>>   * next-20171108 with d89e2378 reverted, failed as well:
>>>     https://lava.collabora.co.uk/scheduler/job/979167
>>>
>>>
>>> So this shows there is almost certainly another offending commit
>>> in -next.  The errors in both cases are not quite the same, the
>>> last one is triggered by a BUG whereas the first one is a NULL
>>> pointer (I haven't looked any further).  Also I don't think
>>> there's any fix for d89e2378a97fafdc74cbf997e7c88af75b81610a
>>> which is currently still in next.
>>
>> This crash is a known issue [0] and we have been discussing this. Can
>> you try applying [1]?
> 
> So with next-20171108 + d89e2378a9 reverted + [1] applied:
> 
>   https://lava.collabora.co.uk/scheduler/job/979173
> 
> No visible kernel crash in the log but it hangs.
> 
> 
> I also tried next-20171108 + [1] applied only:
> 
>   https://lava.collabora.co.uk/scheduler/job/979179
> 
> which also appears to hang.

Thanks for the update. I am wondering if it is one of the kernel modules
that is getting loaded because booting multi_v7_defconfig and loading no
modules does not hang for me. I will take a look but I might not get to
it until next week.

Cheers
Jon

-- 
nvpublic

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* next/master boot: 273 boots: 63 failed, 209 passed with 1 untried/unknown (next-20171106)
@ 2017-11-09  9:55                           ` Jon Hunter
  0 siblings, 0 replies; 46+ messages in thread
From: Jon Hunter @ 2017-11-09  9:55 UTC (permalink / raw)
  To: linux-arm-kernel


On 08/11/17 16:42, Guillaume Tucker wrote:
> On 08/11/17 15:57, Jon Hunter wrote:
>>
>> On 08/11/17 15:19, Guillaume Tucker wrote:
>>
>> ...
>>
>>> After a few more automated bisection attempts and a bug fix in
>>> LAVA, I've now found at least one potentially breaking commit:
>>>
>>> ? commit d89e2378a97fafdc74cbf997e7c88af75b81610a
>>> ? Author: Robin Murphy <robin.murphy@arm.com>
>>> ? Date:?? Thu Oct 12 16:56:14 2017 +0100
>>>
>>> ????? drivers: flag buses which demand DMA configuration
>>>
>>>
>>> I've run some boot tests manually with this revision and then
>>> also after reverting it in-place, these respectively failed and
>>> passed:
>>>
>>> ? * d89e2378, failed:
>>> ??? https://lava.collabora.co.uk/scheduler/job/978968
>>>
>>> ? * d89e2378 reverted, passed:
>>> ??? https://lava.collabora.co.uk/scheduler/job/978969
>>>
>>>
>>> I then went on and tried the same but on top of next-20171108 and
>>> found that they both failed
>>>
>>> ? * next-20171108, failed:
>>> ??? https://lava.collabora.co.uk/scheduler/job/979063
>>>
>>> ? * next-20171108 with d89e2378 reverted, failed as well:
>>> ??? https://lava.collabora.co.uk/scheduler/job/979167
>>>
>>>
>>> So this shows there is almost certainly another offending commit
>>> in -next.? The errors in both cases are not quite the same, the
>>> last one is triggered by a BUG whereas the first one is a NULL
>>> pointer (I haven't looked any further).? Also I don't think
>>> there's any fix for d89e2378a97fafdc74cbf997e7c88af75b81610a
>>> which is currently still in next.
>>
>> This crash is a known issue [0] and we have been discussing this. Can
>> you try applying [1]?
> 
> So with next-20171108 + d89e2378a9 reverted + [1] applied:
> 
> ? https://lava.collabora.co.uk/scheduler/job/979173
> 
> No visible kernel crash in the log but it hangs.
> 
> 
> I also tried next-20171108 + [1] applied only:
> 
> ? https://lava.collabora.co.uk/scheduler/job/979179
> 
> which also appears to hang.

Thanks for the update. I am wondering if it is one of the kernel modules
that is getting loaded because booting multi_v7_defconfig and loading no
modules does not hang for me. I will take a look but I might not get to
it until next week.

Cheers
Jon

-- 
nvpublic

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: next/master boot: 273 boots: 63 failed, 209 passed with 1 untried/unknown (next-20171106)
  2017-11-09  9:55                           ` Jon Hunter
@ 2017-11-09 10:43                               ` Guillaume Tucker
  -1 siblings, 0 replies; 46+ messages in thread
From: Guillaume Tucker @ 2017-11-09 10:43 UTC (permalink / raw)
  To: Jon Hunter, Mark Brown
  Cc: linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	kernelci.org bot, kernel-build-reports-cunTk1MwBs8s++Sfvej+rw,
	Robin Murphy

On 09/11/17 09:55, Jon Hunter wrote:
>
> On 08/11/17 16:42, Guillaume Tucker wrote:
>> On 08/11/17 15:57, Jon Hunter wrote:
>>>
>>> On 08/11/17 15:19, Guillaume Tucker wrote:
>>>
>>> ...
>>>
>>>> After a few more automated bisection attempts and a bug fix in
>>>> LAVA, I've now found at least one potentially breaking commit:
>>>>
>>>>   commit d89e2378a97fafdc74cbf997e7c88af75b81610a
>>>>   Author: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
>>>>   Date:   Thu Oct 12 16:56:14 2017 +0100
>>>>
>>>>       drivers: flag buses which demand DMA configuration
>>>>
>>>>
>>>> I've run some boot tests manually with this revision and then
>>>> also after reverting it in-place, these respectively failed and
>>>> passed:
>>>>
>>>>   * d89e2378, failed:
>>>>     https://lava.collabora.co.uk/scheduler/job/978968
>>>>
>>>>   * d89e2378 reverted, passed:
>>>>     https://lava.collabora.co.uk/scheduler/job/978969
>>>>
>>>>
>>>> I then went on and tried the same but on top of next-20171108 and
>>>> found that they both failed
>>>>
>>>>   * next-20171108, failed:
>>>>     https://lava.collabora.co.uk/scheduler/job/979063
>>>>
>>>>   * next-20171108 with d89e2378 reverted, failed as well:
>>>>     https://lava.collabora.co.uk/scheduler/job/979167
>>>>
>>>>
>>>> So this shows there is almost certainly another offending commit
>>>> in -next.  The errors in both cases are not quite the same, the
>>>> last one is triggered by a BUG whereas the first one is a NULL
>>>> pointer (I haven't looked any further).  Also I don't think
>>>> there's any fix for d89e2378a97fafdc74cbf997e7c88af75b81610a
>>>> which is currently still in next.
>>>
>>> This crash is a known issue [0] and we have been discussing this. Can
>>> you try applying [1]?
>>
>> So with next-20171108 + d89e2378a9 reverted + [1] applied:
>>
>>   https://lava.collabora.co.uk/scheduler/job/979173
>>
>> No visible kernel crash in the log but it hangs.
>>
>>
>> I also tried next-20171108 + [1] applied only:
>>
>>   https://lava.collabora.co.uk/scheduler/job/979179
>>
>> which also appears to hang.
>
> Thanks for the update. I am wondering if it is one of the kernel modules
> that is getting loaded because booting multi_v7_defconfig and loading no
> modules does not hang for me. I will take a look but I might not get to
> it until next week.

I actually built these kernel revisions with module support
disabled to speed up the builds, and no modules are being
downloaded in the LAVA job.

If you have a public URL with your known working kernel zImage
and dtb, let me know so I could re-run the same test LAVA boot
test to see if I get the same results as you (i.e. no hang).

Guillaume

^ permalink raw reply	[flat|nested] 46+ messages in thread

* next/master boot: 273 boots: 63 failed, 209 passed with 1 untried/unknown (next-20171106)
@ 2017-11-09 10:43                               ` Guillaume Tucker
  0 siblings, 0 replies; 46+ messages in thread
From: Guillaume Tucker @ 2017-11-09 10:43 UTC (permalink / raw)
  To: linux-arm-kernel

On 09/11/17 09:55, Jon Hunter wrote:
>
> On 08/11/17 16:42, Guillaume Tucker wrote:
>> On 08/11/17 15:57, Jon Hunter wrote:
>>>
>>> On 08/11/17 15:19, Guillaume Tucker wrote:
>>>
>>> ...
>>>
>>>> After a few more automated bisection attempts and a bug fix in
>>>> LAVA, I've now found at least one potentially breaking commit:
>>>>
>>>>   commit d89e2378a97fafdc74cbf997e7c88af75b81610a
>>>>   Author: Robin Murphy <robin.murphy@arm.com>
>>>>   Date:   Thu Oct 12 16:56:14 2017 +0100
>>>>
>>>>       drivers: flag buses which demand DMA configuration
>>>>
>>>>
>>>> I've run some boot tests manually with this revision and then
>>>> also after reverting it in-place, these respectively failed and
>>>> passed:
>>>>
>>>>   * d89e2378, failed:
>>>>     https://lava.collabora.co.uk/scheduler/job/978968
>>>>
>>>>   * d89e2378 reverted, passed:
>>>>     https://lava.collabora.co.uk/scheduler/job/978969
>>>>
>>>>
>>>> I then went on and tried the same but on top of next-20171108 and
>>>> found that they both failed
>>>>
>>>>   * next-20171108, failed:
>>>>     https://lava.collabora.co.uk/scheduler/job/979063
>>>>
>>>>   * next-20171108 with d89e2378 reverted, failed as well:
>>>>     https://lava.collabora.co.uk/scheduler/job/979167
>>>>
>>>>
>>>> So this shows there is almost certainly another offending commit
>>>> in -next.  The errors in both cases are not quite the same, the
>>>> last one is triggered by a BUG whereas the first one is a NULL
>>>> pointer (I haven't looked any further).  Also I don't think
>>>> there's any fix for d89e2378a97fafdc74cbf997e7c88af75b81610a
>>>> which is currently still in next.
>>>
>>> This crash is a known issue [0] and we have been discussing this. Can
>>> you try applying [1]?
>>
>> So with next-20171108 + d89e2378a9 reverted + [1] applied:
>>
>>   https://lava.collabora.co.uk/scheduler/job/979173
>>
>> No visible kernel crash in the log but it hangs.
>>
>>
>> I also tried next-20171108 + [1] applied only:
>>
>>   https://lava.collabora.co.uk/scheduler/job/979179
>>
>> which also appears to hang.
>
> Thanks for the update. I am wondering if it is one of the kernel modules
> that is getting loaded because booting multi_v7_defconfig and loading no
> modules does not hang for me. I will take a look but I might not get to
> it until next week.

I actually built these kernel revisions with module support
disabled to speed up the builds, and no modules are being
downloaded in the LAVA job.

If you have a public URL with your known working kernel zImage
and dtb, let me know so I could re-run the same test LAVA boot
test to see if I get the same results as you (i.e. no hang).

Guillaume

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: next/master boot: 273 boots: 63 failed, 209 passed with 1 untried/unknown (next-20171106)
  2017-11-09 10:43                               ` Guillaume Tucker
@ 2017-11-09 11:29                                 ` Jon Hunter
  -1 siblings, 0 replies; 46+ messages in thread
From: Jon Hunter @ 2017-11-09 11:29 UTC (permalink / raw)
  To: Guillaume Tucker, Mark Brown
  Cc: linux-tegra, Robin Murphy, kernelci.org bot, linux-arm-kernel,
	kernel-build-reports


On 09/11/17 10:43, Guillaume Tucker wrote:

...

> I actually built these kernel revisions with module support
> disabled to speed up the builds, and no modules are being
> downloaded in the LAVA job.
> 
> If you have a public URL with your known working kernel zImage
> and dtb, let me know so I could re-run the same test LAVA boot
> test to see if I get the same results as you (i.e. no hang).

I don't have a public URL for the zImage but I can definitely email it
to you. By the way, when booting I am setting 'init=/bin/bash' so no
start-up scripts are running.

Cheers
Jon

-- 
nvpublic

^ permalink raw reply	[flat|nested] 46+ messages in thread

* next/master boot: 273 boots: 63 failed, 209 passed with 1 untried/unknown (next-20171106)
@ 2017-11-09 11:29                                 ` Jon Hunter
  0 siblings, 0 replies; 46+ messages in thread
From: Jon Hunter @ 2017-11-09 11:29 UTC (permalink / raw)
  To: linux-arm-kernel


On 09/11/17 10:43, Guillaume Tucker wrote:

...

> I actually built these kernel revisions with module support
> disabled to speed up the builds, and no modules are being
> downloaded in the LAVA job.
> 
> If you have a public URL with your known working kernel zImage
> and dtb, let me know so I could re-run the same test LAVA boot
> test to see if I get the same results as you (i.e. no hang).

I don't have a public URL for the zImage but I can definitely email it
to you. By the way, when booting I am setting 'init=/bin/bash' so no
start-up scripts are running.

Cheers
Jon

-- 
nvpublic

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: next/master boot: 273 boots: 63 failed, 209 passed with 1 untried/unknown (next-20171106)
  2017-11-09 11:29                                 ` Jon Hunter
@ 2017-11-09 12:51                                     ` Guillaume Tucker
  -1 siblings, 0 replies; 46+ messages in thread
From: Guillaume Tucker @ 2017-11-09 12:51 UTC (permalink / raw)
  To: Jon Hunter, Mark Brown
  Cc: linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	kernelci.org bot, kernel-build-reports-cunTk1MwBs8s++Sfvej+rw,
	Robin Murphy

On 09/11/17 11:29, Jon Hunter wrote:
>
> On 09/11/17 10:43, Guillaume Tucker wrote:
>
> ...
>
>> I actually built these kernel revisions with module support
>> disabled to speed up the builds, and no modules are being
>> downloaded in the LAVA job.
>>
>> If you have a public URL with your known working kernel zImage
>> and dtb, let me know so I could re-run the same test LAVA boot
>> test to see if I get the same results as you (i.e. no hang).
>
> I don't have a public URL for the zImage but I can definitely email it
> to you. By the way, when booting I am setting 'init=/bin/bash' so no
> start-up scripts are running.

Thanks, I tried your binary and it booted fine.  I built
next-20171109 again with and without loadable module support
enabled and it fails with it disabled but passes with it
enabled (even without actually loading any modules):

   * with CONFIG_MODULES disabled (fails):
     https://lava.collabora.co.uk/scheduler/job/981215

   * with plain multi_v7_defconfig and no modules loaded (passes):
     https://lava.collabora.co.uk/scheduler/job/981217


So I guess this means disabling loadable modules support has some
interesting side-effects that cause the kernel to crash.  I think
the kci builds all leave modules enabled with multi_v7_defconfig,
so the failing boots must be due to something else.  Taking
another look at the failing kci boots now...

Guillaume

^ permalink raw reply	[flat|nested] 46+ messages in thread

* next/master boot: 273 boots: 63 failed, 209 passed with 1 untried/unknown (next-20171106)
@ 2017-11-09 12:51                                     ` Guillaume Tucker
  0 siblings, 0 replies; 46+ messages in thread
From: Guillaume Tucker @ 2017-11-09 12:51 UTC (permalink / raw)
  To: linux-arm-kernel

On 09/11/17 11:29, Jon Hunter wrote:
>
> On 09/11/17 10:43, Guillaume Tucker wrote:
>
> ...
>
>> I actually built these kernel revisions with module support
>> disabled to speed up the builds, and no modules are being
>> downloaded in the LAVA job.
>>
>> If you have a public URL with your known working kernel zImage
>> and dtb, let me know so I could re-run the same test LAVA boot
>> test to see if I get the same results as you (i.e. no hang).
>
> I don't have a public URL for the zImage but I can definitely email it
> to you. By the way, when booting I am setting 'init=/bin/bash' so no
> start-up scripts are running.

Thanks, I tried your binary and it booted fine.  I built
next-20171109 again with and without loadable module support
enabled and it fails with it disabled but passes with it
enabled (even without actually loading any modules):

   * with CONFIG_MODULES disabled (fails):
     https://lava.collabora.co.uk/scheduler/job/981215

   * with plain multi_v7_defconfig and no modules loaded (passes):
     https://lava.collabora.co.uk/scheduler/job/981217

So I guess this means disabling loadable modules support has some
interesting side-effects that cause the kernel to crash.  I think
the kci builds all leave modules enabled with multi_v7_defconfig,
so the failing boots must be due to something else.  Taking
another look at the failing kci boots now...

Guillaume

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: next/master boot: 273 boots: 63 failed, 209 passed with 1 untried/unknown (next-20171106)
  2017-11-09 12:51                                     ` Guillaume Tucker
@ 2017-11-09 13:17                                         ` Arnd Bergmann
  -1 siblings, 0 replies; 46+ messages in thread
From: Arnd Bergmann @ 2017-11-09 13:17 UTC (permalink / raw)
  To: Guillaume Tucker
  Cc: Jon Hunter, Mark Brown, open list:TEGRA ARCHITECTURE SUPPORT,
	Robin Murphy, kernelci.org bot, Linux ARM,
	Kernel Build Reports Mailman List

On Thu, Nov 9, 2017 at 1:51 PM, Guillaume Tucker
<guillaume.tucker-ZGY8ohtN/8qB+jHODAdFcQ@public.gmane.org> wrote:
> On 09/11/17 11:29, Jon Hunter wrote:
>>
>>
>> On 09/11/17 10:43, Guillaume Tucker wrote:
>>
>> ...
>>
>>> I actually built these kernel revisions with module support
>>> disabled to speed up the builds, and no modules are being
>>> downloaded in the LAVA job.
>>>
>>> If you have a public URL with your known working kernel zImage
>>> and dtb, let me know so I could re-run the same test LAVA boot
>>> test to see if I get the same results as you (i.e. no hang).
>>
>>
>> I don't have a public URL for the zImage but I can definitely email it
>> to you. By the way, when booting I am setting 'init=/bin/bash' so no
>> start-up scripts are running.
>
>
> Thanks, I tried your binary and it booted fine.  I built
> next-20171109 again with and without loadable module support
> enabled and it fails with it disabled but passes with it
> enabled (even without actually loading any modules):
>
>   * with CONFIG_MODULES disabled (fails):
>     https://lava.collabora.co.uk/scheduler/job/981215
>
>   * with plain multi_v7_defconfig and no modules loaded (passes):
>     https://lava.collabora.co.uk/scheduler/job/981217
>
>
> So I guess this means disabling loadable modules support has some
> interesting side-effects that cause the kernel to crash.  I think
> the kci builds all leave modules enabled with multi_v7_defconfig,
> so the failing boots must be due to something else.  Taking
> another look at the failing kci boots now...

The one thing that comes to mind that happened recently with
modules is

371435f78e9e ("kernel debug: support resetting WARN_ONCE for all architectures")

Can you try reverting this? It's probably something else, but I'd like
to rule it out.

      Arnd

^ permalink raw reply	[flat|nested] 46+ messages in thread

* next/master boot: 273 boots: 63 failed, 209 passed with 1 untried/unknown (next-20171106)
@ 2017-11-09 13:17                                         ` Arnd Bergmann
  0 siblings, 0 replies; 46+ messages in thread
From: Arnd Bergmann @ 2017-11-09 13:17 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Nov 9, 2017 at 1:51 PM, Guillaume Tucker
<guillaume.tucker@collabora.com> wrote:
> On 09/11/17 11:29, Jon Hunter wrote:
>>
>>
>> On 09/11/17 10:43, Guillaume Tucker wrote:
>>
>> ...
>>
>>> I actually built these kernel revisions with module support
>>> disabled to speed up the builds, and no modules are being
>>> downloaded in the LAVA job.
>>>
>>> If you have a public URL with your known working kernel zImage
>>> and dtb, let me know so I could re-run the same test LAVA boot
>>> test to see if I get the same results as you (i.e. no hang).
>>
>>
>> I don't have a public URL for the zImage but I can definitely email it
>> to you. By the way, when booting I am setting 'init=/bin/bash' so no
>> start-up scripts are running.
>
>
> Thanks, I tried your binary and it booted fine.  I built
> next-20171109 again with and without loadable module support
> enabled and it fails with it disabled but passes with it
> enabled (even without actually loading any modules):
>
>   * with CONFIG_MODULES disabled (fails):
>     https://lava.collabora.co.uk/scheduler/job/981215
>
>   * with plain multi_v7_defconfig and no modules loaded (passes):
>     https://lava.collabora.co.uk/scheduler/job/981217
>
>
> So I guess this means disabling loadable modules support has some
> interesting side-effects that cause the kernel to crash.  I think
> the kci builds all leave modules enabled with multi_v7_defconfig,
> so the failing boots must be due to something else.  Taking
> another look at the failing kci boots now...

The one thing that comes to mind that happened recently with
modules is

371435f78e9e ("kernel debug: support resetting WARN_ONCE for all architectures")

Can you try reverting this? It's probably something else, but I'd like
to rule it out.

      Arnd

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: next/master boot: 273 boots: 63 failed, 209 passed with 1 untried/unknown (next-20171106)
  2017-11-09 13:17                                         ` Arnd Bergmann
@ 2017-11-09 15:23                                           ` Jon Hunter
  -1 siblings, 0 replies; 46+ messages in thread
From: Jon Hunter @ 2017-11-09 15:23 UTC (permalink / raw)
  To: Arnd Bergmann, Guillaume Tucker
  Cc: kernelci.org bot, Kernel Build Reports Mailman List, Mark Brown,
	open list:TEGRA ARCHITECTURE SUPPORT, Robin Murphy, Linux ARM


On 09/11/17 13:17, Arnd Bergmann wrote:
> On Thu, Nov 9, 2017 at 1:51 PM, Guillaume Tucker
> <guillaume.tucker@collabora.com> wrote:
>> On 09/11/17 11:29, Jon Hunter wrote:
>>>
>>>
>>> On 09/11/17 10:43, Guillaume Tucker wrote:
>>>
>>> ...
>>>
>>>> I actually built these kernel revisions with module support
>>>> disabled to speed up the builds, and no modules are being
>>>> downloaded in the LAVA job.
>>>>
>>>> If you have a public URL with your known working kernel zImage
>>>> and dtb, let me know so I could re-run the same test LAVA boot
>>>> test to see if I get the same results as you (i.e. no hang).
>>>
>>>
>>> I don't have a public URL for the zImage but I can definitely email it
>>> to you. By the way, when booting I am setting 'init=/bin/bash' so no
>>> start-up scripts are running.
>>
>>
>> Thanks, I tried your binary and it booted fine.  I built
>> next-20171109 again with and without loadable module support
>> enabled and it fails with it disabled but passes with it
>> enabled (even without actually loading any modules):
>>
>>   * with CONFIG_MODULES disabled (fails):
>>     https://lava.collabora.co.uk/scheduler/job/981215
>>
>>   * with plain multi_v7_defconfig and no modules loaded (passes):
>>     https://lava.collabora.co.uk/scheduler/job/981217
>>
>>
>> So I guess this means disabling loadable modules support has some
>> interesting side-effects that cause the kernel to crash.  I think
>> the kci builds all leave modules enabled with multi_v7_defconfig,
>> so the failing boots must be due to something else.  Taking
>> another look at the failing kci boots now...
> 
> The one thing that comes to mind that happened recently with
> modules is
> 
> 371435f78e9e ("kernel debug: support resetting WARN_ONCE for all architectures")
> 
> Can you try reverting this? It's probably something else, but I'd like
> to rule it out.

I have been able to reproduce the problem by disabling support for
loadable modules on both Tegra124 Nyan-Big and Jetson TK1. Disabling
DRM_NOUVEAU appears to avoid the problem.

Guillaume can you try the same?

Cheers
Jon

-- 
nvpublic

^ permalink raw reply	[flat|nested] 46+ messages in thread

* next/master boot: 273 boots: 63 failed, 209 passed with 1 untried/unknown (next-20171106)
@ 2017-11-09 15:23                                           ` Jon Hunter
  0 siblings, 0 replies; 46+ messages in thread
From: Jon Hunter @ 2017-11-09 15:23 UTC (permalink / raw)
  To: linux-arm-kernel


On 09/11/17 13:17, Arnd Bergmann wrote:
> On Thu, Nov 9, 2017 at 1:51 PM, Guillaume Tucker
> <guillaume.tucker@collabora.com> wrote:
>> On 09/11/17 11:29, Jon Hunter wrote:
>>>
>>>
>>> On 09/11/17 10:43, Guillaume Tucker wrote:
>>>
>>> ...
>>>
>>>> I actually built these kernel revisions with module support
>>>> disabled to speed up the builds, and no modules are being
>>>> downloaded in the LAVA job.
>>>>
>>>> If you have a public URL with your known working kernel zImage
>>>> and dtb, let me know so I could re-run the same test LAVA boot
>>>> test to see if I get the same results as you (i.e. no hang).
>>>
>>>
>>> I don't have a public URL for the zImage but I can definitely email it
>>> to you. By the way, when booting I am setting 'init=/bin/bash' so no
>>> start-up scripts are running.
>>
>>
>> Thanks, I tried your binary and it booted fine.  I built
>> next-20171109 again with and without loadable module support
>> enabled and it fails with it disabled but passes with it
>> enabled (even without actually loading any modules):
>>
>>   * with CONFIG_MODULES disabled (fails):
>>     https://lava.collabora.co.uk/scheduler/job/981215
>>
>>   * with plain multi_v7_defconfig and no modules loaded (passes):
>>     https://lava.collabora.co.uk/scheduler/job/981217
>>
>>
>> So I guess this means disabling loadable modules support has some
>> interesting side-effects that cause the kernel to crash.  I think
>> the kci builds all leave modules enabled with multi_v7_defconfig,
>> so the failing boots must be due to something else.  Taking
>> another look at the failing kci boots now...
> 
> The one thing that comes to mind that happened recently with
> modules is
> 
> 371435f78e9e ("kernel debug: support resetting WARN_ONCE for all architectures")
> 
> Can you try reverting this? It's probably something else, but I'd like
> to rule it out.

I have been able to reproduce the problem by disabling support for
loadable modules on both Tegra124 Nyan-Big and Jetson TK1. Disabling
DRM_NOUVEAU appears to avoid the problem.

Guillaume can you try the same?

Cheers
Jon

-- 
nvpublic

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: next/master boot: 273 boots: 63 failed, 209 passed with 1 untried/unknown (next-20171106)
  2017-11-09 15:23                                           ` Jon Hunter
@ 2017-11-09 19:03                                               ` Guillaume Tucker
  -1 siblings, 0 replies; 46+ messages in thread
From: Guillaume Tucker @ 2017-11-09 19:03 UTC (permalink / raw)
  To: Jon Hunter, Arnd Bergmann, Robin Murphy
  Cc: Mark Brown, open list:TEGRA ARCHITECTURE SUPPORT,
	kernelci.org bot, Linux ARM, Kernel Build Reports Mailman List

On 09/11/17 15:23, Jon Hunter wrote:
>
> On 09/11/17 13:17, Arnd Bergmann wrote:
>> On Thu, Nov 9, 2017 at 1:51 PM, Guillaume Tucker
>> <guillaume.tucker-ZGY8ohtN/8qB+jHODAdFcQ@public.gmane.org> wrote:
>>> On 09/11/17 11:29, Jon Hunter wrote:
>>>>
>>>>
>>>> On 09/11/17 10:43, Guillaume Tucker wrote:
>>>>
>>>> ...
>>>>
>>>>> I actually built these kernel revisions with module support
>>>>> disabled to speed up the builds, and no modules are being
>>>>> downloaded in the LAVA job.
>>>>>
>>>>> If you have a public URL with your known working kernel zImage
>>>>> and dtb, let me know so I could re-run the same test LAVA boot
>>>>> test to see if I get the same results as you (i.e. no hang).
>>>>
>>>>
>>>> I don't have a public URL for the zImage but I can definitely email it
>>>> to you. By the way, when booting I am setting 'init=/bin/bash' so no
>>>> start-up scripts are running.
>>>
>>>
>>> Thanks, I tried your binary and it booted fine.  I built
>>> next-20171109 again with and without loadable module support
>>> enabled and it fails with it disabled but passes with it
>>> enabled (even without actually loading any modules):
>>>
>>>   * with CONFIG_MODULES disabled (fails):
>>>     https://lava.collabora.co.uk/scheduler/job/981215
>>>
>>>   * with plain multi_v7_defconfig and no modules loaded (passes):
>>>     https://lava.collabora.co.uk/scheduler/job/981217
>>>
>>>
>>> So I guess this means disabling loadable modules support has some
>>> interesting side-effects that cause the kernel to crash.  I think
>>> the kci builds all leave modules enabled with multi_v7_defconfig,
>>> so the failing boots must be due to something else.  Taking
>>> another look at the failing kci boots now...
>>
>> The one thing that comes to mind that happened recently with
>> modules is
>>
>> 371435f78e9e ("kernel debug: support resetting WARN_ONCE for all architectures")
>>
>> Can you try reverting this? It's probably something else, but I'd like
>> to rule it out.
>
> I have been able to reproduce the problem by disabling support for
> loadable modules on both Tegra124 Nyan-Big and Jetson TK1. Disabling
> DRM_NOUVEAU appears to avoid the problem.
>
> Guillaume can you try the same?

Alright, so here's all the results I got all based on
next-20171109 and running on tegra124-nyan-big:

   * plain multi_v7_defconfig, passes:
     https://lava.collabora.co.uk/scheduler/job/981295

   * CONFIG_MODULES disabled, fails:
     https://lava.collabora.co.uk/scheduler/job/981342

   * CONFIG_MODULES and CONFIG_DRM_NOUVEAU disabled, also fails:
     https://lava.collabora.co.uk/scheduler/job/981343

   * CONFIG_MODULES disabled and 371435f78 [0] reverted, also fails:
     https://lava.collabora.co.uk/scheduler/job/981353


What I could try to run next is a bisection with CONFIG_MODULES
disabled between when the DRM branch got merged (so the first
crash should be fixed) and next-20171109.


Guillaume

[0] "kernel debug: support resetting WARN_ONCE for all architectures"
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=371435f78e9e26519763411c2cd20975d2293efe

^ permalink raw reply	[flat|nested] 46+ messages in thread

* next/master boot: 273 boots: 63 failed, 209 passed with 1 untried/unknown (next-20171106)
@ 2017-11-09 19:03                                               ` Guillaume Tucker
  0 siblings, 0 replies; 46+ messages in thread
From: Guillaume Tucker @ 2017-11-09 19:03 UTC (permalink / raw)
  To: linux-arm-kernel

On 09/11/17 15:23, Jon Hunter wrote:
>
> On 09/11/17 13:17, Arnd Bergmann wrote:
>> On Thu, Nov 9, 2017 at 1:51 PM, Guillaume Tucker
>> <guillaume.tucker@collabora.com> wrote:
>>> On 09/11/17 11:29, Jon Hunter wrote:
>>>>
>>>>
>>>> On 09/11/17 10:43, Guillaume Tucker wrote:
>>>>
>>>> ...
>>>>
>>>>> I actually built these kernel revisions with module support
>>>>> disabled to speed up the builds, and no modules are being
>>>>> downloaded in the LAVA job.
>>>>>
>>>>> If you have a public URL with your known working kernel zImage
>>>>> and dtb, let me know so I could re-run the same test LAVA boot
>>>>> test to see if I get the same results as you (i.e. no hang).
>>>>
>>>>
>>>> I don't have a public URL for the zImage but I can definitely email it
>>>> to you. By the way, when booting I am setting 'init=/bin/bash' so no
>>>> start-up scripts are running.
>>>
>>>
>>> Thanks, I tried your binary and it booted fine.  I built
>>> next-20171109 again with and without loadable module support
>>> enabled and it fails with it disabled but passes with it
>>> enabled (even without actually loading any modules):
>>>
>>>   * with CONFIG_MODULES disabled (fails):
>>>     https://lava.collabora.co.uk/scheduler/job/981215
>>>
>>>   * with plain multi_v7_defconfig and no modules loaded (passes):
>>>     https://lava.collabora.co.uk/scheduler/job/981217
>>>
>>>
>>> So I guess this means disabling loadable modules support has some
>>> interesting side-effects that cause the kernel to crash.  I think
>>> the kci builds all leave modules enabled with multi_v7_defconfig,
>>> so the failing boots must be due to something else.  Taking
>>> another look at the failing kci boots now...
>>
>> The one thing that comes to mind that happened recently with
>> modules is
>>
>> 371435f78e9e ("kernel debug: support resetting WARN_ONCE for all architectures")
>>
>> Can you try reverting this? It's probably something else, but I'd like
>> to rule it out.
>
> I have been able to reproduce the problem by disabling support for
> loadable modules on both Tegra124 Nyan-Big and Jetson TK1. Disabling
> DRM_NOUVEAU appears to avoid the problem.
>
> Guillaume can you try the same?

Alright, so here's all the results I got all based on
next-20171109 and running on tegra124-nyan-big:

   * plain multi_v7_defconfig, passes:
     https://lava.collabora.co.uk/scheduler/job/981295

   * CONFIG_MODULES disabled, fails:
     https://lava.collabora.co.uk/scheduler/job/981342

   * CONFIG_MODULES and CONFIG_DRM_NOUVEAU disabled, also fails:
     https://lava.collabora.co.uk/scheduler/job/981343

   * CONFIG_MODULES disabled and 371435f78 [0] reverted, also fails:
     https://lava.collabora.co.uk/scheduler/job/981353


What I could try to run next is a bisection with CONFIG_MODULES
disabled between when the DRM branch got merged (so the first
crash should be fixed) and next-20171109.


Guillaume

[0] "kernel debug: support resetting WARN_ONCE for all architectures"
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=371435f78e9e26519763411c2cd20975d2293efe

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: next/master boot: 273 boots: 63 failed, 209 passed with 1 untried/unknown (next-20171106)
  2017-11-09 19:03                                               ` Guillaume Tucker
@ 2017-11-09 21:45                                                 ` Jon Hunter
  -1 siblings, 0 replies; 46+ messages in thread
From: Jon Hunter @ 2017-11-09 21:45 UTC (permalink / raw)
  To: Guillaume Tucker, Arnd Bergmann, Robin Murphy, bskeggs
  Cc: open list:TEGRA ARCHITECTURE SUPPORT, Mark Brown, Linux ARM,
	kernelci.org bot, Kernel Build Reports Mailman List


On 09/11/17 19:03, Guillaume Tucker wrote:
...

> Alright, so here's all the results I got all based on
> next-20171109 and running on tegra124-nyan-big:
> 
>   * plain multi_v7_defconfig, passes:
>     https://lava.collabora.co.uk/scheduler/job/981295
> 
>   * CONFIG_MODULES disabled, fails:
>     https://lava.collabora.co.uk/scheduler/job/981342
> 
>   * CONFIG_MODULES and CONFIG_DRM_NOUVEAU disabled, also fails:
>     https://lava.collabora.co.uk/scheduler/job/981343

This is the crash in the EC driver that I mentioned before. You need to
add the fix for the EC driver to avoid this BUG_ON.

I was able to bisect this manually dancing around the various bugs and
it points to this commit ...

commit 7313cfa4f6e30384fa04083698d1e865cf812a6a
Author: Ben Skeggs <bskeggs@redhat.com>
Date:   Wed Nov 1 03:56:19 2017 +1000

    drm/nouveau/bar: move bar1 initialisation into its own function


Unfortunately, I cannot revert cleanly on top of next-20171109 and so I
cannot confirm.

Ben, we are seeing a hang on Tegra when booting with CONFIG_DRM_NOUVEAU
enabled. Apart from the above bisect result, I don't have much else to
go on at the moment. Let me know if you have any thoughts or anything to
test.

Cheers
Jon

-- 
nvpublic

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* next/master boot: 273 boots: 63 failed, 209 passed with 1 untried/unknown (next-20171106)
@ 2017-11-09 21:45                                                 ` Jon Hunter
  0 siblings, 0 replies; 46+ messages in thread
From: Jon Hunter @ 2017-11-09 21:45 UTC (permalink / raw)
  To: linux-arm-kernel

On 09/11/17 19:03, Guillaume Tucker wrote:
...

> Alright, so here's all the results I got all based on
> next-20171109 and running on tegra124-nyan-big:
> 
> ? * plain multi_v7_defconfig, passes:
> ??? https://lava.collabora.co.uk/scheduler/job/981295
> 
> ? * CONFIG_MODULES disabled, fails:
> ??? https://lava.collabora.co.uk/scheduler/job/981342
> 
> ? * CONFIG_MODULES and CONFIG_DRM_NOUVEAU disabled, also fails:
> ??? https://lava.collabora.co.uk/scheduler/job/981343

This is the crash in the EC driver that I mentioned before. You need to
add the fix for the EC driver to avoid this BUG_ON.

I was able to bisect this manually dancing around the various bugs and
it points to this commit ...

commit 7313cfa4f6e30384fa04083698d1e865cf812a6a
Author: Ben Skeggs <bskeggs@redhat.com>
Date:   Wed Nov 1 03:56:19 2017 +1000

    drm/nouveau/bar: move bar1 initialisation into its own function

Unfortunately, I cannot revert cleanly on top of next-20171109 and so I
cannot confirm.

Ben, we are seeing a hang on Tegra when booting with CONFIG_DRM_NOUVEAU
enabled. Apart from the above bisect result, I don't have much else to
go on at the moment. Let me know if you have any thoughts or anything to
test.

Cheers
Jon

-- 
nvpublic

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: next/master boot: 273 boots: 63 failed, 209 passed with 1 untried/unknown (next-20171106)
  2017-11-09 21:45                                                 ` Jon Hunter
@ 2017-11-09 22:54                                                   ` Jon Hunter
  -1 siblings, 0 replies; 46+ messages in thread
From: Jon Hunter @ 2017-11-09 22:54 UTC (permalink / raw)
  To: Guillaume Tucker, Arnd Bergmann, Robin Murphy, bskeggs
  Cc: open list:TEGRA ARCHITECTURE SUPPORT, Mark Brown,
	kernelci.org bot, Linux ARM, Kernel Build Reports Mailman List


On 09/11/17 21:45, Jon Hunter wrote:
> 
> On 09/11/17 19:03, Guillaume Tucker wrote:
> ...
> 
>> Alright, so here's all the results I got all based on
>> next-20171109 and running on tegra124-nyan-big:
>>
>>   * plain multi_v7_defconfig, passes:
>>     https://lava.collabora.co.uk/scheduler/job/981295
>>
>>   * CONFIG_MODULES disabled, fails:
>>     https://lava.collabora.co.uk/scheduler/job/981342
>>
>>   * CONFIG_MODULES and CONFIG_DRM_NOUVEAU disabled, also fails:
>>     https://lava.collabora.co.uk/scheduler/job/981343
> 
> This is the crash in the EC driver that I mentioned before. You need to
> add the fix for the EC driver to avoid this BUG_ON.
> 
> I was able to bisect this manually dancing around the various bugs and
> it points to this commit ...
> 
> commit 7313cfa4f6e30384fa04083698d1e865cf812a6a
> Author: Ben Skeggs <bskeggs@redhat.com>
> Date:   Wed Nov 1 03:56:19 2017 +1000
> 
>     drm/nouveau/bar: move bar1 initialisation into its own function
> 
> 
> Unfortunately, I cannot revert cleanly on top of next-20171109 and so I
> cannot confirm.
> 
> Ben, we are seeing a hang on Tegra when booting with CONFIG_DRM_NOUVEAU
> enabled. Apart from the above bisect result, I don't have much else to
> go on at the moment. Let me know if you have any thoughts or anything to
> test.

Here is part of the crash dump I see ...

[    2.288134] nouveau 57000000.gpu: NVIDIA GK20A (0ea000a1)
[    2.293610] nouveau 57000000.gpu: imem: using IOMMU
[    2.298536] nouveau 57000000.gpu: Direct firmware load for nvidia/gk20a/fecs_inst.bin failed with error -2
[    2.308239] nouveau 57000000.gpu: Direct firmware load for nouveau/nvea_fuc409c failed with error -2
[    2.317417] nouveau 57000000.gpu: Direct firmware load for nouveau/fuc409c failed with error -2
[    2.326107] nouveau 57000000.gpu: gr: failed to load fuc409c
[    2.385011] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[    2.393116] pgd = c0204000
[    2.395814] [00000000] *pgd=00000000
[    2.399393] Internal error: Oops: 80000005 [#1] SMP ARM
[    2.404613] Modules linked in:
[    2.405018] elan_i2c 1-0015: invalid report id data (ff)
[    2.412973] CPU: 1 PID: 53 Comm: kworker/1:1 Not tainted 4.14.0-rc7-01211-g7313cfa4f6e3-dirty #129
[    2.421911] Hardware name: NVIDIA Tegra SoC (Flattened Device Tree)
[    2.428174] Workqueue: events deferred_probe_work_func
[    2.433308] task: ee33f200 task.stack: ee470000
[    2.437829] PC is at 0x0
[    2.440355] LR is at nvkm_bar_init+0x3c/0x44
[    2.444617] pc : [<00000000>]    lr : [<c08379d4>]    psr: 20000113
[    2.450869] sp : ee471bc0  ip : 00000000  fp : 00000000
[    2.456083] r10: 00244500  r9 : 00000010  r8 : ee127f04
[    2.461294] r7 : 00000000  r6 : 00244500  r5 : ee127f00  r4 : ee127f04
[    2.467808] r3 : 00000000  r2 : 0001b9c0  r1 : a0000113  r0 : ee127f00
[    2.474326] Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
[    2.481446] Control: 10c5387d  Table: 8020406a  DAC: 00000051
[    2.487184] Process kworker/1:1 (pid: 53, stack limit = 0xee470220)
[    2.493441] Stack: (0xee471bc0 to 0xee472000)
[    2.497787] 1bc0: c0837998 8de4c78d 00000000 c0834c1c 00000000 ed186008 8de4c78d 00000000
[    2.505951] 1be0: 00000010 00000000 00239467 00000000 ed186008 00000000 83126e97 c089084c
[    2.514117] 1c00: 8aebf123 00000000 ed186008 ed186034 8d4fdf3b 83126e97 ed166180 00238f7a
[    2.522277] 1c20: 00000000 c0895314 8ae877d0 00000000 8d4fdf3b c0833268 ee33f280 00000000
[    2.530441] 1c40: 00000000 00000000 ed166118 ed186e00 00000002 ed186e00 ed166138 c0831d28
[    2.538606] 1c60: 00000009 ee33f280 eef9b000 eef9b000 ed186e48 c117d8a4 ed186e68 c0dde690
[    2.546771] 1c80: c117d8a4 ed166180 c082fd4c 00000000 00000000 00000080 00000000 c089532c
[    2.554938] 1ca0: 00000000 00000000 00000000 00000000 ee145050 00000000 ee145050 00000000
[    2.563104] 1cc0: ed186e00 ed186e00 00000000 00000000 00000000 00000000 00000000 ed186e00
[    2.571263] 1ce0: ed166100 ee14505c ed186e00 00000048 00000002 c0831ff0 00000000 00000000
[    2.579426] 1d00: 00000000 00000000 ee145000 ee145000 ee145000 ee145000 ee14505c 00000000
[    2.587591] 1d20: ee145000 00000080 ee471da0 00000000 00000048 c082ee08 ee14505c ee145000
[    2.595756] 1d40: 00000080 ed166100 ee145050 c082f400 00000000 ed166138 ffffffff ee145050
[    2.603922] 1d60: ffffffff ee145000 ee145000 00000000 ee13fc00 c1803a60 c17ae514 c082f690
[    2.612082] 1d80: 00000010 ee145050 ffffffff c08db5b8 00000010 ee145050 ee145000 ee03d508
[    2.620245] 1da0: 00000000 00000000 ffffffff ffffffff ee13fc00 ee13fc00 00000000 00000000
[    2.628410] 1dc0: c1803b58 ee145000 ed1671c0 c08db7f0 00000002 ee131f00 00000000 00000000
[    2.636576] 1de0: ee131f00 c04c0658 ee1314e0 00000001 ee133c40 0000000d ee1314e0 ee131a20
[    2.644741] 1e00: ed1671c0 ee1314e0 00000001 ee131f00 0000000d ee13fc00 00000000 00000000
[    2.652906] 1e20: c1803b58 00000000 0000000d ed1671c0 c17ae514 c0804d88 00000001 00000001
[    2.661065] 1e40: ffffffff ffffffff ee471e6c ee471e6c 00000000 ee13fc00 c1761c5c 00000000
[    2.669228] 1e60: c1761c5c c08dd00c ee256a10 ed186008 fffffffe ee256a10 fffffdfb c097bb48
[    2.677394] 1e80: c097baf8 ee256a10 c1803d7c c1803d80 00000000 c097a27c 00000000 ee471ed0
[    2.685558] 1ea0: c097a40c 00000001 c1764d20 c1803d7c c17bb520 c097888c ee0ab46c ee6472b8
[    2.693725] 1ec0: ee256a10 ee256a44 c1764d78 c0979fbc ee256a10 00000001 ee33f200 ee11db00
[    2.701884] 1ee0: ee256a10 c1764d78 c1764d04 c0979600 ee11db00 c1764d28 ee256a10 c09799f4
[    2.710047] 1f00: ee11db00 c1764d28 eef9ac00 c1602d00 eef9dc00 00000000 00000000 c035af14
[    2.718212] 1f20: c0de1848 2da4a000 eefac000 ee11db00 ee11db18 eef9ac18 c1602d00 c17ae051
[    2.726378] 1f40: ee11db18 00000001 00000008 c035b224 eef9dcf5 ee11db00 eef9ac00 c035b454
[    2.734543] 1f60: ee0edee0 ee20c180 00000000 ee20c180 00000000 ee0b1780 ee20c19c ee11db00
[    2.742708] 1f80: ee0edee0 c035b234 00000000 c03606f4 ee0b1780 c03605d4 00000000 00000000
[    2.750867] 1fa0: 00000000 00000000 00000000 c03082b0 00000000 00000000 00000000 00000000
[    2.759030] 1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    2.767195] 1fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
[    2.775372] [<c08379d4>] (nvkm_bar_init) from [<c0834c1c>] (nvkm_subdev_init+0x210/0x3c8)
[    2.783548] [<c0834c1c>] (nvkm_subdev_init) from [<c089084c>] (nvkm_device_init+0x2d8/0x410)
[    2.791970] [<c089084c>] (nvkm_device_init) from [<c0895314>] (nvkm_udevice_init+0x44/0x5c)
[    2.800316] [<c0895314>] (nvkm_udevice_init) from [<c0833268>] (nvkm_object_init+0xa4/0x284)
[    2.808751] [<c0833268>] (nvkm_object_init) from [<c0831d28>] (nvkm_ioctl_new+0x13c/0x234)
[    2.817011] [<c0831d28>] (nvkm_ioctl_new) from [<c0831ff0>] (nvkm_ioctl+0x140/0x210)
[    2.824751] [<c0831ff0>] (nvkm_ioctl) from [<c082ee08>] (nvif_object_ioctl+0x60/0x80)
[    2.832566] [<c082ee08>] (nvif_object_ioctl) from [<c082f400>] (nvif_object_init+0xc0/0x11c)
[    2.840998] [<c082f400>] (nvif_object_init) from [<c082f690>] (nvif_device_init+0x1c/0x48)
[    2.849258] [<c082f690>] (nvif_device_init) from [<c08db5b8>] (nouveau_cli_init+0x11c/0x18c)
[    2.857689] [<c08db5b8>] (nouveau_cli_init) from [<c08db7f0>] (nouveau_drm_load+0x40/0x7fc)
[    2.866039] [<c08db7f0>] (nouveau_drm_load) from [<c0804d88>] (drm_dev_register+0x134/0x1c8)
[    2.874474] [<c0804d88>] (drm_dev_register) from [<c08dd00c>] (nouveau_platform_probe+0x44/0x68)
[    2.883255] [<c08dd00c>] (nouveau_platform_probe) from [<c097bb48>] (platform_drv_probe+0x50/0xb0)
[    2.892197] [<c097bb48>] (platform_drv_probe) from [<c097a27c>] (driver_probe_device+0x238/0x2e4)
[    2.901062] [<c097a27c>] (driver_probe_device) from [<c097888c>] (bus_for_each_drv+0x44/0x8c)
[    2.909583] [<c097888c>] (bus_for_each_drv) from [<c0979fbc>] (__device_attach+0x9c/0x100)
[    2.917843] [<c0979fbc>] (__device_attach) from [<c0979600>] (bus_probe_device+0x84/0x8c)
[    2.926017] [<c0979600>] (bus_probe_device) from [<c09799f4>] (deferred_probe_work_func+0x30/0x130)
[    2.935060] [<c09799f4>] (deferred_probe_work_func) from [<c035af14>] (process_one_work+0x144/0x42c)
[    2.944187] [<c035af14>] (process_one_work) from [<c035b224>] (process_scheduled_works+0x28/0x38)
[    2.953055] [<c035b224>] (process_scheduled_works) from [<c035b454>] (worker_thread+0x220/0x4d8)
[    2.961823] [<c035b454>] (worker_thread) from [<c03606f4>] (kthread+0x120/0x158)
[    2.969215] [<c03606f4>] (kthread) from [<c03082b0>] (ret_from_fork+0x14/0x24)
[    2.976428] Code: bad PC value
[    2.979509] ---[ end trace f8fe338d0a6f1753 ]---

-- 
nvpublic

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* next/master boot: 273 boots: 63 failed, 209 passed with 1 untried/unknown (next-20171106)
@ 2017-11-09 22:54                                                   ` Jon Hunter
  0 siblings, 0 replies; 46+ messages in thread
From: Jon Hunter @ 2017-11-09 22:54 UTC (permalink / raw)
  To: linux-arm-kernel


On 09/11/17 21:45, Jon Hunter wrote:
> 
> On 09/11/17 19:03, Guillaume Tucker wrote:
> ...
> 
>> Alright, so here's all the results I got all based on
>> next-20171109 and running on tegra124-nyan-big:
>>
>> ? * plain multi_v7_defconfig, passes:
>> ??? https://lava.collabora.co.uk/scheduler/job/981295
>>
>> ? * CONFIG_MODULES disabled, fails:
>> ??? https://lava.collabora.co.uk/scheduler/job/981342
>>
>> ? * CONFIG_MODULES and CONFIG_DRM_NOUVEAU disabled, also fails:
>> ??? https://lava.collabora.co.uk/scheduler/job/981343
> 
> This is the crash in the EC driver that I mentioned before. You need to
> add the fix for the EC driver to avoid this BUG_ON.
> 
> I was able to bisect this manually dancing around the various bugs and
> it points to this commit ...
> 
> commit 7313cfa4f6e30384fa04083698d1e865cf812a6a
> Author: Ben Skeggs <bskeggs@redhat.com>
> Date:   Wed Nov 1 03:56:19 2017 +1000
> 
>     drm/nouveau/bar: move bar1 initialisation into its own function
> 
> 
> Unfortunately, I cannot revert cleanly on top of next-20171109 and so I
> cannot confirm.
> 
> Ben, we are seeing a hang on Tegra when booting with CONFIG_DRM_NOUVEAU
> enabled. Apart from the above bisect result, I don't have much else to
> go on at the moment. Let me know if you have any thoughts or anything to
> test.

Here is part of the crash dump I see ...

[    2.288134] nouveau 57000000.gpu: NVIDIA GK20A (0ea000a1)
[    2.293610] nouveau 57000000.gpu: imem: using IOMMU
[    2.298536] nouveau 57000000.gpu: Direct firmware load for nvidia/gk20a/fecs_inst.bin failed with error -2
[    2.308239] nouveau 57000000.gpu: Direct firmware load for nouveau/nvea_fuc409c failed with error -2
[    2.317417] nouveau 57000000.gpu: Direct firmware load for nouveau/fuc409c failed with error -2
[    2.326107] nouveau 57000000.gpu: gr: failed to load fuc409c
[    2.385011] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[    2.393116] pgd = c0204000
[    2.395814] [00000000] *pgd=00000000
[    2.399393] Internal error: Oops: 80000005 [#1] SMP ARM
[    2.404613] Modules linked in:
[    2.405018] elan_i2c 1-0015: invalid report id data (ff)
[    2.412973] CPU: 1 PID: 53 Comm: kworker/1:1 Not tainted 4.14.0-rc7-01211-g7313cfa4f6e3-dirty #129
[    2.421911] Hardware name: NVIDIA Tegra SoC (Flattened Device Tree)
[    2.428174] Workqueue: events deferred_probe_work_func
[    2.433308] task: ee33f200 task.stack: ee470000
[    2.437829] PC is at 0x0
[    2.440355] LR is at nvkm_bar_init+0x3c/0x44
[    2.444617] pc : [<00000000>]    lr : [<c08379d4>]    psr: 20000113
[    2.450869] sp : ee471bc0  ip : 00000000  fp : 00000000
[    2.456083] r10: 00244500  r9 : 00000010  r8 : ee127f04
[    2.461294] r7 : 00000000  r6 : 00244500  r5 : ee127f00  r4 : ee127f04
[    2.467808] r3 : 00000000  r2 : 0001b9c0  r1 : a0000113  r0 : ee127f00
[    2.474326] Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
[    2.481446] Control: 10c5387d  Table: 8020406a  DAC: 00000051
[    2.487184] Process kworker/1:1 (pid: 53, stack limit = 0xee470220)
[    2.493441] Stack: (0xee471bc0 to 0xee472000)
[    2.497787] 1bc0: c0837998 8de4c78d 00000000 c0834c1c 00000000 ed186008 8de4c78d 00000000
[    2.505951] 1be0: 00000010 00000000 00239467 00000000 ed186008 00000000 83126e97 c089084c
[    2.514117] 1c00: 8aebf123 00000000 ed186008 ed186034 8d4fdf3b 83126e97 ed166180 00238f7a
[    2.522277] 1c20: 00000000 c0895314 8ae877d0 00000000 8d4fdf3b c0833268 ee33f280 00000000
[    2.530441] 1c40: 00000000 00000000 ed166118 ed186e00 00000002 ed186e00 ed166138 c0831d28
[    2.538606] 1c60: 00000009 ee33f280 eef9b000 eef9b000 ed186e48 c117d8a4 ed186e68 c0dde690
[    2.546771] 1c80: c117d8a4 ed166180 c082fd4c 00000000 00000000 00000080 00000000 c089532c
[    2.554938] 1ca0: 00000000 00000000 00000000 00000000 ee145050 00000000 ee145050 00000000
[    2.563104] 1cc0: ed186e00 ed186e00 00000000 00000000 00000000 00000000 00000000 ed186e00
[    2.571263] 1ce0: ed166100 ee14505c ed186e00 00000048 00000002 c0831ff0 00000000 00000000
[    2.579426] 1d00: 00000000 00000000 ee145000 ee145000 ee145000 ee145000 ee14505c 00000000
[    2.587591] 1d20: ee145000 00000080 ee471da0 00000000 00000048 c082ee08 ee14505c ee145000
[    2.595756] 1d40: 00000080 ed166100 ee145050 c082f400 00000000 ed166138 ffffffff ee145050
[    2.603922] 1d60: ffffffff ee145000 ee145000 00000000 ee13fc00 c1803a60 c17ae514 c082f690
[    2.612082] 1d80: 00000010 ee145050 ffffffff c08db5b8 00000010 ee145050 ee145000 ee03d508
[    2.620245] 1da0: 00000000 00000000 ffffffff ffffffff ee13fc00 ee13fc00 00000000 00000000
[    2.628410] 1dc0: c1803b58 ee145000 ed1671c0 c08db7f0 00000002 ee131f00 00000000 00000000
[    2.636576] 1de0: ee131f00 c04c0658 ee1314e0 00000001 ee133c40 0000000d ee1314e0 ee131a20
[    2.644741] 1e00: ed1671c0 ee1314e0 00000001 ee131f00 0000000d ee13fc00 00000000 00000000
[    2.652906] 1e20: c1803b58 00000000 0000000d ed1671c0 c17ae514 c0804d88 00000001 00000001
[    2.661065] 1e40: ffffffff ffffffff ee471e6c ee471e6c 00000000 ee13fc00 c1761c5c 00000000
[    2.669228] 1e60: c1761c5c c08dd00c ee256a10 ed186008 fffffffe ee256a10 fffffdfb c097bb48
[    2.677394] 1e80: c097baf8 ee256a10 c1803d7c c1803d80 00000000 c097a27c 00000000 ee471ed0
[    2.685558] 1ea0: c097a40c 00000001 c1764d20 c1803d7c c17bb520 c097888c ee0ab46c ee6472b8
[    2.693725] 1ec0: ee256a10 ee256a44 c1764d78 c0979fbc ee256a10 00000001 ee33f200 ee11db00
[    2.701884] 1ee0: ee256a10 c1764d78 c1764d04 c0979600 ee11db00 c1764d28 ee256a10 c09799f4
[    2.710047] 1f00: ee11db00 c1764d28 eef9ac00 c1602d00 eef9dc00 00000000 00000000 c035af14
[    2.718212] 1f20: c0de1848 2da4a000 eefac000 ee11db00 ee11db18 eef9ac18 c1602d00 c17ae051
[    2.726378] 1f40: ee11db18 00000001 00000008 c035b224 eef9dcf5 ee11db00 eef9ac00 c035b454
[    2.734543] 1f60: ee0edee0 ee20c180 00000000 ee20c180 00000000 ee0b1780 ee20c19c ee11db00
[    2.742708] 1f80: ee0edee0 c035b234 00000000 c03606f4 ee0b1780 c03605d4 00000000 00000000
[    2.750867] 1fa0: 00000000 00000000 00000000 c03082b0 00000000 00000000 00000000 00000000
[    2.759030] 1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    2.767195] 1fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
[    2.775372] [<c08379d4>] (nvkm_bar_init) from [<c0834c1c>] (nvkm_subdev_init+0x210/0x3c8)
[    2.783548] [<c0834c1c>] (nvkm_subdev_init) from [<c089084c>] (nvkm_device_init+0x2d8/0x410)
[    2.791970] [<c089084c>] (nvkm_device_init) from [<c0895314>] (nvkm_udevice_init+0x44/0x5c)
[    2.800316] [<c0895314>] (nvkm_udevice_init) from [<c0833268>] (nvkm_object_init+0xa4/0x284)
[    2.808751] [<c0833268>] (nvkm_object_init) from [<c0831d28>] (nvkm_ioctl_new+0x13c/0x234)
[    2.817011] [<c0831d28>] (nvkm_ioctl_new) from [<c0831ff0>] (nvkm_ioctl+0x140/0x210)
[    2.824751] [<c0831ff0>] (nvkm_ioctl) from [<c082ee08>] (nvif_object_ioctl+0x60/0x80)
[    2.832566] [<c082ee08>] (nvif_object_ioctl) from [<c082f400>] (nvif_object_init+0xc0/0x11c)
[    2.840998] [<c082f400>] (nvif_object_init) from [<c082f690>] (nvif_device_init+0x1c/0x48)
[    2.849258] [<c082f690>] (nvif_device_init) from [<c08db5b8>] (nouveau_cli_init+0x11c/0x18c)
[    2.857689] [<c08db5b8>] (nouveau_cli_init) from [<c08db7f0>] (nouveau_drm_load+0x40/0x7fc)
[    2.866039] [<c08db7f0>] (nouveau_drm_load) from [<c0804d88>] (drm_dev_register+0x134/0x1c8)
[    2.874474] [<c0804d88>] (drm_dev_register) from [<c08dd00c>] (nouveau_platform_probe+0x44/0x68)
[    2.883255] [<c08dd00c>] (nouveau_platform_probe) from [<c097bb48>] (platform_drv_probe+0x50/0xb0)
[    2.892197] [<c097bb48>] (platform_drv_probe) from [<c097a27c>] (driver_probe_device+0x238/0x2e4)
[    2.901062] [<c097a27c>] (driver_probe_device) from [<c097888c>] (bus_for_each_drv+0x44/0x8c)
[    2.909583] [<c097888c>] (bus_for_each_drv) from [<c0979fbc>] (__device_attach+0x9c/0x100)
[    2.917843] [<c0979fbc>] (__device_attach) from [<c0979600>] (bus_probe_device+0x84/0x8c)
[    2.926017] [<c0979600>] (bus_probe_device) from [<c09799f4>] (deferred_probe_work_func+0x30/0x130)
[    2.935060] [<c09799f4>] (deferred_probe_work_func) from [<c035af14>] (process_one_work+0x144/0x42c)
[    2.944187] [<c035af14>] (process_one_work) from [<c035b224>] (process_scheduled_works+0x28/0x38)
[    2.953055] [<c035b224>] (process_scheduled_works) from [<c035b454>] (worker_thread+0x220/0x4d8)
[    2.961823] [<c035b454>] (worker_thread) from [<c03606f4>] (kthread+0x120/0x158)
[    2.969215] [<c03606f4>] (kthread) from [<c03082b0>] (ret_from_fork+0x14/0x24)
[    2.976428] Code: bad PC value
[    2.979509] ---[ end trace f8fe338d0a6f1753 ]---

-- 
nvpublic

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: next/master boot: 273 boots: 63 failed, 209 passed with 1 untried/unknown (next-20171106)
       [not found]                                                   ` <5505affd-58a5-857f-051d-5b93257e175d@redhat.com>
@ 2017-11-10  9:18                                                         ` Jon Hunter
  0 siblings, 0 replies; 46+ messages in thread
From: Jon Hunter @ 2017-11-10  9:18 UTC (permalink / raw)
  To: Ben Skeggs
  Cc: Guillaume Tucker, Arnd Bergmann, Robin Murphy,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Mark Brown


On 09/11/17 22:57, Ben Skeggs wrote:
> On 11/10/2017 08:54 AM, Jon Hunter wrote:
>>
>> On 09/11/17 21:45, Jon Hunter wrote:
>>>
>>> On 09/11/17 19:03, Guillaume Tucker wrote:
>>> ...
>>>
>>>> Alright, so here's all the results I got all based on
>>>> next-20171109 and running on tegra124-nyan-big:
>>>>
>>>>   * plain multi_v7_defconfig, passes:
>>>>     https://lava.collabora.co.uk/scheduler/job/981295
>>>>
>>>>   * CONFIG_MODULES disabled, fails:
>>>>     https://lava.collabora.co.uk/scheduler/job/981342
>>>>
>>>>   * CONFIG_MODULES and CONFIG_DRM_NOUVEAU disabled, also fails:
>>>>     https://lava.collabora.co.uk/scheduler/job/981343
>>>
>>> This is the crash in the EC driver that I mentioned before. You need to
>>> add the fix for the EC driver to avoid this BUG_ON.
>>>
>>> I was able to bisect this manually dancing around the various bugs and
>>> it points to this commit ...
>>>
>>> commit 7313cfa4f6e30384fa04083698d1e865cf812a6a
>>> Author: Ben Skeggs <bskeggs-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>>> Date:   Wed Nov 1 03:56:19 2017 +1000
>>>
>>>     drm/nouveau/bar: move bar1 initialisation into its own function
>>>
>>>
>>> Unfortunately, I cannot revert cleanly on top of next-20171109 and so I
>>> cannot confirm.
>>>
>>> Ben, we are seeing a hang on Tegra when booting with CONFIG_DRM_NOUVEAU
>>> enabled. Apart from the above bisect result, I don't have much else to
>>> go on at the moment. Let me know if you have any thoughts or anything to
>>> test.
>>
>> Here is part of the crash dump I see ...
> 
> Hey,
> 
> Oops, I went to great care to make that series bisectable, but
> apparently this slipped through the cracks.
> 
> 48fe02478a0ddb89930f3595f8217fa2dfd98796 should fix that crash.

Thanks Ben. However, looking at next-20171109 this one is already in.
So maybe the bisect is still not getting me to the current issue. When
booting next-20171109 the last thing I see is ...

[    2.228178] nouveau 57000000.gpu: NVIDIA GK20A (0ea000a1)
[    2.233634] nouveau 57000000.gpu: imem: using IOMMU
[    2.238572] nouveau 57000000.gpu: Direct firmware load for nvidia/gk20a/fecs_inst.bin failed with error -2
[    2.248295] nouveau 57000000.gpu: Direct firmware load for nouveau/nvea_fuc409c failed with error -2
[    2.257479] nouveau 57000000.gpu: Direct firmware load for nouveau/fuc409c failed with error -2
[    2.266189] nouveau 57000000.gpu: gr: failed to load fuc409c

So no crash. I did see the crash after the bisect, but not in top of
tree. It appears to hang after the nouveau probe fails. Any thoughts
on how to debug further?

Cheers
Jon

-- 
nvpublic

^ permalink raw reply	[flat|nested] 46+ messages in thread

* next/master boot: 273 boots: 63 failed, 209 passed with 1 untried/unknown (next-20171106)
@ 2017-11-10  9:18                                                         ` Jon Hunter
  0 siblings, 0 replies; 46+ messages in thread
From: Jon Hunter @ 2017-11-10  9:18 UTC (permalink / raw)
  To: linux-arm-kernel


On 09/11/17 22:57, Ben Skeggs wrote:
> On 11/10/2017 08:54 AM, Jon Hunter wrote:
>>
>> On 09/11/17 21:45, Jon Hunter wrote:
>>>
>>> On 09/11/17 19:03, Guillaume Tucker wrote:
>>> ...
>>>
>>>> Alright, so here's all the results I got all based on
>>>> next-20171109 and running on tegra124-nyan-big:
>>>>
>>>> ? * plain multi_v7_defconfig, passes:
>>>> ??? https://lava.collabora.co.uk/scheduler/job/981295
>>>>
>>>> ? * CONFIG_MODULES disabled, fails:
>>>> ??? https://lava.collabora.co.uk/scheduler/job/981342
>>>>
>>>> ? * CONFIG_MODULES and CONFIG_DRM_NOUVEAU disabled, also fails:
>>>> ??? https://lava.collabora.co.uk/scheduler/job/981343
>>>
>>> This is the crash in the EC driver that I mentioned before. You need to
>>> add the fix for the EC driver to avoid this BUG_ON.
>>>
>>> I was able to bisect this manually dancing around the various bugs and
>>> it points to this commit ...
>>>
>>> commit 7313cfa4f6e30384fa04083698d1e865cf812a6a
>>> Author: Ben Skeggs <bskeggs@redhat.com>
>>> Date:   Wed Nov 1 03:56:19 2017 +1000
>>>
>>>     drm/nouveau/bar: move bar1 initialisation into its own function
>>>
>>>
>>> Unfortunately, I cannot revert cleanly on top of next-20171109 and so I
>>> cannot confirm.
>>>
>>> Ben, we are seeing a hang on Tegra when booting with CONFIG_DRM_NOUVEAU
>>> enabled. Apart from the above bisect result, I don't have much else to
>>> go on at the moment. Let me know if you have any thoughts or anything to
>>> test.
>>
>> Here is part of the crash dump I see ...
> 
> Hey,
> 
> Oops, I went to great care to make that series bisectable, but
> apparently this slipped through the cracks.
> 
> 48fe02478a0ddb89930f3595f8217fa2dfd98796 should fix that crash.

Thanks Ben. However, looking at next-20171109 this one is already in.
So maybe the bisect is still not getting me to the current issue. When
booting next-20171109 the last thing I see is ...

[    2.228178] nouveau 57000000.gpu: NVIDIA GK20A (0ea000a1)
[    2.233634] nouveau 57000000.gpu: imem: using IOMMU
[    2.238572] nouveau 57000000.gpu: Direct firmware load for nvidia/gk20a/fecs_inst.bin failed with error -2
[    2.248295] nouveau 57000000.gpu: Direct firmware load for nouveau/nvea_fuc409c failed with error -2
[    2.257479] nouveau 57000000.gpu: Direct firmware load for nouveau/fuc409c failed with error -2
[    2.266189] nouveau 57000000.gpu: gr: failed to load fuc409c

So no crash. I did see the crash after the bisect, but not in top of
tree. It appears to hang after the nouveau probe fails. Any thoughts
on how to debug further?

Cheers
Jon

-- 
nvpublic

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: next/master boot: 273 boots: 63 failed, 209 passed with 1 untried/unknown (next-20171106)
  2017-11-10  9:18                                                         ` Jon Hunter
@ 2017-11-10 11:26                                                             ` Jon Hunter
  -1 siblings, 0 replies; 46+ messages in thread
From: Jon Hunter @ 2017-11-10 11:26 UTC (permalink / raw)
  To: Ben Skeggs
  Cc: Guillaume Tucker, Arnd Bergmann, Robin Murphy,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Mark Brown


On 10/11/17 09:18, Jon Hunter wrote:

...

> Thanks Ben. However, looking at next-20171109 this one is already in.
> So maybe the bisect is still not getting me to the current issue. When
> booting next-20171109 the last thing I see is ...
> 
> [    2.228178] nouveau 57000000.gpu: NVIDIA GK20A (0ea000a1)
> [    2.233634] nouveau 57000000.gpu: imem: using IOMMU
> [    2.238572] nouveau 57000000.gpu: Direct firmware load for nvidia/gk20a/fecs_inst.bin failed with error -2
> [    2.248295] nouveau 57000000.gpu: Direct firmware load for nouveau/nvea_fuc409c failed with error -2
> [    2.257479] nouveau 57000000.gpu: Direct firmware load for nouveau/fuc409c failed with error -2
> [    2.266189] nouveau 57000000.gpu: gr: failed to load fuc409c
> 
> So no crash. I did see the crash after the bisect, but not in top of
> tree. It appears to hang after the nouveau probe fails. Any thoughts
> on how to debug further?

So this is probably wrong, but here is a clue about what is happening. 
It appears that the error code is not being propagated from
gk20a_gr_new(). gk20a_gr_new is returning -ENODEV due to the firmware
loading failure...

342         if (gf100_gr_ctor_fw(gr, "fecs_inst", &gr->fuc409c) ||
343             gf100_gr_ctor_fw(gr, "fecs_data", &gr->fuc409d) ||
344             gf100_gr_ctor_fw(gr, "gpccs_inst", &gr->fuc41ac) ||
345             gf100_gr_ctor_fw(gr, "gpccs_data", &gr->fuc41ad))
346                 return -ENODEV;

... but this is ignored by nvkm_device_ctor() (probably for good
reason). If I make the following change the hang no longer occurs
(although I realise this is probably wrong as it has been there for
 years!) ...

diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/device/base.c b/drivers/gpu/drm/nouveau/nvkm/engine/device/base.c
index e14643615698..a611615d3ce7 100644
--- a/drivers/gpu/drm/nouveau/nvkm/engine/device/base.c
+++ b/drivers/gpu/drm/nouveau/nvkm/engine/device/base.c
@@ -2869,7 +2869,7 @@ struct nvkm_engine *
                        subdev = nvkm_device_subdev(device, (s));              \
                        nvkm_subdev_del(&subdev);                              \
                        device->m = NULL;                                      \
-                       if (ret != -ENODEV) {                                  \
+                       if (ret == -ENODEV) {                                  \
                                nvdev_error(device, "%s ctor failed, %d\n",    \
                                            nvkm_subdev_name[s], ret);         \
                                goto done;                                     \

So is gk20a_gr_new() returning the wrong error code for when the
firmware load fails? 

I have no gone back to see what has change in this regard, but I
can, probably next week.

Cheers
Jon

-- 
nvpublic

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* next/master boot: 273 boots: 63 failed, 209 passed with 1 untried/unknown (next-20171106)
@ 2017-11-10 11:26                                                             ` Jon Hunter
  0 siblings, 0 replies; 46+ messages in thread
From: Jon Hunter @ 2017-11-10 11:26 UTC (permalink / raw)
  To: linux-arm-kernel


On 10/11/17 09:18, Jon Hunter wrote:

...

> Thanks Ben. However, looking at next-20171109 this one is already in.
> So maybe the bisect is still not getting me to the current issue. When
> booting next-20171109 the last thing I see is ...
> 
> [    2.228178] nouveau 57000000.gpu: NVIDIA GK20A (0ea000a1)
> [    2.233634] nouveau 57000000.gpu: imem: using IOMMU
> [    2.238572] nouveau 57000000.gpu: Direct firmware load for nvidia/gk20a/fecs_inst.bin failed with error -2
> [    2.248295] nouveau 57000000.gpu: Direct firmware load for nouveau/nvea_fuc409c failed with error -2
> [    2.257479] nouveau 57000000.gpu: Direct firmware load for nouveau/fuc409c failed with error -2
> [    2.266189] nouveau 57000000.gpu: gr: failed to load fuc409c
> 
> So no crash. I did see the crash after the bisect, but not in top of
> tree. It appears to hang after the nouveau probe fails. Any thoughts
> on how to debug further?

So this is probably wrong, but here is a clue about what is happening. 
It appears that the error code is not being propagated from
gk20a_gr_new(). gk20a_gr_new is returning -ENODEV due to the firmware
loading failure...

342         if (gf100_gr_ctor_fw(gr, "fecs_inst", &gr->fuc409c) ||
343             gf100_gr_ctor_fw(gr, "fecs_data", &gr->fuc409d) ||
344             gf100_gr_ctor_fw(gr, "gpccs_inst", &gr->fuc41ac) ||
345             gf100_gr_ctor_fw(gr, "gpccs_data", &gr->fuc41ad))
346                 return -ENODEV;

... but this is ignored by nvkm_device_ctor() (probably for good
reason). If I make the following change the hang no longer occurs
(although I realise this is probably wrong as it has been there for
 years!) ...

diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/device/base.c b/drivers/gpu/drm/nouveau/nvkm/engine/device/base.c
index e14643615698..a611615d3ce7 100644
--- a/drivers/gpu/drm/nouveau/nvkm/engine/device/base.c
+++ b/drivers/gpu/drm/nouveau/nvkm/engine/device/base.c
@@ -2869,7 +2869,7 @@ struct nvkm_engine *
                        subdev = nvkm_device_subdev(device, (s));              \
                        nvkm_subdev_del(&subdev);                              \
                        device->m = NULL;                                      \
-                       if (ret != -ENODEV) {                                  \
+                       if (ret == -ENODEV) {                                  \
                                nvdev_error(device, "%s ctor failed, %d\n",    \
                                            nvkm_subdev_name[s], ret);         \
                                goto done;                                     \

So is gk20a_gr_new() returning the wrong error code for when the
firmware load fails? 

I have no gone back to see what has change in this regard, but I
can, probably next week.

Cheers
Jon

-- 
nvpublic

^ permalink raw reply related	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2017-11-10 11:26 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <5a0055f1.85a8500a.98d54.a4e4@mx.google.com>
2017-11-06 18:47 ` next/master boot: 273 boots: 63 failed, 209 passed with 1 untried/unknown (next-20171106) Mark Brown
2017-11-07  2:17   ` Will Deacon
2017-11-07 11:30     ` Mark Brown
     [not found] ` <5a0055f1.85a8500a.98d54.a4e4-ATjtLOhZ0NVl57MIdRCFDg@public.gmane.org>
2017-11-06 19:17   ` Mark Brown
2017-11-06 19:17     ` Mark Brown
2017-11-07 10:12     ` Jon Hunter
2017-11-07 10:12       ` Jon Hunter
     [not found]       ` <d8e21d87-776b-beff-62af-34e5ad1febc3-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
2017-11-07 10:55         ` Mark Brown
2017-11-07 10:55           ` Mark Brown
     [not found]           ` <20171107105501.7x74gdqzhr7uulp2-GFdadSzt00ze9xe1eoZjHA@public.gmane.org>
2017-11-07 11:43             ` Guillaume Tucker
2017-11-07 11:43               ` Guillaume Tucker
     [not found]               ` <a384e96c-27c7-782b-75b9-7525714f5831-ZGY8ohtN/8qB+jHODAdFcQ@public.gmane.org>
2017-11-08 15:19                 ` Guillaume Tucker
2017-11-08 15:19                   ` Guillaume Tucker
     [not found]                   ` <613bcd63-a215-acbe-9150-c1495f7604f6-ZGY8ohtN/8qB+jHODAdFcQ@public.gmane.org>
2017-11-08 15:55                     ` Robin Murphy
2017-11-08 15:55                       ` Robin Murphy
     [not found]                       ` <7ce29bba-485c-b063-961a-3a745718357f-5wv7dgnIgG8@public.gmane.org>
2017-11-08 16:23                         ` Mikko Perttunen
2017-11-08 16:23                           ` Mikko Perttunen
     [not found]                           ` <cdac9d47-42ce-b5c2-b325-68726d194888-/1wQRMveznE@public.gmane.org>
2017-11-08 16:47                             ` Robin Murphy
2017-11-08 16:47                               ` Robin Murphy
2017-11-08 15:57                   ` Jon Hunter
2017-11-08 15:57                     ` Jon Hunter
     [not found]                     ` <5740b853-4898-2ebc-f67d-0808d1b44c36-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
2017-11-08 16:42                       ` Guillaume Tucker
2017-11-08 16:42                         ` Guillaume Tucker
2017-11-09  9:55                         ` Jon Hunter
2017-11-09  9:55                           ` Jon Hunter
     [not found]                           ` <7cdfa633-d9c6-881a-ae5f-f94f7e6413ee-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
2017-11-09 10:43                             ` Guillaume Tucker
2017-11-09 10:43                               ` Guillaume Tucker
2017-11-09 11:29                               ` Jon Hunter
2017-11-09 11:29                                 ` Jon Hunter
     [not found]                                 ` <15792a16-6b57-a6ad-92dc-0ffaba0354db-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
2017-11-09 12:51                                   ` Guillaume Tucker
2017-11-09 12:51                                     ` Guillaume Tucker
     [not found]                                     ` <1eb4e14f-4728-d4f7-95a6-0a6308760d7a-ZGY8ohtN/8qB+jHODAdFcQ@public.gmane.org>
2017-11-09 13:17                                       ` Arnd Bergmann
2017-11-09 13:17                                         ` Arnd Bergmann
2017-11-09 15:23                                         ` Jon Hunter
2017-11-09 15:23                                           ` Jon Hunter
     [not found]                                           ` <18ef379f-0c23-0cbf-4228-30d5c46c690f-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
2017-11-09 19:03                                             ` Guillaume Tucker
2017-11-09 19:03                                               ` Guillaume Tucker
2017-11-09 21:45                                               ` Jon Hunter
2017-11-09 21:45                                                 ` Jon Hunter
2017-11-09 22:54                                                 ` Jon Hunter
2017-11-09 22:54                                                   ` Jon Hunter
     [not found]                                                   ` <5505affd-58a5-857f-051d-5b93257e175d@redhat.com>
     [not found]                                                     ` <5505affd-58a5-857f-051d-5b93257e175d-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-11-10  9:18                                                       ` Jon Hunter
2017-11-10  9:18                                                         ` Jon Hunter
     [not found]                                                         ` <1040af29-4d15-4e8a-29ab-40952523535c-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
2017-11-10 11:26                                                           ` Jon Hunter
2017-11-10 11:26                                                             ` Jon Hunter
2017-11-06 19:26 ` Mark Brown

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.