All of lore.kernel.org
 help / color / mirror / Atom feed
From: Frank Rowand <frowand.list@gmail.com>
To: Saravana Kannan <saravanak@google.com>, Rob Herring <robh+dt@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	"open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS" 
	<devicetree@vger.kernel.org>, LKML <linux-kernel@vger.kernel.org>,
	LKP <lkp@lists.01.org>, kernel test robot <lkp@intel.com>
Subject: Re: 5e6669387e ("of/platform: Pause/resume sync state during init .."): [ 3.192726] WARNING: CPU: 1 PID: 1 at drivers/base/core.c:688 device_links_supplier_sync_state_resume
Date: Tue, 3 Dec 2019 22:17:33 -0600	[thread overview]
Message-ID: <1a30fb19-dfac-6b6c-1e92-8d972fc52b10@gmail.com> (raw)
In-Reply-To: <CAGETcx9-TZfARQgc7N4=kp1WVvMuxpmw3n3-TmwYO1YNkQKmRw@mail.gmail.com>

On 12/3/19 4:50 PM, Saravana Kannan wrote:
> On Tue, Dec 3, 2019 at 1:10 PM Rob Herring <robh+dt@kernel.org> wrote:
>>
>> On Tue, Dec 3, 2019 at 2:05 PM Saravana Kannan <saravanak@google.com> wrote:
>>>
>>> On Tue, Dec 3, 2019 at 1:01 AM Frank Rowand <frowand.list@gmail.com> wrote:
>>>>
>>>> On 12/2/19 3:19 PM, Saravana Kannan wrote:
>>>>> On Sun, Dec 1, 2019 at 7:00 AM kernel test robot <lkp@intel.com> wrote:
>>>>>>
>>>>>> Greetings,
>>>>>>
>>>>>> 0day kernel testing robot got the below dmesg and the first bad commit is
>>>>>>
>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
>>>>>>
>>>>>> commit 5e6669387e2287f25f09fd0abd279dae104cfa7e
>>>>>> Author:     Saravana Kannan <saravanak@google.com>
>>>>>> AuthorDate: Wed Sep 4 14:11:24 2019 -0700
>>>>>> Commit:     Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>>>> CommitDate: Fri Oct 4 17:30:19 2019 +0200
>>>>>>
>>>>>>     of/platform: Pause/resume sync state during init and of_platform_populate()
>>>>>>
>>>>>>     When all the top level devices are populated from DT during kernel
>>>>>>     init, the supplier devices could be added and probed before the
>>>>>>     consumer devices are added and linked to the suppliers. To avoid the
>>>>>>     sync_state() callback from being called prematurely, pause the
>>>>>>     sync_state() callbacks before populating the devices and resume them
>>>>>>     at late_initcall_sync().
>>>>>>
>>>>>>     Similarly, when children devices are populated from a module using
>>>>>>     of_platform_populate(), there could be supplier-consumer dependencies
>>>>>>     between the children devices that are populated. To avoid the same
>>>>>>     problem with sync_state() being called prematurely, pause and resume
>>>>>>     sync_state() callbacks across of_platform_populate().
>>>>>>
>>>>>>     Signed-off-by: Saravana Kannan <saravanak@google.com>
>>>>>>     Link: https://lore.kernel.org/r/20190904211126.47518-6-saravanak@google.com
>>>>>>     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>>>>
>>>>>> fc5a251d0f  driver core: Add sync_state driver/bus callback
>>>>>> 5e6669387e  of/platform: Pause/resume sync state during init and of_platform_populate()
>>>>>> 81b6b96475  Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux; tag 'dma-mapping-5.5' of git://git.infradead.org/users/hch/dma-mapping
>>>>>> +-------------------------------------------------------------------------+------------+------------+------------+
>>>>>> |                                                                         | fc5a251d0f | 5e6669387e | 81b6b96475 |
>>>>>> +-------------------------------------------------------------------------+------------+------------+------------+
>>>>>> | boot_successes                                                          | 30         | 0          | 0          |
>>>>>> | boot_failures                                                           | 1          | 11         | 22         |
>>>>>> | Oops:#[##]                                                              | 1          |            |            |
>>>>>> | EIP:unmap_vmas                                                          | 1          |            |            |
>>>>>> | PANIC:double_fault                                                      | 1          |            |            |
>>>>>> | Kernel_panic-not_syncing:Fatal_exception                                | 1          |            |            |
>>>>>> | WARNING:at_drivers/base/core.c:#device_links_supplier_sync_state_resume | 0          | 11         | 22         |
>>>>>> | EIP:device_links_supplier_sync_state_resume                             | 0          | 11         | 22         |
>>>>>> +-------------------------------------------------------------------------+------------+------------+------------+
>>>>>>
>>>>>> If you fix the issue, kindly add following tag
>>>>>> Reported-by: kernel test robot <lkp@intel.com>
>>>>>>
>>>>>> [    3.186107] OF: /testcase-data/phandle-tests/consumer-b: #phandle-cells = 2 found -1
>>>>>> [    3.188595] platform testcase-data:testcase-device2: IRQ index 0 not found
>>>>>> [    3.191047] ### dt-test ### end of unittest - 199 passed, 0 failed
>>>>>> [    3.191932] ------------[ cut here ]------------
>>>>>> [    3.192571] Unmatched sync_state pause/resume!
>>>>>> [    3.192726] WARNING: CPU: 1 PID: 1 at drivers/base/core.c:688 device_links_supplier_sync_state_resume+0x27/0xc0
>>>>>> [    3.195084] Modules linked in:
>>>>>> [    3.195494] CPU: 1 PID: 1 Comm: swapper/0 Tainted: G                T 5.4.0-rc1-00005-g5e6669387e228 #1
>>>>>> [    3.196674] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
>>>>>> [    3.197693] EIP: device_links_supplier_sync_state_resume+0x27/0xc0
>>>>>> [    3.198680] Code: 00 00 00 3e 8d 74 26 00 57 56 31 d2 53 b8 a0 d0 d9 c1 e8 6c b6 38 00 a1 e4 d0 d9 c1 85 c0 75 13 68 84 ba c4 c1 e8 29 30 b1 ff <0f> 0b 58 eb 7f 8d 74 26 00 83 e8 01 85 c0 a3 e4 d0 d9 c1 75 6f 8b
>>>>>> [    3.201560] EAX: 00000022 EBX: 00000000 ECX: 00000000 EDX: 00000000
>>>>>> [    3.202466] ESI: 000001ab EDI: c02c7f80 EBP: c1e87d27 ESP: c02c7f20
>>>>>> [    3.203301] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010282
>>>>>> [    3.204258] CR0: 80050033 CR2: bfa1bf98 CR3: 01f28000 CR4: 00140690
>>>>>> [    3.205022] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
>>>>>> [    3.205919] DR6: fffe0ff0 DR7: 00000400
>>>>>> [    3.206529] Call Trace:
>>>>>> [    3.207011]  ? of_platform_sync_state_init+0x13/0x16
>>>>>> [    3.207719]  ? do_one_initcall+0xda/0x260
>>>>>> [    3.208247]  ? kernel_init_freeable+0x110/0x197
>>>>>> [    3.208906]  ? rest_init+0x120/0x120
>>>>>> [    3.209369]  ? kernel_init+0xa/0x100
>>>>>> [    3.209775]  ? ret_from_fork+0x19/0x24
>>>>>> [    3.210283] ---[ end trace 81d0f2d2ee65199b ]---
>>>>>> [    3.210955] ALSA device list:
>>>>>
>>>>> Rob/Frank,
>>>>>
>>>>> This seems to be an issue with the unit test code not properly
>>>>> cleaning up the state after it's done.
>>>>>
>>>>> Specifically, unittest_data_add() setting up of_root on systems where
>>>>> there's no device tree (of_root == NULL). It doesn't clean up of_root
>>>>> after the tests are done. This affects the of_have_populated_dt() API
>>>>> that in turn affects calls to
>>>>> device_links_supplier_sync_state_pause/resume(). I think unittests
>>>>> shouldn't affect the of_have_populated_dt() API.
>>>> There are at least a couple of reasons why the unittest devicetree data
>>>> needs to remain after the point where devicetree unittests currently
>>>> complete.  So cleaning up (removing the data) is not an option.
>>>>
>>>> I depend on the unittest devicetree entries still existing after the system
>>>> boots and I can log into a shell for some validation of the final result of
>>>> the devicetree data.
>>>
>>> IMHO unittests shouldn't have a residual impact on the system after
>>> they are done. So, I'll agree to disagree on this one.
>>
>> They shouldn't be enabled in a production system either. Why would you
>> want the extra boot time?
> 
> Should we ask the kernel test robot folks to not enable OF unittest

No.  If unittests are breaking other code I want to know that.


> then? It broke my patch, but I wouldn't be surprised if it's silently
> breaking other stuff too. I think we need to do option 4 below.
> 
>>
>>>> There is also a desire for the devicetree unittests to be able to be loaded
>>>> as a module.  That work is not yet scheduled, but I do not want to preclude
>>>> the possibility.  If unittests are loaded from a module then they will
>>>> need some devicetree data to exist that is created in early boot.  That
>>>> data will be in the devicetree when of_platform_sync_state_init() is
>>>> invoked.
>>>
>>> On a normal system, FDT is parsed and of_root is set (or not set) very
>>> early on during setup_arch() before any of the initcall levels are
>>> run. The return value of of_have_populated_dt() isn't expected to
>>> change across initcall levels. But because of the way the unittest is
>>> written (the of_root is changed at late_initcall() level) the return
>>> value of of_have_populated_dt() changes across initcall levels. I
>>> think that's a real problem with the unittest -- it's breaking API
>>> semantics.
>>
>> I think what's really desired here is a 'Am I booting using DT' call.
> 
> I think the community has decided to use of_have_populated_dt() as
> that call. So, we shouldn't break it.

Have you analyzed each and every use of of_have_populated_dt() to verify
that?  I have not yet looked at each of them.

The function was created with one user for a specific purpose and the use
of it has grown over the years.  I was not going to modify it to have
the specific meaning of "Am I booting using DT" (thus being able to ignore
the existence of the unittest data in the devicetree) without first examining
each of the users of the of_have_populated_dt().  [[ This possible change was
one of the solutions I considered before I examined what the actual problem
leading to the WARNing was. ]]

> 
>>
>>> of_have_populated_dt() is being used to check if DT is present in the
>>> system and different things are done based on that. We can't have that
>>> value change across initcall levels.
>>>
>>> Couple of thoughts:
>>> 1. Don't run unit test if there is no live DT in the system?
>>
>> That's pretty much the only case I do run. I use UML to run the tests.
> 
> Ah, makes sense.
> 
>>> 2. If you don't want to do (1), then at least set up the unit test
>>> data during setup_arch() instead of doing it at some initcall level?
>>
>> That further breaks making it a module. The plan is also to move to
>> kunit which probably will preclude some hacky hook into setup_arch().
>> Side effects may need to be fixed for kunit though.
> 
> Yup.
> 
>>> 3. Can you use overlays for the unit tests if they are loaded as a module?
>>
>> That was the idea, yes.
>>
>>
>> 4. Make running the unittests a command line option instead of running
>> if enabled. Still has side effects, but you have to explicitly run it.

I am assuming "command line option" means the kernel boot command line, not
a command line interface.

I would prefer not.  It is a debug option.  There is no need to add the extra
complexity of an additional switch to control it.  Configure it in or configure
it out.


> 
> Hmm... this is another good option. I think this should be done. Do we
> have a consensus on this?
Why would you even ask if there was consensus on something that has not
even been discussed?


>> A module would still be my preference. If only there was someone
>> interested in making everything a module... ;)> 
> :)
> 
>>>>> I was looking into writing a unittest patch to fix this, but I don't
>>>>> know enough about the FDT parsing code to make sure I don't leak any
>>>>> memory or free stuff that's in use. I'm not sure I can simply set
>>>>> of_root = NULL if it was NULL before the unittest started. Let me know
>>>>> how I should proceed or if you plan to write up a patch for this.
>>>>
>>>> Based on the above, "clean up" of the unittest data is not the solution.
>>>>
>>>> I haven't looked at the mechanism in device_links_supplier_sync_state_resume()
>>>> that leads to the WARN yet.  But is does not seem reasonable for that code
>>>> to be so sensitive to what valid data is in the devicetree that a WARN results.
>>>
>>> Sure, I could easily fix it to work around this. But this seems to be
>>> a genuine problem with the unittest setup IMO.
> 
> I'll go ahead and do this (basically always doing it instead of
> checking on of_have_populated_dt()) but I don't want us to forget this
> unittest issue.

Thank you for planning to do this fix.

The unittest issue will not be forgotten.  The possible impacts of unittest on
other users of devicetree is something I am very sensitive to and have
thought about quite a bit.

-Frank


> 
> -Saravana
> 


      reply	other threads:[~2019-12-04  4:17 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-01 15:00 5e6669387e ("of/platform: Pause/resume sync state during init .."): [ 3.192726] WARNING: CPU: 1 PID: 1 at drivers/base/core.c:688 device_links_supplier_sync_state_resume kernel test robot
2019-12-01 15:00 ` kernel test robot
2019-12-02 21:19 ` Saravana Kannan
2019-12-02 21:19   ` Saravana Kannan
2019-12-03  9:00   ` Frank Rowand
2019-12-03 20:05     ` Saravana Kannan
2019-12-03 20:05       ` Saravana Kannan
2019-12-03 21:10       ` Rob Herring
2019-12-03 21:10         ` Rob Herring
2019-12-03 22:50         ` Saravana Kannan
2019-12-03 22:50           ` Saravana Kannan
2019-12-04  4:17           ` Frank Rowand [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1a30fb19-dfac-6b6c-1e92-8d972fc52b10@gmail.com \
    --to=frowand.list@gmail.com \
    --cc=devicetree@vger.kernel.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lkp@intel.com \
    --cc=lkp@lists.01.org \
    --cc=robh+dt@kernel.org \
    --cc=saravanak@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.