linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Daniel Lezcano <daniel.lezcano@linaro.org>
To: Marek Szyprowski <m.szyprowski@samsung.com>, rui.zhang@intel.com
Cc: srinivas.pandruvada@linux.intel.com, rkumbako@codeaurora.org,
	amit.kucheria@linaro.org, linux-kernel@vger.kernel.org,
	linux-pm@vger.kernel.org,
	Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>,
	Arnd Bergmann <arnd.bergmann@linaro.org>
Subject: Re: [PATCH v4 4/4] thermal: core: Add notifications call in the framework
Date: Wed, 15 Jul 2020 01:20:28 +0200	[thread overview]
Message-ID: <4cfb15f6-2801-3386-c7cf-6296a54571a1@linaro.org> (raw)
In-Reply-To: <75683b75-6e1b-6e4e-2354-477c487a5f5f@linaro.org>

On 13/07/2020 22:32, Daniel Lezcano wrote:
> On 13/07/2020 11:31, Marek Szyprowski wrote:
>> Hi
>>
>> On 07.07.2020 11:15, Marek Szyprowski wrote:
>>> On 06.07.2020 15:46, Daniel Lezcano wrote:
>>>> On 06/07/2020 15:17, Marek Szyprowski wrote:
>>>>> On 06.07.2020 12:55, Daniel Lezcano wrote:
>>>>>> The generic netlink protocol is implemented but the different
>>>>>> notification functions are not yet connected to the core code.
>>>>>>
>>>>>> These changes add the notification calls in the different
>>>>>> corresponding places.
>>>>>>
>>>>>> Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org>
>>>>>> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
>>>>> This patch landed in today's linux-next 20200706 as commit 5df786e46560
>>>>> ("thermal: core: Add notifications call in the framework"). Sadly it
>>>>> breaks booting various Samsung Exynos based boards. Here is an example
>>>>> log from Odroid U3 board:
>>>>>
>>>>> Unable to handle kernel NULL pointer dereference at virtual address 
>>>>> 00000010
>>>>> pgd = (ptrval)
>>>>> [00000010] *pgd=00000000
>>>>> Internal error: Oops: 5 [#1] PREEMPT SMP ARM
>>>>> Modules linked in:
>>>>> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.8.0-rc3-00015-g5df786e46560
>>>>> #1146
>>>>> Hardware name: Samsung Exynos (Flattened Device Tree)
>>>>> PC is at kmem_cache_alloc+0x13c/0x418
>>>>> LR is at kmem_cache_alloc+0x48/0x418
>>>>> pc : [<c02b5cac>]    lr : [<c02b5bb8>]    psr: 20000053
>>>>> ...
>>>>> Flags: nzCv  IRQs on  FIQs off  Mode SVC_32  ISA ARM  Segment none
>>>>> Control: 10c5387d  Table: 4000404a  DAC: 00000051
>>>>> Process swapper/0 (pid: 1, stack limit = 0x(ptrval))
>>>>> Stack: (0xee8f1cf8 to 0xee8f2000)
>>>>> ...
>>>>> [<c02b5cac>] (kmem_cache_alloc) from [<c08cd170>] 
>>>>> (__alloc_skb+0x5c/0x170)
>>>>> [<c08cd170>] (__alloc_skb) from [<c07ec19c>]
>>>>> (thermal_genl_send_event+0x24/0x174)
>>>>> [<c07ec19c>] (thermal_genl_send_event) from [<c07ec648>]
>>>>> (thermal_notify_tz_create+0x58/0x74)
>>>>> [<c07ec648>] (thermal_notify_tz_create) from [<c07e9058>]
>>>>> (thermal_zone_device_register+0x358/0x650)
>>>>> [<c07e9058>] (thermal_zone_device_register) from [<c1028d34>]
>>>>> (of_parse_thermal_zones+0x304/0x7a4)
>>>>> [<c1028d34>] (of_parse_thermal_zones) from [<c1028964>]
>>>>> (thermal_init+0xdc/0x154)
>>>>> [<c1028964>] (thermal_init) from [<c0102378>] 
>>>>> (do_one_initcall+0x8c/0x424)
>>>>> [<c0102378>] (do_one_initcall) from [<c1001158>]
>>>>> (kernel_init_freeable+0x190/0x204)
>>>>> [<c1001158>] (kernel_init_freeable) from [<c0ab85f4>]
>>>>> (kernel_init+0x8/0x118)
>>>>> [<c0ab85f4>] (kernel_init) from [<c0100114>] (ret_from_fork+0x14/0x20)
>>>>>
>>>>> Reverting it on top of linux-next fixes the boot issue. I will
>>>>> investigate it further soon.
>>>> Thanks for reporting this.
>>>>
>>>> Can you send the addr2line result and code it points to ?
>>>
>>> addr2line of c02b5cac (kmem_cache_alloc+0x13c/0x418) points to 
>>> mm/slub.c +2839, but I'm not sure if we can trust it. imho it looks 
>>> like some trashed memory somewhere, but I don't have time right now to 
>>> analyze it further now...
>>
>> Just one more thing I've noticed. The crash happens only if the kernel 
>> is compiled with old GCC (tested with arm-linux-gnueabi-gcc (Linaro GCC 
>> 4.9-2017.01) 4.9.4). If I compile kernel with newed GCC (like 
>> arm-linux-gnueabi-gcc (Linaro GCC 6.4-2017.11) 6.4.1 20171012), it works 
>> fine...
>>
>> This happens also with Linux next-20200710, which again got this commit.
> 
> So I finally succeed to reproduce on an ARM64 with a recent compiler,
> earlycon, and the option CONFIG_INIT_ON_ALLOC_DEFAULT_ON.


Finally, narrowed down the issue.

 - genetlink initialization is done at subsys initcall.
 - thermal netlink init is done at core initcall
 - netlink is done at core initcall

By changing the order:

 - netlink and genetlink at core initcall
 - thermal init at postcore initcall

That fixes the problem. The genetlink initcall order is from 2005 and
IMO it makes sense to come right after the netlink initialization.

It is acceptable to have the thermal init at the postcore initcall. It
is very recently we moved from fs_initcall to core_initcall.

Thanks to Arnd who give me a direction to look at.

-- 
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

  reply	other threads:[~2020-07-14 23:20 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-06 10:55 [PATCH v4 1/4] thermal: core: Add helpers to browse the cdev, tz and governor list Daniel Lezcano
2020-07-06 10:55 ` [PATCH v4 2/4] thermal: core: Get thermal zone by id Daniel Lezcano
2020-07-07  1:53   ` Zhang Rui
2020-07-06 10:55 ` [PATCH v4 3/4] thermal: core: genetlink support for events/cmd/sampling Daniel Lezcano
2020-07-07  1:54   ` Zhang Rui
2020-07-06 10:55 ` [PATCH v4 4/4] thermal: core: Add notifications call in the framework Daniel Lezcano
     [not found]   ` <CGME20200706131708eucas1p1487955a7632584c17df724399f48825a@eucas1p1.samsung.com>
2020-07-06 13:17     ` Marek Szyprowski
2020-07-06 13:46       ` Daniel Lezcano
2020-07-07  9:15         ` Marek Szyprowski
2020-07-13  9:31           ` Marek Szyprowski
2020-07-13  9:45             ` Daniel Lezcano
2020-07-13 10:28             ` Daniel Lezcano
2020-07-13 20:32             ` Daniel Lezcano
2020-07-14 23:20               ` Daniel Lezcano [this message]
2020-07-15  6:09                 ` Marek Szyprowski
2020-07-07  1:55   ` Zhang Rui

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4cfb15f6-2801-3386-c7cf-6296a54571a1@linaro.org \
    --to=daniel.lezcano@linaro.org \
    --cc=amit.kucheria@linaro.org \
    --cc=arnd.bergmann@linaro.org \
    --cc=b.zolnierkie@samsung.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=m.szyprowski@samsung.com \
    --cc=rkumbako@codeaurora.org \
    --cc=rui.zhang@intel.com \
    --cc=srinivas.pandruvada@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).