linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Marek Szyprowski <m.szyprowski@samsung.com>
To: Daniel Lezcano <daniel.lezcano@linaro.org>, rui.zhang@intel.com
Cc: srinivas.pandruvada@linux.intel.com, rkumbako@codeaurora.org,
	amit.kucheria@linaro.org, linux-kernel@vger.kernel.org,
	linux-pm@vger.kernel.org,
	Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>,
	Arnd Bergmann <arnd.bergmann@linaro.org>
Subject: Re: [PATCH v4 4/4] thermal: core: Add notifications call in the framework
Date: Wed, 15 Jul 2020 08:09:25 +0200	[thread overview]
Message-ID: <6cf5d9a9-a142-d2e0-10e3-10271a4bb926@samsung.com> (raw)
In-Reply-To: <4cfb15f6-2801-3386-c7cf-6296a54571a1@linaro.org>

Hi Daniel,

On 15.07.2020 01:20, Daniel Lezcano wrote:
> On 13/07/2020 22:32, Daniel Lezcano wrote:
>> On 13/07/2020 11:31, Marek Szyprowski wrote:
>>> On 07.07.2020 11:15, Marek Szyprowski wrote:
>>>> On 06.07.2020 15:46, Daniel Lezcano wrote:
>>>>> On 06/07/2020 15:17, Marek Szyprowski wrote:
>>>>>> On 06.07.2020 12:55, Daniel Lezcano wrote:
>>>>>>> The generic netlink protocol is implemented but the different
>>>>>>> notification functions are not yet connected to the core code.
>>>>>>>
>>>>>>> These changes add the notification calls in the different
>>>>>>> corresponding places.
>>>>>>>
>>>>>>> Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org>
>>>>>>> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
>>>>>> This patch landed in today's linux-next 20200706 as commit 5df786e46560
>>>>>> ("thermal: core: Add notifications call in the framework"). Sadly it
>>>>>> breaks booting various Samsung Exynos based boards. Here is an example
>>>>>> log from Odroid U3 board:
>>>>>>
>>>>>> Unable to handle kernel NULL pointer dereference at virtual address
>>>>>> 00000010
>>>>>> pgd = (ptrval)
>>>>>> [00000010] *pgd=00000000
>>>>>> Internal error: Oops: 5 [#1] PREEMPT SMP ARM
>>>>>> Modules linked in:
>>>>>> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.8.0-rc3-00015-g5df786e46560
>>>>>> #1146
>>>>>> Hardware name: Samsung Exynos (Flattened Device Tree)
>>>>>> PC is at kmem_cache_alloc+0x13c/0x418
>>>>>> LR is at kmem_cache_alloc+0x48/0x418
>>>>>> pc : [<c02b5cac>]    lr : [<c02b5bb8>]    psr: 20000053
>>>>>> ...
>>>>>> Flags: nzCv  IRQs on  FIQs off  Mode SVC_32  ISA ARM  Segment none
>>>>>> Control: 10c5387d  Table: 4000404a  DAC: 00000051
>>>>>> Process swapper/0 (pid: 1, stack limit = 0x(ptrval))
>>>>>> Stack: (0xee8f1cf8 to 0xee8f2000)
>>>>>> ...
>>>>>> [<c02b5cac>] (kmem_cache_alloc) from [<c08cd170>]
>>>>>> (__alloc_skb+0x5c/0x170)
>>>>>> [<c08cd170>] (__alloc_skb) from [<c07ec19c>]
>>>>>> (thermal_genl_send_event+0x24/0x174)
>>>>>> [<c07ec19c>] (thermal_genl_send_event) from [<c07ec648>]
>>>>>> (thermal_notify_tz_create+0x58/0x74)
>>>>>> [<c07ec648>] (thermal_notify_tz_create) from [<c07e9058>]
>>>>>> (thermal_zone_device_register+0x358/0x650)
>>>>>> [<c07e9058>] (thermal_zone_device_register) from [<c1028d34>]
>>>>>> (of_parse_thermal_zones+0x304/0x7a4)
>>>>>> [<c1028d34>] (of_parse_thermal_zones) from [<c1028964>]
>>>>>> (thermal_init+0xdc/0x154)
>>>>>> [<c1028964>] (thermal_init) from [<c0102378>]
>>>>>> (do_one_initcall+0x8c/0x424)
>>>>>> [<c0102378>] (do_one_initcall) from [<c1001158>]
>>>>>> (kernel_init_freeable+0x190/0x204)
>>>>>> [<c1001158>] (kernel_init_freeable) from [<c0ab85f4>]
>>>>>> (kernel_init+0x8/0x118)
>>>>>> [<c0ab85f4>] (kernel_init) from [<c0100114>] (ret_from_fork+0x14/0x20)
>>>>>>
>>>>>> Reverting it on top of linux-next fixes the boot issue. I will
>>>>>> investigate it further soon.
>>>>> Thanks for reporting this.
>>>>>
>>>>> Can you send the addr2line result and code it points to ?
>>>> addr2line of c02b5cac (kmem_cache_alloc+0x13c/0x418) points to
>>>> mm/slub.c +2839, but I'm not sure if we can trust it. imho it looks
>>>> like some trashed memory somewhere, but I don't have time right now to
>>>> analyze it further now...
>>> Just one more thing I've noticed. The crash happens only if the kernel
>>> is compiled with old GCC (tested with arm-linux-gnueabi-gcc (Linaro GCC
>>> 4.9-2017.01) 4.9.4). If I compile kernel with newed GCC (like
>>> arm-linux-gnueabi-gcc (Linaro GCC 6.4-2017.11) 6.4.1 20171012), it works
>>> fine...
>>>
>>> This happens also with Linux next-20200710, which again got this commit.
>> So I finally succeed to reproduce on an ARM64 with a recent compiler,
>> earlycon, and the option CONFIG_INIT_ON_ALLOC_DEFAULT_ON.
>
> Finally, narrowed down the issue.
>
>   - genetlink initialization is done at subsys initcall.
>   - thermal netlink init is done at core initcall
>   - netlink is done at core initcall
>
> By changing the order:
>
>   - netlink and genetlink at core initcall
>   - thermal init at postcore initcall
>
> That fixes the problem.
I confirm that such change fixes the issue! Feel free to add:

Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>

Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>

to the final patch.

> The genetlink initcall order is from 2005 and
> IMO it makes sense to come right after the netlink initialization.
>
> It is acceptable to have the thermal init at the postcore initcall. It
> is very recently we moved from fs_initcall to core_initcall.
>
> Thanks to Arnd who give me a direction to look at.
Best regards

-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


  reply	other threads:[~2020-07-15  6:09 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-06 10:55 [PATCH v4 1/4] thermal: core: Add helpers to browse the cdev, tz and governor list Daniel Lezcano
2020-07-06 10:55 ` [PATCH v4 2/4] thermal: core: Get thermal zone by id Daniel Lezcano
2020-07-07  1:53   ` Zhang Rui
2020-07-06 10:55 ` [PATCH v4 3/4] thermal: core: genetlink support for events/cmd/sampling Daniel Lezcano
2020-07-07  1:54   ` Zhang Rui
2020-07-06 10:55 ` [PATCH v4 4/4] thermal: core: Add notifications call in the framework Daniel Lezcano
     [not found]   ` <CGME20200706131708eucas1p1487955a7632584c17df724399f48825a@eucas1p1.samsung.com>
2020-07-06 13:17     ` Marek Szyprowski
2020-07-06 13:46       ` Daniel Lezcano
2020-07-07  9:15         ` Marek Szyprowski
2020-07-13  9:31           ` Marek Szyprowski
2020-07-13  9:45             ` Daniel Lezcano
2020-07-13 10:28             ` Daniel Lezcano
2020-07-13 20:32             ` Daniel Lezcano
2020-07-14 23:20               ` Daniel Lezcano
2020-07-15  6:09                 ` Marek Szyprowski [this message]
2020-07-07  1:55   ` Zhang Rui

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6cf5d9a9-a142-d2e0-10e3-10271a4bb926@samsung.com \
    --to=m.szyprowski@samsung.com \
    --cc=amit.kucheria@linaro.org \
    --cc=arnd.bergmann@linaro.org \
    --cc=b.zolnierkie@samsung.com \
    --cc=daniel.lezcano@linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=rkumbako@codeaurora.org \
    --cc=rui.zhang@intel.com \
    --cc=srinivas.pandruvada@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).