linux-mediatek.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Enric Balletbo i Serra <enric.balletbo@collabora.com>
To: Matthias Brugger <matthias.bgg@gmail.com>,
	Michael Kao <michael.kao@mediatek.com>
Cc: roger.lu@mediatek.com, Hsin-Yi Wang <hsinyi@chromium.org>,
	"drinkcat@chromium.org" <drinkcat@chromium.org>,
	"moderated list:ARM/Mediatek SoC support"
	<linux-mediatek@lists.infradead.org>,
	renze@rnplus.nl
Subject: Re: [BUG] Cannot boot on MT8173 if Mediatek thermal is enabled
Date: Wed, 1 Jul 2020 13:07:43 +0200	[thread overview]
Message-ID: <e72e26d1-ac9f-7506-9e1c-d6aeaa2fa374@collabora.com> (raw)
In-Reply-To: <1d682fad-fc50-9f40-4f8b-ac73a4f41f05@gmail.com>

Hi Michael,

On 28/5/20 16:20, Matthias Brugger wrote:
> 
> 
> On 28/05/2020 10:08, Enric Balletbo i Serra wrote:
>> Hi Michael,
>>
>> On 28/5/20 4:59, Michael Kao wrote:
>>> On Wed, 2020-05-20 at 18:12 +0200, Enric Balletbo i Serra wrote:
>>>> Hi Matthias et all,
>>>>
>>>> On 20/5/20 17:25, Enric Balletbo i Serra wrote:
>>>>>
>>>>>
>>>>> On 20/5/20 17:21, Matthias Brugger wrote:
>>>>>>
>>>>>>
>>>>>> On 20/05/2020 17:09, Enric Balletbo i Serra wrote:
>>>>>>> Dear all,
>>>>>>>
>>>>>>> I've been testing the Acer Chromebook R 13 (elm - MT8173) for a while. Today I
>>>>>>> enabled the Mediatek thermal driver (CONFIG_MTK_THERMAL=y) and I started to get
>>>>>>> that hang [1]
>>>>>>>
>>>>>>
>>>>>> Did you try to bisect to find out what broke it?
>>>>>>
>>>>>
>>>>> I don't even know if this worked at some point, I was running/testing my kernels
>>>>> with CONFIG_MTK_THERMAL disabled. From the log doesn't seem to have a lot of
>>>>> changes so I suspect this issue is there since long time.
>>>>>
>>>>
>>>> So the commit that introduces the problem is:
>>>>
>>>> commit eb9aecd90d1a39601e91cd08b90d5fee51d321a6
>>>> Author: Michael Kao <michael.kao@mediatek.com>
>>>> Date:   Fri Feb 1 15:38:07 2019 +0800
>>>>
>>>>     thermal: mediatek: fix register index error
>>>>
>>>>     The index of msr and adcpnp should match the sensor
>>>>     which belongs to the selected bank in the for loop.
>>>>
>>>>     Fixes: b7cf0053738c ("thermal: Add Mediatek thermal driver for mt2701.")
>>>>     Signed-off-by: Michael Kao <michael.kao@mediatek.com>
>>>>     Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
>>>>
>>>>
>>>>>
>>>>>> Regards,
>>>>>> Matthias
>>>>>>
>>>>>>> The stacktrace points point to this function:
>>>>>>>
>>>>>>> static int mtk_thermal_bank_temperature(struct mtk_thermal_bank *bank)
>>>>>>>
>>>>>>> More precisely to this call:
>>>>>>>
>>>>>>> 		raw = readl(mt->thermal_base +
>>>>>>> 			    conf->msr[conf->bank_data[bank->id].sensors[i]]);
>>>>>>>
>>>>>>> this call, is in a loop and ends trying to access to conf->msr[4]
>>>>>>> (conf->msr[MT8173_TSABB]) which doesn't exist as per the following struct
>>>>>>>
>>>>>>> static const int mt8173_msr[MT8173_NUM_SENSORS_PER_ZONE] = {
>>>>>>> 	TEMP_MSR0, TEMP_MSR1, TEMP_MSR2, TEMP_MSR3
>>>>>>> };
>>>>>>>
>>>>>>> I think the datasheet will help here to clarify what is happening but is not
>>>>>>> public, so I can really check. Anyway seems that or the mt8173_msr struct is
>>>>>>> wrong or the mt8173_bank_data is wrong or there is something else.
>>>>>>>
>>>>>>> Could anyone with the information or with this hardwware knowledge take a look,
>>>>>>> please.
>>>>>>>
>>>>>>> Thanks,
>>>>>>>  Enric
>>>>>>>
>>>>>>>
>>>>>>> [1]
>>>>>>> [    2.222488] Unable to handle kernel paging request at virtual address
>>>>>>> ffff8000125f5001
>>>>>>> [    2.230421] Mem abort info:
>>>>>>> [    2.233207]   ESR = 0x96000021
>>>>>>> [    2.236261]   EC = 0x25: DABT (current EL), IL = 32 bits
>>>>>>> [    2.241571]   SET = 0, FnV = 0
>>>>>>> [    2.244623]   EA = 0, S1PTW = 0
>>>>>>> [    2.247762] Data abort info:
>>>>>>> [    2.250640]   ISV = 0, ISS = 0x00000021
>>>>>>> [    2.254473]   CM = 0, WnR = 0
>>>>>>> [    2.257544] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000041850000
>>>>>>> [    2.264251] [ffff8000125f5001] pgd=000000013ffff003, pud=000000013fffe003,
>>>>>>> pmd=000000013fff9003, pte=006800001100b707
>>>>>>> [    2.274867] Internal error: Oops: 96000021 [#1] PREEMPT SMP
>>>>>>> [    2.280432] Modules linked in:
>>>>>>> [    2.283483] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.7.0-rc6+ #162
>>>>>>> [    2.289914] Hardware name: Google Elm (DT)
>>>>>>> [    2.294003] pstate: 20000005 (nzCv daif -PAN -UAO)
>>>>>>> [    2.298792] pc : mtk_read_temp+0xb8/0x1c8
>>>>>>> [    2.302793] lr : mtk_read_temp+0x7c/0x1c8
>>>>>>> [    2.306794] sp : ffff80001003b930
>>>>>>> [    2.310100] x29: ffff80001003b930 x28: 0000000000000000
>>>>>>> [    2.315404] x27: 0000000000000002 x26: ffff0000f9550b10
>>>>>>> [    2.320709] x25: ffff0000f9550a80 x24: 0000000000000090
>>>>>>> [    2.326014] x23: ffff80001003ba24 x22: 00000000610344c0
>>>>>>> [    2.331318] x21: 0000000000002710 x20: 00000000000001f4
>>>>>>> [    2.336622] x19: 0000000000030d40 x18: ffff800011742ec0
>>>>>>> [    2.341926] x17: 0000000000000001 x16: 0000000000000001
>>>>>>> [    2.347230] x15: ffffffffffffffff x14: ffffff0000000000
>>>>>>> [    2.352535] x13: ffffffffffffffff x12: 0000000000000028
>>>>>>> [    2.357839] x11: 0000000000000003 x10: ffff800011295ec8
>>>>>>> [    2.363143] x9 : 000000000000291b x8 : 0000000000000002
>>>>>>> [    2.368447] x7 : 00000000000000a8 x6 : 0000000000000004
>>>>>>> [    2.373751] x5 : 0000000000000000 x4 : ffff800011295cb0
>>>>>>> [    2.379055] x3 : 0000000000000002 x2 : ffff8000125f5001
>>>>>>> [    2.384359] x1 : 0000000000000001 x0 : ffff0000f9550a80
>>>>>>> [    2.389665] Call trace:
>>>>>>> [    2.392105]  mtk_read_temp+0xb8/0x1c8
>>>>>>> [    2.395760]  of_thermal_get_temp+0x2c/0x40
>>>>>>> [    2.399849]  thermal_zone_get_temp+0x78/0x160
>>>>>>> [    2.404198]  thermal_zone_device_update.part.0+0x3c/0x1f8
>>>>>>> [    2.409589]  thermal_zone_device_update+0x34/0x48
>>>>>>> [    2.414286]  of_thermal_set_mode+0x58/0x88
>>>>>>> [    2.418375]  thermal_zone_of_sensor_register+0x1a8/0x1d8
>>>>>>> [    2.423679]  devm_thermal_zone_of_sensor_register+0x64/0xb0
>>>>>>> [    2.429242]  mtk_thermal_probe+0x690/0x7d0
>>>>>>> [    2.433333]  platform_drv_probe+0x5c/0xb0
>>>>>>> [    2.437335]  really_probe+0xe4/0x448
>>>>>>> [    2.440901]  driver_probe_device+0xe8/0x140
>>>>>>> [    2.445077]  device_driver_attach+0x7c/0x88
>>>>>>> [    2.449252]  __driver_attach+0xac/0x178
>>>>>>> [    2.453082]  bus_for_each_dev+0x78/0xc8
>>>>>>> [    2.456909]  driver_attach+0x2c/0x38
>>>>>>> [    2.460476]  bus_add_driver+0x14c/0x230
>>>>>>> [    2.464304]  driver_register+0x6c/0x128
>>>>>>> [    2.468131]  __platform_driver_register+0x50/0x60
>>>>>>> [    2.472831]  mtk_thermal_driver_init+0x24/0x30
>>>>>>> [    2.477268]  do_one_initcall+0x50/0x298
>>>>>>> [    2.481098]  kernel_init_freeable+0x1ec/0x264
>>>>>>> [    2.485450]  kernel_init+0x1c/0x110
>>>>>>> [    2.488931]  ret_from_fork+0x10/0x1c
>>>>>>> [    2.492502] Code: f9401081 f9400402 b8a67821 8b010042 (b9400042)
>>>>>>> [    2.498599] ---[ end trace e43e3105ed27dc99 ]---
>>>>>>> [    2.503367] Kernel panic - not syncing: Attempted to kill init!
>>>>>>> exitcode=0x0000000b
>>>>>>> [    2.511020] SMP: stopping secondary CPUs
>>>>>>> [    2.514941] Kernel Offset: disabled
>>>>>>> [    2.518421] CPU features: 0x090002,25006005
>>>>>>> [    2.522595] Memory Limit: none
>>>>>>> [    2.525644] ---[ end Kernel panic - not syncing: Attempted to kill init!
>>>>>>> exitcode=0x0000000b ]---
>>>>>>>
>>>
>>>> Hi Enric,
>>> I will help to fix this crash. If it is urgent, you can revert the patch
>>> locally first for mt8173.
>>>
>>
>> Thanks, the reverted patch is what I am carrying ;-)
>>
> 
> As Enric mentioned v4.8-rc1 will claim support for MT8173 based Chromebooks, so
> we should make sure that a fix gets in in the first rc's (best would be rc2). If
> I got that right, best would be if we have a fix in around 4 weeks.
> 
> Please let me know if you can't make it and we can revert the commit in the
> rc-phase. Also I'd prefer a proper fix.
> 


Michael, did you had a chance to look into this.

> Enric what do you think?
> 

I plan to send a revert after rc4 if we couldn't find a fix for it before.

Thanks,
 Enric

> Regards,
> Matthias
> 
>> It'd be nice have this fixed for next MR or during the upcoming release cycle,
>> it will probably start next week. During the next merge window will land the
>> support for MT8173 Elm and Hana, so those boards will be affected by this and
>> will break. Actually, there is no much boards supported in mainline using MT8183
>> (only EVB), so in the worse case I can send a revert or a partial revert of the
>> patch.
>>
>> Thanks,
>>  Enric
>>
>>> The last sensor in mt8173_bank_data[2] is MT8173_TSABB.
>>> Its index is 4. But there is not index 4 in mt8173_msr and
>>> mt8173_adcpnp.
>>> That is the root cause for your reference.
>>>
>>> static const struct mtk_thermal_data mt8173_thermal_data = {
>>> 	.auxadc_channel = MT8173_TEMP_AUXADC_CHANNEL,
>>> 	.num_banks = MT8173_NUM_ZONES,
>>> 	.num_sensors = MT8173_NUM_SENSORS,
>>> 	.vts_index = mt8173_vts_index,
>>> 	.cali_val = MT8173_CALIBRATION,
>>> 	.num_controller = MT8173_NUM_CONTROLLER,
>>> 	.controller_offset = mt8173_tc_offset,
>>> 	.need_switch_bank = true,
>>> 	.bank_data = {
>>> 		{
>>> 			.num_sensors = 2,
>>> 			.sensors = mt8173_bank_data[0],
>>> 		}, {
>>> 			.num_sensors = 2,
>>> 			.sensors = mt8173_bank_data[1],
>>> 		}, {
>>> 			.num_sensors = 3,
>>> 			.sensors = mt8173_bank_data[2],
>>> 		}, {
>>> 			.num_sensors = 1,
>>> 			.sensors = mt8173_bank_data[3],
>>> 		},
>>> 	},
>>> 	.msr = mt8173_msr,
>>> 	.adcpnp = mt8173_adcpnp,
>>> 	.sensor_mux_values = mt8173_mux_values,
>>> };
>>>
>>>
>>>
>>> /* MT8173 thermal sensor data */
>>> static const int mt8173_bank_data[MT8173_NUM_ZONES][3] = {
>>> 	{ MT8173_TS2, MT8173_TS3 },
>>> 	{ MT8173_TS2, MT8173_TS4 },
>>> 	{ MT8173_TS1, MT8173_TS2, MT8173_TSABB },
>>> 	{ MT8173_TS2 },
>>> };
>>>
>>> static const int mt8173_msr[MT8173_NUM_SENSORS_PER_ZONE] = {
>>> 	TEMP_MSR0, TEMP_MSR1, TEMP_MSR2, TEMP_MSR3
>>> };
>>>
>>> static const int mt8173_adcpnp[MT8173_NUM_SENSORS_PER_ZONE] = {
>>> 	TEMP_ADCPNP0, TEMP_ADCPNP1, TEMP_ADCPNP2, TEMP_ADCPNP3
>>> };
>>>
>>> /* MT8173 thermal sensors */
>>> #define MT8173_TS1	0
>>> #define MT8173_TS2	1
>>> #define MT8173_TS3	2
>>> #define MT8173_TS4	3
>>> #define MT8173_TSABB	4
>>>
>>> Best Regards,
>>> Michael
>>>> _______________________________________________
>>>> Linux-mediatek mailing list
>>>> Linux-mediatek@lists.infradead.org
>>>> http://lists.infradead.org/mailman/listinfo/linux-mediatek
>>>

_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

      reply	other threads:[~2020-07-01 11:07 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-20 15:09 [BUG] Cannot boot on MT8173 if Mediatek thermal is enabled Enric Balletbo i Serra
2020-05-20 15:21 ` Matthias Brugger
2020-05-20 15:25   ` Enric Balletbo i Serra
2020-05-20 16:12     ` Enric Balletbo i Serra
2020-05-28  2:59       ` Michael Kao
2020-05-28  8:08         ` Enric Balletbo i Serra
2020-05-28 14:20           ` Matthias Brugger
2020-07-01 11:07             ` Enric Balletbo i Serra [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e72e26d1-ac9f-7506-9e1c-d6aeaa2fa374@collabora.com \
    --to=enric.balletbo@collabora.com \
    --cc=drinkcat@chromium.org \
    --cc=hsinyi@chromium.org \
    --cc=linux-mediatek@lists.infradead.org \
    --cc=matthias.bgg@gmail.com \
    --cc=michael.kao@mediatek.com \
    --cc=renze@rnplus.nl \
    --cc=roger.lu@mediatek.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).