linux-mediatek.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [BUG] Cannot boot on MT8173 if Mediatek thermal is enabled
@ 2020-05-20 15:09 Enric Balletbo i Serra
  2020-05-20 15:21 ` Matthias Brugger
  0 siblings, 1 reply; 8+ messages in thread
From: Enric Balletbo i Serra @ 2020-05-20 15:09 UTC (permalink / raw)
  To: moderated list:ARM/Mediatek SoC support, Matthias Brugger
  Cc: renze, michael.kao, drinkcat, roger.lu, Hsin-Yi Wang

Dear all,

I've been testing the Acer Chromebook R 13 (elm - MT8173) for a while. Today I
enabled the Mediatek thermal driver (CONFIG_MTK_THERMAL=y) and I started to get
that hang [1]

The stacktrace points point to this function:

static int mtk_thermal_bank_temperature(struct mtk_thermal_bank *bank)

More precisely to this call:

		raw = readl(mt->thermal_base +
			    conf->msr[conf->bank_data[bank->id].sensors[i]]);

this call, is in a loop and ends trying to access to conf->msr[4]
(conf->msr[MT8173_TSABB]) which doesn't exist as per the following struct

static const int mt8173_msr[MT8173_NUM_SENSORS_PER_ZONE] = {
	TEMP_MSR0, TEMP_MSR1, TEMP_MSR2, TEMP_MSR3
};

I think the datasheet will help here to clarify what is happening but is not
public, so I can really check. Anyway seems that or the mt8173_msr struct is
wrong or the mt8173_bank_data is wrong or there is something else.

Could anyone with the information or with this hardwware knowledge take a look,
please.

Thanks,
 Enric


[1]
[    2.222488] Unable to handle kernel paging request at virtual address
ffff8000125f5001
[    2.230421] Mem abort info:
[    2.233207]   ESR = 0x96000021
[    2.236261]   EC = 0x25: DABT (current EL), IL = 32 bits
[    2.241571]   SET = 0, FnV = 0
[    2.244623]   EA = 0, S1PTW = 0
[    2.247762] Data abort info:
[    2.250640]   ISV = 0, ISS = 0x00000021
[    2.254473]   CM = 0, WnR = 0
[    2.257544] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000041850000
[    2.264251] [ffff8000125f5001] pgd=000000013ffff003, pud=000000013fffe003,
pmd=000000013fff9003, pte=006800001100b707
[    2.274867] Internal error: Oops: 96000021 [#1] PREEMPT SMP
[    2.280432] Modules linked in:
[    2.283483] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.7.0-rc6+ #162
[    2.289914] Hardware name: Google Elm (DT)
[    2.294003] pstate: 20000005 (nzCv daif -PAN -UAO)
[    2.298792] pc : mtk_read_temp+0xb8/0x1c8
[    2.302793] lr : mtk_read_temp+0x7c/0x1c8
[    2.306794] sp : ffff80001003b930
[    2.310100] x29: ffff80001003b930 x28: 0000000000000000
[    2.315404] x27: 0000000000000002 x26: ffff0000f9550b10
[    2.320709] x25: ffff0000f9550a80 x24: 0000000000000090
[    2.326014] x23: ffff80001003ba24 x22: 00000000610344c0
[    2.331318] x21: 0000000000002710 x20: 00000000000001f4
[    2.336622] x19: 0000000000030d40 x18: ffff800011742ec0
[    2.341926] x17: 0000000000000001 x16: 0000000000000001
[    2.347230] x15: ffffffffffffffff x14: ffffff0000000000
[    2.352535] x13: ffffffffffffffff x12: 0000000000000028
[    2.357839] x11: 0000000000000003 x10: ffff800011295ec8
[    2.363143] x9 : 000000000000291b x8 : 0000000000000002
[    2.368447] x7 : 00000000000000a8 x6 : 0000000000000004
[    2.373751] x5 : 0000000000000000 x4 : ffff800011295cb0
[    2.379055] x3 : 0000000000000002 x2 : ffff8000125f5001
[    2.384359] x1 : 0000000000000001 x0 : ffff0000f9550a80
[    2.389665] Call trace:
[    2.392105]  mtk_read_temp+0xb8/0x1c8
[    2.395760]  of_thermal_get_temp+0x2c/0x40
[    2.399849]  thermal_zone_get_temp+0x78/0x160
[    2.404198]  thermal_zone_device_update.part.0+0x3c/0x1f8
[    2.409589]  thermal_zone_device_update+0x34/0x48
[    2.414286]  of_thermal_set_mode+0x58/0x88
[    2.418375]  thermal_zone_of_sensor_register+0x1a8/0x1d8
[    2.423679]  devm_thermal_zone_of_sensor_register+0x64/0xb0
[    2.429242]  mtk_thermal_probe+0x690/0x7d0
[    2.433333]  platform_drv_probe+0x5c/0xb0
[    2.437335]  really_probe+0xe4/0x448
[    2.440901]  driver_probe_device+0xe8/0x140
[    2.445077]  device_driver_attach+0x7c/0x88
[    2.449252]  __driver_attach+0xac/0x178
[    2.453082]  bus_for_each_dev+0x78/0xc8
[    2.456909]  driver_attach+0x2c/0x38
[    2.460476]  bus_add_driver+0x14c/0x230
[    2.464304]  driver_register+0x6c/0x128
[    2.468131]  __platform_driver_register+0x50/0x60
[    2.472831]  mtk_thermal_driver_init+0x24/0x30
[    2.477268]  do_one_initcall+0x50/0x298
[    2.481098]  kernel_init_freeable+0x1ec/0x264
[    2.485450]  kernel_init+0x1c/0x110
[    2.488931]  ret_from_fork+0x10/0x1c
[    2.492502] Code: f9401081 f9400402 b8a67821 8b010042 (b9400042)
[    2.498599] ---[ end trace e43e3105ed27dc99 ]---
[    2.503367] Kernel panic - not syncing: Attempted to kill init!
exitcode=0x0000000b
[    2.511020] SMP: stopping secondary CPUs
[    2.514941] Kernel Offset: disabled
[    2.518421] CPU features: 0x090002,25006005
[    2.522595] Memory Limit: none
[    2.525644] ---[ end Kernel panic - not syncing: Attempted to kill init!
exitcode=0x0000000b ]---

_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [BUG] Cannot boot on MT8173 if Mediatek thermal is enabled
  2020-05-20 15:09 [BUG] Cannot boot on MT8173 if Mediatek thermal is enabled Enric Balletbo i Serra
@ 2020-05-20 15:21 ` Matthias Brugger
  2020-05-20 15:25   ` Enric Balletbo i Serra
  0 siblings, 1 reply; 8+ messages in thread
From: Matthias Brugger @ 2020-05-20 15:21 UTC (permalink / raw)
  To: Enric Balletbo i Serra, moderated list:ARM/Mediatek SoC support
  Cc: renze, michael.kao, drinkcat, roger.lu, Hsin-Yi Wang



On 20/05/2020 17:09, Enric Balletbo i Serra wrote:
> Dear all,
> 
> I've been testing the Acer Chromebook R 13 (elm - MT8173) for a while. Today I
> enabled the Mediatek thermal driver (CONFIG_MTK_THERMAL=y) and I started to get
> that hang [1]
> 

Did you try to bisect to find out what broke it?

Regards,
Matthias

> The stacktrace points point to this function:
> 
> static int mtk_thermal_bank_temperature(struct mtk_thermal_bank *bank)
> 
> More precisely to this call:
> 
> 		raw = readl(mt->thermal_base +
> 			    conf->msr[conf->bank_data[bank->id].sensors[i]]);
> 
> this call, is in a loop and ends trying to access to conf->msr[4]
> (conf->msr[MT8173_TSABB]) which doesn't exist as per the following struct
> 
> static const int mt8173_msr[MT8173_NUM_SENSORS_PER_ZONE] = {
> 	TEMP_MSR0, TEMP_MSR1, TEMP_MSR2, TEMP_MSR3
> };
> 
> I think the datasheet will help here to clarify what is happening but is not
> public, so I can really check. Anyway seems that or the mt8173_msr struct is
> wrong or the mt8173_bank_data is wrong or there is something else.
> 
> Could anyone with the information or with this hardwware knowledge take a look,
> please.
> 
> Thanks,
>  Enric
> 
> 
> [1]
> [    2.222488] Unable to handle kernel paging request at virtual address
> ffff8000125f5001
> [    2.230421] Mem abort info:
> [    2.233207]   ESR = 0x96000021
> [    2.236261]   EC = 0x25: DABT (current EL), IL = 32 bits
> [    2.241571]   SET = 0, FnV = 0
> [    2.244623]   EA = 0, S1PTW = 0
> [    2.247762] Data abort info:
> [    2.250640]   ISV = 0, ISS = 0x00000021
> [    2.254473]   CM = 0, WnR = 0
> [    2.257544] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000041850000
> [    2.264251] [ffff8000125f5001] pgd=000000013ffff003, pud=000000013fffe003,
> pmd=000000013fff9003, pte=006800001100b707
> [    2.274867] Internal error: Oops: 96000021 [#1] PREEMPT SMP
> [    2.280432] Modules linked in:
> [    2.283483] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.7.0-rc6+ #162
> [    2.289914] Hardware name: Google Elm (DT)
> [    2.294003] pstate: 20000005 (nzCv daif -PAN -UAO)
> [    2.298792] pc : mtk_read_temp+0xb8/0x1c8
> [    2.302793] lr : mtk_read_temp+0x7c/0x1c8
> [    2.306794] sp : ffff80001003b930
> [    2.310100] x29: ffff80001003b930 x28: 0000000000000000
> [    2.315404] x27: 0000000000000002 x26: ffff0000f9550b10
> [    2.320709] x25: ffff0000f9550a80 x24: 0000000000000090
> [    2.326014] x23: ffff80001003ba24 x22: 00000000610344c0
> [    2.331318] x21: 0000000000002710 x20: 00000000000001f4
> [    2.336622] x19: 0000000000030d40 x18: ffff800011742ec0
> [    2.341926] x17: 0000000000000001 x16: 0000000000000001
> [    2.347230] x15: ffffffffffffffff x14: ffffff0000000000
> [    2.352535] x13: ffffffffffffffff x12: 0000000000000028
> [    2.357839] x11: 0000000000000003 x10: ffff800011295ec8
> [    2.363143] x9 : 000000000000291b x8 : 0000000000000002
> [    2.368447] x7 : 00000000000000a8 x6 : 0000000000000004
> [    2.373751] x5 : 0000000000000000 x4 : ffff800011295cb0
> [    2.379055] x3 : 0000000000000002 x2 : ffff8000125f5001
> [    2.384359] x1 : 0000000000000001 x0 : ffff0000f9550a80
> [    2.389665] Call trace:
> [    2.392105]  mtk_read_temp+0xb8/0x1c8
> [    2.395760]  of_thermal_get_temp+0x2c/0x40
> [    2.399849]  thermal_zone_get_temp+0x78/0x160
> [    2.404198]  thermal_zone_device_update.part.0+0x3c/0x1f8
> [    2.409589]  thermal_zone_device_update+0x34/0x48
> [    2.414286]  of_thermal_set_mode+0x58/0x88
> [    2.418375]  thermal_zone_of_sensor_register+0x1a8/0x1d8
> [    2.423679]  devm_thermal_zone_of_sensor_register+0x64/0xb0
> [    2.429242]  mtk_thermal_probe+0x690/0x7d0
> [    2.433333]  platform_drv_probe+0x5c/0xb0
> [    2.437335]  really_probe+0xe4/0x448
> [    2.440901]  driver_probe_device+0xe8/0x140
> [    2.445077]  device_driver_attach+0x7c/0x88
> [    2.449252]  __driver_attach+0xac/0x178
> [    2.453082]  bus_for_each_dev+0x78/0xc8
> [    2.456909]  driver_attach+0x2c/0x38
> [    2.460476]  bus_add_driver+0x14c/0x230
> [    2.464304]  driver_register+0x6c/0x128
> [    2.468131]  __platform_driver_register+0x50/0x60
> [    2.472831]  mtk_thermal_driver_init+0x24/0x30
> [    2.477268]  do_one_initcall+0x50/0x298
> [    2.481098]  kernel_init_freeable+0x1ec/0x264
> [    2.485450]  kernel_init+0x1c/0x110
> [    2.488931]  ret_from_fork+0x10/0x1c
> [    2.492502] Code: f9401081 f9400402 b8a67821 8b010042 (b9400042)
> [    2.498599] ---[ end trace e43e3105ed27dc99 ]---
> [    2.503367] Kernel panic - not syncing: Attempted to kill init!
> exitcode=0x0000000b
> [    2.511020] SMP: stopping secondary CPUs
> [    2.514941] Kernel Offset: disabled
> [    2.518421] CPU features: 0x090002,25006005
> [    2.522595] Memory Limit: none
> [    2.525644] ---[ end Kernel panic - not syncing: Attempted to kill init!
> exitcode=0x0000000b ]---
> 

_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [BUG] Cannot boot on MT8173 if Mediatek thermal is enabled
  2020-05-20 15:21 ` Matthias Brugger
@ 2020-05-20 15:25   ` Enric Balletbo i Serra
  2020-05-20 16:12     ` Enric Balletbo i Serra
  0 siblings, 1 reply; 8+ messages in thread
From: Enric Balletbo i Serra @ 2020-05-20 15:25 UTC (permalink / raw)
  To: Matthias Brugger, moderated list:ARM/Mediatek SoC support
  Cc: renze, michael.kao, drinkcat, roger.lu, Hsin-Yi Wang



On 20/5/20 17:21, Matthias Brugger wrote:
> 
> 
> On 20/05/2020 17:09, Enric Balletbo i Serra wrote:
>> Dear all,
>>
>> I've been testing the Acer Chromebook R 13 (elm - MT8173) for a while. Today I
>> enabled the Mediatek thermal driver (CONFIG_MTK_THERMAL=y) and I started to get
>> that hang [1]
>>
> 
> Did you try to bisect to find out what broke it?
> 

I don't even know if this worked at some point, I was running/testing my kernels
with CONFIG_MTK_THERMAL disabled. From the log doesn't seem to have a lot of
changes so I suspect this issue is there since long time.


> Regards,
> Matthias
> 
>> The stacktrace points point to this function:
>>
>> static int mtk_thermal_bank_temperature(struct mtk_thermal_bank *bank)
>>
>> More precisely to this call:
>>
>> 		raw = readl(mt->thermal_base +
>> 			    conf->msr[conf->bank_data[bank->id].sensors[i]]);
>>
>> this call, is in a loop and ends trying to access to conf->msr[4]
>> (conf->msr[MT8173_TSABB]) which doesn't exist as per the following struct
>>
>> static const int mt8173_msr[MT8173_NUM_SENSORS_PER_ZONE] = {
>> 	TEMP_MSR0, TEMP_MSR1, TEMP_MSR2, TEMP_MSR3
>> };
>>
>> I think the datasheet will help here to clarify what is happening but is not
>> public, so I can really check. Anyway seems that or the mt8173_msr struct is
>> wrong or the mt8173_bank_data is wrong or there is something else.
>>
>> Could anyone with the information or with this hardwware knowledge take a look,
>> please.
>>
>> Thanks,
>>  Enric
>>
>>
>> [1]
>> [    2.222488] Unable to handle kernel paging request at virtual address
>> ffff8000125f5001
>> [    2.230421] Mem abort info:
>> [    2.233207]   ESR = 0x96000021
>> [    2.236261]   EC = 0x25: DABT (current EL), IL = 32 bits
>> [    2.241571]   SET = 0, FnV = 0
>> [    2.244623]   EA = 0, S1PTW = 0
>> [    2.247762] Data abort info:
>> [    2.250640]   ISV = 0, ISS = 0x00000021
>> [    2.254473]   CM = 0, WnR = 0
>> [    2.257544] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000041850000
>> [    2.264251] [ffff8000125f5001] pgd=000000013ffff003, pud=000000013fffe003,
>> pmd=000000013fff9003, pte=006800001100b707
>> [    2.274867] Internal error: Oops: 96000021 [#1] PREEMPT SMP
>> [    2.280432] Modules linked in:
>> [    2.283483] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.7.0-rc6+ #162
>> [    2.289914] Hardware name: Google Elm (DT)
>> [    2.294003] pstate: 20000005 (nzCv daif -PAN -UAO)
>> [    2.298792] pc : mtk_read_temp+0xb8/0x1c8
>> [    2.302793] lr : mtk_read_temp+0x7c/0x1c8
>> [    2.306794] sp : ffff80001003b930
>> [    2.310100] x29: ffff80001003b930 x28: 0000000000000000
>> [    2.315404] x27: 0000000000000002 x26: ffff0000f9550b10
>> [    2.320709] x25: ffff0000f9550a80 x24: 0000000000000090
>> [    2.326014] x23: ffff80001003ba24 x22: 00000000610344c0
>> [    2.331318] x21: 0000000000002710 x20: 00000000000001f4
>> [    2.336622] x19: 0000000000030d40 x18: ffff800011742ec0
>> [    2.341926] x17: 0000000000000001 x16: 0000000000000001
>> [    2.347230] x15: ffffffffffffffff x14: ffffff0000000000
>> [    2.352535] x13: ffffffffffffffff x12: 0000000000000028
>> [    2.357839] x11: 0000000000000003 x10: ffff800011295ec8
>> [    2.363143] x9 : 000000000000291b x8 : 0000000000000002
>> [    2.368447] x7 : 00000000000000a8 x6 : 0000000000000004
>> [    2.373751] x5 : 0000000000000000 x4 : ffff800011295cb0
>> [    2.379055] x3 : 0000000000000002 x2 : ffff8000125f5001
>> [    2.384359] x1 : 0000000000000001 x0 : ffff0000f9550a80
>> [    2.389665] Call trace:
>> [    2.392105]  mtk_read_temp+0xb8/0x1c8
>> [    2.395760]  of_thermal_get_temp+0x2c/0x40
>> [    2.399849]  thermal_zone_get_temp+0x78/0x160
>> [    2.404198]  thermal_zone_device_update.part.0+0x3c/0x1f8
>> [    2.409589]  thermal_zone_device_update+0x34/0x48
>> [    2.414286]  of_thermal_set_mode+0x58/0x88
>> [    2.418375]  thermal_zone_of_sensor_register+0x1a8/0x1d8
>> [    2.423679]  devm_thermal_zone_of_sensor_register+0x64/0xb0
>> [    2.429242]  mtk_thermal_probe+0x690/0x7d0
>> [    2.433333]  platform_drv_probe+0x5c/0xb0
>> [    2.437335]  really_probe+0xe4/0x448
>> [    2.440901]  driver_probe_device+0xe8/0x140
>> [    2.445077]  device_driver_attach+0x7c/0x88
>> [    2.449252]  __driver_attach+0xac/0x178
>> [    2.453082]  bus_for_each_dev+0x78/0xc8
>> [    2.456909]  driver_attach+0x2c/0x38
>> [    2.460476]  bus_add_driver+0x14c/0x230
>> [    2.464304]  driver_register+0x6c/0x128
>> [    2.468131]  __platform_driver_register+0x50/0x60
>> [    2.472831]  mtk_thermal_driver_init+0x24/0x30
>> [    2.477268]  do_one_initcall+0x50/0x298
>> [    2.481098]  kernel_init_freeable+0x1ec/0x264
>> [    2.485450]  kernel_init+0x1c/0x110
>> [    2.488931]  ret_from_fork+0x10/0x1c
>> [    2.492502] Code: f9401081 f9400402 b8a67821 8b010042 (b9400042)
>> [    2.498599] ---[ end trace e43e3105ed27dc99 ]---
>> [    2.503367] Kernel panic - not syncing: Attempted to kill init!
>> exitcode=0x0000000b
>> [    2.511020] SMP: stopping secondary CPUs
>> [    2.514941] Kernel Offset: disabled
>> [    2.518421] CPU features: 0x090002,25006005
>> [    2.522595] Memory Limit: none
>> [    2.525644] ---[ end Kernel panic - not syncing: Attempted to kill init!
>> exitcode=0x0000000b ]---
>>

_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [BUG] Cannot boot on MT8173 if Mediatek thermal is enabled
  2020-05-20 15:25   ` Enric Balletbo i Serra
@ 2020-05-20 16:12     ` Enric Balletbo i Serra
  2020-05-28  2:59       ` Michael Kao
  0 siblings, 1 reply; 8+ messages in thread
From: Enric Balletbo i Serra @ 2020-05-20 16:12 UTC (permalink / raw)
  To: Matthias Brugger, moderated list:ARM/Mediatek SoC support
  Cc: renze, michael.kao, drinkcat, roger.lu, Hsin-Yi Wang

Hi Matthias et all,

On 20/5/20 17:25, Enric Balletbo i Serra wrote:
> 
> 
> On 20/5/20 17:21, Matthias Brugger wrote:
>>
>>
>> On 20/05/2020 17:09, Enric Balletbo i Serra wrote:
>>> Dear all,
>>>
>>> I've been testing the Acer Chromebook R 13 (elm - MT8173) for a while. Today I
>>> enabled the Mediatek thermal driver (CONFIG_MTK_THERMAL=y) and I started to get
>>> that hang [1]
>>>
>>
>> Did you try to bisect to find out what broke it?
>>
> 
> I don't even know if this worked at some point, I was running/testing my kernels
> with CONFIG_MTK_THERMAL disabled. From the log doesn't seem to have a lot of
> changes so I suspect this issue is there since long time.
> 

So the commit that introduces the problem is:

commit eb9aecd90d1a39601e91cd08b90d5fee51d321a6
Author: Michael Kao <michael.kao@mediatek.com>
Date:   Fri Feb 1 15:38:07 2019 +0800

    thermal: mediatek: fix register index error

    The index of msr and adcpnp should match the sensor
    which belongs to the selected bank in the for loop.

    Fixes: b7cf0053738c ("thermal: Add Mediatek thermal driver for mt2701.")
    Signed-off-by: Michael Kao <michael.kao@mediatek.com>
    Signed-off-by: Eduardo Valentin <edubezval@gmail.com>


> 
>> Regards,
>> Matthias
>>
>>> The stacktrace points point to this function:
>>>
>>> static int mtk_thermal_bank_temperature(struct mtk_thermal_bank *bank)
>>>
>>> More precisely to this call:
>>>
>>> 		raw = readl(mt->thermal_base +
>>> 			    conf->msr[conf->bank_data[bank->id].sensors[i]]);
>>>
>>> this call, is in a loop and ends trying to access to conf->msr[4]
>>> (conf->msr[MT8173_TSABB]) which doesn't exist as per the following struct
>>>
>>> static const int mt8173_msr[MT8173_NUM_SENSORS_PER_ZONE] = {
>>> 	TEMP_MSR0, TEMP_MSR1, TEMP_MSR2, TEMP_MSR3
>>> };
>>>
>>> I think the datasheet will help here to clarify what is happening but is not
>>> public, so I can really check. Anyway seems that or the mt8173_msr struct is
>>> wrong or the mt8173_bank_data is wrong or there is something else.
>>>
>>> Could anyone with the information or with this hardwware knowledge take a look,
>>> please.
>>>
>>> Thanks,
>>>  Enric
>>>
>>>
>>> [1]
>>> [    2.222488] Unable to handle kernel paging request at virtual address
>>> ffff8000125f5001
>>> [    2.230421] Mem abort info:
>>> [    2.233207]   ESR = 0x96000021
>>> [    2.236261]   EC = 0x25: DABT (current EL), IL = 32 bits
>>> [    2.241571]   SET = 0, FnV = 0
>>> [    2.244623]   EA = 0, S1PTW = 0
>>> [    2.247762] Data abort info:
>>> [    2.250640]   ISV = 0, ISS = 0x00000021
>>> [    2.254473]   CM = 0, WnR = 0
>>> [    2.257544] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000041850000
>>> [    2.264251] [ffff8000125f5001] pgd=000000013ffff003, pud=000000013fffe003,
>>> pmd=000000013fff9003, pte=006800001100b707
>>> [    2.274867] Internal error: Oops: 96000021 [#1] PREEMPT SMP
>>> [    2.280432] Modules linked in:
>>> [    2.283483] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.7.0-rc6+ #162
>>> [    2.289914] Hardware name: Google Elm (DT)
>>> [    2.294003] pstate: 20000005 (nzCv daif -PAN -UAO)
>>> [    2.298792] pc : mtk_read_temp+0xb8/0x1c8
>>> [    2.302793] lr : mtk_read_temp+0x7c/0x1c8
>>> [    2.306794] sp : ffff80001003b930
>>> [    2.310100] x29: ffff80001003b930 x28: 0000000000000000
>>> [    2.315404] x27: 0000000000000002 x26: ffff0000f9550b10
>>> [    2.320709] x25: ffff0000f9550a80 x24: 0000000000000090
>>> [    2.326014] x23: ffff80001003ba24 x22: 00000000610344c0
>>> [    2.331318] x21: 0000000000002710 x20: 00000000000001f4
>>> [    2.336622] x19: 0000000000030d40 x18: ffff800011742ec0
>>> [    2.341926] x17: 0000000000000001 x16: 0000000000000001
>>> [    2.347230] x15: ffffffffffffffff x14: ffffff0000000000
>>> [    2.352535] x13: ffffffffffffffff x12: 0000000000000028
>>> [    2.357839] x11: 0000000000000003 x10: ffff800011295ec8
>>> [    2.363143] x9 : 000000000000291b x8 : 0000000000000002
>>> [    2.368447] x7 : 00000000000000a8 x6 : 0000000000000004
>>> [    2.373751] x5 : 0000000000000000 x4 : ffff800011295cb0
>>> [    2.379055] x3 : 0000000000000002 x2 : ffff8000125f5001
>>> [    2.384359] x1 : 0000000000000001 x0 : ffff0000f9550a80
>>> [    2.389665] Call trace:
>>> [    2.392105]  mtk_read_temp+0xb8/0x1c8
>>> [    2.395760]  of_thermal_get_temp+0x2c/0x40
>>> [    2.399849]  thermal_zone_get_temp+0x78/0x160
>>> [    2.404198]  thermal_zone_device_update.part.0+0x3c/0x1f8
>>> [    2.409589]  thermal_zone_device_update+0x34/0x48
>>> [    2.414286]  of_thermal_set_mode+0x58/0x88
>>> [    2.418375]  thermal_zone_of_sensor_register+0x1a8/0x1d8
>>> [    2.423679]  devm_thermal_zone_of_sensor_register+0x64/0xb0
>>> [    2.429242]  mtk_thermal_probe+0x690/0x7d0
>>> [    2.433333]  platform_drv_probe+0x5c/0xb0
>>> [    2.437335]  really_probe+0xe4/0x448
>>> [    2.440901]  driver_probe_device+0xe8/0x140
>>> [    2.445077]  device_driver_attach+0x7c/0x88
>>> [    2.449252]  __driver_attach+0xac/0x178
>>> [    2.453082]  bus_for_each_dev+0x78/0xc8
>>> [    2.456909]  driver_attach+0x2c/0x38
>>> [    2.460476]  bus_add_driver+0x14c/0x230
>>> [    2.464304]  driver_register+0x6c/0x128
>>> [    2.468131]  __platform_driver_register+0x50/0x60
>>> [    2.472831]  mtk_thermal_driver_init+0x24/0x30
>>> [    2.477268]  do_one_initcall+0x50/0x298
>>> [    2.481098]  kernel_init_freeable+0x1ec/0x264
>>> [    2.485450]  kernel_init+0x1c/0x110
>>> [    2.488931]  ret_from_fork+0x10/0x1c
>>> [    2.492502] Code: f9401081 f9400402 b8a67821 8b010042 (b9400042)
>>> [    2.498599] ---[ end trace e43e3105ed27dc99 ]---
>>> [    2.503367] Kernel panic - not syncing: Attempted to kill init!
>>> exitcode=0x0000000b
>>> [    2.511020] SMP: stopping secondary CPUs
>>> [    2.514941] Kernel Offset: disabled
>>> [    2.518421] CPU features: 0x090002,25006005
>>> [    2.522595] Memory Limit: none
>>> [    2.525644] ---[ end Kernel panic - not syncing: Attempted to kill init!
>>> exitcode=0x0000000b ]---
>>>

_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [BUG] Cannot boot on MT8173 if Mediatek thermal is enabled
  2020-05-20 16:12     ` Enric Balletbo i Serra
@ 2020-05-28  2:59       ` Michael Kao
  2020-05-28  8:08         ` Enric Balletbo i Serra
  0 siblings, 1 reply; 8+ messages in thread
From: Michael Kao @ 2020-05-28  2:59 UTC (permalink / raw)
  To: Enric Balletbo i Serra
  Cc: drinkcat, roger.lu, renze,
	moderated list:ARM/Mediatek SoC support, Hsin-Yi Wang,
	Matthias Brugger

On Wed, 2020-05-20 at 18:12 +0200, Enric Balletbo i Serra wrote:
> Hi Matthias et all,
> 
> On 20/5/20 17:25, Enric Balletbo i Serra wrote:
> > 
> > 
> > On 20/5/20 17:21, Matthias Brugger wrote:
> >>
> >>
> >> On 20/05/2020 17:09, Enric Balletbo i Serra wrote:
> >>> Dear all,
> >>>
> >>> I've been testing the Acer Chromebook R 13 (elm - MT8173) for a while. Today I
> >>> enabled the Mediatek thermal driver (CONFIG_MTK_THERMAL=y) and I started to get
> >>> that hang [1]
> >>>
> >>
> >> Did you try to bisect to find out what broke it?
> >>
> > 
> > I don't even know if this worked at some point, I was running/testing my kernels
> > with CONFIG_MTK_THERMAL disabled. From the log doesn't seem to have a lot of
> > changes so I suspect this issue is there since long time.
> > 
> 
> So the commit that introduces the problem is:
> 
> commit eb9aecd90d1a39601e91cd08b90d5fee51d321a6
> Author: Michael Kao <michael.kao@mediatek.com>
> Date:   Fri Feb 1 15:38:07 2019 +0800
> 
>     thermal: mediatek: fix register index error
> 
>     The index of msr and adcpnp should match the sensor
>     which belongs to the selected bank in the for loop.
> 
>     Fixes: b7cf0053738c ("thermal: Add Mediatek thermal driver for mt2701.")
>     Signed-off-by: Michael Kao <michael.kao@mediatek.com>
>     Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
> 
> 
> > 
> >> Regards,
> >> Matthias
> >>
> >>> The stacktrace points point to this function:
> >>>
> >>> static int mtk_thermal_bank_temperature(struct mtk_thermal_bank *bank)
> >>>
> >>> More precisely to this call:
> >>>
> >>> 		raw = readl(mt->thermal_base +
> >>> 			    conf->msr[conf->bank_data[bank->id].sensors[i]]);
> >>>
> >>> this call, is in a loop and ends trying to access to conf->msr[4]
> >>> (conf->msr[MT8173_TSABB]) which doesn't exist as per the following struct
> >>>
> >>> static const int mt8173_msr[MT8173_NUM_SENSORS_PER_ZONE] = {
> >>> 	TEMP_MSR0, TEMP_MSR1, TEMP_MSR2, TEMP_MSR3
> >>> };
> >>>
> >>> I think the datasheet will help here to clarify what is happening but is not
> >>> public, so I can really check. Anyway seems that or the mt8173_msr struct is
> >>> wrong or the mt8173_bank_data is wrong or there is something else.
> >>>
> >>> Could anyone with the information or with this hardwware knowledge take a look,
> >>> please.
> >>>
> >>> Thanks,
> >>>  Enric
> >>>
> >>>
> >>> [1]
> >>> [    2.222488] Unable to handle kernel paging request at virtual address
> >>> ffff8000125f5001
> >>> [    2.230421] Mem abort info:
> >>> [    2.233207]   ESR = 0x96000021
> >>> [    2.236261]   EC = 0x25: DABT (current EL), IL = 32 bits
> >>> [    2.241571]   SET = 0, FnV = 0
> >>> [    2.244623]   EA = 0, S1PTW = 0
> >>> [    2.247762] Data abort info:
> >>> [    2.250640]   ISV = 0, ISS = 0x00000021
> >>> [    2.254473]   CM = 0, WnR = 0
> >>> [    2.257544] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000041850000
> >>> [    2.264251] [ffff8000125f5001] pgd=000000013ffff003, pud=000000013fffe003,
> >>> pmd=000000013fff9003, pte=006800001100b707
> >>> [    2.274867] Internal error: Oops: 96000021 [#1] PREEMPT SMP
> >>> [    2.280432] Modules linked in:
> >>> [    2.283483] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.7.0-rc6+ #162
> >>> [    2.289914] Hardware name: Google Elm (DT)
> >>> [    2.294003] pstate: 20000005 (nzCv daif -PAN -UAO)
> >>> [    2.298792] pc : mtk_read_temp+0xb8/0x1c8
> >>> [    2.302793] lr : mtk_read_temp+0x7c/0x1c8
> >>> [    2.306794] sp : ffff80001003b930
> >>> [    2.310100] x29: ffff80001003b930 x28: 0000000000000000
> >>> [    2.315404] x27: 0000000000000002 x26: ffff0000f9550b10
> >>> [    2.320709] x25: ffff0000f9550a80 x24: 0000000000000090
> >>> [    2.326014] x23: ffff80001003ba24 x22: 00000000610344c0
> >>> [    2.331318] x21: 0000000000002710 x20: 00000000000001f4
> >>> [    2.336622] x19: 0000000000030d40 x18: ffff800011742ec0
> >>> [    2.341926] x17: 0000000000000001 x16: 0000000000000001
> >>> [    2.347230] x15: ffffffffffffffff x14: ffffff0000000000
> >>> [    2.352535] x13: ffffffffffffffff x12: 0000000000000028
> >>> [    2.357839] x11: 0000000000000003 x10: ffff800011295ec8
> >>> [    2.363143] x9 : 000000000000291b x8 : 0000000000000002
> >>> [    2.368447] x7 : 00000000000000a8 x6 : 0000000000000004
> >>> [    2.373751] x5 : 0000000000000000 x4 : ffff800011295cb0
> >>> [    2.379055] x3 : 0000000000000002 x2 : ffff8000125f5001
> >>> [    2.384359] x1 : 0000000000000001 x0 : ffff0000f9550a80
> >>> [    2.389665] Call trace:
> >>> [    2.392105]  mtk_read_temp+0xb8/0x1c8
> >>> [    2.395760]  of_thermal_get_temp+0x2c/0x40
> >>> [    2.399849]  thermal_zone_get_temp+0x78/0x160
> >>> [    2.404198]  thermal_zone_device_update.part.0+0x3c/0x1f8
> >>> [    2.409589]  thermal_zone_device_update+0x34/0x48
> >>> [    2.414286]  of_thermal_set_mode+0x58/0x88
> >>> [    2.418375]  thermal_zone_of_sensor_register+0x1a8/0x1d8
> >>> [    2.423679]  devm_thermal_zone_of_sensor_register+0x64/0xb0
> >>> [    2.429242]  mtk_thermal_probe+0x690/0x7d0
> >>> [    2.433333]  platform_drv_probe+0x5c/0xb0
> >>> [    2.437335]  really_probe+0xe4/0x448
> >>> [    2.440901]  driver_probe_device+0xe8/0x140
> >>> [    2.445077]  device_driver_attach+0x7c/0x88
> >>> [    2.449252]  __driver_attach+0xac/0x178
> >>> [    2.453082]  bus_for_each_dev+0x78/0xc8
> >>> [    2.456909]  driver_attach+0x2c/0x38
> >>> [    2.460476]  bus_add_driver+0x14c/0x230
> >>> [    2.464304]  driver_register+0x6c/0x128
> >>> [    2.468131]  __platform_driver_register+0x50/0x60
> >>> [    2.472831]  mtk_thermal_driver_init+0x24/0x30
> >>> [    2.477268]  do_one_initcall+0x50/0x298
> >>> [    2.481098]  kernel_init_freeable+0x1ec/0x264
> >>> [    2.485450]  kernel_init+0x1c/0x110
> >>> [    2.488931]  ret_from_fork+0x10/0x1c
> >>> [    2.492502] Code: f9401081 f9400402 b8a67821 8b010042 (b9400042)
> >>> [    2.498599] ---[ end trace e43e3105ed27dc99 ]---
> >>> [    2.503367] Kernel panic - not syncing: Attempted to kill init!
> >>> exitcode=0x0000000b
> >>> [    2.511020] SMP: stopping secondary CPUs
> >>> [    2.514941] Kernel Offset: disabled
> >>> [    2.518421] CPU features: 0x090002,25006005
> >>> [    2.522595] Memory Limit: none
> >>> [    2.525644] ---[ end Kernel panic - not syncing: Attempted to kill init!
> >>> exitcode=0x0000000b ]---
> >>>

> Hi Enric,
I will help to fix this crash. If it is urgent, you can revert the patch
locally first for mt8173.

The last sensor in mt8173_bank_data[2] is MT8173_TSABB.
Its index is 4. But there is not index 4 in mt8173_msr and
mt8173_adcpnp.
That is the root cause for your reference.

static const struct mtk_thermal_data mt8173_thermal_data = {
	.auxadc_channel = MT8173_TEMP_AUXADC_CHANNEL,
	.num_banks = MT8173_NUM_ZONES,
	.num_sensors = MT8173_NUM_SENSORS,
	.vts_index = mt8173_vts_index,
	.cali_val = MT8173_CALIBRATION,
	.num_controller = MT8173_NUM_CONTROLLER,
	.controller_offset = mt8173_tc_offset,
	.need_switch_bank = true,
	.bank_data = {
		{
			.num_sensors = 2,
			.sensors = mt8173_bank_data[0],
		}, {
			.num_sensors = 2,
			.sensors = mt8173_bank_data[1],
		}, {
			.num_sensors = 3,
			.sensors = mt8173_bank_data[2],
		}, {
			.num_sensors = 1,
			.sensors = mt8173_bank_data[3],
		},
	},
	.msr = mt8173_msr,
	.adcpnp = mt8173_adcpnp,
	.sensor_mux_values = mt8173_mux_values,
};



/* MT8173 thermal sensor data */
static const int mt8173_bank_data[MT8173_NUM_ZONES][3] = {
	{ MT8173_TS2, MT8173_TS3 },
	{ MT8173_TS2, MT8173_TS4 },
	{ MT8173_TS1, MT8173_TS2, MT8173_TSABB },
	{ MT8173_TS2 },
};

static const int mt8173_msr[MT8173_NUM_SENSORS_PER_ZONE] = {
	TEMP_MSR0, TEMP_MSR1, TEMP_MSR2, TEMP_MSR3
};

static const int mt8173_adcpnp[MT8173_NUM_SENSORS_PER_ZONE] = {
	TEMP_ADCPNP0, TEMP_ADCPNP1, TEMP_ADCPNP2, TEMP_ADCPNP3
};

/* MT8173 thermal sensors */
#define MT8173_TS1	0
#define MT8173_TS2	1
#define MT8173_TS3	2
#define MT8173_TS4	3
#define MT8173_TSABB	4

Best Regards,
Michael
> _______________________________________________
> Linux-mediatek mailing list
> Linux-mediatek@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-mediatek

_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [BUG] Cannot boot on MT8173 if Mediatek thermal is enabled
  2020-05-28  2:59       ` Michael Kao
@ 2020-05-28  8:08         ` Enric Balletbo i Serra
  2020-05-28 14:20           ` Matthias Brugger
  0 siblings, 1 reply; 8+ messages in thread
From: Enric Balletbo i Serra @ 2020-05-28  8:08 UTC (permalink / raw)
  To: Michael Kao
  Cc: drinkcat, roger.lu, renze,
	moderated list:ARM/Mediatek SoC support, Hsin-Yi Wang,
	Matthias Brugger

Hi Michael,

On 28/5/20 4:59, Michael Kao wrote:
> On Wed, 2020-05-20 at 18:12 +0200, Enric Balletbo i Serra wrote:
>> Hi Matthias et all,
>>
>> On 20/5/20 17:25, Enric Balletbo i Serra wrote:
>>>
>>>
>>> On 20/5/20 17:21, Matthias Brugger wrote:
>>>>
>>>>
>>>> On 20/05/2020 17:09, Enric Balletbo i Serra wrote:
>>>>> Dear all,
>>>>>
>>>>> I've been testing the Acer Chromebook R 13 (elm - MT8173) for a while. Today I
>>>>> enabled the Mediatek thermal driver (CONFIG_MTK_THERMAL=y) and I started to get
>>>>> that hang [1]
>>>>>
>>>>
>>>> Did you try to bisect to find out what broke it?
>>>>
>>>
>>> I don't even know if this worked at some point, I was running/testing my kernels
>>> with CONFIG_MTK_THERMAL disabled. From the log doesn't seem to have a lot of
>>> changes so I suspect this issue is there since long time.
>>>
>>
>> So the commit that introduces the problem is:
>>
>> commit eb9aecd90d1a39601e91cd08b90d5fee51d321a6
>> Author: Michael Kao <michael.kao@mediatek.com>
>> Date:   Fri Feb 1 15:38:07 2019 +0800
>>
>>     thermal: mediatek: fix register index error
>>
>>     The index of msr and adcpnp should match the sensor
>>     which belongs to the selected bank in the for loop.
>>
>>     Fixes: b7cf0053738c ("thermal: Add Mediatek thermal driver for mt2701.")
>>     Signed-off-by: Michael Kao <michael.kao@mediatek.com>
>>     Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
>>
>>
>>>
>>>> Regards,
>>>> Matthias
>>>>
>>>>> The stacktrace points point to this function:
>>>>>
>>>>> static int mtk_thermal_bank_temperature(struct mtk_thermal_bank *bank)
>>>>>
>>>>> More precisely to this call:
>>>>>
>>>>> 		raw = readl(mt->thermal_base +
>>>>> 			    conf->msr[conf->bank_data[bank->id].sensors[i]]);
>>>>>
>>>>> this call, is in a loop and ends trying to access to conf->msr[4]
>>>>> (conf->msr[MT8173_TSABB]) which doesn't exist as per the following struct
>>>>>
>>>>> static const int mt8173_msr[MT8173_NUM_SENSORS_PER_ZONE] = {
>>>>> 	TEMP_MSR0, TEMP_MSR1, TEMP_MSR2, TEMP_MSR3
>>>>> };
>>>>>
>>>>> I think the datasheet will help here to clarify what is happening but is not
>>>>> public, so I can really check. Anyway seems that or the mt8173_msr struct is
>>>>> wrong or the mt8173_bank_data is wrong or there is something else.
>>>>>
>>>>> Could anyone with the information or with this hardwware knowledge take a look,
>>>>> please.
>>>>>
>>>>> Thanks,
>>>>>  Enric
>>>>>
>>>>>
>>>>> [1]
>>>>> [    2.222488] Unable to handle kernel paging request at virtual address
>>>>> ffff8000125f5001
>>>>> [    2.230421] Mem abort info:
>>>>> [    2.233207]   ESR = 0x96000021
>>>>> [    2.236261]   EC = 0x25: DABT (current EL), IL = 32 bits
>>>>> [    2.241571]   SET = 0, FnV = 0
>>>>> [    2.244623]   EA = 0, S1PTW = 0
>>>>> [    2.247762] Data abort info:
>>>>> [    2.250640]   ISV = 0, ISS = 0x00000021
>>>>> [    2.254473]   CM = 0, WnR = 0
>>>>> [    2.257544] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000041850000
>>>>> [    2.264251] [ffff8000125f5001] pgd=000000013ffff003, pud=000000013fffe003,
>>>>> pmd=000000013fff9003, pte=006800001100b707
>>>>> [    2.274867] Internal error: Oops: 96000021 [#1] PREEMPT SMP
>>>>> [    2.280432] Modules linked in:
>>>>> [    2.283483] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.7.0-rc6+ #162
>>>>> [    2.289914] Hardware name: Google Elm (DT)
>>>>> [    2.294003] pstate: 20000005 (nzCv daif -PAN -UAO)
>>>>> [    2.298792] pc : mtk_read_temp+0xb8/0x1c8
>>>>> [    2.302793] lr : mtk_read_temp+0x7c/0x1c8
>>>>> [    2.306794] sp : ffff80001003b930
>>>>> [    2.310100] x29: ffff80001003b930 x28: 0000000000000000
>>>>> [    2.315404] x27: 0000000000000002 x26: ffff0000f9550b10
>>>>> [    2.320709] x25: ffff0000f9550a80 x24: 0000000000000090
>>>>> [    2.326014] x23: ffff80001003ba24 x22: 00000000610344c0
>>>>> [    2.331318] x21: 0000000000002710 x20: 00000000000001f4
>>>>> [    2.336622] x19: 0000000000030d40 x18: ffff800011742ec0
>>>>> [    2.341926] x17: 0000000000000001 x16: 0000000000000001
>>>>> [    2.347230] x15: ffffffffffffffff x14: ffffff0000000000
>>>>> [    2.352535] x13: ffffffffffffffff x12: 0000000000000028
>>>>> [    2.357839] x11: 0000000000000003 x10: ffff800011295ec8
>>>>> [    2.363143] x9 : 000000000000291b x8 : 0000000000000002
>>>>> [    2.368447] x7 : 00000000000000a8 x6 : 0000000000000004
>>>>> [    2.373751] x5 : 0000000000000000 x4 : ffff800011295cb0
>>>>> [    2.379055] x3 : 0000000000000002 x2 : ffff8000125f5001
>>>>> [    2.384359] x1 : 0000000000000001 x0 : ffff0000f9550a80
>>>>> [    2.389665] Call trace:
>>>>> [    2.392105]  mtk_read_temp+0xb8/0x1c8
>>>>> [    2.395760]  of_thermal_get_temp+0x2c/0x40
>>>>> [    2.399849]  thermal_zone_get_temp+0x78/0x160
>>>>> [    2.404198]  thermal_zone_device_update.part.0+0x3c/0x1f8
>>>>> [    2.409589]  thermal_zone_device_update+0x34/0x48
>>>>> [    2.414286]  of_thermal_set_mode+0x58/0x88
>>>>> [    2.418375]  thermal_zone_of_sensor_register+0x1a8/0x1d8
>>>>> [    2.423679]  devm_thermal_zone_of_sensor_register+0x64/0xb0
>>>>> [    2.429242]  mtk_thermal_probe+0x690/0x7d0
>>>>> [    2.433333]  platform_drv_probe+0x5c/0xb0
>>>>> [    2.437335]  really_probe+0xe4/0x448
>>>>> [    2.440901]  driver_probe_device+0xe8/0x140
>>>>> [    2.445077]  device_driver_attach+0x7c/0x88
>>>>> [    2.449252]  __driver_attach+0xac/0x178
>>>>> [    2.453082]  bus_for_each_dev+0x78/0xc8
>>>>> [    2.456909]  driver_attach+0x2c/0x38
>>>>> [    2.460476]  bus_add_driver+0x14c/0x230
>>>>> [    2.464304]  driver_register+0x6c/0x128
>>>>> [    2.468131]  __platform_driver_register+0x50/0x60
>>>>> [    2.472831]  mtk_thermal_driver_init+0x24/0x30
>>>>> [    2.477268]  do_one_initcall+0x50/0x298
>>>>> [    2.481098]  kernel_init_freeable+0x1ec/0x264
>>>>> [    2.485450]  kernel_init+0x1c/0x110
>>>>> [    2.488931]  ret_from_fork+0x10/0x1c
>>>>> [    2.492502] Code: f9401081 f9400402 b8a67821 8b010042 (b9400042)
>>>>> [    2.498599] ---[ end trace e43e3105ed27dc99 ]---
>>>>> [    2.503367] Kernel panic - not syncing: Attempted to kill init!
>>>>> exitcode=0x0000000b
>>>>> [    2.511020] SMP: stopping secondary CPUs
>>>>> [    2.514941] Kernel Offset: disabled
>>>>> [    2.518421] CPU features: 0x090002,25006005
>>>>> [    2.522595] Memory Limit: none
>>>>> [    2.525644] ---[ end Kernel panic - not syncing: Attempted to kill init!
>>>>> exitcode=0x0000000b ]---
>>>>>
> 
>> Hi Enric,
> I will help to fix this crash. If it is urgent, you can revert the patch
> locally first for mt8173.
> 

Thanks, the reverted patch is what I am carrying ;-)

It'd be nice have this fixed for next MR or during the upcoming release cycle,
it will probably start next week. During the next merge window will land the
support for MT8173 Elm and Hana, so those boards will be affected by this and
will break. Actually, there is no much boards supported in mainline using MT8183
(only EVB), so in the worse case I can send a revert or a partial revert of the
patch.

Thanks,
 Enric

> The last sensor in mt8173_bank_data[2] is MT8173_TSABB.
> Its index is 4. But there is not index 4 in mt8173_msr and
> mt8173_adcpnp.
> That is the root cause for your reference.
> 
> static const struct mtk_thermal_data mt8173_thermal_data = {
> 	.auxadc_channel = MT8173_TEMP_AUXADC_CHANNEL,
> 	.num_banks = MT8173_NUM_ZONES,
> 	.num_sensors = MT8173_NUM_SENSORS,
> 	.vts_index = mt8173_vts_index,
> 	.cali_val = MT8173_CALIBRATION,
> 	.num_controller = MT8173_NUM_CONTROLLER,
> 	.controller_offset = mt8173_tc_offset,
> 	.need_switch_bank = true,
> 	.bank_data = {
> 		{
> 			.num_sensors = 2,
> 			.sensors = mt8173_bank_data[0],
> 		}, {
> 			.num_sensors = 2,
> 			.sensors = mt8173_bank_data[1],
> 		}, {
> 			.num_sensors = 3,
> 			.sensors = mt8173_bank_data[2],
> 		}, {
> 			.num_sensors = 1,
> 			.sensors = mt8173_bank_data[3],
> 		},
> 	},
> 	.msr = mt8173_msr,
> 	.adcpnp = mt8173_adcpnp,
> 	.sensor_mux_values = mt8173_mux_values,
> };
> 
> 
> 
> /* MT8173 thermal sensor data */
> static const int mt8173_bank_data[MT8173_NUM_ZONES][3] = {
> 	{ MT8173_TS2, MT8173_TS3 },
> 	{ MT8173_TS2, MT8173_TS4 },
> 	{ MT8173_TS1, MT8173_TS2, MT8173_TSABB },
> 	{ MT8173_TS2 },
> };
> 
> static const int mt8173_msr[MT8173_NUM_SENSORS_PER_ZONE] = {
> 	TEMP_MSR0, TEMP_MSR1, TEMP_MSR2, TEMP_MSR3
> };
> 
> static const int mt8173_adcpnp[MT8173_NUM_SENSORS_PER_ZONE] = {
> 	TEMP_ADCPNP0, TEMP_ADCPNP1, TEMP_ADCPNP2, TEMP_ADCPNP3
> };
> 
> /* MT8173 thermal sensors */
> #define MT8173_TS1	0
> #define MT8173_TS2	1
> #define MT8173_TS3	2
> #define MT8173_TS4	3
> #define MT8173_TSABB	4
> 
> Best Regards,
> Michael
>> _______________________________________________
>> Linux-mediatek mailing list
>> Linux-mediatek@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-mediatek
> 

_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [BUG] Cannot boot on MT8173 if Mediatek thermal is enabled
  2020-05-28  8:08         ` Enric Balletbo i Serra
@ 2020-05-28 14:20           ` Matthias Brugger
  2020-07-01 11:07             ` Enric Balletbo i Serra
  0 siblings, 1 reply; 8+ messages in thread
From: Matthias Brugger @ 2020-05-28 14:20 UTC (permalink / raw)
  To: Enric Balletbo i Serra, Michael Kao
  Cc: roger.lu, Hsin-Yi Wang, drinkcat,
	moderated list:ARM/Mediatek SoC support, renze



On 28/05/2020 10:08, Enric Balletbo i Serra wrote:
> Hi Michael,
> 
> On 28/5/20 4:59, Michael Kao wrote:
>> On Wed, 2020-05-20 at 18:12 +0200, Enric Balletbo i Serra wrote:
>>> Hi Matthias et all,
>>>
>>> On 20/5/20 17:25, Enric Balletbo i Serra wrote:
>>>>
>>>>
>>>> On 20/5/20 17:21, Matthias Brugger wrote:
>>>>>
>>>>>
>>>>> On 20/05/2020 17:09, Enric Balletbo i Serra wrote:
>>>>>> Dear all,
>>>>>>
>>>>>> I've been testing the Acer Chromebook R 13 (elm - MT8173) for a while. Today I
>>>>>> enabled the Mediatek thermal driver (CONFIG_MTK_THERMAL=y) and I started to get
>>>>>> that hang [1]
>>>>>>
>>>>>
>>>>> Did you try to bisect to find out what broke it?
>>>>>
>>>>
>>>> I don't even know if this worked at some point, I was running/testing my kernels
>>>> with CONFIG_MTK_THERMAL disabled. From the log doesn't seem to have a lot of
>>>> changes so I suspect this issue is there since long time.
>>>>
>>>
>>> So the commit that introduces the problem is:
>>>
>>> commit eb9aecd90d1a39601e91cd08b90d5fee51d321a6
>>> Author: Michael Kao <michael.kao@mediatek.com>
>>> Date:   Fri Feb 1 15:38:07 2019 +0800
>>>
>>>     thermal: mediatek: fix register index error
>>>
>>>     The index of msr and adcpnp should match the sensor
>>>     which belongs to the selected bank in the for loop.
>>>
>>>     Fixes: b7cf0053738c ("thermal: Add Mediatek thermal driver for mt2701.")
>>>     Signed-off-by: Michael Kao <michael.kao@mediatek.com>
>>>     Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
>>>
>>>
>>>>
>>>>> Regards,
>>>>> Matthias
>>>>>
>>>>>> The stacktrace points point to this function:
>>>>>>
>>>>>> static int mtk_thermal_bank_temperature(struct mtk_thermal_bank *bank)
>>>>>>
>>>>>> More precisely to this call:
>>>>>>
>>>>>> 		raw = readl(mt->thermal_base +
>>>>>> 			    conf->msr[conf->bank_data[bank->id].sensors[i]]);
>>>>>>
>>>>>> this call, is in a loop and ends trying to access to conf->msr[4]
>>>>>> (conf->msr[MT8173_TSABB]) which doesn't exist as per the following struct
>>>>>>
>>>>>> static const int mt8173_msr[MT8173_NUM_SENSORS_PER_ZONE] = {
>>>>>> 	TEMP_MSR0, TEMP_MSR1, TEMP_MSR2, TEMP_MSR3
>>>>>> };
>>>>>>
>>>>>> I think the datasheet will help here to clarify what is happening but is not
>>>>>> public, so I can really check. Anyway seems that or the mt8173_msr struct is
>>>>>> wrong or the mt8173_bank_data is wrong or there is something else.
>>>>>>
>>>>>> Could anyone with the information or with this hardwware knowledge take a look,
>>>>>> please.
>>>>>>
>>>>>> Thanks,
>>>>>>  Enric
>>>>>>
>>>>>>
>>>>>> [1]
>>>>>> [    2.222488] Unable to handle kernel paging request at virtual address
>>>>>> ffff8000125f5001
>>>>>> [    2.230421] Mem abort info:
>>>>>> [    2.233207]   ESR = 0x96000021
>>>>>> [    2.236261]   EC = 0x25: DABT (current EL), IL = 32 bits
>>>>>> [    2.241571]   SET = 0, FnV = 0
>>>>>> [    2.244623]   EA = 0, S1PTW = 0
>>>>>> [    2.247762] Data abort info:
>>>>>> [    2.250640]   ISV = 0, ISS = 0x00000021
>>>>>> [    2.254473]   CM = 0, WnR = 0
>>>>>> [    2.257544] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000041850000
>>>>>> [    2.264251] [ffff8000125f5001] pgd=000000013ffff003, pud=000000013fffe003,
>>>>>> pmd=000000013fff9003, pte=006800001100b707
>>>>>> [    2.274867] Internal error: Oops: 96000021 [#1] PREEMPT SMP
>>>>>> [    2.280432] Modules linked in:
>>>>>> [    2.283483] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.7.0-rc6+ #162
>>>>>> [    2.289914] Hardware name: Google Elm (DT)
>>>>>> [    2.294003] pstate: 20000005 (nzCv daif -PAN -UAO)
>>>>>> [    2.298792] pc : mtk_read_temp+0xb8/0x1c8
>>>>>> [    2.302793] lr : mtk_read_temp+0x7c/0x1c8
>>>>>> [    2.306794] sp : ffff80001003b930
>>>>>> [    2.310100] x29: ffff80001003b930 x28: 0000000000000000
>>>>>> [    2.315404] x27: 0000000000000002 x26: ffff0000f9550b10
>>>>>> [    2.320709] x25: ffff0000f9550a80 x24: 0000000000000090
>>>>>> [    2.326014] x23: ffff80001003ba24 x22: 00000000610344c0
>>>>>> [    2.331318] x21: 0000000000002710 x20: 00000000000001f4
>>>>>> [    2.336622] x19: 0000000000030d40 x18: ffff800011742ec0
>>>>>> [    2.341926] x17: 0000000000000001 x16: 0000000000000001
>>>>>> [    2.347230] x15: ffffffffffffffff x14: ffffff0000000000
>>>>>> [    2.352535] x13: ffffffffffffffff x12: 0000000000000028
>>>>>> [    2.357839] x11: 0000000000000003 x10: ffff800011295ec8
>>>>>> [    2.363143] x9 : 000000000000291b x8 : 0000000000000002
>>>>>> [    2.368447] x7 : 00000000000000a8 x6 : 0000000000000004
>>>>>> [    2.373751] x5 : 0000000000000000 x4 : ffff800011295cb0
>>>>>> [    2.379055] x3 : 0000000000000002 x2 : ffff8000125f5001
>>>>>> [    2.384359] x1 : 0000000000000001 x0 : ffff0000f9550a80
>>>>>> [    2.389665] Call trace:
>>>>>> [    2.392105]  mtk_read_temp+0xb8/0x1c8
>>>>>> [    2.395760]  of_thermal_get_temp+0x2c/0x40
>>>>>> [    2.399849]  thermal_zone_get_temp+0x78/0x160
>>>>>> [    2.404198]  thermal_zone_device_update.part.0+0x3c/0x1f8
>>>>>> [    2.409589]  thermal_zone_device_update+0x34/0x48
>>>>>> [    2.414286]  of_thermal_set_mode+0x58/0x88
>>>>>> [    2.418375]  thermal_zone_of_sensor_register+0x1a8/0x1d8
>>>>>> [    2.423679]  devm_thermal_zone_of_sensor_register+0x64/0xb0
>>>>>> [    2.429242]  mtk_thermal_probe+0x690/0x7d0
>>>>>> [    2.433333]  platform_drv_probe+0x5c/0xb0
>>>>>> [    2.437335]  really_probe+0xe4/0x448
>>>>>> [    2.440901]  driver_probe_device+0xe8/0x140
>>>>>> [    2.445077]  device_driver_attach+0x7c/0x88
>>>>>> [    2.449252]  __driver_attach+0xac/0x178
>>>>>> [    2.453082]  bus_for_each_dev+0x78/0xc8
>>>>>> [    2.456909]  driver_attach+0x2c/0x38
>>>>>> [    2.460476]  bus_add_driver+0x14c/0x230
>>>>>> [    2.464304]  driver_register+0x6c/0x128
>>>>>> [    2.468131]  __platform_driver_register+0x50/0x60
>>>>>> [    2.472831]  mtk_thermal_driver_init+0x24/0x30
>>>>>> [    2.477268]  do_one_initcall+0x50/0x298
>>>>>> [    2.481098]  kernel_init_freeable+0x1ec/0x264
>>>>>> [    2.485450]  kernel_init+0x1c/0x110
>>>>>> [    2.488931]  ret_from_fork+0x10/0x1c
>>>>>> [    2.492502] Code: f9401081 f9400402 b8a67821 8b010042 (b9400042)
>>>>>> [    2.498599] ---[ end trace e43e3105ed27dc99 ]---
>>>>>> [    2.503367] Kernel panic - not syncing: Attempted to kill init!
>>>>>> exitcode=0x0000000b
>>>>>> [    2.511020] SMP: stopping secondary CPUs
>>>>>> [    2.514941] Kernel Offset: disabled
>>>>>> [    2.518421] CPU features: 0x090002,25006005
>>>>>> [    2.522595] Memory Limit: none
>>>>>> [    2.525644] ---[ end Kernel panic - not syncing: Attempted to kill init!
>>>>>> exitcode=0x0000000b ]---
>>>>>>
>>
>>> Hi Enric,
>> I will help to fix this crash. If it is urgent, you can revert the patch
>> locally first for mt8173.
>>
> 
> Thanks, the reverted patch is what I am carrying ;-)
> 

As Enric mentioned v4.8-rc1 will claim support for MT8173 based Chromebooks, so
we should make sure that a fix gets in in the first rc's (best would be rc2). If
I got that right, best would be if we have a fix in around 4 weeks.

Please let me know if you can't make it and we can revert the commit in the
rc-phase. Also I'd prefer a proper fix.

Enric what do you think?

Regards,
Matthias

> It'd be nice have this fixed for next MR or during the upcoming release cycle,
> it will probably start next week. During the next merge window will land the
> support for MT8173 Elm and Hana, so those boards will be affected by this and
> will break. Actually, there is no much boards supported in mainline using MT8183
> (only EVB), so in the worse case I can send a revert or a partial revert of the
> patch.
> 
> Thanks,
>  Enric
> 
>> The last sensor in mt8173_bank_data[2] is MT8173_TSABB.
>> Its index is 4. But there is not index 4 in mt8173_msr and
>> mt8173_adcpnp.
>> That is the root cause for your reference.
>>
>> static const struct mtk_thermal_data mt8173_thermal_data = {
>> 	.auxadc_channel = MT8173_TEMP_AUXADC_CHANNEL,
>> 	.num_banks = MT8173_NUM_ZONES,
>> 	.num_sensors = MT8173_NUM_SENSORS,
>> 	.vts_index = mt8173_vts_index,
>> 	.cali_val = MT8173_CALIBRATION,
>> 	.num_controller = MT8173_NUM_CONTROLLER,
>> 	.controller_offset = mt8173_tc_offset,
>> 	.need_switch_bank = true,
>> 	.bank_data = {
>> 		{
>> 			.num_sensors = 2,
>> 			.sensors = mt8173_bank_data[0],
>> 		}, {
>> 			.num_sensors = 2,
>> 			.sensors = mt8173_bank_data[1],
>> 		}, {
>> 			.num_sensors = 3,
>> 			.sensors = mt8173_bank_data[2],
>> 		}, {
>> 			.num_sensors = 1,
>> 			.sensors = mt8173_bank_data[3],
>> 		},
>> 	},
>> 	.msr = mt8173_msr,
>> 	.adcpnp = mt8173_adcpnp,
>> 	.sensor_mux_values = mt8173_mux_values,
>> };
>>
>>
>>
>> /* MT8173 thermal sensor data */
>> static const int mt8173_bank_data[MT8173_NUM_ZONES][3] = {
>> 	{ MT8173_TS2, MT8173_TS3 },
>> 	{ MT8173_TS2, MT8173_TS4 },
>> 	{ MT8173_TS1, MT8173_TS2, MT8173_TSABB },
>> 	{ MT8173_TS2 },
>> };
>>
>> static const int mt8173_msr[MT8173_NUM_SENSORS_PER_ZONE] = {
>> 	TEMP_MSR0, TEMP_MSR1, TEMP_MSR2, TEMP_MSR3
>> };
>>
>> static const int mt8173_adcpnp[MT8173_NUM_SENSORS_PER_ZONE] = {
>> 	TEMP_ADCPNP0, TEMP_ADCPNP1, TEMP_ADCPNP2, TEMP_ADCPNP3
>> };
>>
>> /* MT8173 thermal sensors */
>> #define MT8173_TS1	0
>> #define MT8173_TS2	1
>> #define MT8173_TS3	2
>> #define MT8173_TS4	3
>> #define MT8173_TSABB	4
>>
>> Best Regards,
>> Michael
>>> _______________________________________________
>>> Linux-mediatek mailing list
>>> Linux-mediatek@lists.infradead.org
>>> http://lists.infradead.org/mailman/listinfo/linux-mediatek
>>

_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [BUG] Cannot boot on MT8173 if Mediatek thermal is enabled
  2020-05-28 14:20           ` Matthias Brugger
@ 2020-07-01 11:07             ` Enric Balletbo i Serra
  0 siblings, 0 replies; 8+ messages in thread
From: Enric Balletbo i Serra @ 2020-07-01 11:07 UTC (permalink / raw)
  To: Matthias Brugger, Michael Kao
  Cc: roger.lu, Hsin-Yi Wang, drinkcat,
	moderated list:ARM/Mediatek SoC support, renze

Hi Michael,

On 28/5/20 16:20, Matthias Brugger wrote:
> 
> 
> On 28/05/2020 10:08, Enric Balletbo i Serra wrote:
>> Hi Michael,
>>
>> On 28/5/20 4:59, Michael Kao wrote:
>>> On Wed, 2020-05-20 at 18:12 +0200, Enric Balletbo i Serra wrote:
>>>> Hi Matthias et all,
>>>>
>>>> On 20/5/20 17:25, Enric Balletbo i Serra wrote:
>>>>>
>>>>>
>>>>> On 20/5/20 17:21, Matthias Brugger wrote:
>>>>>>
>>>>>>
>>>>>> On 20/05/2020 17:09, Enric Balletbo i Serra wrote:
>>>>>>> Dear all,
>>>>>>>
>>>>>>> I've been testing the Acer Chromebook R 13 (elm - MT8173) for a while. Today I
>>>>>>> enabled the Mediatek thermal driver (CONFIG_MTK_THERMAL=y) and I started to get
>>>>>>> that hang [1]
>>>>>>>
>>>>>>
>>>>>> Did you try to bisect to find out what broke it?
>>>>>>
>>>>>
>>>>> I don't even know if this worked at some point, I was running/testing my kernels
>>>>> with CONFIG_MTK_THERMAL disabled. From the log doesn't seem to have a lot of
>>>>> changes so I suspect this issue is there since long time.
>>>>>
>>>>
>>>> So the commit that introduces the problem is:
>>>>
>>>> commit eb9aecd90d1a39601e91cd08b90d5fee51d321a6
>>>> Author: Michael Kao <michael.kao@mediatek.com>
>>>> Date:   Fri Feb 1 15:38:07 2019 +0800
>>>>
>>>>     thermal: mediatek: fix register index error
>>>>
>>>>     The index of msr and adcpnp should match the sensor
>>>>     which belongs to the selected bank in the for loop.
>>>>
>>>>     Fixes: b7cf0053738c ("thermal: Add Mediatek thermal driver for mt2701.")
>>>>     Signed-off-by: Michael Kao <michael.kao@mediatek.com>
>>>>     Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
>>>>
>>>>
>>>>>
>>>>>> Regards,
>>>>>> Matthias
>>>>>>
>>>>>>> The stacktrace points point to this function:
>>>>>>>
>>>>>>> static int mtk_thermal_bank_temperature(struct mtk_thermal_bank *bank)
>>>>>>>
>>>>>>> More precisely to this call:
>>>>>>>
>>>>>>> 		raw = readl(mt->thermal_base +
>>>>>>> 			    conf->msr[conf->bank_data[bank->id].sensors[i]]);
>>>>>>>
>>>>>>> this call, is in a loop and ends trying to access to conf->msr[4]
>>>>>>> (conf->msr[MT8173_TSABB]) which doesn't exist as per the following struct
>>>>>>>
>>>>>>> static const int mt8173_msr[MT8173_NUM_SENSORS_PER_ZONE] = {
>>>>>>> 	TEMP_MSR0, TEMP_MSR1, TEMP_MSR2, TEMP_MSR3
>>>>>>> };
>>>>>>>
>>>>>>> I think the datasheet will help here to clarify what is happening but is not
>>>>>>> public, so I can really check. Anyway seems that or the mt8173_msr struct is
>>>>>>> wrong or the mt8173_bank_data is wrong or there is something else.
>>>>>>>
>>>>>>> Could anyone with the information or with this hardwware knowledge take a look,
>>>>>>> please.
>>>>>>>
>>>>>>> Thanks,
>>>>>>>  Enric
>>>>>>>
>>>>>>>
>>>>>>> [1]
>>>>>>> [    2.222488] Unable to handle kernel paging request at virtual address
>>>>>>> ffff8000125f5001
>>>>>>> [    2.230421] Mem abort info:
>>>>>>> [    2.233207]   ESR = 0x96000021
>>>>>>> [    2.236261]   EC = 0x25: DABT (current EL), IL = 32 bits
>>>>>>> [    2.241571]   SET = 0, FnV = 0
>>>>>>> [    2.244623]   EA = 0, S1PTW = 0
>>>>>>> [    2.247762] Data abort info:
>>>>>>> [    2.250640]   ISV = 0, ISS = 0x00000021
>>>>>>> [    2.254473]   CM = 0, WnR = 0
>>>>>>> [    2.257544] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000041850000
>>>>>>> [    2.264251] [ffff8000125f5001] pgd=000000013ffff003, pud=000000013fffe003,
>>>>>>> pmd=000000013fff9003, pte=006800001100b707
>>>>>>> [    2.274867] Internal error: Oops: 96000021 [#1] PREEMPT SMP
>>>>>>> [    2.280432] Modules linked in:
>>>>>>> [    2.283483] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.7.0-rc6+ #162
>>>>>>> [    2.289914] Hardware name: Google Elm (DT)
>>>>>>> [    2.294003] pstate: 20000005 (nzCv daif -PAN -UAO)
>>>>>>> [    2.298792] pc : mtk_read_temp+0xb8/0x1c8
>>>>>>> [    2.302793] lr : mtk_read_temp+0x7c/0x1c8
>>>>>>> [    2.306794] sp : ffff80001003b930
>>>>>>> [    2.310100] x29: ffff80001003b930 x28: 0000000000000000
>>>>>>> [    2.315404] x27: 0000000000000002 x26: ffff0000f9550b10
>>>>>>> [    2.320709] x25: ffff0000f9550a80 x24: 0000000000000090
>>>>>>> [    2.326014] x23: ffff80001003ba24 x22: 00000000610344c0
>>>>>>> [    2.331318] x21: 0000000000002710 x20: 00000000000001f4
>>>>>>> [    2.336622] x19: 0000000000030d40 x18: ffff800011742ec0
>>>>>>> [    2.341926] x17: 0000000000000001 x16: 0000000000000001
>>>>>>> [    2.347230] x15: ffffffffffffffff x14: ffffff0000000000
>>>>>>> [    2.352535] x13: ffffffffffffffff x12: 0000000000000028
>>>>>>> [    2.357839] x11: 0000000000000003 x10: ffff800011295ec8
>>>>>>> [    2.363143] x9 : 000000000000291b x8 : 0000000000000002
>>>>>>> [    2.368447] x7 : 00000000000000a8 x6 : 0000000000000004
>>>>>>> [    2.373751] x5 : 0000000000000000 x4 : ffff800011295cb0
>>>>>>> [    2.379055] x3 : 0000000000000002 x2 : ffff8000125f5001
>>>>>>> [    2.384359] x1 : 0000000000000001 x0 : ffff0000f9550a80
>>>>>>> [    2.389665] Call trace:
>>>>>>> [    2.392105]  mtk_read_temp+0xb8/0x1c8
>>>>>>> [    2.395760]  of_thermal_get_temp+0x2c/0x40
>>>>>>> [    2.399849]  thermal_zone_get_temp+0x78/0x160
>>>>>>> [    2.404198]  thermal_zone_device_update.part.0+0x3c/0x1f8
>>>>>>> [    2.409589]  thermal_zone_device_update+0x34/0x48
>>>>>>> [    2.414286]  of_thermal_set_mode+0x58/0x88
>>>>>>> [    2.418375]  thermal_zone_of_sensor_register+0x1a8/0x1d8
>>>>>>> [    2.423679]  devm_thermal_zone_of_sensor_register+0x64/0xb0
>>>>>>> [    2.429242]  mtk_thermal_probe+0x690/0x7d0
>>>>>>> [    2.433333]  platform_drv_probe+0x5c/0xb0
>>>>>>> [    2.437335]  really_probe+0xe4/0x448
>>>>>>> [    2.440901]  driver_probe_device+0xe8/0x140
>>>>>>> [    2.445077]  device_driver_attach+0x7c/0x88
>>>>>>> [    2.449252]  __driver_attach+0xac/0x178
>>>>>>> [    2.453082]  bus_for_each_dev+0x78/0xc8
>>>>>>> [    2.456909]  driver_attach+0x2c/0x38
>>>>>>> [    2.460476]  bus_add_driver+0x14c/0x230
>>>>>>> [    2.464304]  driver_register+0x6c/0x128
>>>>>>> [    2.468131]  __platform_driver_register+0x50/0x60
>>>>>>> [    2.472831]  mtk_thermal_driver_init+0x24/0x30
>>>>>>> [    2.477268]  do_one_initcall+0x50/0x298
>>>>>>> [    2.481098]  kernel_init_freeable+0x1ec/0x264
>>>>>>> [    2.485450]  kernel_init+0x1c/0x110
>>>>>>> [    2.488931]  ret_from_fork+0x10/0x1c
>>>>>>> [    2.492502] Code: f9401081 f9400402 b8a67821 8b010042 (b9400042)
>>>>>>> [    2.498599] ---[ end trace e43e3105ed27dc99 ]---
>>>>>>> [    2.503367] Kernel panic - not syncing: Attempted to kill init!
>>>>>>> exitcode=0x0000000b
>>>>>>> [    2.511020] SMP: stopping secondary CPUs
>>>>>>> [    2.514941] Kernel Offset: disabled
>>>>>>> [    2.518421] CPU features: 0x090002,25006005
>>>>>>> [    2.522595] Memory Limit: none
>>>>>>> [    2.525644] ---[ end Kernel panic - not syncing: Attempted to kill init!
>>>>>>> exitcode=0x0000000b ]---
>>>>>>>
>>>
>>>> Hi Enric,
>>> I will help to fix this crash. If it is urgent, you can revert the patch
>>> locally first for mt8173.
>>>
>>
>> Thanks, the reverted patch is what I am carrying ;-)
>>
> 
> As Enric mentioned v4.8-rc1 will claim support for MT8173 based Chromebooks, so
> we should make sure that a fix gets in in the first rc's (best would be rc2). If
> I got that right, best would be if we have a fix in around 4 weeks.
> 
> Please let me know if you can't make it and we can revert the commit in the
> rc-phase. Also I'd prefer a proper fix.
> 


Michael, did you had a chance to look into this.

> Enric what do you think?
> 

I plan to send a revert after rc4 if we couldn't find a fix for it before.

Thanks,
 Enric

> Regards,
> Matthias
> 
>> It'd be nice have this fixed for next MR or during the upcoming release cycle,
>> it will probably start next week. During the next merge window will land the
>> support for MT8173 Elm and Hana, so those boards will be affected by this and
>> will break. Actually, there is no much boards supported in mainline using MT8183
>> (only EVB), so in the worse case I can send a revert or a partial revert of the
>> patch.
>>
>> Thanks,
>>  Enric
>>
>>> The last sensor in mt8173_bank_data[2] is MT8173_TSABB.
>>> Its index is 4. But there is not index 4 in mt8173_msr and
>>> mt8173_adcpnp.
>>> That is the root cause for your reference.
>>>
>>> static const struct mtk_thermal_data mt8173_thermal_data = {
>>> 	.auxadc_channel = MT8173_TEMP_AUXADC_CHANNEL,
>>> 	.num_banks = MT8173_NUM_ZONES,
>>> 	.num_sensors = MT8173_NUM_SENSORS,
>>> 	.vts_index = mt8173_vts_index,
>>> 	.cali_val = MT8173_CALIBRATION,
>>> 	.num_controller = MT8173_NUM_CONTROLLER,
>>> 	.controller_offset = mt8173_tc_offset,
>>> 	.need_switch_bank = true,
>>> 	.bank_data = {
>>> 		{
>>> 			.num_sensors = 2,
>>> 			.sensors = mt8173_bank_data[0],
>>> 		}, {
>>> 			.num_sensors = 2,
>>> 			.sensors = mt8173_bank_data[1],
>>> 		}, {
>>> 			.num_sensors = 3,
>>> 			.sensors = mt8173_bank_data[2],
>>> 		}, {
>>> 			.num_sensors = 1,
>>> 			.sensors = mt8173_bank_data[3],
>>> 		},
>>> 	},
>>> 	.msr = mt8173_msr,
>>> 	.adcpnp = mt8173_adcpnp,
>>> 	.sensor_mux_values = mt8173_mux_values,
>>> };
>>>
>>>
>>>
>>> /* MT8173 thermal sensor data */
>>> static const int mt8173_bank_data[MT8173_NUM_ZONES][3] = {
>>> 	{ MT8173_TS2, MT8173_TS3 },
>>> 	{ MT8173_TS2, MT8173_TS4 },
>>> 	{ MT8173_TS1, MT8173_TS2, MT8173_TSABB },
>>> 	{ MT8173_TS2 },
>>> };
>>>
>>> static const int mt8173_msr[MT8173_NUM_SENSORS_PER_ZONE] = {
>>> 	TEMP_MSR0, TEMP_MSR1, TEMP_MSR2, TEMP_MSR3
>>> };
>>>
>>> static const int mt8173_adcpnp[MT8173_NUM_SENSORS_PER_ZONE] = {
>>> 	TEMP_ADCPNP0, TEMP_ADCPNP1, TEMP_ADCPNP2, TEMP_ADCPNP3
>>> };
>>>
>>> /* MT8173 thermal sensors */
>>> #define MT8173_TS1	0
>>> #define MT8173_TS2	1
>>> #define MT8173_TS3	2
>>> #define MT8173_TS4	3
>>> #define MT8173_TSABB	4
>>>
>>> Best Regards,
>>> Michael
>>>> _______________________________________________
>>>> Linux-mediatek mailing list
>>>> Linux-mediatek@lists.infradead.org
>>>> http://lists.infradead.org/mailman/listinfo/linux-mediatek
>>>

_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-07-01 11:07 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-20 15:09 [BUG] Cannot boot on MT8173 if Mediatek thermal is enabled Enric Balletbo i Serra
2020-05-20 15:21 ` Matthias Brugger
2020-05-20 15:25   ` Enric Balletbo i Serra
2020-05-20 16:12     ` Enric Balletbo i Serra
2020-05-28  2:59       ` Michael Kao
2020-05-28  8:08         ` Enric Balletbo i Serra
2020-05-28 14:20           ` Matthias Brugger
2020-07-01 11:07             ` Enric Balletbo i Serra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).