linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: loop nesting in alignment exception and machine check
       [not found] <D44062DC474617438D5181ADFE2B2C21016DE42A@dggemi529-mbs.china.huawei.com>
@ 2019-10-26 11:20 ` Christophe Leroy
       [not found]   ` <D44062DC474617438D5181ADFE2B2C21016E9EAA@dggemi529-mbs.china.huawei.com>
  0 siblings, 1 reply; 3+ messages in thread
From: Christophe Leroy @ 2019-10-26 11:20 UTC (permalink / raw)
  To: Wangshaobo (bobo)
  Cc: linux-arch, alistair, chengjian (D),
	Xiexiuqi, linux-kernel, oss, paulus, Libin (Huawei),
	agust, linuxppc-dev

Hi,

Le 26/10/2019 à 09:23, Wangshaobo (bobo) a écrit :
> Hi,
> 
> I encountered a problem about a loop nesting occurred in manufacturing 
> the alignment exception in machine check, trigger background is :
> 
> problem:
> 
> machine checkout or critical interrupt ->…->kbox_write[for recording 
> last words] -> memcpy(irremap_addr, src,size):_GLOBAL(memcpy)…
> 
> when we enter memcpy,a command ‘dcbz r11,r6’ will cause a alignment 
> exception, in this situation,r11 loads the ioremap address,which leads 
> to the alignment exception,

You can't use memcpy() on something else than memory.

For an ioremapped area, you have to use memcpy_toio()

Christophe

> 
> then the command can not be process successfully,as we still in machine 
> check.at the end ,it triggers a new irq machine check in irq handler 
> function,a loop nesting begins.
> 
> analysis:
> 
> We have analysed a lot,but it still can not come to a reasonable 
> description,in common,the alignment triggered in machine check context 
> can still be collected into the Kbox
> 
> after alignment exception be handled by handler function, but how does 
> the machine checkout can be triggered in the handler fucntion for any 
> causes? We print relevant registers
> 
> as follow when first enter machine check and alignment exception handler 
> function:
> 
>           MSR:0x2      MSR:0x0
> 
>           SRR1:0x2      SRR1:0x21002
> 
>           But the manual says SRR1 should be set to MSR(0x2),why that 
> happened ?
> 
>           Then a branch in handler function copy the SRR1 to MSR,this 
> enble MSR[ME] and MSR[CE],system collapses.
> 
> Conclusion:
> 
>           1)  why the alignment exception can not be handled in machine 
> check ?
> 
>           2)  besides memcpy,any other function can cause the alignment 
> exception ?
> 
> We still recurrent it, the line as follows:
> 
>           Cpu dead lock->watch log->trigger 
> fiq->kbox_write->memcpy->alignment exception->print last words.
> 
>           but for those problems as below,what the kbox printed is empty.
> 
> ------------------kbox restart:[   10.147594]----------------
> 
> kbox verify fs magic fail
> 
> kbox mem mabye destroyed, format it
> 
> kbox: load OK
> 
> lock-task: major[249] minor[0]
> 
> -----start show_destroyed_kbox_mem_head----
> 
> 00000000: 00000000 00000000 00000000 00000000  ................
> 
> 00000010: 00000000 00000000 00000000 00000000  ................
> 
> 00000020: 00000000 00000000 00000000 00000000  ................
> 
> 00000030: 00000000 00000000 00000000 00000000  ................
> 
> 00000040: 00000000 00000000 00000000 00000000  ................
> 
> 00000050: 00000000 00000000 00000000 00000000  ................
> 
> 00000060: 00000000 00000000 00000000 00000000  ................
> 
> 00000070: 00000000 00000000 00000000 00000000  ................
> 
> 00000080: 00000000 00000000 00000000 00000000  ................
> 
> 00000090: 00000000 00000000 00000000 00000000  ................
> 

^ permalink raw reply	[flat|nested] 3+ messages in thread

* 答复: 答复: loop nesting in alignment exception and machine check
       [not found]     ` <ef93fa2f-d98f-2e94-322e-0ae095626e75@c-s.fr>
@ 2019-11-01  1:57       ` Wangshaobo (bobo)
  2019-11-26  8:13         ` Christophe Leroy
  0 siblings, 1 reply; 3+ messages in thread
From: Wangshaobo (bobo) @ 2019-11-01  1:57 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: linux-arch, alistair, chengjian (D),
	Xiexiuqi, linux-kernel, oss, paulus, Libin (Huawei),
	agust, linuxppc-dev

Hi, Christophe

	I am sorry that we are in some troubles for some unpredictable problems when we replay and haven't given you a quick reply.
	
	I also want to ask does the phenomeon(use memcpy_toio when copy ioremap_address) only occurs in powerpc ? does any other 
arch also has the same problem ? we are in persuit of asking why this phenomenon happened. Our linux kernel version is 4.4.
	
	thanks very much.

-----邮件原件-----
发件人: Christophe Leroy [mailto:christophe.leroy@c-s.fr] 
发送时间: 2019年10月31日 19:13
收件人: Wangshaobo (bobo) <bobo.shaobowang@huawei.com>
抄送: chengjian (D) <cj.chengjian@huawei.com>; Libin (Huawei) <huawei.libin@huawei.com>; Xiexiuqi <xiexiuqi@huawei.com>; zhangyi (F) <yi.zhang@huawei.com>
主题: Re: 答复: loop nesting in alignment exception and machine check

Hi,

Did you try ? Does it work ?

Christophe

Le 28/10/2019 à 06:57, Wangshaobo (bobo) a écrit :
> Hi,Christophe
> 
> Thank you for your quick reply. I will try to use memcpy_toio() instead of memcpy().
> 
> -----邮件原件-----
> 发件人: Christophe Leroy [mailto:christophe.leroy@c-s.fr]
> 发送时间: 2019年10月26日 19:20
> 收件人: Wangshaobo (bobo) <bobo.shaobowang@huawei.com>
> 抄送: linux-arch@vger.kernel.org; alistair@popple.id.au; chengjian (D) 
> <cj.chengjian@huawei.com>; Xiexiuqi <xiexiuqi@huawei.com>; 
> linux-kernel@vger.kernel.org; oss@buserror.net; paulus@samba.org; 
> Libin (Huawei) <huawei.libin@huawei.com>; agust@denx.de; 
> linuxppc-dev@lists.ozlabs.org
> 主题: Re: loop nesting in alignment exception and machine check
> 
> Hi,
> 
> Le 26/10/2019 à 09:23, Wangshaobo (bobo) a écrit :
>> Hi,
>>
>> I encountered a problem about a loop nesting occurred in 
>> manufacturing the alignment exception in machine check, trigger background is :
>>
>> problem:
>>
>> machine checkout or critical interrupt ->…->kbox_write[for recording 
>> last words] -> memcpy(irremap_addr, src,size):_GLOBAL(memcpy)…
>>
>> when we enter memcpy,a command ‘dcbz r11,r6’ will cause a alignment 
>> exception, in this situation,r11 loads the ioremap address,which 
>> leads to the alignment exception,
> 
> You can't use memcpy() on something else than memory.
> 
> For an ioremapped area, you have to use memcpy_toio()
> 
> Christophe
> 
>>
>> then the command can not be process successfully,as we still in 
>> machine check.at the end ,it triggers a new irq machine check in irq 
>> handler function,a loop nesting begins.
>>
>> analysis:
>>
>> We have analysed a lot,but it still can not come to a reasonable 
>> description,in common,the alignment triggered in machine check 
>> context can still be collected into the Kbox
>>
>> after alignment exception be handled by handler function, but how 
>> does the machine checkout can be triggered in the handler fucntion 
>> for any causes? We print relevant registers
>>
>> as follow when first enter machine check and alignment exception 
>> handler
>> function:
>>
>>            MSR:0x2      MSR:0x0
>>
>>            SRR1:0x2      SRR1:0x21002
>>
>>            But the manual says SRR1 should be set to MSR(0x2),why 
>> that happened ?
>>
>>            Then a branch in handler function copy the SRR1 to 
>> MSR,this enble MSR[ME] and MSR[CE],system collapses.
>>
>> Conclusion:
>>
>>            1)  why the alignment exception can not be handled in 
>> machine check ?
>>
>>            2)  besides memcpy,any other function can cause the 
>> alignment exception ?
>>
>> We still recurrent it, the line as follows:
>>
>>            Cpu dead lock->watch log->trigger
>> fiq->kbox_write->memcpy->alignment exception->print last words.
>>
>>            but for those problems as below,what the kbox printed is empty.
>>
>> ------------------kbox restart:[   10.147594]----------------
>>
>> kbox verify fs magic fail
>>
>> kbox mem mabye destroyed, format it
>>
>> kbox: load OK
>>
>> lock-task: major[249] minor[0]
>>
>> -----start show_destroyed_kbox_mem_head----
>>
>> 00000000: 00000000 00000000 00000000 00000000  ................
>>
>> 00000010: 00000000 00000000 00000000 00000000  ................
>>
>> 00000020: 00000000 00000000 00000000 00000000  ................
>>
>> 00000030: 00000000 00000000 00000000 00000000  ................
>>
>> 00000040: 00000000 00000000 00000000 00000000  ................
>>
>> 00000050: 00000000 00000000 00000000 00000000  ................
>>
>> 00000060: 00000000 00000000 00000000 00000000  ................
>>
>> 00000070: 00000000 00000000 00000000 00000000  ................
>>
>> 00000080: 00000000 00000000 00000000 00000000  ................
>>
>> 00000090: 00000000 00000000 00000000 00000000  ................
>>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: 答复: 答复: loop nesting in alignment exception and machine check
  2019-11-01  1:57       ` 答复: 答复: " Wangshaobo (bobo)
@ 2019-11-26  8:13         ` Christophe Leroy
  0 siblings, 0 replies; 3+ messages in thread
From: Christophe Leroy @ 2019-11-26  8:13 UTC (permalink / raw)
  To: Wangshaobo (bobo)
  Cc: linux-arch, alistair, chengjian (D),
	Xiexiuqi, linux-kernel, oss, paulus, Libin (Huawei),
	agust, linuxppc-dev

Hi,

Le 01/11/2019 à 02:57, Wangshaobo (bobo) a écrit :
> Hi, Christophe
> 
> 	I am sorry that we are in some troubles for some unpredictable problems when we replay and haven't given you a quick reply.
> 	
> 	I also want to ask does the phenomeon(use memcpy_toio when copy ioremap_address) only occurs in powerpc ? does any other
> arch also has the same problem ? we are in persuit of asking why this phenomenon happened. Our linux kernel version is 4.4.

It's not a problem ... it's a feature.

I have no idea whether the same kind of issue can happen on other 
arches, sorry.

Christophe

> 	
> 	thanks very much.
> 
> -----邮件原件-----
> 发件人: Christophe Leroy [mailto:christophe.leroy@c-s.fr]
> 发送时间: 2019年10月31日 19:13
> 收件人: Wangshaobo (bobo) <bobo.shaobowang@huawei.com>
> 抄送: chengjian (D) <cj.chengjian@huawei.com>; Libin (Huawei) <huawei.libin@huawei.com>; Xiexiuqi <xiexiuqi@huawei.com>; zhangyi (F) <yi.zhang@huawei.com>
> 主题: Re: 答复: loop nesting in alignment exception and machine check
> 
> Hi,
> 
> Did you try ? Does it work ?
> 
> Christophe
> 
> Le 28/10/2019 à 06:57, Wangshaobo (bobo) a écrit :
>> Hi,Christophe
>>
>> Thank you for your quick reply. I will try to use memcpy_toio() instead of memcpy().
>>
>> -----邮件原件-----
>> 发件人: Christophe Leroy [mailto:christophe.leroy@c-s.fr]
>> 发送时间: 2019年10月26日 19:20
>> 收件人: Wangshaobo (bobo) <bobo.shaobowang@huawei.com>
>> 抄送: linux-arch@vger.kernel.org; alistair@popple.id.au; chengjian (D)
>> <cj.chengjian@huawei.com>; Xiexiuqi <xiexiuqi@huawei.com>;
>> linux-kernel@vger.kernel.org; oss@buserror.net; paulus@samba.org;
>> Libin (Huawei) <huawei.libin@huawei.com>; agust@denx.de;
>> linuxppc-dev@lists.ozlabs.org
>> 主题: Re: loop nesting in alignment exception and machine check
>>
>> Hi,
>>
>> Le 26/10/2019 à 09:23, Wangshaobo (bobo) a écrit :
>>> Hi,
>>>
>>> I encountered a problem about a loop nesting occurred in
>>> manufacturing the alignment exception in machine check, trigger background is :
>>>
>>> problem:
>>>
>>> machine checkout or critical interrupt ->…->kbox_write[for recording
>>> last words] -> memcpy(irremap_addr, src,size):_GLOBAL(memcpy)…
>>>
>>> when we enter memcpy,a command ‘dcbz r11,r6’ will cause a alignment
>>> exception, in this situation,r11 loads the ioremap address,which
>>> leads to the alignment exception,
>>
>> You can't use memcpy() on something else than memory.
>>
>> For an ioremapped area, you have to use memcpy_toio()
>>
>> Christophe
>>
>>>
>>> then the command can not be process successfully,as we still in
>>> machine check.at the end ,it triggers a new irq machine check in irq
>>> handler function,a loop nesting begins.
>>>
>>> analysis:
>>>
>>> We have analysed a lot,but it still can not come to a reasonable
>>> description,in common,the alignment triggered in machine check
>>> context can still be collected into the Kbox
>>>
>>> after alignment exception be handled by handler function, but how
>>> does the machine checkout can be triggered in the handler fucntion
>>> for any causes? We print relevant registers
>>>
>>> as follow when first enter machine check and alignment exception
>>> handler
>>> function:
>>>
>>>             MSR:0x2      MSR:0x0
>>>
>>>             SRR1:0x2      SRR1:0x21002
>>>
>>>             But the manual says SRR1 should be set to MSR(0x2),why
>>> that happened ?
>>>
>>>             Then a branch in handler function copy the SRR1 to
>>> MSR,this enble MSR[ME] and MSR[CE],system collapses.
>>>
>>> Conclusion:
>>>
>>>             1)  why the alignment exception can not be handled in
>>> machine check ?
>>>
>>>             2)  besides memcpy,any other function can cause the
>>> alignment exception ?
>>>
>>> We still recurrent it, the line as follows:
>>>
>>>             Cpu dead lock->watch log->trigger
>>> fiq->kbox_write->memcpy->alignment exception->print last words.
>>>
>>>             but for those problems as below,what the kbox printed is empty.
>>>
>>> ------------------kbox restart:[   10.147594]----------------
>>>
>>> kbox verify fs magic fail
>>>
>>> kbox mem mabye destroyed, format it
>>>
>>> kbox: load OK
>>>
>>> lock-task: major[249] minor[0]
>>>
>>> -----start show_destroyed_kbox_mem_head----
>>>
>>> 00000000: 00000000 00000000 00000000 00000000  ................
>>>
>>> 00000010: 00000000 00000000 00000000 00000000  ................
>>>
>>> 00000020: 00000000 00000000 00000000 00000000  ................
>>>
>>> 00000030: 00000000 00000000 00000000 00000000  ................
>>>
>>> 00000040: 00000000 00000000 00000000 00000000  ................
>>>
>>> 00000050: 00000000 00000000 00000000 00000000  ................
>>>
>>> 00000060: 00000000 00000000 00000000 00000000  ................
>>>
>>> 00000070: 00000000 00000000 00000000 00000000  ................
>>>
>>> 00000080: 00000000 00000000 00000000 00000000  ................
>>>
>>> 00000090: 00000000 00000000 00000000 00000000  ................
>>>

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-11-26  8:13 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <D44062DC474617438D5181ADFE2B2C21016DE42A@dggemi529-mbs.china.huawei.com>
2019-10-26 11:20 ` loop nesting in alignment exception and machine check Christophe Leroy
     [not found]   ` <D44062DC474617438D5181ADFE2B2C21016E9EAA@dggemi529-mbs.china.huawei.com>
     [not found]     ` <ef93fa2f-d98f-2e94-322e-0ae095626e75@c-s.fr>
2019-11-01  1:57       ` 答复: 答复: " Wangshaobo (bobo)
2019-11-26  8:13         ` Christophe Leroy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).