All of lore.kernel.org
 help / color / mirror / Atom feed
* loop nesting in alignment exception and machine check
@ 2019-10-26  7:23 ` Wangshaobo (bobo)
  0 siblings, 0 replies; 11+ messages in thread
From: Wangshaobo (bobo) @ 2019-10-26  7:23 UTC (permalink / raw)
  Cc: linux-arch, alistair, chengjian (D),
	Xiexiuqi, linux-kernel, oss, Wangshaobo (bobo),
	paulus, Libin (Huawei),
	agust, linuxppc-dev


[-- Attachment #1.1: Type: text/plain, Size: 2672 bytes --]

Hi,
I encountered a problem about a loop nesting occurred in manufacturing the alignment exception in machine check, trigger background is :

problem:
machine checkout or critical interrupt ->...->kbox_write[for recording last words] -> memcpy(irremap_addr, src,size):_GLOBAL(memcpy)...
when we enter memcpy,a command 'dcbz r11,r6' will cause a alignment exception, in this situation,r11 loads the ioremap address,which leads to the alignment exception,
then the command can not be process successfully,as we still in machine check.at the end ,it triggers a new irq machine check in irq handler function,a loop nesting begins.

analysis:
We have analysed a lot,but it still can not come to a reasonable description,in common,the alignment triggered in machine check context can still be collected into the Kbox
after alignment exception be handled by handler function, but how does the machine checkout can be triggered in the handler fucntion for any causes? We print relevant registers
as follow when first enter machine check and alignment exception handler function:
         MSR:0x2      MSR:0x0
         SRR1:0x2      SRR1:0x21002
         But the manual says SRR1 should be set to MSR(0x2),why that happened ?
         [cid:image001.jpg@01D58C0D.E496CFD0]
         Then a branch in handler function copy the SRR1 to MSR,this enble MSR[ME] and MSR[CE],system collapses.

Conclusion:
         1)  why the alignment exception can not be handled in machine check ?
         2)  besides memcpy,any other function can cause the alignment exception ?

We still recurrent it, the line as follows:
         Cpu dead lock->watch log->trigger fiq->kbox_write->memcpy->alignment exception->print last words.
         but for those problems as below,what the kbox printed is empty.
------------------kbox restart:[   10.147594]----------------
kbox verify fs magic fail
kbox mem mabye destroyed, format it
kbox: load OK
lock-task: major[249] minor[0]
-----start show_destroyed_kbox_mem_head----
00000000: 00000000 00000000 00000000 00000000  ................
00000010: 00000000 00000000 00000000 00000000  ................
00000020: 00000000 00000000 00000000 00000000  ................
00000030: 00000000 00000000 00000000 00000000  ................
00000040: 00000000 00000000 00000000 00000000  ................
00000050: 00000000 00000000 00000000 00000000  ................
00000060: 00000000 00000000 00000000 00000000  ................
00000070: 00000000 00000000 00000000 00000000  ................
00000080: 00000000 00000000 00000000 00000000  ................
00000090: 00000000 00000000 00000000 00000000  ................


[-- Attachment #1.2: Type: text/html, Size: 8615 bytes --]

[-- Attachment #2: image001.jpg --]
[-- Type: image/jpeg, Size: 11935 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* loop nesting in alignment exception and machine check
@ 2019-10-26  7:23 ` Wangshaobo (bobo)
  0 siblings, 0 replies; 11+ messages in thread
From: Wangshaobo (bobo) @ 2019-10-26  7:23 UTC (permalink / raw)
  Cc: linux-arch, alistair, chengjian (D),
	Xiexiuqi, linux-kernel, oss, Wangshaobo (bobo),
	paulus, Libin (Huawei),
	agust, linuxppc-dev


[-- Attachment #1.1: Type: text/plain, Size: 2672 bytes --]

Hi,
I encountered a problem about a loop nesting occurred in manufacturing the alignment exception in machine check, trigger background is :

problem:
machine checkout or critical interrupt ->...->kbox_write[for recording last words] -> memcpy(irremap_addr, src,size):_GLOBAL(memcpy)...
when we enter memcpy,a command 'dcbz r11,r6' will cause a alignment exception, in this situation,r11 loads the ioremap address,which leads to the alignment exception,
then the command can not be process successfully,as we still in machine check.at the end ,it triggers a new irq machine check in irq handler function,a loop nesting begins.

analysis:
We have analysed a lot,but it still can not come to a reasonable description,in common,the alignment triggered in machine check context can still be collected into the Kbox
after alignment exception be handled by handler function, but how does the machine checkout can be triggered in the handler fucntion for any causes? We print relevant registers
as follow when first enter machine check and alignment exception handler function:
         MSR:0x2      MSR:0x0
         SRR1:0x2      SRR1:0x21002
         But the manual says SRR1 should be set to MSR(0x2),why that happened ?
         [cid:image001.jpg@01D58C0D.E496CFD0]
         Then a branch in handler function copy the SRR1 to MSR,this enble MSR[ME] and MSR[CE],system collapses.

Conclusion:
         1)  why the alignment exception can not be handled in machine check ?
         2)  besides memcpy,any other function can cause the alignment exception ?

We still recurrent it, the line as follows:
         Cpu dead lock->watch log->trigger fiq->kbox_write->memcpy->alignment exception->print last words.
         but for those problems as below,what the kbox printed is empty.
------------------kbox restart:[   10.147594]----------------
kbox verify fs magic fail
kbox mem mabye destroyed, format it
kbox: load OK
lock-task: major[249] minor[0]
-----start show_destroyed_kbox_mem_head----
00000000: 00000000 00000000 00000000 00000000  ................
00000010: 00000000 00000000 00000000 00000000  ................
00000020: 00000000 00000000 00000000 00000000  ................
00000030: 00000000 00000000 00000000 00000000  ................
00000040: 00000000 00000000 00000000 00000000  ................
00000050: 00000000 00000000 00000000 00000000  ................
00000060: 00000000 00000000 00000000 00000000  ................
00000070: 00000000 00000000 00000000 00000000  ................
00000080: 00000000 00000000 00000000 00000000  ................
00000090: 00000000 00000000 00000000 00000000  ................


[-- Attachment #1.2: Type: text/html, Size: 8615 bytes --]

[-- Attachment #2: image001.jpg --]
[-- Type: image/jpeg, Size: 11935 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: loop nesting in alignment exception and machine check
  2019-10-26  7:23 ` Wangshaobo (bobo)
@ 2019-10-26 11:20   ` Christophe Leroy
  -1 siblings, 0 replies; 11+ messages in thread
From: Christophe Leroy @ 2019-10-26 11:20 UTC (permalink / raw)
  To: Wangshaobo (bobo)
  Cc: linux-arch, alistair, chengjian (D),
	Xiexiuqi, linux-kernel, oss, paulus, Libin (Huawei),
	agust, linuxppc-dev

Hi,

Le 26/10/2019 à 09:23, Wangshaobo (bobo) a écrit :
> Hi,
> 
> I encountered a problem about a loop nesting occurred in manufacturing 
> the alignment exception in machine check, trigger background is :
> 
> problem:
> 
> machine checkout or critical interrupt ->…->kbox_write[for recording 
> last words] -> memcpy(irremap_addr, src,size):_GLOBAL(memcpy)…
> 
> when we enter memcpy,a command ‘dcbz r11,r6’ will cause a alignment 
> exception, in this situation,r11 loads the ioremap address,which leads 
> to the alignment exception,

You can't use memcpy() on something else than memory.

For an ioremapped area, you have to use memcpy_toio()

Christophe

> 
> then the command can not be process successfully,as we still in machine 
> check.at the end ,it triggers a new irq machine check in irq handler 
> function,a loop nesting begins.
> 
> analysis:
> 
> We have analysed a lot,but it still can not come to a reasonable 
> description,in common,the alignment triggered in machine check context 
> can still be collected into the Kbox
> 
> after alignment exception be handled by handler function, but how does 
> the machine checkout can be triggered in the handler fucntion for any 
> causes? We print relevant registers
> 
> as follow when first enter machine check and alignment exception handler 
> function:
> 
>           MSR:0x2      MSR:0x0
> 
>           SRR1:0x2      SRR1:0x21002
> 
>           But the manual says SRR1 should be set to MSR(0x2),why that 
> happened ?
> 
>           Then a branch in handler function copy the SRR1 to MSR,this 
> enble MSR[ME] and MSR[CE],system collapses.
> 
> Conclusion:
> 
>           1)  why the alignment exception can not be handled in machine 
> check ?
> 
>           2)  besides memcpy,any other function can cause the alignment 
> exception ?
> 
> We still recurrent it, the line as follows:
> 
>           Cpu dead lock->watch log->trigger 
> fiq->kbox_write->memcpy->alignment exception->print last words.
> 
>           but for those problems as below,what the kbox printed is empty.
> 
> ------------------kbox restart:[   10.147594]----------------
> 
> kbox verify fs magic fail
> 
> kbox mem mabye destroyed, format it
> 
> kbox: load OK
> 
> lock-task: major[249] minor[0]
> 
> -----start show_destroyed_kbox_mem_head----
> 
> 00000000: 00000000 00000000 00000000 00000000  ................
> 
> 00000010: 00000000 00000000 00000000 00000000  ................
> 
> 00000020: 00000000 00000000 00000000 00000000  ................
> 
> 00000030: 00000000 00000000 00000000 00000000  ................
> 
> 00000040: 00000000 00000000 00000000 00000000  ................
> 
> 00000050: 00000000 00000000 00000000 00000000  ................
> 
> 00000060: 00000000 00000000 00000000 00000000  ................
> 
> 00000070: 00000000 00000000 00000000 00000000  ................
> 
> 00000080: 00000000 00000000 00000000 00000000  ................
> 
> 00000090: 00000000 00000000 00000000 00000000  ................
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: loop nesting in alignment exception and machine check
@ 2019-10-26 11:20   ` Christophe Leroy
  0 siblings, 0 replies; 11+ messages in thread
From: Christophe Leroy @ 2019-10-26 11:20 UTC (permalink / raw)
  To: Wangshaobo (bobo)
  Cc: linux-arch, chengjian (D),
	Xiexiuqi, alistair, linux-kernel, oss, paulus, Libin (Huawei),
	agust, linuxppc-dev

Hi,

Le 26/10/2019 à 09:23, Wangshaobo (bobo) a écrit :
> Hi,
> 
> I encountered a problem about a loop nesting occurred in manufacturing 
> the alignment exception in machine check, trigger background is :
> 
> problem:
> 
> machine checkout or critical interrupt ->…->kbox_write[for recording 
> last words] -> memcpy(irremap_addr, src,size):_GLOBAL(memcpy)…
> 
> when we enter memcpy,a command ‘dcbz r11,r6’ will cause a alignment 
> exception, in this situation,r11 loads the ioremap address,which leads 
> to the alignment exception,

You can't use memcpy() on something else than memory.

For an ioremapped area, you have to use memcpy_toio()

Christophe

> 
> then the command can not be process successfully,as we still in machine 
> check.at the end ,it triggers a new irq machine check in irq handler 
> function,a loop nesting begins.
> 
> analysis:
> 
> We have analysed a lot,but it still can not come to a reasonable 
> description,in common,the alignment triggered in machine check context 
> can still be collected into the Kbox
> 
> after alignment exception be handled by handler function, but how does 
> the machine checkout can be triggered in the handler fucntion for any 
> causes? We print relevant registers
> 
> as follow when first enter machine check and alignment exception handler 
> function:
> 
>           MSR:0x2      MSR:0x0
> 
>           SRR1:0x2      SRR1:0x21002
> 
>           But the manual says SRR1 should be set to MSR(0x2),why that 
> happened ?
> 
>           Then a branch in handler function copy the SRR1 to MSR,this 
> enble MSR[ME] and MSR[CE],system collapses.
> 
> Conclusion:
> 
>           1)  why the alignment exception can not be handled in machine 
> check ?
> 
>           2)  besides memcpy,any other function can cause the alignment 
> exception ?
> 
> We still recurrent it, the line as follows:
> 
>           Cpu dead lock->watch log->trigger 
> fiq->kbox_write->memcpy->alignment exception->print last words.
> 
>           but for those problems as below,what the kbox printed is empty.
> 
> ------------------kbox restart:[   10.147594]----------------
> 
> kbox verify fs magic fail
> 
> kbox mem mabye destroyed, format it
> 
> kbox: load OK
> 
> lock-task: major[249] minor[0]
> 
> -----start show_destroyed_kbox_mem_head----
> 
> 00000000: 00000000 00000000 00000000 00000000  ................
> 
> 00000010: 00000000 00000000 00000000 00000000  ................
> 
> 00000020: 00000000 00000000 00000000 00000000  ................
> 
> 00000030: 00000000 00000000 00000000 00000000  ................
> 
> 00000040: 00000000 00000000 00000000 00000000  ................
> 
> 00000050: 00000000 00000000 00000000 00000000  ................
> 
> 00000060: 00000000 00000000 00000000 00000000  ................
> 
> 00000070: 00000000 00000000 00000000 00000000  ................
> 
> 00000080: 00000000 00000000 00000000 00000000  ................
> 
> 00000090: 00000000 00000000 00000000 00000000  ................
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* 答复: 答复: loop nesting in alignment exception and machine check
       [not found]     ` <ef93fa2f-d98f-2e94-322e-0ae095626e75@c-s.fr>
@ 2019-11-01  1:57         ` Wangshaobo (bobo)
  2019-11-14  3:46       ` Wangshaobo (bobo)
  1 sibling, 0 replies; 11+ messages in thread
From: Wangshaobo (bobo) @ 2019-11-01  1:57 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: linux-arch, alistair, chengjian (D),
	Xiexiuqi, linux-kernel, oss, paulus, Libin (Huawei),
	agust, linuxppc-dev

Hi, Christophe

	I am sorry that we are in some troubles for some unpredictable problems when we replay and haven't given you a quick reply.
	
	I also want to ask does the phenomeon(use memcpy_toio when copy ioremap_address) only occurs in powerpc ? does any other 
arch also has the same problem ? we are in persuit of asking why this phenomenon happened. Our linux kernel version is 4.4.
	
	thanks very much.

-----邮件原件-----
发件人: Christophe Leroy [mailto:christophe.leroy@c-s.fr] 
发送时间: 2019年10月31日 19:13
收件人: Wangshaobo (bobo) <bobo.shaobowang@huawei.com>
抄送: chengjian (D) <cj.chengjian@huawei.com>; Libin (Huawei) <huawei.libin@huawei.com>; Xiexiuqi <xiexiuqi@huawei.com>; zhangyi (F) <yi.zhang@huawei.com>
主题: Re: 答复: loop nesting in alignment exception and machine check

Hi,

Did you try ? Does it work ?

Christophe

Le 28/10/2019 à 06:57, Wangshaobo (bobo) a écrit :
> Hi,Christophe
> 
> Thank you for your quick reply. I will try to use memcpy_toio() instead of memcpy().
> 
> -----邮件原件-----
> 发件人: Christophe Leroy [mailto:christophe.leroy@c-s.fr]
> 发送时间: 2019年10月26日 19:20
> 收件人: Wangshaobo (bobo) <bobo.shaobowang@huawei.com>
> 抄送: linux-arch@vger.kernel.org; alistair@popple.id.au; chengjian (D) 
> <cj.chengjian@huawei.com>; Xiexiuqi <xiexiuqi@huawei.com>; 
> linux-kernel@vger.kernel.org; oss@buserror.net; paulus@samba.org; 
> Libin (Huawei) <huawei.libin@huawei.com>; agust@denx.de; 
> linuxppc-dev@lists.ozlabs.org
> 主题: Re: loop nesting in alignment exception and machine check
> 
> Hi,
> 
> Le 26/10/2019 à 09:23, Wangshaobo (bobo) a écrit :
>> Hi,
>>
>> I encountered a problem about a loop nesting occurred in 
>> manufacturing the alignment exception in machine check, trigger background is :
>>
>> problem:
>>
>> machine checkout or critical interrupt ->…->kbox_write[for recording 
>> last words] -> memcpy(irremap_addr, src,size):_GLOBAL(memcpy)…
>>
>> when we enter memcpy,a command ‘dcbz r11,r6’ will cause a alignment 
>> exception, in this situation,r11 loads the ioremap address,which 
>> leads to the alignment exception,
> 
> You can't use memcpy() on something else than memory.
> 
> For an ioremapped area, you have to use memcpy_toio()
> 
> Christophe
> 
>>
>> then the command can not be process successfully,as we still in 
>> machine check.at the end ,it triggers a new irq machine check in irq 
>> handler function,a loop nesting begins.
>>
>> analysis:
>>
>> We have analysed a lot,but it still can not come to a reasonable 
>> description,in common,the alignment triggered in machine check 
>> context can still be collected into the Kbox
>>
>> after alignment exception be handled by handler function, but how 
>> does the machine checkout can be triggered in the handler fucntion 
>> for any causes? We print relevant registers
>>
>> as follow when first enter machine check and alignment exception 
>> handler
>> function:
>>
>>            MSR:0x2      MSR:0x0
>>
>>            SRR1:0x2      SRR1:0x21002
>>
>>            But the manual says SRR1 should be set to MSR(0x2),why 
>> that happened ?
>>
>>            Then a branch in handler function copy the SRR1 to 
>> MSR,this enble MSR[ME] and MSR[CE],system collapses.
>>
>> Conclusion:
>>
>>            1)  why the alignment exception can not be handled in 
>> machine check ?
>>
>>            2)  besides memcpy,any other function can cause the 
>> alignment exception ?
>>
>> We still recurrent it, the line as follows:
>>
>>            Cpu dead lock->watch log->trigger
>> fiq->kbox_write->memcpy->alignment exception->print last words.
>>
>>            but for those problems as below,what the kbox printed is empty.
>>
>> ------------------kbox restart:[   10.147594]----------------
>>
>> kbox verify fs magic fail
>>
>> kbox mem mabye destroyed, format it
>>
>> kbox: load OK
>>
>> lock-task: major[249] minor[0]
>>
>> -----start show_destroyed_kbox_mem_head----
>>
>> 00000000: 00000000 00000000 00000000 00000000  ................
>>
>> 00000010: 00000000 00000000 00000000 00000000  ................
>>
>> 00000020: 00000000 00000000 00000000 00000000  ................
>>
>> 00000030: 00000000 00000000 00000000 00000000  ................
>>
>> 00000040: 00000000 00000000 00000000 00000000  ................
>>
>> 00000050: 00000000 00000000 00000000 00000000  ................
>>
>> 00000060: 00000000 00000000 00000000 00000000  ................
>>
>> 00000070: 00000000 00000000 00000000 00000000  ................
>>
>> 00000080: 00000000 00000000 00000000 00000000  ................
>>
>> 00000090: 00000000 00000000 00000000 00000000  ................
>>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* 答复: 答复: loop nesting in alignment exception and machine check
@ 2019-11-01  1:57         ` Wangshaobo (bobo)
  0 siblings, 0 replies; 11+ messages in thread
From: Wangshaobo (bobo) @ 2019-11-01  1:57 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: linux-arch, chengjian (D),
	Xiexiuqi, alistair, linux-kernel, oss, paulus, Libin (Huawei),
	agust, linuxppc-dev

Hi, Christophe

	I am sorry that we are in some troubles for some unpredictable problems when we replay and haven't given you a quick reply.
	
	I also want to ask does the phenomeon(use memcpy_toio when copy ioremap_address) only occurs in powerpc ? does any other 
arch also has the same problem ? we are in persuit of asking why this phenomenon happened. Our linux kernel version is 4.4.
	
	thanks very much.

-----邮件原件-----
发件人: Christophe Leroy [mailto:christophe.leroy@c-s.fr] 
发送时间: 2019年10月31日 19:13
收件人: Wangshaobo (bobo) <bobo.shaobowang@huawei.com>
抄送: chengjian (D) <cj.chengjian@huawei.com>; Libin (Huawei) <huawei.libin@huawei.com>; Xiexiuqi <xiexiuqi@huawei.com>; zhangyi (F) <yi.zhang@huawei.com>
主题: Re: 答复: loop nesting in alignment exception and machine check

Hi,

Did you try ? Does it work ?

Christophe

Le 28/10/2019 à 06:57, Wangshaobo (bobo) a écrit :
> Hi,Christophe
> 
> Thank you for your quick reply. I will try to use memcpy_toio() instead of memcpy().
> 
> -----邮件原件-----
> 发件人: Christophe Leroy [mailto:christophe.leroy@c-s.fr]
> 发送时间: 2019年10月26日 19:20
> 收件人: Wangshaobo (bobo) <bobo.shaobowang@huawei.com>
> 抄送: linux-arch@vger.kernel.org; alistair@popple.id.au; chengjian (D) 
> <cj.chengjian@huawei.com>; Xiexiuqi <xiexiuqi@huawei.com>; 
> linux-kernel@vger.kernel.org; oss@buserror.net; paulus@samba.org; 
> Libin (Huawei) <huawei.libin@huawei.com>; agust@denx.de; 
> linuxppc-dev@lists.ozlabs.org
> 主题: Re: loop nesting in alignment exception and machine check
> 
> Hi,
> 
> Le 26/10/2019 à 09:23, Wangshaobo (bobo) a écrit :
>> Hi,
>>
>> I encountered a problem about a loop nesting occurred in 
>> manufacturing the alignment exception in machine check, trigger background is :
>>
>> problem:
>>
>> machine checkout or critical interrupt ->…->kbox_write[for recording 
>> last words] -> memcpy(irremap_addr, src,size):_GLOBAL(memcpy)…
>>
>> when we enter memcpy,a command ‘dcbz r11,r6’ will cause a alignment 
>> exception, in this situation,r11 loads the ioremap address,which 
>> leads to the alignment exception,
> 
> You can't use memcpy() on something else than memory.
> 
> For an ioremapped area, you have to use memcpy_toio()
> 
> Christophe
> 
>>
>> then the command can not be process successfully,as we still in 
>> machine check.at the end ,it triggers a new irq machine check in irq 
>> handler function,a loop nesting begins.
>>
>> analysis:
>>
>> We have analysed a lot,but it still can not come to a reasonable 
>> description,in common,the alignment triggered in machine check 
>> context can still be collected into the Kbox
>>
>> after alignment exception be handled by handler function, but how 
>> does the machine checkout can be triggered in the handler fucntion 
>> for any causes? We print relevant registers
>>
>> as follow when first enter machine check and alignment exception 
>> handler
>> function:
>>
>>            MSR:0x2      MSR:0x0
>>
>>            SRR1:0x2      SRR1:0x21002
>>
>>            But the manual says SRR1 should be set to MSR(0x2),why 
>> that happened ?
>>
>>            Then a branch in handler function copy the SRR1 to 
>> MSR,this enble MSR[ME] and MSR[CE],system collapses.
>>
>> Conclusion:
>>
>>            1)  why the alignment exception can not be handled in 
>> machine check ?
>>
>>            2)  besides memcpy,any other function can cause the 
>> alignment exception ?
>>
>> We still recurrent it, the line as follows:
>>
>>            Cpu dead lock->watch log->trigger
>> fiq->kbox_write->memcpy->alignment exception->print last words.
>>
>>            but for those problems as below,what the kbox printed is empty.
>>
>> ------------------kbox restart:[   10.147594]----------------
>>
>> kbox verify fs magic fail
>>
>> kbox mem mabye destroyed, format it
>>
>> kbox: load OK
>>
>> lock-task: major[249] minor[0]
>>
>> -----start show_destroyed_kbox_mem_head----
>>
>> 00000000: 00000000 00000000 00000000 00000000  ................
>>
>> 00000010: 00000000 00000000 00000000 00000000  ................
>>
>> 00000020: 00000000 00000000 00000000 00000000  ................
>>
>> 00000030: 00000000 00000000 00000000 00000000  ................
>>
>> 00000040: 00000000 00000000 00000000 00000000  ................
>>
>> 00000050: 00000000 00000000 00000000 00000000  ................
>>
>> 00000060: 00000000 00000000 00000000 00000000  ................
>>
>> 00000070: 00000000 00000000 00000000 00000000  ................
>>
>> 00000080: 00000000 00000000 00000000 00000000  ................
>>
>> 00000090: 00000000 00000000 00000000 00000000  ................
>>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* 答复: 答复: loop nesting in alignment exception and machine check
       [not found]     ` <ef93fa2f-d98f-2e94-322e-0ae095626e75@c-s.fr>
  2019-11-01  1:57         ` Wangshaobo (bobo)
@ 2019-11-14  3:46       ` Wangshaobo (bobo)
  2019-11-26  8:16         ` Christophe Leroy
  1 sibling, 1 reply; 11+ messages in thread
From: Wangshaobo (bobo) @ 2019-11-14  3:46 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: linux-arch, chengjian (D), Libin (Huawei), Xiexiuqi, zhangyi (F),
	Liuwenliang (Abbott Liu)

Hi Christophe,
	It testifys problem fixed when we use memcpy_toio() instead of memcpy In our practice, we found everything is ok before the cache_memcpy becomes memcpy in the 
Patch 0b05e2d671c40cfb57e66e4e402320d6e056b2f8 adopted, it accelerates the memcpy but introduces implicit trouble, our products commonly used memcpy for continuous 
matainance for a long time , but now those become a big problem for us to check where we use is correct and where is wrong, with respect to cachable_memcpy and memcpy_toio.
	So, I also want to ask,
	how can we trustly and unified fill the gap resulted by those changes in memcpy in version mantainance, if you have some tips pls tell me.
	Tthanks, your Shaobo Wang

-----邮件原件-----
发件人: Christophe Leroy [mailto:christophe.leroy@c-s.fr] 
发送时间: 2019年10月31日 19:13
收件人: Wangshaobo (bobo) <bobo.shaobowang@huawei.com>
抄送: chengjian (D) <cj.chengjian@huawei.com>; Libin (Huawei) <huawei.libin@huawei.com>; Xiexiuqi <xiexiuqi@huawei.com>; zhangyi (F) <yi.zhang@huawei.com>
主题: Re: 答复: loop nesting in alignment exception and machine check

Hi,

Did you try ? Does it work ?

Christophe

Le 28/10/2019 à 06:57, Wangshaobo (bobo) a écrit :
> Hi,Christophe
> 
> Thank you for your quick reply. I will try to use memcpy_toio() instead of memcpy().
> 
> -----邮件原件-----
> 发件人: Christophe Leroy [mailto:christophe.leroy@c-s.fr]
> 发送时间: 2019年10月26日 19:20
> 收件人: Wangshaobo (bobo) <bobo.shaobowang@huawei.com>
> 抄送: linux-arch@vger.kernel.org; alistair@popple.id.au; chengjian (D) 
> <cj.chengjian@huawei.com>; Xiexiuqi <xiexiuqi@huawei.com>; 
> linux-kernel@vger.kernel.org; oss@buserror.net; paulus@samba.org; 
> Libin (Huawei) <huawei.libin@huawei.com>; agust@denx.de; 
> linuxppc-dev@lists.ozlabs.org
> 主题: Re: loop nesting in alignment exception and machine check
> 
> Hi,
> 
> Le 26/10/2019 à 09:23, Wangshaobo (bobo) a écrit :
>> Hi,
>>
>> I encountered a problem about a loop nesting occurred in 
>> manufacturing the alignment exception in machine check, trigger background is :
>>
>> problem:
>>
>> machine checkout or critical interrupt ->…->kbox_write[for recording 
>> last words] -> memcpy(irremap_addr, src,size):_GLOBAL(memcpy)…
>>
>> when we enter memcpy,a command ‘dcbz r11,r6’ will cause a alignment 
>> exception, in this situation,r11 loads the ioremap address,which 
>> leads to the alignment exception,
> 
> You can't use memcpy() on something else than memory.
> 
> For an ioremapped area, you have to use memcpy_toio()
> 
> Christophe
> 
>>
>> then the command can not be process successfully,as we still in 
>> machine check.at the end ,it triggers a new irq machine check in irq 
>> handler function,a loop nesting begins.
>>
>> analysis:
>>
>> We have analysed a lot,but it still can not come to a reasonable 
>> description,in common,the alignment triggered in machine check 
>> context can still be collected into the Kbox
>>
>> after alignment exception be handled by handler function, but how 
>> does the machine checkout can be triggered in the handler fucntion 
>> for any causes? We print relevant registers
>>
>> as follow when first enter machine check and alignment exception 
>> handler
>> function:
>>
>>            MSR:0x2      MSR:0x0
>>
>>            SRR1:0x2      SRR1:0x21002
>>
>>            But the manual says SRR1 should be set to MSR(0x2),why 
>> that happened ?
>>
>>            Then a branch in handler function copy the SRR1 to 
>> MSR,this enble MSR[ME] and MSR[CE],system collapses.
>>
>> Conclusion:
>>
>>            1)  why the alignment exception can not be handled in 
>> machine check ?
>>
>>            2)  besides memcpy,any other function can cause the 
>> alignment exception ?
>>
>> We still recurrent it, the line as follows:
>>
>>            Cpu dead lock->watch log->trigger
>> fiq->kbox_write->memcpy->alignment exception->print last words.
>>
>>            but for those problems as below,what the kbox printed is empty.
>>
>> ------------------kbox restart:[   10.147594]----------------
>>
>> kbox verify fs magic fail
>>
>> kbox mem mabye destroyed, format it
>>
>> kbox: load OK
>>
>> lock-task: major[249] minor[0]
>>
>> -----start show_destroyed_kbox_mem_head----
>>
>> 00000000: 00000000 00000000 00000000 00000000  ................
>>
>> 00000010: 00000000 00000000 00000000 00000000  ................
>>
>> 00000020: 00000000 00000000 00000000 00000000  ................
>>
>> 00000030: 00000000 00000000 00000000 00000000  ................
>>
>> 00000040: 00000000 00000000 00000000 00000000  ................
>>
>> 00000050: 00000000 00000000 00000000 00000000  ................
>>
>> 00000060: 00000000 00000000 00000000 00000000  ................
>>
>> 00000070: 00000000 00000000 00000000 00000000  ................
>>
>> 00000080: 00000000 00000000 00000000 00000000  ................
>>
>> 00000090: 00000000 00000000 00000000 00000000  ................
>>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 答复: 答复: loop nesting in alignment exception and machine check
  2019-11-01  1:57         ` Wangshaobo (bobo)
@ 2019-11-26  8:13           ` Christophe Leroy
  -1 siblings, 0 replies; 11+ messages in thread
From: Christophe Leroy @ 2019-11-26  8:13 UTC (permalink / raw)
  To: Wangshaobo (bobo)
  Cc: linux-arch, alistair, chengjian (D),
	Xiexiuqi, linux-kernel, oss, paulus, Libin (Huawei),
	agust, linuxppc-dev

Hi,

Le 01/11/2019 à 02:57, Wangshaobo (bobo) a écrit :
> Hi, Christophe
> 
> 	I am sorry that we are in some troubles for some unpredictable problems when we replay and haven't given you a quick reply.
> 	
> 	I also want to ask does the phenomeon(use memcpy_toio when copy ioremap_address) only occurs in powerpc ? does any other
> arch also has the same problem ? we are in persuit of asking why this phenomenon happened. Our linux kernel version is 4.4.

It's not a problem ... it's a feature.

I have no idea whether the same kind of issue can happen on other 
arches, sorry.

Christophe

> 	
> 	thanks very much.
> 
> -----邮件原件-----
> 发件人: Christophe Leroy [mailto:christophe.leroy@c-s.fr]
> 发送时间: 2019年10月31日 19:13
> 收件人: Wangshaobo (bobo) <bobo.shaobowang@huawei.com>
> 抄送: chengjian (D) <cj.chengjian@huawei.com>; Libin (Huawei) <huawei.libin@huawei.com>; Xiexiuqi <xiexiuqi@huawei.com>; zhangyi (F) <yi.zhang@huawei.com>
> 主题: Re: 答复: loop nesting in alignment exception and machine check
> 
> Hi,
> 
> Did you try ? Does it work ?
> 
> Christophe
> 
> Le 28/10/2019 à 06:57, Wangshaobo (bobo) a écrit :
>> Hi,Christophe
>>
>> Thank you for your quick reply. I will try to use memcpy_toio() instead of memcpy().
>>
>> -----邮件原件-----
>> 发件人: Christophe Leroy [mailto:christophe.leroy@c-s.fr]
>> 发送时间: 2019年10月26日 19:20
>> 收件人: Wangshaobo (bobo) <bobo.shaobowang@huawei.com>
>> 抄送: linux-arch@vger.kernel.org; alistair@popple.id.au; chengjian (D)
>> <cj.chengjian@huawei.com>; Xiexiuqi <xiexiuqi@huawei.com>;
>> linux-kernel@vger.kernel.org; oss@buserror.net; paulus@samba.org;
>> Libin (Huawei) <huawei.libin@huawei.com>; agust@denx.de;
>> linuxppc-dev@lists.ozlabs.org
>> 主题: Re: loop nesting in alignment exception and machine check
>>
>> Hi,
>>
>> Le 26/10/2019 à 09:23, Wangshaobo (bobo) a écrit :
>>> Hi,
>>>
>>> I encountered a problem about a loop nesting occurred in
>>> manufacturing the alignment exception in machine check, trigger background is :
>>>
>>> problem:
>>>
>>> machine checkout or critical interrupt ->…->kbox_write[for recording
>>> last words] -> memcpy(irremap_addr, src,size):_GLOBAL(memcpy)…
>>>
>>> when we enter memcpy,a command ‘dcbz r11,r6’ will cause a alignment
>>> exception, in this situation,r11 loads the ioremap address,which
>>> leads to the alignment exception,
>>
>> You can't use memcpy() on something else than memory.
>>
>> For an ioremapped area, you have to use memcpy_toio()
>>
>> Christophe
>>
>>>
>>> then the command can not be process successfully,as we still in
>>> machine check.at the end ,it triggers a new irq machine check in irq
>>> handler function,a loop nesting begins.
>>>
>>> analysis:
>>>
>>> We have analysed a lot,but it still can not come to a reasonable
>>> description,in common,the alignment triggered in machine check
>>> context can still be collected into the Kbox
>>>
>>> after alignment exception be handled by handler function, but how
>>> does the machine checkout can be triggered in the handler fucntion
>>> for any causes? We print relevant registers
>>>
>>> as follow when first enter machine check and alignment exception
>>> handler
>>> function:
>>>
>>>             MSR:0x2      MSR:0x0
>>>
>>>             SRR1:0x2      SRR1:0x21002
>>>
>>>             But the manual says SRR1 should be set to MSR(0x2),why
>>> that happened ?
>>>
>>>             Then a branch in handler function copy the SRR1 to
>>> MSR,this enble MSR[ME] and MSR[CE],system collapses.
>>>
>>> Conclusion:
>>>
>>>             1)  why the alignment exception can not be handled in
>>> machine check ?
>>>
>>>             2)  besides memcpy,any other function can cause the
>>> alignment exception ?
>>>
>>> We still recurrent it, the line as follows:
>>>
>>>             Cpu dead lock->watch log->trigger
>>> fiq->kbox_write->memcpy->alignment exception->print last words.
>>>
>>>             but for those problems as below,what the kbox printed is empty.
>>>
>>> ------------------kbox restart:[   10.147594]----------------
>>>
>>> kbox verify fs magic fail
>>>
>>> kbox mem mabye destroyed, format it
>>>
>>> kbox: load OK
>>>
>>> lock-task: major[249] minor[0]
>>>
>>> -----start show_destroyed_kbox_mem_head----
>>>
>>> 00000000: 00000000 00000000 00000000 00000000  ................
>>>
>>> 00000010: 00000000 00000000 00000000 00000000  ................
>>>
>>> 00000020: 00000000 00000000 00000000 00000000  ................
>>>
>>> 00000030: 00000000 00000000 00000000 00000000  ................
>>>
>>> 00000040: 00000000 00000000 00000000 00000000  ................
>>>
>>> 00000050: 00000000 00000000 00000000 00000000  ................
>>>
>>> 00000060: 00000000 00000000 00000000 00000000  ................
>>>
>>> 00000070: 00000000 00000000 00000000 00000000  ................
>>>
>>> 00000080: 00000000 00000000 00000000 00000000  ................
>>>
>>> 00000090: 00000000 00000000 00000000 00000000  ................
>>>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 答复: 答复: loop nesting in alignment exception and machine check
@ 2019-11-26  8:13           ` Christophe Leroy
  0 siblings, 0 replies; 11+ messages in thread
From: Christophe Leroy @ 2019-11-26  8:13 UTC (permalink / raw)
  To: Wangshaobo (bobo)
  Cc: linux-arch, chengjian (D),
	Xiexiuqi, alistair, linux-kernel, oss, paulus, Libin (Huawei),
	agust, linuxppc-dev

Hi,

Le 01/11/2019 à 02:57, Wangshaobo (bobo) a écrit :
> Hi, Christophe
> 
> 	I am sorry that we are in some troubles for some unpredictable problems when we replay and haven't given you a quick reply.
> 	
> 	I also want to ask does the phenomeon(use memcpy_toio when copy ioremap_address) only occurs in powerpc ? does any other
> arch also has the same problem ? we are in persuit of asking why this phenomenon happened. Our linux kernel version is 4.4.

It's not a problem ... it's a feature.

I have no idea whether the same kind of issue can happen on other 
arches, sorry.

Christophe

> 	
> 	thanks very much.
> 
> -----邮件原件-----
> 发件人: Christophe Leroy [mailto:christophe.leroy@c-s.fr]
> 发送时间: 2019年10月31日 19:13
> 收件人: Wangshaobo (bobo) <bobo.shaobowang@huawei.com>
> 抄送: chengjian (D) <cj.chengjian@huawei.com>; Libin (Huawei) <huawei.libin@huawei.com>; Xiexiuqi <xiexiuqi@huawei.com>; zhangyi (F) <yi.zhang@huawei.com>
> 主题: Re: 答复: loop nesting in alignment exception and machine check
> 
> Hi,
> 
> Did you try ? Does it work ?
> 
> Christophe
> 
> Le 28/10/2019 à 06:57, Wangshaobo (bobo) a écrit :
>> Hi,Christophe
>>
>> Thank you for your quick reply. I will try to use memcpy_toio() instead of memcpy().
>>
>> -----邮件原件-----
>> 发件人: Christophe Leroy [mailto:christophe.leroy@c-s.fr]
>> 发送时间: 2019年10月26日 19:20
>> 收件人: Wangshaobo (bobo) <bobo.shaobowang@huawei.com>
>> 抄送: linux-arch@vger.kernel.org; alistair@popple.id.au; chengjian (D)
>> <cj.chengjian@huawei.com>; Xiexiuqi <xiexiuqi@huawei.com>;
>> linux-kernel@vger.kernel.org; oss@buserror.net; paulus@samba.org;
>> Libin (Huawei) <huawei.libin@huawei.com>; agust@denx.de;
>> linuxppc-dev@lists.ozlabs.org
>> 主题: Re: loop nesting in alignment exception and machine check
>>
>> Hi,
>>
>> Le 26/10/2019 à 09:23, Wangshaobo (bobo) a écrit :
>>> Hi,
>>>
>>> I encountered a problem about a loop nesting occurred in
>>> manufacturing the alignment exception in machine check, trigger background is :
>>>
>>> problem:
>>>
>>> machine checkout or critical interrupt ->…->kbox_write[for recording
>>> last words] -> memcpy(irremap_addr, src,size):_GLOBAL(memcpy)…
>>>
>>> when we enter memcpy,a command ‘dcbz r11,r6’ will cause a alignment
>>> exception, in this situation,r11 loads the ioremap address,which
>>> leads to the alignment exception,
>>
>> You can't use memcpy() on something else than memory.
>>
>> For an ioremapped area, you have to use memcpy_toio()
>>
>> Christophe
>>
>>>
>>> then the command can not be process successfully,as we still in
>>> machine check.at the end ,it triggers a new irq machine check in irq
>>> handler function,a loop nesting begins.
>>>
>>> analysis:
>>>
>>> We have analysed a lot,but it still can not come to a reasonable
>>> description,in common,the alignment triggered in machine check
>>> context can still be collected into the Kbox
>>>
>>> after alignment exception be handled by handler function, but how
>>> does the machine checkout can be triggered in the handler fucntion
>>> for any causes? We print relevant registers
>>>
>>> as follow when first enter machine check and alignment exception
>>> handler
>>> function:
>>>
>>>             MSR:0x2      MSR:0x0
>>>
>>>             SRR1:0x2      SRR1:0x21002
>>>
>>>             But the manual says SRR1 should be set to MSR(0x2),why
>>> that happened ?
>>>
>>>             Then a branch in handler function copy the SRR1 to
>>> MSR,this enble MSR[ME] and MSR[CE],system collapses.
>>>
>>> Conclusion:
>>>
>>>             1)  why the alignment exception can not be handled in
>>> machine check ?
>>>
>>>             2)  besides memcpy,any other function can cause the
>>> alignment exception ?
>>>
>>> We still recurrent it, the line as follows:
>>>
>>>             Cpu dead lock->watch log->trigger
>>> fiq->kbox_write->memcpy->alignment exception->print last words.
>>>
>>>             but for those problems as below,what the kbox printed is empty.
>>>
>>> ------------------kbox restart:[   10.147594]----------------
>>>
>>> kbox verify fs magic fail
>>>
>>> kbox mem mabye destroyed, format it
>>>
>>> kbox: load OK
>>>
>>> lock-task: major[249] minor[0]
>>>
>>> -----start show_destroyed_kbox_mem_head----
>>>
>>> 00000000: 00000000 00000000 00000000 00000000  ................
>>>
>>> 00000010: 00000000 00000000 00000000 00000000  ................
>>>
>>> 00000020: 00000000 00000000 00000000 00000000  ................
>>>
>>> 00000030: 00000000 00000000 00000000 00000000  ................
>>>
>>> 00000040: 00000000 00000000 00000000 00000000  ................
>>>
>>> 00000050: 00000000 00000000 00000000 00000000  ................
>>>
>>> 00000060: 00000000 00000000 00000000 00000000  ................
>>>
>>> 00000070: 00000000 00000000 00000000 00000000  ................
>>>
>>> 00000080: 00000000 00000000 00000000 00000000  ................
>>>
>>> 00000090: 00000000 00000000 00000000 00000000  ................
>>>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 答复: 答复: loop nesting in alignment exception and machine check
  2019-11-14  3:46       ` Wangshaobo (bobo)
@ 2019-11-26  8:16         ` Christophe Leroy
  2019-11-26 12:25           ` 答复: " Wangshaobo (bobo)
  0 siblings, 1 reply; 11+ messages in thread
From: Christophe Leroy @ 2019-11-26  8:16 UTC (permalink / raw)
  To: Wangshaobo (bobo)
  Cc: linux-arch, chengjian (D), Libin (Huawei), Xiexiuqi, zhangyi (F),
	Liuwenliang (Abbott Liu)



Le 14/11/2019 à 04:46, Wangshaobo (bobo) a écrit :
> Hi Christophe,
> 	It testifys problem fixed when we use memcpy_toio() instead of memcpy In our practice, we found everything is ok before the cache_memcpy becomes memcpy in the
> Patch 0b05e2d671c40cfb57e66e4e402320d6e056b2f8 adopted, it accelerates the memcpy but introduces implicit trouble, our products commonly used memcpy for continuous
> matainance for a long time , but now those become a big problem for us to check where we use is correct and where is wrong, with respect to cachable_memcpy and memcpy_toio.
> 	So, I also want to ask,
> 	how can we trustly and unified fill the gap resulted by those changes in memcpy in version mantainance, if you have some tips pls tell me.
> 	Tthanks, your Shaobo Wang

All accesses to I/O memory should use io accessors. Direct access to io 
memory is unsafe by definition.

Incorrect accesses to I/O memory can be detected with 'sparse' tool. For 
that, you just have to build the kernel with 'make vmlinux C=2' and 
you'll get notified for unsafe accesses to IO memory.

Christophe

> 
> -----邮件原件-----
> 发件人: Christophe Leroy [mailto:christophe.leroy@c-s.fr]
> 发送时间: 2019年10月31日 19:13
> 收件人: Wangshaobo (bobo) <bobo.shaobowang@huawei.com>
> 抄送: chengjian (D) <cj.chengjian@huawei.com>; Libin (Huawei) <huawei.libin@huawei.com>; Xiexiuqi <xiexiuqi@huawei.com>; zhangyi (F) <yi.zhang@huawei.com>
> 主题: Re: 答复: loop nesting in alignment exception and machine check
> 
> Hi,
> 
> Did you try ? Does it work ?
> 
> Christophe
> 
> Le 28/10/2019 à 06:57, Wangshaobo (bobo) a écrit :
>> Hi,Christophe
>>
>> Thank you for your quick reply. I will try to use memcpy_toio() instead of memcpy().
>>
>> -----邮件原件-----
>> 发件人: Christophe Leroy [mailto:christophe.leroy@c-s.fr]
>> 发送时间: 2019年10月26日 19:20
>> 收件人: Wangshaobo (bobo) <bobo.shaobowang@huawei.com>
>> 抄送: linux-arch@vger.kernel.org; alistair@popple.id.au; chengjian (D)
>> <cj.chengjian@huawei.com>; Xiexiuqi <xiexiuqi@huawei.com>;
>> linux-kernel@vger.kernel.org; oss@buserror.net; paulus@samba.org;
>> Libin (Huawei) <huawei.libin@huawei.com>; agust@denx.de;
>> linuxppc-dev@lists.ozlabs.org
>> 主题: Re: loop nesting in alignment exception and machine check
>>
>> Hi,
>>
>> Le 26/10/2019 à 09:23, Wangshaobo (bobo) a écrit :
>>> Hi,
>>>
>>> I encountered a problem about a loop nesting occurred in
>>> manufacturing the alignment exception in machine check, trigger background is :
>>>
>>> problem:
>>>
>>> machine checkout or critical interrupt ->…->kbox_write[for recording
>>> last words] -> memcpy(irremap_addr, src,size):_GLOBAL(memcpy)…
>>>
>>> when we enter memcpy,a command ‘dcbz r11,r6’ will cause a alignment
>>> exception, in this situation,r11 loads the ioremap address,which
>>> leads to the alignment exception,
>>
>> You can't use memcpy() on something else than memory.
>>
>> For an ioremapped area, you have to use memcpy_toio()
>>
>> Christophe
>>
>>>
>>> then the command can not be process successfully,as we still in
>>> machine check.at the end ,it triggers a new irq machine check in irq
>>> handler function,a loop nesting begins.
>>>
>>> analysis:
>>>
>>> We have analysed a lot,but it still can not come to a reasonable
>>> description,in common,the alignment triggered in machine check
>>> context can still be collected into the Kbox
>>>
>>> after alignment exception be handled by handler function, but how
>>> does the machine checkout can be triggered in the handler fucntion
>>> for any causes? We print relevant registers
>>>
>>> as follow when first enter machine check and alignment exception
>>> handler
>>> function:
>>>
>>>             MSR:0x2      MSR:0x0
>>>
>>>             SRR1:0x2      SRR1:0x21002
>>>
>>>             But the manual says SRR1 should be set to MSR(0x2),why
>>> that happened ?
>>>
>>>             Then a branch in handler function copy the SRR1 to
>>> MSR,this enble MSR[ME] and MSR[CE],system collapses.
>>>
>>> Conclusion:
>>>
>>>             1)  why the alignment exception can not be handled in
>>> machine check ?
>>>
>>>             2)  besides memcpy,any other function can cause the
>>> alignment exception ?
>>>
>>> We still recurrent it, the line as follows:
>>>
>>>             Cpu dead lock->watch log->trigger
>>> fiq->kbox_write->memcpy->alignment exception->print last words.
>>>
>>>             but for those problems as below,what the kbox printed is empty.
>>>
>>> ------------------kbox restart:[   10.147594]----------------
>>>
>>> kbox verify fs magic fail
>>>
>>> kbox mem mabye destroyed, format it
>>>
>>> kbox: load OK
>>>
>>> lock-task: major[249] minor[0]
>>>
>>> -----start show_destroyed_kbox_mem_head----
>>>
>>> 00000000: 00000000 00000000 00000000 00000000  ................
>>>
>>> 00000010: 00000000 00000000 00000000 00000000  ................
>>>
>>> 00000020: 00000000 00000000 00000000 00000000  ................
>>>
>>> 00000030: 00000000 00000000 00000000 00000000  ................
>>>
>>> 00000040: 00000000 00000000 00000000 00000000  ................
>>>
>>> 00000050: 00000000 00000000 00000000 00000000  ................
>>>
>>> 00000060: 00000000 00000000 00000000 00000000  ................
>>>
>>> 00000070: 00000000 00000000 00000000 00000000  ................
>>>
>>> 00000080: 00000000 00000000 00000000 00000000  ................
>>>
>>> 00000090: 00000000 00000000 00000000 00000000  ................
>>>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* 答复: 答复: 答复: loop nesting in alignment exception and machine check
  2019-11-26  8:16         ` Christophe Leroy
@ 2019-11-26 12:25           ` Wangshaobo (bobo)
  0 siblings, 0 replies; 11+ messages in thread
From: Wangshaobo (bobo) @ 2019-11-26 12:25 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: linux-arch, chengjian (D), Libin (Huawei), Xiexiuqi, zhangyi (F),
	Liuwenliang (Abbott Liu)

Thanks for your reply, Christophe,

I will use 'sparse' tool for checking unsafe IO memory access, I guess it is powerful.

Thanks again !
-----邮件原件-----
发件人: Christophe Leroy [mailto:christophe.leroy@c-s.fr] 
发送时间: 2019年11月26日 16:16
收件人: Wangshaobo (bobo) <bobo.shaobowang@huawei.com>
抄送: linux-arch@vger.kernel.org; chengjian (D) <cj.chengjian@huawei.com>; Libin (Huawei) <huawei.libin@huawei.com>; Xiexiuqi <xiexiuqi@huawei.com>; zhangyi (F) <yi.zhang@huawei.com>; Liuwenliang (Abbott Liu) <liuwenliang@huawei.com>
主题: Re: 答复: 答复: loop nesting in alignment exception and machine check



Le 14/11/2019 à 04:46, Wangshaobo (bobo) a écrit :
> Hi Christophe,
> 	It testifys problem fixed when we use memcpy_toio() instead of memcpy 
> In our practice, we found everything is ok before the cache_memcpy 
> becomes memcpy in the Patch 0b05e2d671c40cfb57e66e4e402320d6e056b2f8 adopted, it accelerates the memcpy but introduces implicit trouble, our products commonly used memcpy for continuous matainance for a long time , but now those become a big problem for us to check where we use is correct and where is wrong, with respect to cachable_memcpy and memcpy_toio.
> 	So, I also want to ask,
> 	how can we trustly and unified fill the gap resulted by those changes in memcpy in version mantainance, if you have some tips pls tell me.
> 	Tthanks, your Shaobo Wang

All accesses to I/O memory should use io accessors. Direct access to io memory is unsafe by definition.

Incorrect accesses to I/O memory can be detected with 'sparse' tool. For that, you just have to build the kernel with 'make vmlinux C=2' and you'll get notified for unsafe accesses to IO memory.

Christophe

> 
> -----邮件原件-----
> 发件人: Christophe Leroy [mailto:christophe.leroy@c-s.fr]
> 发送时间: 2019年10月31日 19:13
> 收件人: Wangshaobo (bobo) <bobo.shaobowang@huawei.com>
> 抄送: chengjian (D) <cj.chengjian@huawei.com>; Libin (Huawei) 
> <huawei.libin@huawei.com>; Xiexiuqi <xiexiuqi@huawei.com>; zhangyi (F) 
> <yi.zhang@huawei.com>
> 主题: Re: 答复: loop nesting in alignment exception and machine check
> 
> Hi,
> 
> Did you try ? Does it work ?
> 
> Christophe
> 
> Le 28/10/2019 à 06:57, Wangshaobo (bobo) a écrit :
>> Hi,Christophe
>>
>> Thank you for your quick reply. I will try to use memcpy_toio() instead of memcpy().
>>
>> -----邮件原件-----
>> 发件人: Christophe Leroy [mailto:christophe.leroy@c-s.fr]
>> 发送时间: 2019年10月26日 19:20
>> 收件人: Wangshaobo (bobo) <bobo.shaobowang@huawei.com>
>> 抄送: linux-arch@vger.kernel.org; alistair@popple.id.au; chengjian (D) 
>> <cj.chengjian@huawei.com>; Xiexiuqi <xiexiuqi@huawei.com>; 
>> linux-kernel@vger.kernel.org; oss@buserror.net; paulus@samba.org; 
>> Libin (Huawei) <huawei.libin@huawei.com>; agust@denx.de; 
>> linuxppc-dev@lists.ozlabs.org
>> 主题: Re: loop nesting in alignment exception and machine check
>>
>> Hi,
>>
>> Le 26/10/2019 à 09:23, Wangshaobo (bobo) a écrit :
>>> Hi,
>>>
>>> I encountered a problem about a loop nesting occurred in 
>>> manufacturing the alignment exception in machine check, trigger background is :
>>>
>>> problem:
>>>
>>> machine checkout or critical interrupt ->…->kbox_write[for recording 
>>> last words] -> memcpy(irremap_addr, src,size):_GLOBAL(memcpy)…
>>>
>>> when we enter memcpy,a command ‘dcbz r11,r6’ will cause a alignment 
>>> exception, in this situation,r11 loads the ioremap address,which 
>>> leads to the alignment exception,
>>
>> You can't use memcpy() on something else than memory.
>>
>> For an ioremapped area, you have to use memcpy_toio()
>>
>> Christophe
>>
>>>
>>> then the command can not be process successfully,as we still in 
>>> machine check.at the end ,it triggers a new irq machine check in irq 
>>> handler function,a loop nesting begins.
>>>
>>> analysis:
>>>
>>> We have analysed a lot,but it still can not come to a reasonable 
>>> description,in common,the alignment triggered in machine check 
>>> context can still be collected into the Kbox
>>>
>>> after alignment exception be handled by handler function, but how 
>>> does the machine checkout can be triggered in the handler fucntion 
>>> for any causes? We print relevant registers
>>>
>>> as follow when first enter machine check and alignment exception 
>>> handler
>>> function:
>>>
>>>             MSR:0x2      MSR:0x0
>>>
>>>             SRR1:0x2      SRR1:0x21002
>>>
>>>             But the manual says SRR1 should be set to MSR(0x2),why 
>>> that happened ?
>>>
>>>             Then a branch in handler function copy the SRR1 to 
>>> MSR,this enble MSR[ME] and MSR[CE],system collapses.
>>>
>>> Conclusion:
>>>
>>>             1)  why the alignment exception can not be handled in 
>>> machine check ?
>>>
>>>             2)  besides memcpy,any other function can cause the 
>>> alignment exception ?
>>>
>>> We still recurrent it, the line as follows:
>>>
>>>             Cpu dead lock->watch log->trigger
>>> fiq->kbox_write->memcpy->alignment exception->print last words.
>>>
>>>             but for those problems as below,what the kbox printed is empty.
>>>
>>> ------------------kbox restart:[   10.147594]----------------
>>>
>>> kbox verify fs magic fail
>>>
>>> kbox mem mabye destroyed, format it
>>>
>>> kbox: load OK
>>>
>>> lock-task: major[249] minor[0]
>>>
>>> -----start show_destroyed_kbox_mem_head----
>>>
>>> 00000000: 00000000 00000000 00000000 00000000  ................
>>>
>>> 00000010: 00000000 00000000 00000000 00000000  ................
>>>
>>> 00000020: 00000000 00000000 00000000 00000000  ................
>>>
>>> 00000030: 00000000 00000000 00000000 00000000  ................
>>>
>>> 00000040: 00000000 00000000 00000000 00000000  ................
>>>
>>> 00000050: 00000000 00000000 00000000 00000000  ................
>>>
>>> 00000060: 00000000 00000000 00000000 00000000  ................
>>>
>>> 00000070: 00000000 00000000 00000000 00000000  ................
>>>
>>> 00000080: 00000000 00000000 00000000 00000000  ................
>>>
>>> 00000090: 00000000 00000000 00000000 00000000  ................
>>>

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2019-11-26 12:25 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-26  7:23 loop nesting in alignment exception and machine check Wangshaobo (bobo)
2019-10-26  7:23 ` Wangshaobo (bobo)
2019-10-26 11:20 ` Christophe Leroy
2019-10-26 11:20   ` Christophe Leroy
     [not found]   ` <D44062DC474617438D5181ADFE2B2C21016E9EAA@dggemi529-mbs.china.huawei.com>
     [not found]     ` <ef93fa2f-d98f-2e94-322e-0ae095626e75@c-s.fr>
2019-11-01  1:57       ` 答复: 答复: " Wangshaobo (bobo)
2019-11-01  1:57         ` Wangshaobo (bobo)
2019-11-26  8:13         ` Christophe Leroy
2019-11-26  8:13           ` Christophe Leroy
2019-11-14  3:46       ` Wangshaobo (bobo)
2019-11-26  8:16         ` Christophe Leroy
2019-11-26 12:25           ` 答复: " Wangshaobo (bobo)

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.