All of lore.kernel.org
 help / color / mirror / Atom feed
* Wedge400 (AST2520) OpenBMC stuck at reboot
@ 2022-09-21 22:08 Tao Ren
  2022-09-22  2:10 ` Lei Yu
       [not found] ` <217031663833427@mail.yandex-team.ru>
  0 siblings, 2 replies; 6+ messages in thread
From: Tao Ren @ 2022-09-21 22:08 UTC (permalink / raw)
  To: openbmc; +Cc: taoren

Hi there,

Recently I noticed a few Wedge400 (AST2520A2) units stuck after "reboot"
command. It's hard to reproduce (affecting ~1 out of 1,000 units), but
once it happens, I have to power cycle the chassis to recover OpenBMC.

I checked aspeed_wdt.c and manually played with watchdog registers, but
everything looks normal to me. Did anyone hit the similar error before?
Any suggestions which area I should look into?

Below are the last few lines of logs before OpenBMC hangs:

bmc-oob login:
INIT: Sending processes configured via /etc/inittab the TERM signal
Stopping OpenBSD Secure Shell server: sshdstopped /usr/sbin/sshd (pid 7397 1189)
Stopping ntpd: done
stopping rsyslogd ... done
Stopping random number generator daemon.
Deconfiguring network interfaces... done.
Sending all processes the TERM signal...
rackmond[1747]: Got request exit[  528.383133] watchdog: watchdog0: watchdog did not stop!
Sending all processes the KILL signal...
Unmounting remote filesystems...
Deactivating swap...
Unmounting local filesystems...
Rebooting... [  529.725009] reboot: Restarting system


Cheers,

Tao

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Wedge400 (AST2520) OpenBMC stuck at reboot
  2022-09-21 22:08 Wedge400 (AST2520) OpenBMC stuck at reboot Tao Ren
@ 2022-09-22  2:10 ` Lei Yu
  2022-09-22  6:21   ` Tao Ren
       [not found] ` <217031663833427@mail.yandex-team.ru>
  1 sibling, 1 reply; 6+ messages in thread
From: Lei Yu @ 2022-09-22  2:10 UTC (permalink / raw)
  To: Tao Ren; +Cc: taoren, openbmc

We hit a similar but different issue about BMC stuck.
It occurs when running host DC cycle test, and when the issue occurs:
1. The BMC hangs, and the aspeed's heartbeat is off
2. If the wdt2 is enabled, the wdt2 will fire and aspeed chip will
reset and reboot into the seconds flash.
3. If the wdt2 is disabled, the BMC just hangs and we have to power
cycle the chassis.

We could not find the root cause, but it's likely related to a patch:
https://lore.kernel.org/openbmc/20201221223225.14723-2-jae.hyun.yoo@linux.intel.com/
If we revert the patch, the issue could not be reproduced anymore.

On Thu, Sep 22, 2022 at 6:09 AM Tao Ren <rentao.bupt@gmail.com> wrote:
>
> Hi there,
>
> Recently I noticed a few Wedge400 (AST2520A2) units stuck after "reboot"
> command. It's hard to reproduce (affecting ~1 out of 1,000 units), but
> once it happens, I have to power cycle the chassis to recover OpenBMC.
>
> I checked aspeed_wdt.c and manually played with watchdog registers, but
> everything looks normal to me. Did anyone hit the similar error before?
> Any suggestions which area I should look into?
>
> Below are the last few lines of logs before OpenBMC hangs:
>
> bmc-oob login:
> INIT: Sending processes configured via /etc/inittab the TERM signal
> Stopping OpenBSD Secure Shell server: sshdstopped /usr/sbin/sshd (pid 7397 1189)
> Stopping ntpd: done
> stopping rsyslogd ... done
> Stopping random number generator daemon.
> Deconfiguring network interfaces... done.
> Sending all processes the TERM signal...
> rackmond[1747]: Got request exit[  528.383133] watchdog: watchdog0: watchdog did not stop!
> Sending all processes the KILL signal...
> Unmounting remote filesystems...
> Deactivating swap...
> Unmounting local filesystems...
> Rebooting... [  529.725009] reboot: Restarting system
>
>
> Cheers,
>
> Tao



-- 
BRs,
Lei YU

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Wedge400 (AST2520) OpenBMC stuck at reboot
  2022-09-22  2:10 ` Lei Yu
@ 2022-09-22  6:21   ` Tao Ren
  0 siblings, 0 replies; 6+ messages in thread
From: Tao Ren @ 2022-09-22  6:21 UTC (permalink / raw)
  To: Lei Yu; +Cc: taoren, openbmc

Hi Lei,

Thank you for the quick response! The symptom is quite similar to my
Wedge400 problem, but CONFIG_VIDEO_ASPEED is not enabled in my kconfig,
so it might be caused by different component(s) in my environment..


Cheers,

Tao

On Thu, Sep 22, 2022 at 10:10:01AM +0800, Lei Yu wrote:
> We hit a similar but different issue about BMC stuck.
> It occurs when running host DC cycle test, and when the issue occurs:
> 1. The BMC hangs, and the aspeed's heartbeat is off
> 2. If the wdt2 is enabled, the wdt2 will fire and aspeed chip will
> reset and reboot into the seconds flash.
> 3. If the wdt2 is disabled, the BMC just hangs and we have to power
> cycle the chassis.
> 
> We could not find the root cause, but it's likely related to a patch:
> https://lore.kernel.org/openbmc/20201221223225.14723-2-jae.hyun.yoo@linux.intel.com/
> If we revert the patch, the issue could not be reproduced anymore.
> 
> On Thu, Sep 22, 2022 at 6:09 AM Tao Ren <rentao.bupt@gmail.com> wrote:
> >
> > Hi there,
> >
> > Recently I noticed a few Wedge400 (AST2520A2) units stuck after "reboot"
> > command. It's hard to reproduce (affecting ~1 out of 1,000 units), but
> > once it happens, I have to power cycle the chassis to recover OpenBMC.
> >
> > I checked aspeed_wdt.c and manually played with watchdog registers, but
> > everything looks normal to me. Did anyone hit the similar error before?
> > Any suggestions which area I should look into?
> >
> > Below are the last few lines of logs before OpenBMC hangs:
> >
> > bmc-oob login:
> > INIT: Sending processes configured via /etc/inittab the TERM signal
> > Stopping OpenBSD Secure Shell server: sshdstopped /usr/sbin/sshd (pid 7397 1189)
> > Stopping ntpd: done
> > stopping rsyslogd ... done
> > Stopping random number generator daemon.
> > Deconfiguring network interfaces... done.
> > Sending all processes the TERM signal...
> > rackmond[1747]: Got request exit[  528.383133] watchdog: watchdog0: watchdog did not stop!
> > Sending all processes the KILL signal...
> > Unmounting remote filesystems...
> > Deactivating swap...
> > Unmounting local filesystems...
> > Rebooting... [  529.725009] reboot: Restarting system
> >
> >
> > Cheers,
> >
> > Tao
> 
> 
> 
> -- 
> BRs,
> Lei YU

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Wedge400 (AST2520) OpenBMC stuck at reboot
       [not found] ` <217031663833427@mail.yandex-team.ru>
@ 2022-09-22 22:54   ` Tao Ren
  2022-09-26  7:28     ` Chin-Ting Kuo
  0 siblings, 1 reply; 6+ messages in thread
From: Tao Ren @ 2022-09-22 22:54 UTC (permalink / raw)
  To: Konstantin Klubnichkin, ryan_chen; +Cc: openbmc

Hi Konstantin,

Thanks for the sharing. The watchdog control logic in the script is
similar to aspeed_wdt_restart(), but the good part is: system is still
reachable if watchdog cannot reset the system successfully.

Hi Ryan,

Have you ever seen the problem in your environment? Looks like it is
affecting multiple ASPEED platforms. Any suggestions?

BTW, I'm running Linux 5.15 in Wedge400 AST2520A2 OpenBMC.


Cheers,

Tao

On Thu, Sep 22, 2022 at 10:59:12AM +0300, Konstantin Klubnichkin wrote:
> <div>- все</div><div> </div><div>Hello!</div><div> </div><div>I've faced this issue.</div><div>Finally my solution was to modify shutdown script:</div><div> </div><div>======================================================</div><div><div># tcsattr(tty, TIOCDRAIN, mode) to drain tty messages to console   </div><div>test -t 1 &amp;&amp; stty cooked 0&lt;&amp;1                                      </div><div>                                                                   </div><div>echo "Syncing..."                                                  </div><div>sync || :                                          </div><div>sync || :                          </div><div>sync || :                           </div><div>                                       </div><div>echo "Stopping WDTs..."                                                                                      </div><div>rev=$(ast_getrev || :)              </div><div>if [ "$rev" = "G5" ]; then           </div><div>        devmem 0x1e78500c 32 0 || :              </div><div>        devmem 0x1e78502c 32 0 || :  </div><div>        devmem 0x1e78504c 32 0 || :            </div><div>fi                                             </div><div>if [ "$rev" = "G6" ]; then                                         </div><div>        devmem 0x1e78500c 32 0 || :                                </div><div>        devmem 0x1e78504c 32 0 || :                                </div><div>        devmem 0x1e78508c 32 0 || :                                </div><div>        devmem 0x1e7850cc 32 0 || :                                </div><div>        devmem 0x1e78510c 32 0 || :                                </div><div>        devmem 0x1e78514c 32 0 || :                                </div><div>        devmem 0x1e78518c 32 0 || :                                </div><div>        devmem 0x1e7851cc 32 0 || :                                </div><div>fi                                                                 </div><div>                                                                   </div><div>sled_hb_hb || :                                                    </div><div>                                                                   </div><div>echo "Setting up WDT1 for ARM reboot"                              </div><div># Set timeout to 5 seconds                                         </div><div>devmem 0x1e785004 32 0x4c4b40 || :                                 </div><div># Load counter reload value to counter register                    </div><div>devmem 0x1e785008 32 0x4755 || :                                   </div><div># Enable WDT1, reset ARM core only, use first flash (AST2500 only),</div><div># disable interrupt,  use 1MHz clock (AST2500 only)</div><div>devmem 0x1e78500c 32 0x53 || :                          </div><div>                                            </div><div>echo -n "WDT1CR " || :                      </div><div>devmem 0x1e78500c || :                      </div><div>                                            </div><div>echo "Last heart beats following..."                            </div><div>                                                                </div><div>while true; do                                                  </div><div>        echo "KNOCK knock..."                                   </div><div>        sleep 1                                                 </div><div>done                                                                                                                                                                               </div><div> </div><div>echo "WARNING!!!! ZOMBIE ATTACK!!!" </div><div>                                                              </div><div># Execute the command systemd told us to ...                    </div><div>if test -d /oldroot  &amp;&amp; test "$1"                               </div><div>then                                                            </div><div>        if test "$1" = kexec                                    </div><div>        then                                                    </div><div>                $1 -f -e                                        </div><div>        else                                                    </div><div>                $1 -f                                           </div><div>        fi                                                      </div><div>fi    </div></div><div><div>======================================================</div></div><div> </div><div>22.09.2022, 01:09, "Tao Ren" &lt;rentao.bupt@gmail.com&gt;:</div><blockquote><p>Hi there,<br /><br />Recently I noticed a few Wedge400 (AST2520A2) units stuck after "reboot"<br />command. It's hard to reproduce (affecting ~1 out of 1,000 units), but<br />once it happens, I have to power cycle the chassis to recover OpenBMC.<br /><br />I checked aspeed_wdt.c and manually played with watchdog registers, but<br />everything looks normal to me. Did anyone hit the similar error before?<br />Any suggestions which area I should look into?<br /><br />Below are the last few lines of logs before OpenBMC hangs:<br /><br />bmc-oob login:<br />INIT: Sending processes configured via /etc/inittab the TERM signal<br />Stopping OpenBSD Secure Shell server: sshdstopped /usr/sbin/sshd (pid 7397 1189)<br />Stopping ntpd: done<br />stopping rsyslogd ... done<br />Stopping random number generator daemon.<br />Deconfiguring network interfaces... done.<br />Sending all processes the TERM signal...<br />rackmond[1747]: Got request exit[ 528.383133] watchdog: watchdog0: watchdog did not stop!<br />Sending all processes the KILL signal...<br />Unmounting remote filesystems...<br />Deactivating swap...<br />Unmounting local filesystems...<br />Rebooting... [ 529.725009] reboot: Restarting system<br /><br /><br />Cheers,<br /><br />Tao</p></blockquote><div> </div><div> </div><div>-- </div><div>Best regards,</div><div>Konstantin Klubnichkin,</div><div>lead firmware engineer,</div><div>server hardware R&amp;D group,</div><div>Yandex Moscow office.</div><div>tel: +7-903-510-33-33</div><div> </div>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: Wedge400 (AST2520) OpenBMC stuck at reboot
  2022-09-22 22:54   ` Tao Ren
@ 2022-09-26  7:28     ` Chin-Ting Kuo
  2022-09-27  7:04       ` Tao Ren
  0 siblings, 1 reply; 6+ messages in thread
From: Chin-Ting Kuo @ 2022-09-26  7:28 UTC (permalink / raw)
  To: Tao Ren, Konstantin Klubnichkin, Ryan Chen; +Cc: Ron Chen, openbmc

Hi Tao,

This problem cannot be reproduced on our AST2500 EVB with our kernel-5.15 SDK image.
We have implemented three days reboot stress for about 1,980 times.




Thanks.

Best Wishes,
Chin-Ting

> -----Original Message-----
> From: openbmc
> <openbmc-bounces+chin-ting_kuo=aspeedtech.com@lists.ozlabs.org> On
> Behalf Of Tao Ren
> Sent: Friday, September 23, 2022 6:54 AM
> To: Konstantin Klubnichkin <kitsok@yandex-team.ru>; Ryan Chen
> <ryan_chen@aspeedtech.com>
> Cc: openbmc@lists.ozlabs.org
> Subject: Re: Wedge400 (AST2520) OpenBMC stuck at reboot
> 
> Hi Konstantin,
> 
> Thanks for the sharing. The watchdog control logic in the script is similar to
> aspeed_wdt_restart(), but the good part is: system is still reachable if
> watchdog cannot reset the system successfully.
> 
> Hi Ryan,
> 
> Have you ever seen the problem in your environment? Looks like it is affecting
> multiple ASPEED platforms. Any suggestions?
> 
> BTW, I'm running Linux 5.15 in Wedge400 AST2520A2 OpenBMC.
> 
> 
> Cheers,
> 
> Tao
> 
> On Thu, Sep 22, 2022 at 10:59:12AM +0300, Konstantin Klubnichkin wrote:
> > <div>- все</div><div> </div><div>Hello!</div><div> </div><div>I've
> > faced this issue.</div><div>Finally my solution was to modify shutdown
> > script:</div><div>
> >
> </div><div>======================================================
> </div
> > ><div><div># tcsattr(tty, TIOCDRAIN, mode) to drain tty messages to
> > console   </div><div>test -t 1 &amp;&amp; stty cooked
> 0&lt;&amp;1
> >
> </div><div>
> 
> > </div><div>echo
> "Syncing..."
> 
> > </div><div>sync
> || :
> > </div><div>sync || :                          </div><div>sync
> || :
> >
> </div><div>                                       </div><div>e
> cho
> > "Stopping
> WDTs..."
> 
> > </div><div>rev=$(ast_getrev || :)              </div><div>if [ "$rev"
> > = "G5" ]; then           </div><div>        devmem 0x1e78500c 32
> 0 ||
> > :              </div><div>        devmem 0x1e78502c 32 0 || :
> > </div><div>        devmem 0x1e78504c 32 0 || :
> > </div><div>fi
> > </div><div>if [ "$rev" = "G6" ];
> then
> > </div><div>        devmem 0x1e78500c 32 0
> || :
> > </div><div>        devmem 0x1e78504c 32 0
> || :
> > </div><div>        devmem 0x1e78508c 32 0
> || :
> > </div><div>        devmem 0x1e7850cc 32 0
> || :
> > </div><div>        devmem 0x1e78510c 32 0
> || :
> > </div><div>        devmem 0x1e78514c 32 0
> || :
> > </div><div>        devmem 0x1e78518c 32 0
> || :
> > </div><div>        devmem 0x1e7851cc 32 0
> || :
> >
> </div><div>fi
> 
> >
> </div><div>
> 
> > </div><div>sled_hb_hb
> || :
> >
> </div><div>
> 
> > </div><div>echo "Setting up WDT1 for ARM
> reboot"
> > </div><div># Set timeout to 5
> seconds
> > </div><div>devmem 0x1e785004 32 0x4c4b40
> || :
> > </div><div># Load counter reload value to counter
> register
> > </div><div>devmem 0x1e785008 32 0x4755
> || :
> > </div><div># Enable WDT1, reset ARM core only, use first flash
> > (AST2500 only),</div><div># disable interrupt,  use 1MHz clock
> > (AST2500 only)</div><div>devmem 0x1e78500c 32 0x53
> || :
> >
> </div><div>                                            </div
> ><div>echo
> > -n "WDT1CR " || :                      </div><div>devmem
> 0x1e78500c ||
> > :                      </div><div>
> 
> > </div><div>echo "Last heart beats
> following..."
> >
> </div><div>
> 
> > </div><div>while true;
> do
> > </div><div>        echo "KNOCK
> knock..."
> > </div><div>        sleep
> 1
> >
> </div><div>done
> 
> 
> 
> > </div><div> </div><div>echo "WARNING!!!! ZOMBIE
> ATTACK!!!" </div><div>
> 
> > </div><div># Execute the command systemd told us
> to ...
> > </div><div>if test -d /oldroot  &amp;&amp; test
> "$1"
> >
> </div><div>then
> 
> > </div><div>        if test "$1" =
> kexec
> >
> </div><div>        then
> 
> > </div><div>                $1 -f
> -e
> >
> </div><div>        else
> 
> > </div><div>                $1
> -f
> >
> </div><div>        fi
> 
> > </div><div>fi
> >
> </div></div><div><div>=============================================
> ===
> > ======</div></div><div> </div><div>22.09.2022, 01:09, "Tao Ren"
> > &lt;rentao.bupt@gmail.com&gt;:</div><blockquote><p>Hi there,<br /><br
> > />Recently I noticed a few Wedge400 (AST2520A2) units stuck after
> > "reboot"<br />command. It's hard to reproduce (affecting ~1 out of
> > 1,000 units), but<br />once it happens, I have to power cycle the
> > chassis to recover OpenBMC.<br /><br />I checked aspeed_wdt.c and
> > manually played with watchdog registers, but<br />everything looks
> > normal to me. Did anyone hit the similar error before?<br />Any
> > suggestions which area I should look into?<br /><br />Below are the
> > last few lines of logs before OpenBMC hangs:<br /><br />bmc-oob
> > login:<br />INIT: Sending processes configured via /etc/inittab the
> > TERM signal<br />Stopping OpenBSD Secure Shell server: sshdstopped
> > /usr/sbin/sshd (pid 7397 1189)<br />Stopping ntpd: done<br />stopping
> > rsyslogd ... done<br />Stopping random number generator daemon.<br
> > />Deconfiguring network interfaces... done.<br />Sending all processes
> > the TERM signal...<br />rackmond[1747]: Got request exit[ 528.383133]
> > watchdog: watchdog0: watchdog did not stop!<br />Sending all processes
> > the KILL signal...<br />Unmounting remote filesystems...<br
> > />Deactivating swap...<br />Unmounting local filesystems...<br
> > />Rebooting... [ 529.725009] reboot: Restarting system<br /><br /><br
> > />Cheers,<br /><br />Tao</p></blockquote><div> </div><div>
> > </div><div>-- </div><div>Best regards,</div><div>Konstantin
> > Klubnichkin,</div><div>lead firmware engineer,</div><div>server
> > hardware R&amp;D group,</div><div>Yandex Moscow office.</div><div>tel:
> > +7-903-510-33-33</div><div> </div>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Wedge400 (AST2520) OpenBMC stuck at reboot
  2022-09-26  7:28     ` Chin-Ting Kuo
@ 2022-09-27  7:04       ` Tao Ren
  0 siblings, 0 replies; 6+ messages in thread
From: Tao Ren @ 2022-09-27  7:04 UTC (permalink / raw)
  To: Chin-Ting Kuo; +Cc: Ron Chen, openbmc, Ryan Chen, Konstantin Klubnichkin

Hi Chin-Ting,

Thank you for spending time on this problem!

Could you please share the git-repo/url of the kernel-5.15 SDK you are
referring to? Do you see any critical SDK kernel patches that are not
upstreamed yet, but could potentially help to solve the reboot hange
issue?

BTW, below is my kernel tree, which is derived from Joel's kernel tree,
dev-5.15 branch:
https://github.com/facebook/openbmc-linux/tree/dev-5.15


Cheers,

Tao

On Mon, Sep 26, 2022 at 07:28:27AM +0000, Chin-Ting Kuo wrote:
> Hi Tao,
> 
> This problem cannot be reproduced on our AST2500 EVB with our kernel-5.15 SDK image.
> We have implemented three days reboot stress for about 1,980 times.
> 
> 
> 
> 
> Thanks.
> 
> Best Wishes,
> Chin-Ting
> 
> > -----Original Message-----
> > From: openbmc
> > <openbmc-bounces+chin-ting_kuo=aspeedtech.com@lists.ozlabs.org> On
> > Behalf Of Tao Ren
> > Sent: Friday, September 23, 2022 6:54 AM
> > To: Konstantin Klubnichkin <kitsok@yandex-team.ru>; Ryan Chen
> > <ryan_chen@aspeedtech.com>
> > Cc: openbmc@lists.ozlabs.org
> > Subject: Re: Wedge400 (AST2520) OpenBMC stuck at reboot
> > 
> > Hi Konstantin,
> > 
> > Thanks for the sharing. The watchdog control logic in the script is similar to
> > aspeed_wdt_restart(), but the good part is: system is still reachable if
> > watchdog cannot reset the system successfully.
> > 
> > Hi Ryan,
> > 
> > Have you ever seen the problem in your environment? Looks like it is affecting
> > multiple ASPEED platforms. Any suggestions?
> > 
> > BTW, I'm running Linux 5.15 in Wedge400 AST2520A2 OpenBMC.
> > 
> > 
> > Cheers,
> > 
> > Tao
> > 
> > On Thu, Sep 22, 2022 at 10:59:12AM +0300, Konstantin Klubnichkin wrote:
> > > <div>- все</div><div> </div><div>Hello!</div><div> </div><div>I've
> > > faced this issue.</div><div>Finally my solution was to modify shutdown
> > > script:</div><div>
> > >
> > </div><div>======================================================
> > </div
> > > ><div><div># tcsattr(tty, TIOCDRAIN, mode) to drain tty messages to
> > > console   </div><div>test -t 1 &amp;&amp; stty cooked
> > 0&lt;&amp;1
> > >
> > </div><div>
> > 
> > > </div><div>echo
> > "Syncing..."
> > 
> > > </div><div>sync
> > || :
> > > </div><div>sync || :                          </div><div>sync
> > || :
> > >
> > </div><div>                                       </div><div>e
> > cho
> > > "Stopping
> > WDTs..."
> > 
> > > </div><div>rev=$(ast_getrev || :)              </div><div>if [ "$rev"
> > > = "G5" ]; then           </div><div>        devmem 0x1e78500c 32
> > 0 ||
> > > :              </div><div>        devmem 0x1e78502c 32 0 || :
> > > </div><div>        devmem 0x1e78504c 32 0 || :
> > > </div><div>fi
> > > </div><div>if [ "$rev" = "G6" ];
> > then
> > > </div><div>        devmem 0x1e78500c 32 0
> > || :
> > > </div><div>        devmem 0x1e78504c 32 0
> > || :
> > > </div><div>        devmem 0x1e78508c 32 0
> > || :
> > > </div><div>        devmem 0x1e7850cc 32 0
> > || :
> > > </div><div>        devmem 0x1e78510c 32 0
> > || :
> > > </div><div>        devmem 0x1e78514c 32 0
> > || :
> > > </div><div>        devmem 0x1e78518c 32 0
> > || :
> > > </div><div>        devmem 0x1e7851cc 32 0
> > || :
> > >
> > </div><div>fi
> > 
> > >
> > </div><div>
> > 
> > > </div><div>sled_hb_hb
> > || :
> > >
> > </div><div>
> > 
> > > </div><div>echo "Setting up WDT1 for ARM
> > reboot"
> > > </div><div># Set timeout to 5
> > seconds
> > > </div><div>devmem 0x1e785004 32 0x4c4b40
> > || :
> > > </div><div># Load counter reload value to counter
> > register
> > > </div><div>devmem 0x1e785008 32 0x4755
> > || :
> > > </div><div># Enable WDT1, reset ARM core only, use first flash
> > > (AST2500 only),</div><div># disable interrupt,  use 1MHz clock
> > > (AST2500 only)</div><div>devmem 0x1e78500c 32 0x53
> > || :
> > >
> > </div><div>                                            </div
> > ><div>echo
> > > -n "WDT1CR " || :                      </div><div>devmem
> > 0x1e78500c ||
> > > :                      </div><div>
> > 
> > > </div><div>echo "Last heart beats
> > following..."
> > >
> > </div><div>
> > 
> > > </div><div>while true;
> > do
> > > </div><div>        echo "KNOCK
> > knock..."
> > > </div><div>        sleep
> > 1
> > >
> > </div><div>done
> > 
> > 
> > 
> > > </div><div> </div><div>echo "WARNING!!!! ZOMBIE
> > ATTACK!!!" </div><div>
> > 
> > > </div><div># Execute the command systemd told us
> > to ...
> > > </div><div>if test -d /oldroot  &amp;&amp; test
> > "$1"
> > >
> > </div><div>then
> > 
> > > </div><div>        if test "$1" =
> > kexec
> > >
> > </div><div>        then
> > 
> > > </div><div>                $1 -f
> > -e
> > >
> > </div><div>        else
> > 
> > > </div><div>                $1
> > -f
> > >
> > </div><div>        fi
> > 
> > > </div><div>fi
> > >
> > </div></div><div><div>=============================================
> > ===
> > > ======</div></div><div> </div><div>22.09.2022, 01:09, "Tao Ren"
> > > &lt;rentao.bupt@gmail.com&gt;:</div><blockquote><p>Hi there,<br /><br
> > > />Recently I noticed a few Wedge400 (AST2520A2) units stuck after
> > > "reboot"<br />command. It's hard to reproduce (affecting ~1 out of
> > > 1,000 units), but<br />once it happens, I have to power cycle the
> > > chassis to recover OpenBMC.<br /><br />I checked aspeed_wdt.c and
> > > manually played with watchdog registers, but<br />everything looks
> > > normal to me. Did anyone hit the similar error before?<br />Any
> > > suggestions which area I should look into?<br /><br />Below are the
> > > last few lines of logs before OpenBMC hangs:<br /><br />bmc-oob
> > > login:<br />INIT: Sending processes configured via /etc/inittab the
> > > TERM signal<br />Stopping OpenBSD Secure Shell server: sshdstopped
> > > /usr/sbin/sshd (pid 7397 1189)<br />Stopping ntpd: done<br />stopping
> > > rsyslogd ... done<br />Stopping random number generator daemon.<br
> > > />Deconfiguring network interfaces... done.<br />Sending all processes
> > > the TERM signal...<br />rackmond[1747]: Got request exit[ 528.383133]
> > > watchdog: watchdog0: watchdog did not stop!<br />Sending all processes
> > > the KILL signal...<br />Unmounting remote filesystems...<br
> > > />Deactivating swap...<br />Unmounting local filesystems...<br
> > > />Rebooting... [ 529.725009] reboot: Restarting system<br /><br /><br
> > > />Cheers,<br /><br />Tao</p></blockquote><div> </div><div>
> > > </div><div>-- </div><div>Best regards,</div><div>Konstantin
> > > Klubnichkin,</div><div>lead firmware engineer,</div><div>server
> > > hardware R&amp;D group,</div><div>Yandex Moscow office.</div><div>tel:
> > > +7-903-510-33-33</div><div> </div>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-09-27  7:05 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-21 22:08 Wedge400 (AST2520) OpenBMC stuck at reboot Tao Ren
2022-09-22  2:10 ` Lei Yu
2022-09-22  6:21   ` Tao Ren
     [not found] ` <217031663833427@mail.yandex-team.ru>
2022-09-22 22:54   ` Tao Ren
2022-09-26  7:28     ` Chin-Ting Kuo
2022-09-27  7:04       ` Tao Ren

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.