All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chris Paterson <Chris.Paterson2@renesas.com>
To: "Bhola, Bikram" <Bikram_Bhola@mentor.com>, Pavel Machek <pavel@denx.de>
Cc: "cip-dev@lists.cip-project.org" <cip-dev@lists.cip-project.org>,
	Jan Kiszka <jan.kiszka@siemens.com>
Subject: RE: Prompt timeouts on ipc227e board -- randomness related?
Date: Tue, 5 Oct 2021 13:41:08 +0000	[thread overview]
Message-ID: <OSZPR01MB68770F80AE64400871800F6EB7AF9@OSZPR01MB6877.jpnprd01.prod.outlook.com> (raw)
In-Reply-To: <a34a0fcb84f449ef83e2843bec5a6d02@SVR-IES-MBX-03.mgc.mentorg.com>

Hello Bikram,

> From: Bhola, Bikram <Bikram_Bhola@mentor.com>
> Sent: 30 September 2021 12:19
> 
> Hi Chris,
> 
> We investigated the failure job and looks like before getting login prompt job
> timeout is happening . In the job definition  file - job timeout is mentioned
> 15mins and  sometimes due to slow network issue, it takes more time while
> downloading, untar  and deploying image. So we are seeing timeout during
> login prompt or in some cases in earlier stages also.  The work in progress to
> double up the network bandwidth within a few weeks, which will reduce the
> occurrence of this type of issues.

Thank you for your investigation.
I've have increased the timeout as you have suggested:
https://gitlab.com/cip-project/cip-testing/linux-cip-ci/-/merge_requests/49

One additional thing I've noticed, the default x86 character delay during boot is 500ms, which seems a long time inbetween each character sent to the platform
https://lava.ciplatform.org/scheduler/device/x86-simatic-ipc227e-01/devicedict#defline5

Has a lower value for boot_character_delay ever been tried?

Kind regards, Chris

> 
> Time being, with an increased job timeout to 20mins, failure is not observed.
> We tested 10 times to be working fine.
> Example :
> https://lava.ciplatform.org/scheduler/device/x86-simatic-ipc227e-01
> 
> 
> changes in the job definition file
> https://lava.ciplatform.org/scheduler/job/444336/definition 
> Current implementation
> ------------------------
> timeouts:
>   job:
>     minutes: 15
> 
> Need to Modify
> -----------------------------------
> timeouts:
>   job:
>     minutes: 20
> 
> 
> Regards,
> Bikram
> 
> -----Original Message-----
> From: Chris Paterson <Chris.Paterson2@renesas.com>
> Sent: 28 September 2021 15:38
> To: Pavel Machek <pavel@denx.de>; Bhola, Bikram
> <Bikram_Bhola@mentor.com>
> Cc: cip-dev@lists.cip-project.org; Jan Kiszka <jan.kiszka@siemens.com>
> Subject: RE: Prompt timeouts on ipc227e board -- randomness related?
> 
> Hello Pavel,
> 
> > From: Pavel Machek <pavel@denx.de>
> > Sent: 25 September 2021 21:06
> >
> > Hi!
> >
> > It is not first time I see this failure:
> 
> Thank you for reporting the issue.
> 
> Bikram is going to take a look for us (thank you).
> 
> Kind regards, Chris
> 
> >
> >
> https://jpn01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flava.c
> iplatform.org%2Fscheduler%2Fjob%2F444336&amp;data=04%7C01%7CChris.
> Paterson2%40renesas.com%7Cacb37b995a6c41d2090808d98404189e%7C53d
> 82571da1947e49cb4625a166a4a2a%7C0%7C0%7C637685975391318359%7CUn
> known%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6
> Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=yIPWAcblC17x6PpT3TSiGT
> QWiiinQiSuu9a3HRg4u3Q%3D&amp;reserved=0
> >
> >
> > [[0;32m  OK  [0m] Started Login Service.
> > [[0m[0;31m*     [0m] (1 of 2) A start job is running for…ate sshd host keys
> (7s /
> > no limit)[K[[0;1;31m*[0m[0;31m*    [0m] (1 of 2) A start job is running
> for…ate
> > sshd host keys (8s / no limit)[K[[0;31m*[0;1;31m*[0m[0;31m*   [0m] (1 of 2)
> > A start job is running for…ate sshd host keys (9s / no limit)[K[
> > [0;31m*[0;1;31m*[0m[0;31m*  [0m] (2 of 2) A start job is running
> for…evices-
> > eth0.device (8s / 1min 30s)[   19.855328] systemd[1]: apt-daily-
> > upgrade.timer: Adding 3min 2.027476s random time.
> > [   19.864207] systemd[1]: apt-daily.timer: Adding 1h 54min 15.794344s
> > random time.
> > [   21.406490] systemd[1]: apt-daily-upgrade.timer: Adding 55min
> 47.041488s
> > random time.
> > [   21.415357] systemd[1]: apt-daily.timer: Adding 11h 48min 4.457495s
> > random time.
> > [   22.049807] systemd[1]: apt-daily-upgrade.timer: Adding 3min 54.125406s
> > random time.
> > [   22.058500] systemd[1]: apt-daily.timer: Adding 8h 34min 47.388595s
> > random time.
> > [   22.511646] systemd[1]: apt-daily-upgrade.timer: Adding 25min
> 13.015405s
> > random time.
> > [   22.520510] systemd[1]: apt-daily.timer: Adding 11h 58min 24.212170s
> > random time.
> > [K[[0;32m  OK  [0m] Started Regenerate sshd host keys.
> > wait for prompt timed out
> > end: 2.3.4.1 login-action (duration 00:00:24) [common]
> > case: login-action
> > case_id: 9417066
> > definition: lava
> > duration: 23.98
> >
> > Any idea what is going on there? Is it just a test problem, or do we
> > have kernel regression that only happens sometimes?
> >
> > Best regards,
> > 								Pavel
> > --
> > DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
> > HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany


WARNING: multiple messages have this Message-ID (diff)
From: "Chris Paterson" <chris.paterson2@renesas.com>
To: "Bhola, Bikram" <Bikram_Bhola@mentor.com>, Pavel Machek <pavel@denx.de>
Cc: "cip-dev@lists.cip-project.org" <cip-dev@lists.cip-project.org>,
	Jan Kiszka <jan.kiszka@siemens.com>
Subject: Re: [cip-dev] Prompt timeouts on ipc227e board -- randomness related?
Date: Tue, 5 Oct 2021 13:41:08 +0000	[thread overview]
Message-ID: <OSZPR01MB68770F80AE64400871800F6EB7AF9@OSZPR01MB6877.jpnprd01.prod.outlook.com> (raw)
Message-ID: <20211005134108.tFhfqrNiUqrI7HWYHOPYEsEDb1ZTzf-6vreScGmLp_Q@z> (raw)
In-Reply-To: <a34a0fcb84f449ef83e2843bec5a6d02@SVR-IES-MBX-03.mgc.mentorg.com>

[-- Attachment #1: Type: text/plain, Size: 4653 bytes --]

Hello Bikram,

> From: Bhola, Bikram <Bikram_Bhola@mentor.com>
> Sent: 30 September 2021 12:19
> 
> Hi Chris,
> 
> We investigated the failure job and looks like before getting login prompt job
> timeout is happening . In the job definition  file - job timeout is mentioned
> 15mins and  sometimes due to slow network issue, it takes more time while
> downloading, untar  and deploying image. So we are seeing timeout during
> login prompt or in some cases in earlier stages also.  The work in progress to
> double up the network bandwidth within a few weeks, which will reduce the
> occurrence of this type of issues.

Thank you for your investigation.
I've have increased the timeout as you have suggested:
https://gitlab.com/cip-project/cip-testing/linux-cip-ci/-/merge_requests/49

One additional thing I've noticed, the default x86 character delay during boot is 500ms, which seems a long time inbetween each character sent to the platform
https://lava.ciplatform.org/scheduler/device/x86-simatic-ipc227e-01/devicedict#defline5

Has a lower value for boot_character_delay ever been tried?

Kind regards, Chris

> 
> Time being, with an increased job timeout to 20mins, failure is not observed.
> We tested 10 times to be working fine.
> Example :
> https://lava.ciplatform.org/scheduler/device/x86-simatic-ipc227e-01
> 
> 
> changes in the job definition file
> https://lava.ciplatform.org/scheduler/job/444336/definition 
> Current implementation
> ------------------------
> timeouts:
>   job:
>     minutes: 15
> 
> Need to Modify
> -----------------------------------
> timeouts:
>   job:
>     minutes: 20
> 
> 
> Regards,
> Bikram
> 
> -----Original Message-----
> From: Chris Paterson <Chris.Paterson2@renesas.com>
> Sent: 28 September 2021 15:38
> To: Pavel Machek <pavel@denx.de>; Bhola, Bikram
> <Bikram_Bhola@mentor.com>
> Cc: cip-dev@lists.cip-project.org; Jan Kiszka <jan.kiszka@siemens.com>
> Subject: RE: Prompt timeouts on ipc227e board -- randomness related?
> 
> Hello Pavel,
> 
> > From: Pavel Machek <pavel@denx.de>
> > Sent: 25 September 2021 21:06
> >
> > Hi!
> >
> > It is not first time I see this failure:
> 
> Thank you for reporting the issue.
> 
> Bikram is going to take a look for us (thank you).
> 
> Kind regards, Chris
> 
> >
> >
> https://jpn01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flava.c
> iplatform.org%2Fscheduler%2Fjob%2F444336&amp;data=04%7C01%7CChris.
> Paterson2%40renesas.com%7Cacb37b995a6c41d2090808d98404189e%7C53d
> 82571da1947e49cb4625a166a4a2a%7C0%7C0%7C637685975391318359%7CUn
> known%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6
> Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=yIPWAcblC17x6PpT3TSiGT
> QWiiinQiSuu9a3HRg4u3Q%3D&amp;reserved=0
> >
> >
> > [[0;32m  OK  [0m] Started Login Service.
> > [[0m[0;31m*     [0m] (1 of 2) A start job is running for…ate sshd host keys
> (7s /
> > no limit)[K[[0;1;31m*[0m[0;31m*    [0m] (1 of 2) A start job is running
> for…ate
> > sshd host keys (8s / no limit)[K[[0;31m*[0;1;31m*[0m[0;31m*   [0m] (1 of 2)
> > A start job is running for…ate sshd host keys (9s / no limit)[K[
> > [0;31m*[0;1;31m*[0m[0;31m*  [0m] (2 of 2) A start job is running
> for…evices-
> > eth0.device (8s / 1min 30s)[   19.855328] systemd[1]: apt-daily-
> > upgrade.timer: Adding 3min 2.027476s random time.
> > [   19.864207] systemd[1]: apt-daily.timer: Adding 1h 54min 15.794344s
> > random time.
> > [   21.406490] systemd[1]: apt-daily-upgrade.timer: Adding 55min
> 47.041488s
> > random time.
> > [   21.415357] systemd[1]: apt-daily.timer: Adding 11h 48min 4.457495s
> > random time.
> > [   22.049807] systemd[1]: apt-daily-upgrade.timer: Adding 3min 54.125406s
> > random time.
> > [   22.058500] systemd[1]: apt-daily.timer: Adding 8h 34min 47.388595s
> > random time.
> > [   22.511646] systemd[1]: apt-daily-upgrade.timer: Adding 25min
> 13.015405s
> > random time.
> > [   22.520510] systemd[1]: apt-daily.timer: Adding 11h 58min 24.212170s
> > random time.
> > [K[[0;32m  OK  [0m] Started Regenerate sshd host keys.
> > wait for prompt timed out
> > end: 2.3.4.1 login-action (duration 00:00:24) [common]
> > case: login-action
> > case_id: 9417066
> > definition: lava
> > duration: 23.98
> >
> > Any idea what is going on there? Is it just a test problem, or do we
> > have kernel regression that only happens sometimes?
> >
> > Best regards,
> > 								Pavel
> > --
> > DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
> > HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany

[-- Attachment #2: Type: text/plain, Size: 429 bytes --]


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#6785): https://lists.cip-project.org/g/cip-dev/message/6785
Mute This Topic: https://lists.cip-project.org/mt/85867572/4520388
Group Owner: cip-dev+owner@lists.cip-project.org
Unsubscribe: https://lists.cip-project.org/g/cip-dev/leave/10495289/4520388/727948398/xyzzy [cip-dev@archiver.kernel.org]
-=-=-=-=-=-=-=-=-=-=-=-


  parent reply	other threads:[~2021-10-05 13:41 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-25 20:06 Prompt timeouts on ipc227e board -- randomness related? Pavel Machek
2021-09-25 20:06 ` [cip-dev] " Pavel Machek
2021-09-28 10:08 ` Chris Paterson
2021-09-28 10:08   ` [cip-dev] " Chris Paterson
     [not found]   ` <a34a0fcb84f449ef83e2843bec5a6d02@SVR-IES-MBX-03.mgc.mentorg.com>
2021-10-05 13:41     ` Chris Paterson [this message]
2021-10-05 13:41       ` Chris Paterson
2021-10-05 17:47       ` Pavel Machek
2021-10-05 17:47         ` [cip-dev] " Pavel Machek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=OSZPR01MB68770F80AE64400871800F6EB7AF9@OSZPR01MB6877.jpnprd01.prod.outlook.com \
    --to=chris.paterson2@renesas.com \
    --cc=Bikram_Bhola@mentor.com \
    --cc=cip-dev@lists.cip-project.org \
    --cc=jan.kiszka@siemens.com \
    --cc=pavel@denx.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.