All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] gitlab: Move the n900 test into its own section
@ 2021-01-31  5:17 Simon Glass
  2021-01-31  5:17 ` [PATCH 2/2] buildman: Support single-threaded operation Simon Glass
  2021-01-31 12:15 ` [PATCH 1/2] gitlab: Move the n900 test into its own section Pali Rohár
  0 siblings, 2 replies; 14+ messages in thread
From: Simon Glass @ 2021-01-31  5:17 UTC (permalink / raw)
  To: u-boot

This test is not reliable. Quite often (20%?) it makes the build fail and
a retry succeeds. Move it to the end so it does not cause the whole build
to above.

Signed-off-by: Simon Glass <sjg@chromium.org>
---

 .gitlab-ci.yml | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml
index 4b0680887b5..d920b4b0192 100644
--- a/.gitlab-ci.yml
+++ b/.gitlab-ci.yml
@@ -5,10 +5,12 @@
 image: trini/u-boot-gitlab-ci-runner:bionic-20200807-02Sep2020
 
 # We run some tests in different order, to catch some failures quicker.
+# 'probation' is for tests that are flaky (pass and fail on the same commit)
 stages:
   - testsuites
   - test.py
   - world build
+  - probation
 
 .buildman_and_testpy_template: &buildman_and_testpy_dfn
   tags: [ 'all' ]
@@ -174,7 +176,7 @@ Run binman, buildman, dtoc, Kconfig and patman testsuites:
 
 Run tests for Nokia RX-51 (aka N900):
   tags: [ 'all' ]
-  stage: testsuites
+  stage: probation
   script:
     - ./tools/buildman/buildman --fetch-arch arm;
       export PATH=~/.buildman-toolchains/gcc-9.2.0-nolibc/arm-linux-gnueabi/bin/:$PATH;
-- 
2.30.0.365.g02bc693789-goog

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/2] buildman: Support single-threaded operation
  2021-01-31  5:17 [PATCH 1/2] gitlab: Move the n900 test into its own section Simon Glass
@ 2021-01-31  5:17 ` Simon Glass
  2021-03-05  2:02   ` Tom Rini
  2021-01-31 12:15 ` [PATCH 1/2] gitlab: Move the n900 test into its own section Pali Rohár
  1 sibling, 1 reply; 14+ messages in thread
From: Simon Glass @ 2021-01-31  5:17 UTC (permalink / raw)
  To: u-boot

At present even if only a single thread is in use, buildman still uses
threading.

For some debugging it is helpful to do everything in the main process.
Allow -T0 to support this.

Signed-off-by: Simon Glass <sjg@chromium.org>
---

 tools/buildman/README           |  5 +++
 tools/buildman/builder.py       | 60 +++++++++++++++++++++------------
 tools/buildman/builderthread.py | 16 +++++++--
 tools/buildman/cmdline.py       |  3 +-
 tools/buildman/control.py       |  2 +-
 tools/buildman/test.py          | 12 +++++--
 6 files changed, 68 insertions(+), 30 deletions(-)

diff --git a/tools/buildman/README b/tools/buildman/README
index b7442a95e56..600794790a0 100644
--- a/tools/buildman/README
+++ b/tools/buildman/README
@@ -1128,6 +1128,11 @@ If there are both warnings and errors, errors win, so buildman returns 100.
 The -y option is provided (for use with -s) to ignore the bountiful device-tree
 warnings. Similarly, -Y tells buildman to ignore the migration warnings.
 
+Sometimes you might get an error in a thread that is not handled by buildman,
+perhaps due to a failure of a tool that it calls. You might see the output, but
+then buildman hangs. Failing to handle any eventuality is a bug in buildman and
+should be reported. But you can use -T0 to disable threading and hopefully
+figure out the root cause of the build failure.
 
 Build summary
 =============
diff --git a/tools/buildman/builder.py b/tools/buildman/builder.py
index c93946842a3..3e3f76a908a 100644
--- a/tools/buildman/builder.py
+++ b/tools/buildman/builder.py
@@ -197,6 +197,8 @@ class Builder:
             last _timestamp_count builds. Each is a datetime object.
         _timestamp_count: Number of timestamps to keep in our list.
         _working_dir: Base working directory containing all threads
+        _single_builder: BuilderThread object for the singer builder, if
+            threading is not being used
     """
     class Outcome:
         """Records a build outcome for a single make invocation
@@ -309,19 +311,24 @@ class Builder:
         self._re_migration_warning = re.compile(r'^={21} WARNING ={22}\n.*\n=+\n',
                                                 re.MULTILINE | re.DOTALL)
 
-        self.queue = queue.Queue()
-        self.out_queue = queue.Queue()
-        for i in range(self.num_threads):
-            t = builderthread.BuilderThread(self, i, mrproper,
-                    per_board_out_dir)
+        if self.num_threads:
+            self._single_builder = None
+            self.queue = queue.Queue()
+            self.out_queue = queue.Queue()
+            for i in range(self.num_threads):
+                t = builderthread.BuilderThread(self, i, mrproper,
+                        per_board_out_dir)
+                t.setDaemon(True)
+                t.start()
+                self.threads.append(t)
+
+            t = builderthread.ResultThread(self)
             t.setDaemon(True)
             t.start()
             self.threads.append(t)
-
-        t = builderthread.ResultThread(self)
-        t.setDaemon(True)
-        t.start()
-        self.threads.append(t)
+        else:
+            self._single_builder = builderthread.BuilderThread(
+                self, -1, mrproper, per_board_out_dir)
 
         ignore_lines = ['(make.*Waiting for unfinished)', '(Segmentation fault)']
         self.re_make_err = re.compile('|'.join(ignore_lines))
@@ -1531,11 +1538,12 @@ class Builder:
         """Get the directory path to the working dir for a thread.
 
         Args:
-            thread_num: Number of thread to check.
+            thread_num: Number of thread to check (-1 for main process, which
+                is treated as 0)
         """
         if self.work_in_output:
             return self._working_dir
-        return os.path.join(self._working_dir, '%02d' % thread_num)
+        return os.path.join(self._working_dir, '%02d' % max(thread_num, 0))
 
     def _PrepareThread(self, thread_num, setup_git):
         """Prepare the working directory for a thread.
@@ -1594,7 +1602,9 @@ class Builder:
         if git-worktree is available, or clones the repo if it isn't.
 
         Args:
-            max_threads: Maximum number of threads we expect to need.
+            max_threads: Maximum number of threads we expect to need. If 0 then
+                1 is set up, since the main process still needs somewhere to
+                work
             setup_git: True to set up a git worktree or a git clone
         """
         builderthread.Mkdir(self._working_dir)
@@ -1608,7 +1618,9 @@ class Builder:
                 gitutil.PruneWorktrees(src_dir)
             else:
                 setup_git = 'clone'
-        for thread in range(max_threads):
+
+        # Always do at least one thread
+        for thread in range(max(max_threads, 1)):
             self._PrepareThread(thread, setup_git)
 
     def _GetOutputSpaceRemovals(self):
@@ -1686,16 +1698,20 @@ class Builder:
             job.keep_outputs = keep_outputs
             job.work_in_output = self.work_in_output
             job.step = self._step
-            self.queue.put(job)
+            if self.num_threads:
+                self.queue.put(job)
+            else:
+                results = self._single_builder.RunJob(job)
 
-        term = threading.Thread(target=self.queue.join)
-        term.setDaemon(True)
-        term.start()
-        while term.isAlive():
-            term.join(100)
+        if self.num_threads:
+            term = threading.Thread(target=self.queue.join)
+            term.setDaemon(True)
+            term.start()
+            while term.isAlive():
+                term.join(100)
 
-        # Wait until we have processed all output
-        self.out_queue.join()
+            # Wait until we have processed all output
+            self.out_queue.join()
         Print()
 
         msg = 'Completed: %d total built' % self.count
diff --git a/tools/buildman/builderthread.py b/tools/buildman/builderthread.py
index d6648685823..6c6dbd78725 100644
--- a/tools/buildman/builderthread.py
+++ b/tools/buildman/builderthread.py
@@ -89,7 +89,8 @@ class BuilderThread(threading.Thread):
     Members:
         builder: The builder which contains information we might need
         thread_num: Our thread number (0-n-1), used to decide on a
-                temporary directory
+                temporary directory. If this is -1 then there are no threads
+                and we are the (only) main process
     """
     def __init__(self, builder, thread_num, mrproper, per_board_out_dir):
         """Set up a new builder thread"""
@@ -445,6 +446,9 @@ class BuilderThread(threading.Thread):
 
         Args:
             job: Job to build
+
+        Returns:
+            List of Result objects
         """
         brd = job.board
         work_dir = self.builder.GetThreadDir(self.thread_num)
@@ -508,7 +512,10 @@ class BuilderThread(threading.Thread):
 
                 # We have the build results, so output the result
                 self._WriteResult(result, job.keep_outputs, job.work_in_output)
-                self.builder.out_queue.put(result)
+                if self.thread_num != -1:
+                    self.builder.out_queue.put(result)
+                else:
+                    self.builder.ProcessResult(result)
         else:
             # Just build the currently checked-out build
             result, request_config = self.RunCommit(None, brd, work_dir, True,
@@ -517,7 +524,10 @@ class BuilderThread(threading.Thread):
                         work_in_output=job.work_in_output)
             result.commit_upto = 0
             self._WriteResult(result, job.keep_outputs, job.work_in_output)
-            self.builder.out_queue.put(result)
+            if self.thread_num != -1:
+                self.builder.out_queue.put(result)
+            else:
+                self.builder.ProcessResult(result)
 
     def run(self):
         """Our thread's run function
diff --git a/tools/buildman/cmdline.py b/tools/buildman/cmdline.py
index 680c072d662..274b5ac3f45 100644
--- a/tools/buildman/cmdline.py
+++ b/tools/buildman/cmdline.py
@@ -97,7 +97,8 @@ def ParseArgs():
     parser.add_option('-t', '--test', action='store_true', dest='test',
                       default=False, help='run tests')
     parser.add_option('-T', '--threads', type='int',
-          default=None, help='Number of builder threads to use')
+          default=None,
+          help='Number of builder threads to use (0=single-thread)')
     parser.add_option('-u', '--show_unknown', action='store_true',
           default=False, help='Show boards with unknown build result')
     parser.add_option('-U', '--show-environment', action='store_true',
diff --git a/tools/buildman/control.py b/tools/buildman/control.py
index fe874b8165b..a7675701466 100644
--- a/tools/buildman/control.py
+++ b/tools/buildman/control.py
@@ -294,7 +294,7 @@ def DoBuildman(options, args, toolchains=None, make_func=None, boards=None,
 
     # By default we have one thread per CPU. But if there are not enough jobs
     # we can have fewer threads and use a high '-j' value for make.
-    if not options.threads:
+    if options.threads is None:
         options.threads = min(multiprocessing.cpu_count(), len(selected))
     if not options.jobs:
         options.jobs = max(1, (multiprocessing.cpu_count() +
diff --git a/tools/buildman/test.py b/tools/buildman/test.py
index 1a259d54ab0..b9c65c0d326 100644
--- a/tools/buildman/test.py
+++ b/tools/buildman/test.py
@@ -187,7 +187,7 @@ class TestBuild(unittest.TestCase):
             expect += col.Color(expected_colour, ' %s' % board)
         self.assertEqual(text, expect)
 
-    def _SetupTest(self, echo_lines=False, **kwdisplay_args):
+    def _SetupTest(self, echo_lines=False, threads=1, **kwdisplay_args):
         """Set up the test by running a build and summary
 
         Args:
@@ -199,8 +199,8 @@ class TestBuild(unittest.TestCase):
         Returns:
             Iterator containing the output lines, each a PrintLine() object
         """
-        build = builder.Builder(self.toolchains, self.base_dir, None, 1, 2,
-                                checkout=False, show_unknown=False)
+        build = builder.Builder(self.toolchains, self.base_dir, None, threads,
+                                2, checkout=False, show_unknown=False)
         build.do_make = self.Make
         board_selected = self.boards.GetSelectedDict()
 
@@ -438,6 +438,12 @@ class TestBuild(unittest.TestCase):
                                 filter_migration_warnings=True)
         self._CheckOutput(lines, filter_migration_warnings=True)
 
+    def testSingleThread(self):
+        """Test operation without threading"""
+        lines = self._SetupTest(show_errors=True, threads=0)
+        self._CheckOutput(lines, list_error_boards=False,
+                          filter_dtb_warnings=False)
+
     def _testGit(self):
         """Test basic builder operation by building a branch"""
         options = Options()
-- 
2.30.0.365.g02bc693789-goog

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 1/2] gitlab: Move the n900 test into its own section
  2021-01-31  5:17 [PATCH 1/2] gitlab: Move the n900 test into its own section Simon Glass
  2021-01-31  5:17 ` [PATCH 2/2] buildman: Support single-threaded operation Simon Glass
@ 2021-01-31 12:15 ` Pali Rohár
  2021-01-31 13:49   ` Tom Rini
  1 sibling, 1 reply; 14+ messages in thread
From: Pali Rohár @ 2021-01-31 12:15 UTC (permalink / raw)
  To: u-boot

On Saturday 30 January 2021 22:17:45 Simon Glass wrote:
> This test is not reliable. Quite often (20%?) it makes the build fail and
> a retry succeeds.

This test should work. Are there any logs with issues?

> Move it to the end so it does not cause the whole build to above.
> 
> Signed-off-by: Simon Glass <sjg@chromium.org>
> ---
> 
>  .gitlab-ci.yml | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml
> index 4b0680887b5..d920b4b0192 100644
> --- a/.gitlab-ci.yml
> +++ b/.gitlab-ci.yml
> @@ -5,10 +5,12 @@
>  image: trini/u-boot-gitlab-ci-runner:bionic-20200807-02Sep2020
>  
>  # We run some tests in different order, to catch some failures quicker.
> +# 'probation' is for tests that are flaky (pass and fail on the same commit)
>  stages:
>    - testsuites
>    - test.py
>    - world build
> +  - probation
>  
>  .buildman_and_testpy_template: &buildman_and_testpy_dfn
>    tags: [ 'all' ]
> @@ -174,7 +176,7 @@ Run binman, buildman, dtoc, Kconfig and patman testsuites:
>  
>  Run tests for Nokia RX-51 (aka N900):
>    tags: [ 'all' ]
> -  stage: testsuites
> +  stage: probation
>    script:
>      - ./tools/buildman/buildman --fetch-arch arm;
>        export PATH=~/.buildman-toolchains/gcc-9.2.0-nolibc/arm-linux-gnueabi/bin/:$PATH;
> -- 
> 2.30.0.365.g02bc693789-goog
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/2] gitlab: Move the n900 test into its own section
  2021-01-31 12:15 ` [PATCH 1/2] gitlab: Move the n900 test into its own section Pali Rohár
@ 2021-01-31 13:49   ` Tom Rini
  2021-01-31 15:04     ` Pali Rohár
  2021-02-01  2:54     ` Heinrich Schuchardt
  0 siblings, 2 replies; 14+ messages in thread
From: Tom Rini @ 2021-01-31 13:49 UTC (permalink / raw)
  To: u-boot

On Sun, Jan 31, 2021 at 01:15:20PM +0100, Pali Roh?r wrote:
> On Saturday 30 January 2021 22:17:45 Simon Glass wrote:
> > This test is not reliable. Quite often (20%?) it makes the build fail and
> > a retry succeeds.
> 
> This test should work. Are there any logs with issues?

I don't see it failing any more often than other tests do, due to
network connectivity issues.  That may be helped by, now that we've
dropped Travis, having the container be pre-populated with more of the
downloaded files and pre-building the special QEMU.

-- 
Tom
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 659 bytes
Desc: not available
URL: <https://lists.denx.de/pipermail/u-boot/attachments/20210131/582380f6/attachment.sig>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/2] gitlab: Move the n900 test into its own section
  2021-01-31 13:49   ` Tom Rini
@ 2021-01-31 15:04     ` Pali Rohár
  2021-01-31 15:43       ` Simon Glass
  2021-02-01  2:54     ` Heinrich Schuchardt
  1 sibling, 1 reply; 14+ messages in thread
From: Pali Rohár @ 2021-01-31 15:04 UTC (permalink / raw)
  To: u-boot

On Sunday 31 January 2021 08:49:20 Tom Rini wrote:
> On Sun, Jan 31, 2021 at 01:15:20PM +0100, Pali Roh?r wrote:
> > On Saturday 30 January 2021 22:17:45 Simon Glass wrote:
> > > This test is not reliable. Quite often (20%?) it makes the build fail and
> > > a retry succeeds.
> > 
> > This test should work. Are there any logs with issues?
> 
> I don't see it failing any more often than other tests do, due to
> network connectivity issues.  That may be helped by, now that we've
> dropped Travis, having the container be pre-populated with more of the
> downloaded files and pre-building the special QEMU.

If there are just network issue problems then pre-downloading required
files into cache / container should resolve them.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/2] gitlab: Move the n900 test into its own section
  2021-01-31 15:04     ` Pali Rohár
@ 2021-01-31 15:43       ` Simon Glass
  2021-01-31 15:51         ` Pali Rohár
  0 siblings, 1 reply; 14+ messages in thread
From: Simon Glass @ 2021-01-31 15:43 UTC (permalink / raw)
  To: u-boot

Hi Pali,

On Sun, 31 Jan 2021 at 08:04, Pali Roh?r <pali@kernel.org> wrote:
>
> On Sunday 31 January 2021 08:49:20 Tom Rini wrote:
> > On Sun, Jan 31, 2021 at 01:15:20PM +0100, Pali Roh?r wrote:
> > > On Saturday 30 January 2021 22:17:45 Simon Glass wrote:
> > > > This test is not reliable. Quite often (20%?) it makes the build fail and
> > > > a retry succeeds.
> > >
> > > This test should work. Are there any logs with issues?
> >
> > I don't see it failing any more often than other tests do, due to
> > network connectivity issues.  That may be helped by, now that we've
> > dropped Travis, having the container be pre-populated with more of the
> > downloaded files and pre-building the special QEMU.
>
> If there are just network issue problems then pre-downloading required
> files into cache / container should resolve them.

The flake issues I see are like this:

https://gitlab.denx.de/u-boot/custodians/u-boot-dm/-/jobs/202441

I am not sure of the cause, but it would be good to fix it!

Re the network issues, I have a persistent DNS problem with my
network. I am really not sure of the root cause but sometimes it will
fail to find a host, then succeed 5 seconds later. I spent some time
on it a few weeks ago but will try again.

Regards,
Simon

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/2] gitlab: Move the n900 test into its own section
  2021-01-31 15:43       ` Simon Glass
@ 2021-01-31 15:51         ` Pali Rohár
  2021-01-31 16:51           ` Simon Glass
  0 siblings, 1 reply; 14+ messages in thread
From: Pali Rohár @ 2021-01-31 15:51 UTC (permalink / raw)
  To: u-boot

On Sunday 31 January 2021 08:43:19 Simon Glass wrote:
> Hi Pali,
> 
> On Sun, 31 Jan 2021 at 08:04, Pali Roh?r <pali@kernel.org> wrote:
> >
> > On Sunday 31 January 2021 08:49:20 Tom Rini wrote:
> > > On Sun, Jan 31, 2021 at 01:15:20PM +0100, Pali Roh?r wrote:
> > > > On Saturday 30 January 2021 22:17:45 Simon Glass wrote:
> > > > > This test is not reliable. Quite often (20%?) it makes the build fail and
> > > > > a retry succeeds.
> > > >
> > > > This test should work. Are there any logs with issues?
> > >
> > > I don't see it failing any more often than other tests do, due to
> > > network connectivity issues.  That may be helped by, now that we've
> > > dropped Travis, having the container be pre-populated with more of the
> > > downloaded files and pre-building the special QEMU.
> >
> > If there are just network issue problems then pre-downloading required
> > files into cache / container should resolve them.
> 
> The flake issues I see are like this:
> 
> https://gitlab.denx.de/u-boot/custodians/u-boot-dm/-/jobs/202441
> 
> I am not sure of the cause, but it would be good to fix it!

Hello Simon! This is not a network issue problem but rather some U-Boot
regression in mmc code. Second test failed with error:

    "Failed to boot kernel from eMMC"

Other tests succeed:

    "Kernel was successfully booted from RAM"
    "Kernel was successfully booted from OneNAND"

So problem is really with second boot attempt from eMMC. U-Boot log is
also available in output (as second run):

    Check if pads/pull-ups of bus are properly configured
    Timed out in wait_for_event: status=0000
    ...
    Timed out in wait_for_event: status=0000
    Check if pads/pull-ups of bus are properly configured
    Timed out in wait_for_event: status=0000
    Check if pads/pull-ups of bus are properly configured
    Timed out in wait_for_event: status=0000
    Check if pads/pull-ups of bus are properly configured
    test/nokia_rx51_test.sh: line 233:  5946 Killed                  ./qemu-system-arm -M n900 -mtdblock mtd_emmc.img -sd emmc_emmc.img -serial /dev/stdout -display none > qemu_emmc.log

After 300s was qemu killed and test marked as failure.

So this is valid failure and regression in u-boot emmc code. So it would
be needed to identify which commit caused it and revert it...

> Re the network issues, I have a persistent DNS problem with my
> network. I am really not sure of the root cause but sometimes it will
> fail to find a host, then succeed 5 seconds later. I spent some time
> on it a few weeks ago but will try again.
> 
> Regards,
> Simon

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/2] gitlab: Move the n900 test into its own section
  2021-01-31 15:51         ` Pali Rohár
@ 2021-01-31 16:51           ` Simon Glass
  2021-01-31 17:05             ` Pali Rohár
  0 siblings, 1 reply; 14+ messages in thread
From: Simon Glass @ 2021-01-31 16:51 UTC (permalink / raw)
  To: u-boot

Hi Pali,

On Sun, 31 Jan 2021 at 08:52, Pali Roh?r <pali@kernel.org> wrote:
>
> On Sunday 31 January 2021 08:43:19 Simon Glass wrote:
> > Hi Pali,
> >
> > On Sun, 31 Jan 2021 at 08:04, Pali Roh?r <pali@kernel.org> wrote:
> > >
> > > On Sunday 31 January 2021 08:49:20 Tom Rini wrote:
> > > > On Sun, Jan 31, 2021 at 01:15:20PM +0100, Pali Roh?r wrote:
> > > > > On Saturday 30 January 2021 22:17:45 Simon Glass wrote:
> > > > > > This test is not reliable. Quite often (20%?) it makes the build fail and
> > > > > > a retry succeeds.
> > > > >
> > > > > This test should work. Are there any logs with issues?
> > > >
> > > > I don't see it failing any more often than other tests do, due to
> > > > network connectivity issues.  That may be helped by, now that we've
> > > > dropped Travis, having the container be pre-populated with more of the
> > > > downloaded files and pre-building the special QEMU.
> > >
> > > If there are just network issue problems then pre-downloading required
> > > files into cache / container should resolve them.
> >
> > The flake issues I see are like this:
> >
> > https://gitlab.denx.de/u-boot/custodians/u-boot-dm/-/jobs/202441
> >
> > I am not sure of the cause, but it would be good to fix it!
>
> Hello Simon! This is not a network issue problem but rather some U-Boot
> regression in mmc code. Second test failed with error:
>
>     "Failed to boot kernel from eMMC"
>
> Other tests succeed:
>
>     "Kernel was successfully booted from RAM"
>     "Kernel was successfully booted from OneNAND"
>
> So problem is really with second boot attempt from eMMC. U-Boot log is
> also available in output (as second run):
>
>     Check if pads/pull-ups of bus are properly configured
>     Timed out in wait_for_event: status=0000
>     ...
>     Timed out in wait_for_event: status=0000
>     Check if pads/pull-ups of bus are properly configured
>     Timed out in wait_for_event: status=0000
>     Check if pads/pull-ups of bus are properly configured
>     Timed out in wait_for_event: status=0000
>     Check if pads/pull-ups of bus are properly configured
>     test/nokia_rx51_test.sh: line 233:  5946 Killed                  ./qemu-system-arm -M n900 -mtdblock mtd_emmc.img -sd emmc_emmc.img -serial /dev/stdout -display none > qemu_emmc.log
>
> After 300s was qemu killed and test marked as failure.
>
> So this is valid failure and regression in u-boot emmc code. So it would
> be needed to identify which commit caused it and revert it...

The problem is that it is intermittent. Can you repeat it?

>
> > Re the network issues, I have a persistent DNS problem with my
> > network. I am really not sure of the root cause but sometimes it will
> > fail to find a host, then succeed 5 seconds later. I spent some time
> > on it a few weeks ago but will try again.

Regards,
Simon

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/2] gitlab: Move the n900 test into its own section
  2021-01-31 16:51           ` Simon Glass
@ 2021-01-31 17:05             ` Pali Rohár
  2021-01-31 17:10               ` Simon Glass
  0 siblings, 1 reply; 14+ messages in thread
From: Pali Rohár @ 2021-01-31 17:05 UTC (permalink / raw)
  To: u-boot

On Sunday 31 January 2021 09:51:44 Simon Glass wrote:
> Hi Pali,
> 
> On Sun, 31 Jan 2021 at 08:52, Pali Roh?r <pali@kernel.org> wrote:
> >
> > On Sunday 31 January 2021 08:43:19 Simon Glass wrote:
> > > Hi Pali,
> > >
> > > On Sun, 31 Jan 2021 at 08:04, Pali Roh?r <pali@kernel.org> wrote:
> > > >
> > > > On Sunday 31 January 2021 08:49:20 Tom Rini wrote:
> > > > > On Sun, Jan 31, 2021 at 01:15:20PM +0100, Pali Roh?r wrote:
> > > > > > On Saturday 30 January 2021 22:17:45 Simon Glass wrote:
> > > > > > > This test is not reliable. Quite often (20%?) it makes the build fail and
> > > > > > > a retry succeeds.
> > > > > >
> > > > > > This test should work. Are there any logs with issues?
> > > > >
> > > > > I don't see it failing any more often than other tests do, due to
> > > > > network connectivity issues.  That may be helped by, now that we've
> > > > > dropped Travis, having the container be pre-populated with more of the
> > > > > downloaded files and pre-building the special QEMU.
> > > >
> > > > If there are just network issue problems then pre-downloading required
> > > > files into cache / container should resolve them.
> > >
> > > The flake issues I see are like this:
> > >
> > > https://gitlab.denx.de/u-boot/custodians/u-boot-dm/-/jobs/202441
> > >
> > > I am not sure of the cause, but it would be good to fix it!
> >
> > Hello Simon! This is not a network issue problem but rather some U-Boot
> > regression in mmc code. Second test failed with error:
> >
> >     "Failed to boot kernel from eMMC"
> >
> > Other tests succeed:
> >
> >     "Kernel was successfully booted from RAM"
> >     "Kernel was successfully booted from OneNAND"
> >
> > So problem is really with second boot attempt from eMMC. U-Boot log is
> > also available in output (as second run):
> >
> >     Check if pads/pull-ups of bus are properly configured
> >     Timed out in wait_for_event: status=0000
> >     ...
> >     Timed out in wait_for_event: status=0000
> >     Check if pads/pull-ups of bus are properly configured
> >     Timed out in wait_for_event: status=0000
> >     Check if pads/pull-ups of bus are properly configured
> >     Timed out in wait_for_event: status=0000
> >     Check if pads/pull-ups of bus are properly configured
> >     test/nokia_rx51_test.sh: line 233:  5946 Killed                  ./qemu-system-arm -M n900 -mtdblock mtd_emmc.img -sd emmc_emmc.img -serial /dev/stdout -display none > qemu_emmc.log
> >
> > After 300s was qemu killed and test marked as failure.
> >
> > So this is valid failure and regression in u-boot emmc code. So it would
> > be needed to identify which commit caused it and revert it...
> 
> The problem is that it is intermittent. Can you repeat it?

So when you run this test more times from same sources / git commit,
this error appears only sometimes?

This particular issue I have not seen in qemu yet when I run tests on my
local machine. So I cannot reproduce it.

I saw similar errors, but only on real device (not in qemu) and they
were visible always (not sometimes). And for all my known problems I
have sent patches to mailing list. including i2c, mmc and usb. Some of
them are still waiting for review & merge...

===

I know only one error which is not fixed yet and happens "only
sometimes" which I was not able to debug yet. Probably if u-boot binary
has particular size then it completely crashes (and with same binary it
can be reproduced for every run). But recompiling u-boot binary resolves
this issue and sometimes even without modifying source code. So I
suspect that time&date string (which changes for every recompilation)
must have some effect (maybe some +-1 padding?). Adding new random 100
characters into env variables seems to fix it.

> >
> > > Re the network issues, I have a persistent DNS problem with my
> > > network. I am really not sure of the root cause but sometimes it will
> > > fail to find a host, then succeed 5 seconds later. I spent some time
> > > on it a few weeks ago but will try again.
> 
> Regards,
> Simon

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/2] gitlab: Move the n900 test into its own section
  2021-01-31 17:05             ` Pali Rohár
@ 2021-01-31 17:10               ` Simon Glass
  2021-01-31 17:31                 ` Pali Rohár
  0 siblings, 1 reply; 14+ messages in thread
From: Simon Glass @ 2021-01-31 17:10 UTC (permalink / raw)
  To: u-boot

Hi Pali,

On Sun, 31 Jan 2021 at 10:05, Pali Roh?r <pali@kernel.org> wrote:
>
> On Sunday 31 January 2021 09:51:44 Simon Glass wrote:
> > Hi Pali,
> >
> > On Sun, 31 Jan 2021 at 08:52, Pali Roh?r <pali@kernel.org> wrote:
> > >
> > > On Sunday 31 January 2021 08:43:19 Simon Glass wrote:
> > > > Hi Pali,
> > > >
> > > > On Sun, 31 Jan 2021 at 08:04, Pali Roh?r <pali@kernel.org> wrote:
> > > > >
> > > > > On Sunday 31 January 2021 08:49:20 Tom Rini wrote:
> > > > > > On Sun, Jan 31, 2021 at 01:15:20PM +0100, Pali Roh?r wrote:
> > > > > > > On Saturday 30 January 2021 22:17:45 Simon Glass wrote:
> > > > > > > > This test is not reliable. Quite often (20%?) it makes the build fail and
> > > > > > > > a retry succeeds.
> > > > > > >
> > > > > > > This test should work. Are there any logs with issues?
> > > > > >
> > > > > > I don't see it failing any more often than other tests do, due to
> > > > > > network connectivity issues.  That may be helped by, now that we've
> > > > > > dropped Travis, having the container be pre-populated with more of the
> > > > > > downloaded files and pre-building the special QEMU.
> > > > >
> > > > > If there are just network issue problems then pre-downloading required
> > > > > files into cache / container should resolve them.
> > > >
> > > > The flake issues I see are like this:
> > > >
> > > > https://gitlab.denx.de/u-boot/custodians/u-boot-dm/-/jobs/202441
> > > >
> > > > I am not sure of the cause, but it would be good to fix it!
> > >
> > > Hello Simon! This is not a network issue problem but rather some U-Boot
> > > regression in mmc code. Second test failed with error:
> > >
> > >     "Failed to boot kernel from eMMC"
> > >
> > > Other tests succeed:
> > >
> > >     "Kernel was successfully booted from RAM"
> > >     "Kernel was successfully booted from OneNAND"
> > >
> > > So problem is really with second boot attempt from eMMC. U-Boot log is
> > > also available in output (as second run):
> > >
> > >     Check if pads/pull-ups of bus are properly configured
> > >     Timed out in wait_for_event: status=0000
> > >     ...
> > >     Timed out in wait_for_event: status=0000
> > >     Check if pads/pull-ups of bus are properly configured
> > >     Timed out in wait_for_event: status=0000
> > >     Check if pads/pull-ups of bus are properly configured
> > >     Timed out in wait_for_event: status=0000
> > >     Check if pads/pull-ups of bus are properly configured
> > >     test/nokia_rx51_test.sh: line 233:  5946 Killed                  ./qemu-system-arm -M n900 -mtdblock mtd_emmc.img -sd emmc_emmc.img -serial /dev/stdout -display none > qemu_emmc.log
> > >
> > > After 300s was qemu killed and test marked as failure.
> > >
> > > So this is valid failure and regression in u-boot emmc code. So it would
> > > be needed to identify which commit caused it and revert it...
> >
> > The problem is that it is intermittent. Can you repeat it?
>
> So when you run this test more times from same sources / git commit,
> this error appears only sometimes?

Perhaps 1 time in 5 or 10? Every time I click 'retry' in gitlab it
tries again and passes.

>
> This particular issue I have not seen in qemu yet when I run tests on my
> local machine. So I cannot reproduce it.
>
> I saw similar errors, but only on real device (not in qemu) and they
> were visible always (not sometimes). And for all my known problems I
> have sent patches to mailing list. including i2c, mmc and usb. Some of
> them are still waiting for review & merge...

So perhaps it has been fixed, but not yet merged?

>
> ===
>
> I know only one error which is not fixed yet and happens "only
> sometimes" which I was not able to debug yet. Probably if u-boot binary
> has particular size then it completely crashes (and with same binary it
> can be reproduced for every run). But recompiling u-boot binary resolves
> this issue and sometimes even without modifying source code. So I
> suspect that time&date string (which changes for every recompilation)
> must have some effect (maybe some +-1 padding?). Adding new random 100
> characters into env variables seems to fix it.

That's not good.

Re the analsys, that seems a bit of a stretch. While the time/date
changes, its length doesn't normally change.

Uninited values can have any behaviour. I assumes this is in U-Boot
proper, not SPL? You could check that BSS variables are not used
before relocation, perhaps?

>
> > >
> > > > Re the network issues, I have a persistent DNS problem with my
> > > > network. I am really not sure of the root cause but sometimes it will
> > > > fail to find a host, then succeed 5 seconds later. I spent some time
> > > > on it a few weeks ago but will try again.
Regards,
Simon

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/2] gitlab: Move the n900 test into its own section
  2021-01-31 17:10               ` Simon Glass
@ 2021-01-31 17:31                 ` Pali Rohár
  0 siblings, 0 replies; 14+ messages in thread
From: Pali Rohár @ 2021-01-31 17:31 UTC (permalink / raw)
  To: u-boot

On Sunday 31 January 2021 10:10:56 Simon Glass wrote:
> Hi Pali,
> 
> On Sun, 31 Jan 2021 at 10:05, Pali Roh?r <pali@kernel.org> wrote:
> >
> > On Sunday 31 January 2021 09:51:44 Simon Glass wrote:
> > > Hi Pali,
> > >
> > > On Sun, 31 Jan 2021 at 08:52, Pali Roh?r <pali@kernel.org> wrote:
> > > >
> > > > On Sunday 31 January 2021 08:43:19 Simon Glass wrote:
> > > > > Hi Pali,
> > > > >
> > > > > On Sun, 31 Jan 2021 at 08:04, Pali Roh?r <pali@kernel.org> wrote:
> > > > > >
> > > > > > On Sunday 31 January 2021 08:49:20 Tom Rini wrote:
> > > > > > > On Sun, Jan 31, 2021 at 01:15:20PM +0100, Pali Roh?r wrote:
> > > > > > > > On Saturday 30 January 2021 22:17:45 Simon Glass wrote:
> > > > > > > > > This test is not reliable. Quite often (20%?) it makes the build fail and
> > > > > > > > > a retry succeeds.
> > > > > > > >
> > > > > > > > This test should work. Are there any logs with issues?
> > > > > > >
> > > > > > > I don't see it failing any more often than other tests do, due to
> > > > > > > network connectivity issues.  That may be helped by, now that we've
> > > > > > > dropped Travis, having the container be pre-populated with more of the
> > > > > > > downloaded files and pre-building the special QEMU.
> > > > > >
> > > > > > If there are just network issue problems then pre-downloading required
> > > > > > files into cache / container should resolve them.
> > > > >
> > > > > The flake issues I see are like this:
> > > > >
> > > > > https://gitlab.denx.de/u-boot/custodians/u-boot-dm/-/jobs/202441
> > > > >
> > > > > I am not sure of the cause, but it would be good to fix it!
> > > >
> > > > Hello Simon! This is not a network issue problem but rather some U-Boot
> > > > regression in mmc code. Second test failed with error:
> > > >
> > > >     "Failed to boot kernel from eMMC"
> > > >
> > > > Other tests succeed:
> > > >
> > > >     "Kernel was successfully booted from RAM"
> > > >     "Kernel was successfully booted from OneNAND"
> > > >
> > > > So problem is really with second boot attempt from eMMC. U-Boot log is
> > > > also available in output (as second run):
> > > >
> > > >     Check if pads/pull-ups of bus are properly configured
> > > >     Timed out in wait_for_event: status=0000
> > > >     ...
> > > >     Timed out in wait_for_event: status=0000
> > > >     Check if pads/pull-ups of bus are properly configured
> > > >     Timed out in wait_for_event: status=0000
> > > >     Check if pads/pull-ups of bus are properly configured
> > > >     Timed out in wait_for_event: status=0000
> > > >     Check if pads/pull-ups of bus are properly configured
> > > >     test/nokia_rx51_test.sh: line 233:  5946 Killed                  ./qemu-system-arm -M n900 -mtdblock mtd_emmc.img -sd emmc_emmc.img -serial /dev/stdout -display none > qemu_emmc.log
> > > >
> > > > After 300s was qemu killed and test marked as failure.
> > > >
> > > > So this is valid failure and regression in u-boot emmc code. So it would
> > > > be needed to identify which commit caused it and revert it...
> > >
> > > The problem is that it is intermittent. Can you repeat it?
> >
> > So when you run this test more times from same sources / git commit,
> > this error appears only sometimes?
> 
> Perhaps 1 time in 5 or 10? Every time I click 'retry' in gitlab it
> tries again and passes.

It would be interested to know if problem is with compiled binary (and
rebuilding fixes it) or problem is in qemu runtime part (same compiled
binary sometimes passes and sometimes fails).

But as I have not see this issue, I do not know what is happening here.

> >
> > This particular issue I have not seen in qemu yet when I run tests on my
> > local machine. So I cannot reproduce it.
> >
> > I saw similar errors, but only on real device (not in qemu) and they
> > were visible always (not sometimes). And for all my known problems I
> > have sent patches to mailing list. including i2c, mmc and usb. Some of
> > them are still waiting for review & merge...
> 
> So perhaps it has been fixed, but not yet merged?

Yea, this is possible.

> >
> > ===
> >
> > I know only one error which is not fixed yet and happens "only
> > sometimes" which I was not able to debug yet. Probably if u-boot binary
> > has particular size then it completely crashes (and with same binary it
> > can be reproduced for every run). But recompiling u-boot binary resolves
> > this issue and sometimes even without modifying source code. So I
> > suspect that time&date string (which changes for every recompilation)
> > must have some effect (maybe some +-1 padding?). Adding new random 100
> > characters into env variables seems to fix it.
> 
> That's not good.
> 
> Re the analsys, that seems a bit of a stretch. While the time/date
> changes, its length doesn't normally change.
> 
> Uninited values can have any behaviour. I assumes this is in U-Boot
> proper, not SPL? You could check that BSS variables are not used
> before relocation, perhaps?

This is U-Boot binary. N900 does not use SPL at all. U-Boot binary is
loaded and executed by (proprietary) Nokia loader directly to RAM and it
do almost all HW initialization.

And it is even more strange. If build produce binary which does not work
on real device, it always crashes on real devices. But same binary is
working fine in qemu (so no way to debug it). And if I start qemu in
debug mode, ready for attaching gdb to look at this issue, it somehow
disappear... Total heisenbug. I have no idea if bug is in u-boot code or
in gcc (because also recompiling with different gcc version and
different flags hides it)...

I caught this issue in qemu with attached gdb only once. This is my
screen from terminal, I do not have nothing more. U-Boot crashed on
division by zero error because htab->size was zero:


(gdb) bt
#0  __aeabi_uidivmod () at arch/arm/lib/lib1funcs.S:325
#1  0x8002f054 in hsearch_r (item=..., action=ENV_FIND, retval=0x8fd12cec, htab=0x80041348, flag=0) at lib/hashtable.c:313
#2  0x80011f68 in env_get (name=0x8fd19830 "switchmmc") at cmd/nvedit.c:677
#3  env_get (name=0x8fd19830 "switchmmc") at cmd/nvedit.c:668
#4  0x800187e0 in do_run (cmdtp=<optimized out>, flag=<optimized out>, argc=2, argv=<optimized out>) at common/cli.c:142
#5  0x8fe042cc in ?? ()
#6  0x8fe042cc in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) up
#1  0x8002f054 in hsearch_r (item=..., action=ENV_FIND, retval=0x8fd12cec, htab=0x80041348, flag=0) at lib/hashtable.c:313
313             hval %= htab->size;
(gdb) print htab
$6 = (struct hsearch_data *) 0x80041348
(gdb) print *htab
$7 = {table = 0x0, size = 0, filled = 0, change_ok = 0x8002ab20 <env_flags_validate>}
(gdb) info registers
r0             0x43eab0e3       1139454179
r1             0x0      0
r2             0x73     115
r3             0x8fd19830       -1882089424
r4             0x80041348       -2147216568
r5             0x8fd19830       -1882089424
r6             0x2      2
r7             0x0      0
r8             0x8fd19830       -1882089424
r9             0x8fd12ee0       -1882116384
r10            0x0      0
r11            0x8fe0419c       -1881128548
r12            0x8fd12cb8       -1882116936
sp             0x8fd12ca0       0x8fd12ca0
lr             0x8002f054       -2147291052
pc             0x8002f054       0x8002f054 <hsearch_r+56>
cpsr           0x600001d3       1610613203

	/*
	 * First hash function:
	 * simply take the modul but prevent zero.
	 */
	hval %= htab->size;
	if (hval == 0)
		++hval;


I spend more time with it and I was not able to debug it more. And now I
do not have time to look at it again. For me this one issue does not
make sense at all. And because workaround exist (recompile binary,
possibly by padding dummy env variable) I stopped investigation.

But I think you must see something different as in my case this issue
cause U-Boot crash prior staring bootmenu and boot procedure...

> >
> > > >
> > > > > Re the network issues, I have a persistent DNS problem with my
> > > > > network. I am really not sure of the root cause but sometimes it will
> > > > > fail to find a host, then succeed 5 seconds later. I spent some time
> > > > > on it a few weeks ago but will try again.
> Regards,
> Simon

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/2] gitlab: Move the n900 test into its own section
  2021-01-31 13:49   ` Tom Rini
  2021-01-31 15:04     ` Pali Rohár
@ 2021-02-01  2:54     ` Heinrich Schuchardt
  2021-02-01  4:01       ` Tom Rini
  1 sibling, 1 reply; 14+ messages in thread
From: Heinrich Schuchardt @ 2021-02-01  2:54 UTC (permalink / raw)
  To: u-boot

On 1/31/21 2:49 PM, Tom Rini wrote:
> On Sun, Jan 31, 2021 at 01:15:20PM +0100, Pali Roh?r wrote:
>> On Saturday 30 January 2021 22:17:45 Simon Glass wrote:
>>> This test is not reliable. Quite often (20%?) it makes the build fail and
>>> a retry succeeds.
>>
>> This test should work. Are there any logs with issues?
>
> I don't see it failing any more often than other tests do, due to
> network connectivity issues.  That may be helped by, now that we've
> dropped Travis, having the container be pre-populated with more of the
> downloaded files and pre-building the special QEMU.
>

Hello Tom,

That is what my patch

Dockerfile: compile QEMU for Nokia N900 emulation
https://patchwork.ozlabs.org/project/uboot/patch/20200713171046.230013-1-xypron.glpk at gmx.de/

is about.

You marked it as "changes requested" but it is unclear to me which
changes you request.

Best regards

Heinrich

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/2] gitlab: Move the n900 test into its own section
  2021-02-01  2:54     ` Heinrich Schuchardt
@ 2021-02-01  4:01       ` Tom Rini
  0 siblings, 0 replies; 14+ messages in thread
From: Tom Rini @ 2021-02-01  4:01 UTC (permalink / raw)
  To: u-boot

On Mon, Feb 01, 2021 at 03:54:39AM +0100, Heinrich Schuchardt wrote:
> On 1/31/21 2:49 PM, Tom Rini wrote:
> > On Sun, Jan 31, 2021 at 01:15:20PM +0100, Pali Roh?r wrote:
> > > On Saturday 30 January 2021 22:17:45 Simon Glass wrote:
> > > > This test is not reliable. Quite often (20%?) it makes the build fail and
> > > > a retry succeeds.
> > > 
> > > This test should work. Are there any logs with issues?
> > 
> > I don't see it failing any more often than other tests do, due to
> > network connectivity issues.  That may be helped by, now that we've
> > dropped Travis, having the container be pre-populated with more of the
> > downloaded files and pre-building the special QEMU.
> > 
> 
> Hello Tom,
> 
> That is what my patch
> 
> Dockerfile: compile QEMU for Nokia N900 emulation
> https://patchwork.ozlabs.org/project/uboot/patch/20200713171046.230013-1-xypron.glpk at gmx.de/
> 
> is about.
> 
> You marked it as "changes requested" but it is unclear to me which
> changes you request.

As I said in the most recent reply there, we need to update the test
script to use it, and that in turn may also mean changing what the
Dockerfile change is doing.

-- 
Tom
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 659 bytes
Desc: not available
URL: <https://lists.denx.de/pipermail/u-boot/attachments/20210131/aaee04ca/attachment.sig>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 2/2] buildman: Support single-threaded operation
  2021-01-31  5:17 ` [PATCH 2/2] buildman: Support single-threaded operation Simon Glass
@ 2021-03-05  2:02   ` Tom Rini
  0 siblings, 0 replies; 14+ messages in thread
From: Tom Rini @ 2021-03-05  2:02 UTC (permalink / raw)
  To: u-boot

On Sat, Jan 30, 2021 at 10:17:46PM -0700, Simon Glass wrote:

> At present even if only a single thread is in use, buildman still uses
> threading.
> 
> For some debugging it is helpful to do everything in the main process.
> Allow -T0 to support this.
> 
> Signed-off-by: Simon Glass <sjg@chromium.org>

Applied to u-boot/master, thanks!

-- 
Tom
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 659 bytes
Desc: not available
URL: <https://lists.denx.de/pipermail/u-boot/attachments/20210304/65c6d160/attachment.sig>

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-03-05  2:02 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-31  5:17 [PATCH 1/2] gitlab: Move the n900 test into its own section Simon Glass
2021-01-31  5:17 ` [PATCH 2/2] buildman: Support single-threaded operation Simon Glass
2021-03-05  2:02   ` Tom Rini
2021-01-31 12:15 ` [PATCH 1/2] gitlab: Move the n900 test into its own section Pali Rohár
2021-01-31 13:49   ` Tom Rini
2021-01-31 15:04     ` Pali Rohár
2021-01-31 15:43       ` Simon Glass
2021-01-31 15:51         ` Pali Rohár
2021-01-31 16:51           ` Simon Glass
2021-01-31 17:05             ` Pali Rohár
2021-01-31 17:10               ` Simon Glass
2021-01-31 17:31                 ` Pali Rohár
2021-02-01  2:54     ` Heinrich Schuchardt
2021-02-01  4:01       ` Tom Rini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.