All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] qemurunner: Try to ensure mmap'd libs are paged in
@ 2021-06-10 22:41 Richard Purdie
  2021-06-10 22:41 ` [PATCH 2/2] qemurunner: Increase startup timeout 120 -> 300 Richard Purdie
  0 siblings, 1 reply; 2+ messages in thread
From: Richard Purdie @ 2021-06-10 22:41 UTC (permalink / raw)
  To: openembedded-core

We've seeing issues where IO load appears to cause strange failures due to timeouts
within qemu. One theory for these is that it is is hitting hard page faults
at in-opportune moments which cause timing problems within the VM.

This patch is a bit of a hack which tries to ensure the data is paged in
at a point when we know we can take the time delays (waiting for the QMP
start signal). Whilst this isn't ideal, it does seem to improve things on
the autobuilder and shouldn't harm anything.

The code figures out which files to read my looking at the mmap'd files
the process has open from /proc. On Centos7 systems these files are not
user readable, if that is the case we just skip them.

Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
---
 meta/lib/oeqa/utils/qemurunner.py | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/meta/lib/oeqa/utils/qemurunner.py b/meta/lib/oeqa/utils/qemurunner.py
index 0032f6ed8dd..24af2fb20bb 100644
--- a/meta/lib/oeqa/utils/qemurunner.py
+++ b/meta/lib/oeqa/utils/qemurunner.py
@@ -342,7 +342,24 @@ class QemuRunner:
         finally:
             os.chdir(origpath)
 
-        # Release the qemu porcess to continue running
+        # We worry that mmap'd libraries may cause page faults which hang the qemu VM for periods
+        # causing failures. Before we "start" qemu, read through it's mapped files to try and 
+        # ensure we don't hit page faults later
+        mapdir = "/proc/" + str(self.qemupid) + "/map_files/"
+        try:
+            for f in os.listdir(mapdir):
+                linktarget = os.readlink(os.path.join(mapdir, f))
+                if not linktarget.startswith("/") or linktarget.startswith("/dev") or "deleted" in linktarget:
+                    continue
+                with open(linktarget, "rb") as readf:
+                    data = True
+                    while data:
+                        data = readf.read(4096)
+        # Centos7 doesn't allow us to read /map_files/
+        except PermissionError:
+            pass
+
+        # Release the qemu process to continue running
         self.run_monitor('cont')
 
         # We are alive: qemu is running
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* [PATCH 2/2] qemurunner: Increase startup timeout 120 -> 300
  2021-06-10 22:41 [PATCH 1/2] qemurunner: Try to ensure mmap'd libs are paged in Richard Purdie
@ 2021-06-10 22:41 ` Richard Purdie
  0 siblings, 0 replies; 2+ messages in thread
From: Richard Purdie @ 2021-06-10 22:41 UTC (permalink / raw)
  To: openembedded-core

We now spend time copying the VM image into a tmpfs and with IO load on the
system, the time + the boot time of the VM can take longer than 120s. Increase
the timeout to match the added overhead of copying the image file.

Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
---
 meta/lib/oeqa/utils/qemurunner.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/meta/lib/oeqa/utils/qemurunner.py b/meta/lib/oeqa/utils/qemurunner.py
index 24af2fb20bb..5dc1a136e3b 100644
--- a/meta/lib/oeqa/utils/qemurunner.py
+++ b/meta/lib/oeqa/utils/qemurunner.py
@@ -65,7 +65,7 @@ class QemuRunner:
         self.boot_patterns = boot_patterns
         self.tmpfsdir = tmpfsdir
 
-        self.runqemutime = 120
+        self.runqemutime = 300
         if not workdir:
             workdir = os.getcwd()
         self.qemu_pidfile = workdir + '/pidfile_' + str(os.getpid())
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2021-06-10 22:41 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-10 22:41 [PATCH 1/2] qemurunner: Try to ensure mmap'd libs are paged in Richard Purdie
2021-06-10 22:41 ` [PATCH 2/2] qemurunner: Increase startup timeout 120 -> 300 Richard Purdie

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.