All of lore.kernel.org
 help / color / mirror / Atom feed
* [bitbake][dunfell][1.46][PATCH 0/6] Patch review
@ 2020-06-30  3:08 Steve Sakoman
  2020-06-30  3:08 ` [bitbake][dunfell][1.46][PATCH 1/6] taskdata: Improve handling of regex in ASSUME_PROVIDED Steve Sakoman
                   ` (5 more replies)
  0 siblings, 6 replies; 13+ messages in thread
From: Steve Sakoman @ 2020-06-30  3:08 UTC (permalink / raw)
  To: bitbake-devel

Please review this next set of patches for 1.46 (dunfell) and have comments back
by end of day Wednesday.

The following changes since commit 83296870bede70e31bdf6e73683bcc30681023fc:

  tests/fetch: Switch from git.infradead.org to a YP mirror (2020-06-22 20:53:17 +0100)

are available in the Git repository at:

  git://git.openembedded.org/bitbake-contrib stable/1.46-nut
  http://cgit.openembedded.org/bitbake-contrib/log/?h=stable/1.46-nut

Joshua Watt (2):
  hashserv: Chunkify large messages
  siggen: Fix error when hash equivalence has an exception

Richard Purdie (3):
  taskdata: Improve handling of regex in ASSUME_PROVIDED
  runqueue: Avoid unpickle errors in rare cases
  msg: Avoid issues where paths have relative components

akuster (1):
  test/fetch: change to better svn source

 lib/bb/msg.py            |   2 +-
 lib/bb/runqueue.py       |   9 +++-
 lib/bb/siggen.py         |   1 +
 lib/bb/taskdata.py       |   9 +++-
 lib/bb/tests/fetch.py    |   2 +-
 lib/hashserv/__init__.py |  22 ++++++++
 lib/hashserv/client.py   |  43 +++++++++++++---
 lib/hashserv/server.py   | 105 +++++++++++++++++++++++++++------------
 lib/hashserv/tests.py    |  23 +++++++++
 9 files changed, 169 insertions(+), 47 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [bitbake][dunfell][1.46][PATCH 1/6] taskdata: Improve handling of regex in ASSUME_PROVIDED
  2020-06-30  3:08 [bitbake][dunfell][1.46][PATCH 0/6] Patch review Steve Sakoman
@ 2020-06-30  3:08 ` Steve Sakoman
  2020-06-30 13:27   ` [bitbake-devel] " Paul Barker
  2020-06-30  3:08 ` [bitbake][dunfell][1.46][PATCH 2/6] test/fetch: change to better svn source Steve Sakoman
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: Steve Sakoman @ 2020-06-30  3:08 UTC (permalink / raw)
  To: bitbake-devel

From: Richard Purdie <richard.purdie@linuxfoundation.org>

ASSUME_PROVIDED can take regexs however the current way of handling
this in code is suboptimal. It means that you can add something like:

DEPENDS += "texinfo-nativejunk-that-does-not-exist"

and if texinfo-native is in ASSUME_PROVIDED, no error will occur.

Update the code to only treat something as a regex if a start or end
anchor character is present (which wouldn't be valid in a recipe name).

[YOCTO #13893]

Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
(cherry picked from commit 3d72e23109990970fbb1086923277af752168b4a)
Signed-off-by: Steve Sakoman <steve@sakoman.com>
---
 lib/bb/taskdata.py | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/lib/bb/taskdata.py b/lib/bb/taskdata.py
index d13a1249..ffbaf362 100644
--- a/lib/bb/taskdata.py
+++ b/lib/bb/taskdata.py
@@ -21,8 +21,13 @@ def re_match_strings(target, strings):
     Whether or not the string 'target' matches
     any one string of the strings which can be regular expression string
     """
-    return any(name == target or re.match(name, target)
-               for name in strings)
+    for name in strings:
+        if name.startswith("^") or name.endswith("$"):
+            if re.match(name, target):
+                return True
+        elif name == target:
+            return True
+    return False
 
 class TaskEntry:
     def __init__(self):
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [bitbake][dunfell][1.46][PATCH 2/6] test/fetch: change to better svn source
  2020-06-30  3:08 [bitbake][dunfell][1.46][PATCH 0/6] Patch review Steve Sakoman
  2020-06-30  3:08 ` [bitbake][dunfell][1.46][PATCH 1/6] taskdata: Improve handling of regex in ASSUME_PROVIDED Steve Sakoman
@ 2020-06-30  3:08 ` Steve Sakoman
  2020-06-30  3:08 ` [bitbake][dunfell][1.46][PATCH 3/6] hashserv: Chunkify large messages Steve Sakoman
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Steve Sakoman @ 2020-06-30  3:08 UTC (permalink / raw)
  To: bitbake-devel

From: akuster <akuster808@gmail.com>

fixes:
svn: warning: W175002: Unexpected HTTP status 504 'Gateway Timeout' on '/openembedded/bitbake/!svn/vcc/default'
svn: E205011: Failure occurred processing one or more externals definitions

picked pcre2

[Yocto #13948]

Signed-off-by: Armin Kuster <akuster808@gmail.com>
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
(cherry picked from commit 1483d17108da02f5d615e83403d5fd6288ca957c)
Signed-off-by: Steve Sakoman <steve@sakoman.com>
---
 lib/bb/tests/fetch.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/bb/tests/fetch.py b/lib/bb/tests/fetch.py
index 4697ef59..29c96b2b 100644
--- a/lib/bb/tests/fetch.py
+++ b/lib/bb/tests/fetch.py
@@ -1031,7 +1031,7 @@ class SVNTest(FetcherTest):
 
         bb.process.run("svn co %s svnfetch_co" % self.repo_url, cwd=self.tempdir)
         # Github will emulate SVN.  Use this to check if we're downloding...
-        bb.process.run("svn propset svn:externals 'bitbake http://github.com/openembedded/bitbake' .",
+        bb.process.run("svn propset svn:externals 'bitbake svn://vcs.pcre.org/pcre2/code' .",
                        cwd=os.path.join(self.tempdir, 'svnfetch_co', 'trunk'))
         bb.process.run("svn commit --non-interactive -m 'Add external'",
                        cwd=os.path.join(self.tempdir, 'svnfetch_co', 'trunk'))
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [bitbake][dunfell][1.46][PATCH 3/6] hashserv: Chunkify large messages
  2020-06-30  3:08 [bitbake][dunfell][1.46][PATCH 0/6] Patch review Steve Sakoman
  2020-06-30  3:08 ` [bitbake][dunfell][1.46][PATCH 1/6] taskdata: Improve handling of regex in ASSUME_PROVIDED Steve Sakoman
  2020-06-30  3:08 ` [bitbake][dunfell][1.46][PATCH 2/6] test/fetch: change to better svn source Steve Sakoman
@ 2020-06-30  3:08 ` Steve Sakoman
  2020-06-30 13:33   ` [bitbake-devel] " Paul Barker
  2020-06-30  3:08 ` [bitbake][dunfell][1.46][PATCH 4/6] siggen: Fix error when hash equivalence has an exception Steve Sakoman
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: Steve Sakoman @ 2020-06-30  3:08 UTC (permalink / raw)
  To: bitbake-devel

From: Joshua Watt <JPEWhacker@gmail.com>

The hash equivalence client and server can occasionally send messages
that are too large for the server to fit in the receive buffer (64 KB).
To prevent this, support is added to the protocol to "chunkify" the
stream and break it up into manageable pieces that the server can each
side can back together.

Ideally, this would be negotiated by the client and server, but it's
currently hard coded to 32 KB to prevent the round-trip delay.

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
(cherry picked from commit e27a28c1e40e886ee68ba4b99b537ffc9c3577d4)
Signed-off-by: Steve Sakoman <steve@sakoman.com>
---
 lib/hashserv/__init__.py |  22 ++++++++
 lib/hashserv/client.py   |  43 +++++++++++++---
 lib/hashserv/server.py   | 105 +++++++++++++++++++++++++++------------
 lib/hashserv/tests.py    |  23 +++++++++
 4 files changed, 152 insertions(+), 41 deletions(-)

diff --git a/lib/hashserv/__init__.py b/lib/hashserv/__init__.py
index c3318620..f95e8f43 100644
--- a/lib/hashserv/__init__.py
+++ b/lib/hashserv/__init__.py
@@ -6,12 +6,20 @@
 from contextlib import closing
 import re
 import sqlite3
+import itertools
+import json
 
 UNIX_PREFIX = "unix://"
 
 ADDR_TYPE_UNIX = 0
 ADDR_TYPE_TCP = 1
 
+# The Python async server defaults to a 64K receive buffer, so we hardcode our
+# maximum chunk size. It would be better if the client and server reported to
+# each other what the maximum chunk sizes were, but that will slow down the
+# connection setup with a round trip delay so I'd rather not do that unless it
+# is necessary
+DEFAULT_MAX_CHUNK = 32 * 1024
 
 def setup_database(database, sync=True):
     db = sqlite3.connect(database)
@@ -66,6 +74,20 @@ def parse_address(addr):
         return (ADDR_TYPE_TCP, (host, int(port)))
 
 
+def chunkify(msg, max_chunk):
+    if len(msg) < max_chunk - 1:
+        yield ''.join((msg, "\n"))
+    else:
+        yield ''.join((json.dumps({
+                'chunk-stream': None
+            }), "\n"))
+
+        args = [iter(msg)] * (max_chunk - 1)
+        for m in map(''.join, itertools.zip_longest(*args, fillvalue='')):
+            yield ''.join(itertools.chain(m, "\n"))
+        yield "\n"
+
+
 def create_server(addr, dbname, *, sync=True):
     from . import server
     db = setup_database(dbname, sync=sync)
diff --git a/lib/hashserv/client.py b/lib/hashserv/client.py
index 46085d64..a29af836 100644
--- a/lib/hashserv/client.py
+++ b/lib/hashserv/client.py
@@ -7,6 +7,7 @@ import json
 import logging
 import socket
 import os
+from . import chunkify, DEFAULT_MAX_CHUNK
 
 
 logger = logging.getLogger('hashserv.client')
@@ -25,6 +26,7 @@ class Client(object):
         self.reader = None
         self.writer = None
         self.mode = self.MODE_NORMAL
+        self.max_chunk = DEFAULT_MAX_CHUNK
 
     def connect_tcp(self, address, port):
         def connect_sock():
@@ -58,7 +60,7 @@ class Client(object):
             self.reader = self._socket.makefile('r', encoding='utf-8')
             self.writer = self._socket.makefile('w', encoding='utf-8')
 
-            self.writer.write('OEHASHEQUIV 1.0\n\n')
+            self.writer.write('OEHASHEQUIV 1.1\n\n')
             self.writer.flush()
 
             # Restore mode if the socket is being re-created
@@ -91,18 +93,35 @@ class Client(object):
                 count += 1
 
     def send_message(self, msg):
+        def get_line():
+            line = self.reader.readline()
+            if not line:
+                raise HashConnectionError('Connection closed')
+
+            if not line.endswith('\n'):
+                raise HashConnectionError('Bad message %r' % message)
+
+            return line
+
         def proc():
-            self.writer.write('%s\n' % json.dumps(msg))
+            for c in chunkify(json.dumps(msg), self.max_chunk):
+                self.writer.write(c)
             self.writer.flush()
 
-            l = self.reader.readline()
-            if not l:
-                raise HashConnectionError('Connection closed')
+            l = get_line()
 
-            if not l.endswith('\n'):
-                raise HashConnectionError('Bad message %r' % message)
+            m = json.loads(l)
+            if 'chunk-stream' in m:
+                lines = []
+                while True:
+                    l = get_line().rstrip('\n')
+                    if not l:
+                        break
+                    lines.append(l)
 
-            return json.loads(l)
+                m = json.loads(''.join(lines))
+
+            return m
 
         return self._send_wrapper(proc)
 
@@ -155,6 +174,14 @@ class Client(object):
         m['unihash'] = unihash
         return self.send_message({'report-equiv': m})
 
+    def get_taskhash(self, method, taskhash, all_properties=False):
+        self._set_mode(self.MODE_NORMAL)
+        return self.send_message({'get': {
+            'taskhash': taskhash,
+            'method': method,
+            'all': all_properties
+        }})
+
     def get_stats(self):
         self._set_mode(self.MODE_NORMAL)
         return self.send_message({'get-stats': None})
diff --git a/lib/hashserv/server.py b/lib/hashserv/server.py
index cc7e4823..81050715 100644
--- a/lib/hashserv/server.py
+++ b/lib/hashserv/server.py
@@ -13,6 +13,7 @@ import os
 import signal
 import socket
 import time
+from . import chunkify, DEFAULT_MAX_CHUNK
 
 logger = logging.getLogger('hashserv.server')
 
@@ -107,12 +108,29 @@ class Stats(object):
         return {k: getattr(self, k) for k in ('num', 'total_time', 'max_time', 'average', 'stdev')}
 
 
+class ClientError(Exception):
+    pass
+
 class ServerClient(object):
+    FAST_QUERY = 'SELECT taskhash, method, unihash FROM tasks_v2 WHERE method=:method AND taskhash=:taskhash ORDER BY created ASC LIMIT 1'
+    ALL_QUERY =  'SELECT *                         FROM tasks_v2 WHERE method=:method AND taskhash=:taskhash ORDER BY created ASC LIMIT 1'
+
     def __init__(self, reader, writer, db, request_stats):
         self.reader = reader
         self.writer = writer
         self.db = db
         self.request_stats = request_stats
+        self.max_chunk = DEFAULT_MAX_CHUNK
+
+        self.handlers = {
+            'get': self.handle_get,
+            'report': self.handle_report,
+            'report-equiv': self.handle_equivreport,
+            'get-stream': self.handle_get_stream,
+            'get-stats': self.handle_get_stats,
+            'reset-stats': self.handle_reset_stats,
+            'chunk-stream': self.handle_chunk,
+        }
 
     async def process_requests(self):
         try:
@@ -125,7 +143,11 @@ class ServerClient(object):
                 return
 
             (proto_name, proto_version) = protocol.decode('utf-8').rstrip().split()
-            if proto_name != 'OEHASHEQUIV' or proto_version != '1.0':
+            if proto_name != 'OEHASHEQUIV':
+                return
+
+            proto_version = tuple(int(v) for v in proto_version.split('.'))
+            if proto_version < (1, 0) or proto_version > (1, 1):
                 return
 
             # Read headers. Currently, no headers are implemented, so look for
@@ -140,40 +162,34 @@ class ServerClient(object):
                     break
 
             # Handle messages
-            handlers = {
-                'get': self.handle_get,
-                'report': self.handle_report,
-                'report-equiv': self.handle_equivreport,
-                'get-stream': self.handle_get_stream,
-                'get-stats': self.handle_get_stats,
-                'reset-stats': self.handle_reset_stats,
-            }
-
             while True:
                 d = await self.read_message()
                 if d is None:
                     break
-
-                for k in handlers.keys():
-                    if k in d:
-                        logger.debug('Handling %s' % k)
-                        if 'stream' in k:
-                            await handlers[k](d[k])
-                        else:
-                            with self.request_stats.start_sample() as self.request_sample, \
-                                    self.request_sample.measure():
-                                await handlers[k](d[k])
-                        break
-                else:
-                    logger.warning("Unrecognized command %r" % d)
-                    break
-
+                await self.dispatch_message(d)
                 await self.writer.drain()
+        except ClientError as e:
+            logger.error(str(e))
         finally:
             self.writer.close()
 
+    async def dispatch_message(self, msg):
+        for k in self.handlers.keys():
+            if k in msg:
+                logger.debug('Handling %s' % k)
+                if 'stream' in k:
+                    await self.handlers[k](msg[k])
+                else:
+                    with self.request_stats.start_sample() as self.request_sample, \
+                            self.request_sample.measure():
+                        await self.handlers[k](msg[k])
+                return
+
+        raise ClientError("Unrecognized command %r" % msg)
+
     def write_message(self, msg):
-        self.writer.write(('%s\n' % json.dumps(msg)).encode('utf-8'))
+        for c in chunkify(json.dumps(msg), self.max_chunk):
+            self.writer.write(c.encode('utf-8'))
 
     async def read_message(self):
         l = await self.reader.readline()
@@ -191,14 +207,38 @@ class ServerClient(object):
             logger.error('Bad message from client: %r' % message)
             raise e
 
+    async def handle_chunk(self, request):
+        lines = []
+        try:
+            while True:
+                l = await self.reader.readline()
+                l = l.rstrip(b"\n").decode("utf-8")
+                if not l:
+                    break
+                lines.append(l)
+
+            msg = json.loads(''.join(lines))
+        except (json.JSONDecodeError, UnicodeDecodeError) as e:
+            logger.error('Bad message from client: %r' % message)
+            raise e
+
+        if 'chunk-stream' in msg:
+            raise ClientError("Nested chunks are not allowed")
+
+        await self.dispatch_message(msg)
+
     async def handle_get(self, request):
         method = request['method']
         taskhash = request['taskhash']
 
-        row = self.query_equivalent(method, taskhash)
+        if request.get('all', False):
+            row = self.query_equivalent(method, taskhash, self.ALL_QUERY)
+        else:
+            row = self.query_equivalent(method, taskhash, self.FAST_QUERY)
+
         if row is not None:
             logger.debug('Found equivalent task %s -> %s', (row['taskhash'], row['unihash']))
-            d = {k: row[k] for k in ('taskhash', 'method', 'unihash')}
+            d = {k: row[k] for k in row.keys()}
 
             self.write_message(d)
         else:
@@ -228,7 +268,7 @@ class ServerClient(object):
 
                 (method, taskhash) = l.split()
                 #logger.debug('Looking up %s %s' % (method, taskhash))
-                row = self.query_equivalent(method, taskhash)
+                row = self.query_equivalent(method, taskhash, self.FAST_QUERY)
                 if row is not None:
                     msg = ('%s\n' % row['unihash']).encode('utf-8')
                     #logger.debug('Found equivalent task %s -> %s', (row['taskhash'], row['unihash']))
@@ -328,7 +368,7 @@ class ServerClient(object):
             # Fetch the unihash that will be reported for the taskhash. If the
             # unihash matches, it means this row was inserted (or the mapping
             # was already valid)
-            row = self.query_equivalent(data['method'], data['taskhash'])
+            row = self.query_equivalent(data['method'], data['taskhash'], self.FAST_QUERY)
 
             if row['unihash'] == data['unihash']:
                 logger.info('Adding taskhash equivalence for %s with unihash %s',
@@ -354,12 +394,11 @@ class ServerClient(object):
         self.request_stats.reset()
         self.write_message(d)
 
-    def query_equivalent(self, method, taskhash):
+    def query_equivalent(self, method, taskhash, query):
         # This is part of the inner loop and must be as fast as possible
         try:
             cursor = self.db.cursor()
-            cursor.execute('SELECT taskhash, method, unihash FROM tasks_v2 WHERE method=:method AND taskhash=:taskhash ORDER BY created ASC LIMIT 1',
-                           {'method': method, 'taskhash': taskhash})
+            cursor.execute(query, {'method': method, 'taskhash': taskhash})
             return cursor.fetchone()
         except:
             cursor.close()
diff --git a/lib/hashserv/tests.py b/lib/hashserv/tests.py
index a5472a99..6e862950 100644
--- a/lib/hashserv/tests.py
+++ b/lib/hashserv/tests.py
@@ -99,6 +99,29 @@ class TestHashEquivalenceServer(object):
         result = self.client.get_unihash(self.METHOD, taskhash)
         self.assertEqual(result, unihash)
 
+    def test_huge_message(self):
+        # Simple test that hashes can be created
+        taskhash = 'c665584ee6817aa99edfc77a44dd853828279370'
+        outhash = '3c979c3db45c569f51ab7626a4651074be3a9d11a84b1db076f5b14f7d39db44'
+        unihash = '90e9bc1d1f094c51824adca7f8ea79a048d68824'
+
+        result = self.client.get_unihash(self.METHOD, taskhash)
+        self.assertIsNone(result, msg='Found unexpected task, %r' % result)
+
+        siginfo = "0" * (self.client.max_chunk * 4)
+
+        result = self.client.report_unihash(taskhash, self.METHOD, outhash, unihash, {
+            'outhash_siginfo': siginfo
+        })
+        self.assertEqual(result['unihash'], unihash, 'Server returned bad unihash')
+
+        result = self.client.get_taskhash(self.METHOD, taskhash, True)
+        self.assertEqual(result['taskhash'], taskhash)
+        self.assertEqual(result['unihash'], unihash)
+        self.assertEqual(result['method'], self.METHOD)
+        self.assertEqual(result['outhash'], outhash)
+        self.assertEqual(result['outhash_siginfo'], siginfo)
+
     def test_stress(self):
         def query_server(failures):
             client = Client(self.server.address)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [bitbake][dunfell][1.46][PATCH 4/6] siggen: Fix error when hash equivalence has an exception
  2020-06-30  3:08 [bitbake][dunfell][1.46][PATCH 0/6] Patch review Steve Sakoman
                   ` (2 preceding siblings ...)
  2020-06-30  3:08 ` [bitbake][dunfell][1.46][PATCH 3/6] hashserv: Chunkify large messages Steve Sakoman
@ 2020-06-30  3:08 ` Steve Sakoman
  2020-06-30  3:08 ` [bitbake][dunfell][1.46][PATCH 5/6] runqueue: Avoid unpickle errors in rare cases Steve Sakoman
  2020-06-30  3:08 ` [bitbake][dunfell][1.46][PATCH 6/6] msg: Avoid issues where paths have relative components Steve Sakoman
  5 siblings, 0 replies; 13+ messages in thread
From: Steve Sakoman @ 2020-06-30  3:08 UTC (permalink / raw)
  To: bitbake-devel

From: Joshua Watt <JPEWhacker@gmail.com>

The code that handled exceptions from the hash equivalence client was
raising an exception itself because hashserv.client wasn't imported

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
(cherry picked from commit a76290dfc6f34ff9f6efdb13a6db74b6b4759daf)
Signed-off-by: Steve Sakoman <steve@sakoman.com>
---
 lib/bb/siggen.py | 1 +
 1 file changed, 1 insertion(+)

diff --git a/lib/bb/siggen.py b/lib/bb/siggen.py
index 4c8d81c5..94d1762d 100644
--- a/lib/bb/siggen.py
+++ b/lib/bb/siggen.py
@@ -14,6 +14,7 @@ import simplediff
 from bb.checksum import FileChecksumCache
 from bb import runqueue
 import hashserv
+import hashserv.client
 
 logger = logging.getLogger('BitBake.SigGen')
 hashequiv_logger = logging.getLogger('BitBake.SigGen.HashEquiv')
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [bitbake][dunfell][1.46][PATCH 5/6] runqueue: Avoid unpickle errors in rare cases
  2020-06-30  3:08 [bitbake][dunfell][1.46][PATCH 0/6] Patch review Steve Sakoman
                   ` (3 preceding siblings ...)
  2020-06-30  3:08 ` [bitbake][dunfell][1.46][PATCH 4/6] siggen: Fix error when hash equivalence has an exception Steve Sakoman
@ 2020-06-30  3:08 ` Steve Sakoman
  2020-06-30  3:08 ` [bitbake][dunfell][1.46][PATCH 6/6] msg: Avoid issues where paths have relative components Steve Sakoman
  5 siblings, 0 replies; 13+ messages in thread
From: Steve Sakoman @ 2020-06-30  3:08 UTC (permalink / raw)
  To: bitbake-devel

From: Richard Purdie <richard.purdie@linuxfoundation.org>

In rare cases the pickled data from a task contains "</event>" which
causes backtrace. This can be reproduced with something like:

do_unpack_prepend () {
    bb.warn("</event>")
}

There are several solutions but the easiest is to catch this exception
and look for the next marker instead as this should be the only way such
an unpickle error could occur.

This fixes rare exceptions seen on the autobuilder.

Also add in other potential exceptions listed in the pickle manual
page so that better debug is obtained should there be an error in
this code path in future. exitcode doesn't need the same handling
since we control what is in that data field and it could never contain
</exitcode>

Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
(cherry picked from commit 5ada512d6f9cbbdf1172ff7818117c38b12225ca)
Signed-off-by: Steve Sakoman <steve@sakoman.com>
---
 lib/bb/runqueue.py | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/lib/bb/runqueue.py b/lib/bb/runqueue.py
index 16f076f3..30cab537 100644
--- a/lib/bb/runqueue.py
+++ b/lib/bb/runqueue.py
@@ -2958,7 +2958,12 @@ class runQueuePipe():
             while index != -1 and self.queue.startswith(b"<event>"):
                 try:
                     event = pickle.loads(self.queue[7:index])
-                except ValueError as e:
+                except (ValueError, pickle.UnpicklingError, AttributeError, IndexError) as e:
+                    if isinstance(e, pickle.UnpicklingError) and "truncated" in str(e):
+                        # The pickled data could contain "</event>" so search for the next occurance
+                        # unpickling again, this should be the only way an unpickle error could occur
+                        index = self.queue.find(b"</event>", index + 1)
+                        continue
                     bb.msg.fatal("RunQueue", "failed load pickle '%s': '%s'" % (e, self.queue[7:index]))
                 bb.event.fire_from_worker(event, self.d)
                 if isinstance(event, taskUniHashUpdate):
@@ -2970,7 +2975,7 @@ class runQueuePipe():
             while index != -1 and self.queue.startswith(b"<exitcode>"):
                 try:
                     task, status = pickle.loads(self.queue[10:index])
-                except ValueError as e:
+                except (ValueError, pickle.UnpicklingError, AttributeError, IndexError) as e:
                     bb.msg.fatal("RunQueue", "failed load pickle '%s': '%s'" % (e, self.queue[10:index]))
                 self.rqexec.runqueue_process_waitpid(task, status)
                 found = True
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [bitbake][dunfell][1.46][PATCH 6/6] msg: Avoid issues where paths have relative components
  2020-06-30  3:08 [bitbake][dunfell][1.46][PATCH 0/6] Patch review Steve Sakoman
                   ` (4 preceding siblings ...)
  2020-06-30  3:08 ` [bitbake][dunfell][1.46][PATCH 5/6] runqueue: Avoid unpickle errors in rare cases Steve Sakoman
@ 2020-06-30  3:08 ` Steve Sakoman
  5 siblings, 0 replies; 13+ messages in thread
From: Steve Sakoman @ 2020-06-30  3:08 UTC (permalink / raw)
  To: bitbake-devel

From: Richard Purdie <richard.purdie@linuxfoundation.org>

The autobuilder can end up using build/../ syntax which is an issue
if the build directory is cleaned. Avoid this by using normpath()
on the file path passed in.

Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
(cherry picked from commit 41988fec47eb196ab7195a75330a6d98de19101b)
Signed-off-by: Steve Sakoman <steve@sakoman.com>
---
 lib/bb/msg.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/bb/msg.py b/lib/bb/msg.py
index c0b344e3..2d88c4e7 100644
--- a/lib/bb/msg.py
+++ b/lib/bb/msg.py
@@ -280,7 +280,7 @@ def setLoggingConfig(defaultconfig, userconfigfile=None):
     logconfig = copy.deepcopy(defaultconfig)
 
     if userconfigfile:
-        with open(userconfigfile, 'r') as f:
+        with open(os.path.normpath(userconfigfile), 'r') as f:
             if userconfigfile.endswith('.yml') or userconfigfile.endswith('.yaml'):
                 import yaml
                 userconfig = yaml.load(f)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [bitbake-devel] [bitbake][dunfell][1.46][PATCH 1/6] taskdata: Improve handling of regex in ASSUME_PROVIDED
  2020-06-30  3:08 ` [bitbake][dunfell][1.46][PATCH 1/6] taskdata: Improve handling of regex in ASSUME_PROVIDED Steve Sakoman
@ 2020-06-30 13:27   ` Paul Barker
  2020-06-30 14:11     ` Steve Sakoman
  0 siblings, 1 reply; 13+ messages in thread
From: Paul Barker @ 2020-06-30 13:27 UTC (permalink / raw)
  To: Steve Sakoman; +Cc: bitbake-devel

On Tue, 30 Jun 2020 at 04:08, Steve Sakoman <steve@sakoman.com> wrote:
>
> From: Richard Purdie <richard.purdie@linuxfoundation.org>
>
> ASSUME_PROVIDED can take regexs however the current way of handling
> this in code is suboptimal. It means that you can add something like:
>
> DEPENDS += "texinfo-nativejunk-that-does-not-exist"
>
> and if texinfo-native is in ASSUME_PROVIDED, no error will occur.
>
> Update the code to only treat something as a regex if a start or end
> anchor character is present (which wouldn't be valid in a recipe name).
>
> [YOCTO #13893]
>
> Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
> (cherry picked from commit 3d72e23109990970fbb1086923277af752168b4a)
> Signed-off-by: Steve Sakoman <steve@sakoman.com>
> ---
>  lib/bb/taskdata.py | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/lib/bb/taskdata.py b/lib/bb/taskdata.py
> index d13a1249..ffbaf362 100644
> --- a/lib/bb/taskdata.py
> +++ b/lib/bb/taskdata.py
> @@ -21,8 +21,13 @@ def re_match_strings(target, strings):
>      Whether or not the string 'target' matches
>      any one string of the strings which can be regular expression string
>      """
> -    return any(name == target or re.match(name, target)
> -               for name in strings)
> +    for name in strings:
> +        if name.startswith("^") or name.endswith("$"):
> +            if re.match(name, target):
> +                return True
> +        elif name == target:
> +            return True
> +    return False
>
>  class TaskEntry:
>      def __init__(self):
> --
> 2.17.1

I'm not sure we should be changing how ASSUME_PROVIDED is parsed on a
stable branch. If we do backport this we should at least issue a
warning where the behaviour changes (i.e. if re.match(name, target) is
true but name != target and name doesn't start with ^ or end with $).

-- 
Paul Barker
Konsulko Group

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [bitbake-devel] [bitbake][dunfell][1.46][PATCH 3/6] hashserv: Chunkify large messages
  2020-06-30  3:08 ` [bitbake][dunfell][1.46][PATCH 3/6] hashserv: Chunkify large messages Steve Sakoman
@ 2020-06-30 13:33   ` Paul Barker
  2020-06-30 14:05     ` Steve Sakoman
  0 siblings, 1 reply; 13+ messages in thread
From: Paul Barker @ 2020-06-30 13:33 UTC (permalink / raw)
  To: Steve Sakoman; +Cc: bitbake-devel, Joshua Watt, Richard Purdie

On Tue, 30 Jun 2020 at 04:09, Steve Sakoman <steve@sakoman.com> wrote:
>
> From: Joshua Watt <JPEWhacker@gmail.com>
>
> The hash equivalence client and server can occasionally send messages
> that are too large for the server to fit in the receive buffer (64 KB).
> To prevent this, support is added to the protocol to "chunkify" the
> stream and break it up into manageable pieces that the server can each
> side can back together.
>
> Ideally, this would be negotiated by the client and server, but it's
> currently hard coded to 32 KB to prevent the round-trip delay.
>
> Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
> Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
> (cherry picked from commit e27a28c1e40e886ee68ba4b99b537ffc9c3577d4)
> Signed-off-by: Steve Sakoman <steve@sakoman.com>
> ---
>  lib/hashserv/__init__.py |  22 ++++++++
>  lib/hashserv/client.py   |  43 +++++++++++++---
>  lib/hashserv/server.py   | 105 +++++++++++++++++++++++++++------------
>  lib/hashserv/tests.py    |  23 +++++++++
>  4 files changed, 152 insertions(+), 41 deletions(-)
>
> diff --git a/lib/hashserv/__init__.py b/lib/hashserv/__init__.py
> index c3318620..f95e8f43 100644
> --- a/lib/hashserv/__init__.py
> +++ b/lib/hashserv/__init__.py
> @@ -6,12 +6,20 @@
>  from contextlib import closing
>  import re
>  import sqlite3
> +import itertools
> +import json
>
>  UNIX_PREFIX = "unix://"
>
>  ADDR_TYPE_UNIX = 0
>  ADDR_TYPE_TCP = 1
>
> +# The Python async server defaults to a 64K receive buffer, so we hardcode our
> +# maximum chunk size. It would be better if the client and server reported to
> +# each other what the maximum chunk sizes were, but that will slow down the
> +# connection setup with a round trip delay so I'd rather not do that unless it
> +# is necessary
> +DEFAULT_MAX_CHUNK = 32 * 1024
>
>  def setup_database(database, sync=True):
>      db = sqlite3.connect(database)
> @@ -66,6 +74,20 @@ def parse_address(addr):
>          return (ADDR_TYPE_TCP, (host, int(port)))
>
>
> +def chunkify(msg, max_chunk):
> +    if len(msg) < max_chunk - 1:
> +        yield ''.join((msg, "\n"))
> +    else:
> +        yield ''.join((json.dumps({
> +                'chunk-stream': None
> +            }), "\n"))
> +
> +        args = [iter(msg)] * (max_chunk - 1)
> +        for m in map(''.join, itertools.zip_longest(*args, fillvalue='')):
> +            yield ''.join(itertools.chain(m, "\n"))
> +        yield "\n"
> +
> +
>  def create_server(addr, dbname, *, sync=True):
>      from . import server
>      db = setup_database(dbname, sync=sync)
> diff --git a/lib/hashserv/client.py b/lib/hashserv/client.py
> index 46085d64..a29af836 100644
> --- a/lib/hashserv/client.py
> +++ b/lib/hashserv/client.py
> @@ -7,6 +7,7 @@ import json
>  import logging
>  import socket
>  import os
> +from . import chunkify, DEFAULT_MAX_CHUNK
>
>
>  logger = logging.getLogger('hashserv.client')
> @@ -25,6 +26,7 @@ class Client(object):
>          self.reader = None
>          self.writer = None
>          self.mode = self.MODE_NORMAL
> +        self.max_chunk = DEFAULT_MAX_CHUNK
>
>      def connect_tcp(self, address, port):
>          def connect_sock():
> @@ -58,7 +60,7 @@ class Client(object):
>              self.reader = self._socket.makefile('r', encoding='utf-8')
>              self.writer = self._socket.makefile('w', encoding='utf-8')
>
> -            self.writer.write('OEHASHEQUIV 1.0\n\n')
> +            self.writer.write('OEHASHEQUIV 1.1\n\n')
>              self.writer.flush()
>
>              # Restore mode if the socket is being re-created
> @@ -91,18 +93,35 @@ class Client(object):
>                  count += 1
>
>      def send_message(self, msg):
> +        def get_line():
> +            line = self.reader.readline()
> +            if not line:
> +                raise HashConnectionError('Connection closed')
> +
> +            if not line.endswith('\n'):
> +                raise HashConnectionError('Bad message %r' % message)
> +
> +            return line
> +
>          def proc():
> -            self.writer.write('%s\n' % json.dumps(msg))
> +            for c in chunkify(json.dumps(msg), self.max_chunk):
> +                self.writer.write(c)
>              self.writer.flush()
>
> -            l = self.reader.readline()
> -            if not l:
> -                raise HashConnectionError('Connection closed')
> +            l = get_line()
>
> -            if not l.endswith('\n'):
> -                raise HashConnectionError('Bad message %r' % message)
> +            m = json.loads(l)
> +            if 'chunk-stream' in m:
> +                lines = []
> +                while True:
> +                    l = get_line().rstrip('\n')
> +                    if not l:
> +                        break
> +                    lines.append(l)
>
> -            return json.loads(l)
> +                m = json.loads(''.join(lines))
> +
> +            return m
>
>          return self._send_wrapper(proc)
>
> @@ -155,6 +174,14 @@ class Client(object):
>          m['unihash'] = unihash
>          return self.send_message({'report-equiv': m})
>
> +    def get_taskhash(self, method, taskhash, all_properties=False):
> +        self._set_mode(self.MODE_NORMAL)
> +        return self.send_message({'get': {
> +            'taskhash': taskhash,
> +            'method': method,
> +            'all': all_properties
> +        }})
> +
>      def get_stats(self):
>          self._set_mode(self.MODE_NORMAL)
>          return self.send_message({'get-stats': None})
> diff --git a/lib/hashserv/server.py b/lib/hashserv/server.py
> index cc7e4823..81050715 100644
> --- a/lib/hashserv/server.py
> +++ b/lib/hashserv/server.py
> @@ -13,6 +13,7 @@ import os
>  import signal
>  import socket
>  import time
> +from . import chunkify, DEFAULT_MAX_CHUNK
>
>  logger = logging.getLogger('hashserv.server')
>
> @@ -107,12 +108,29 @@ class Stats(object):
>          return {k: getattr(self, k) for k in ('num', 'total_time', 'max_time', 'average', 'stdev')}
>
>
> +class ClientError(Exception):
> +    pass
> +
>  class ServerClient(object):
> +    FAST_QUERY = 'SELECT taskhash, method, unihash FROM tasks_v2 WHERE method=:method AND taskhash=:taskhash ORDER BY created ASC LIMIT 1'
> +    ALL_QUERY =  'SELECT *                         FROM tasks_v2 WHERE method=:method AND taskhash=:taskhash ORDER BY created ASC LIMIT 1'
> +
>      def __init__(self, reader, writer, db, request_stats):
>          self.reader = reader
>          self.writer = writer
>          self.db = db
>          self.request_stats = request_stats
> +        self.max_chunk = DEFAULT_MAX_CHUNK
> +
> +        self.handlers = {
> +            'get': self.handle_get,
> +            'report': self.handle_report,
> +            'report-equiv': self.handle_equivreport,
> +            'get-stream': self.handle_get_stream,
> +            'get-stats': self.handle_get_stats,
> +            'reset-stats': self.handle_reset_stats,
> +            'chunk-stream': self.handle_chunk,
> +        }
>
>      async def process_requests(self):
>          try:
> @@ -125,7 +143,11 @@ class ServerClient(object):
>                  return
>
>              (proto_name, proto_version) = protocol.decode('utf-8').rstrip().split()
> -            if proto_name != 'OEHASHEQUIV' or proto_version != '1.0':
> +            if proto_name != 'OEHASHEQUIV':
> +                return
> +
> +            proto_version = tuple(int(v) for v in proto_version.split('.'))
> +            if proto_version < (1, 0) or proto_version > (1, 1):
>                  return
>
>              # Read headers. Currently, no headers are implemented, so look for
> @@ -140,40 +162,34 @@ class ServerClient(object):
>                      break
>
>              # Handle messages
> -            handlers = {
> -                'get': self.handle_get,
> -                'report': self.handle_report,
> -                'report-equiv': self.handle_equivreport,
> -                'get-stream': self.handle_get_stream,
> -                'get-stats': self.handle_get_stats,
> -                'reset-stats': self.handle_reset_stats,
> -            }
> -
>              while True:
>                  d = await self.read_message()
>                  if d is None:
>                      break
> -
> -                for k in handlers.keys():
> -                    if k in d:
> -                        logger.debug('Handling %s' % k)
> -                        if 'stream' in k:
> -                            await handlers[k](d[k])
> -                        else:
> -                            with self.request_stats.start_sample() as self.request_sample, \
> -                                    self.request_sample.measure():
> -                                await handlers[k](d[k])
> -                        break
> -                else:
> -                    logger.warning("Unrecognized command %r" % d)
> -                    break
> -
> +                await self.dispatch_message(d)
>                  await self.writer.drain()
> +        except ClientError as e:
> +            logger.error(str(e))
>          finally:
>              self.writer.close()
>
> +    async def dispatch_message(self, msg):
> +        for k in self.handlers.keys():
> +            if k in msg:
> +                logger.debug('Handling %s' % k)
> +                if 'stream' in k:
> +                    await self.handlers[k](msg[k])
> +                else:
> +                    with self.request_stats.start_sample() as self.request_sample, \
> +                            self.request_sample.measure():
> +                        await self.handlers[k](msg[k])
> +                return
> +
> +        raise ClientError("Unrecognized command %r" % msg)
> +
>      def write_message(self, msg):
> -        self.writer.write(('%s\n' % json.dumps(msg)).encode('utf-8'))
> +        for c in chunkify(json.dumps(msg), self.max_chunk):
> +            self.writer.write(c.encode('utf-8'))
>
>      async def read_message(self):
>          l = await self.reader.readline()
> @@ -191,14 +207,38 @@ class ServerClient(object):
>              logger.error('Bad message from client: %r' % message)
>              raise e
>
> +    async def handle_chunk(self, request):
> +        lines = []
> +        try:
> +            while True:
> +                l = await self.reader.readline()
> +                l = l.rstrip(b"\n").decode("utf-8")
> +                if not l:
> +                    break
> +                lines.append(l)
> +
> +            msg = json.loads(''.join(lines))
> +        except (json.JSONDecodeError, UnicodeDecodeError) as e:
> +            logger.error('Bad message from client: %r' % message)
> +            raise e
> +
> +        if 'chunk-stream' in msg:
> +            raise ClientError("Nested chunks are not allowed")
> +
> +        await self.dispatch_message(msg)
> +
>      async def handle_get(self, request):
>          method = request['method']
>          taskhash = request['taskhash']
>
> -        row = self.query_equivalent(method, taskhash)
> +        if request.get('all', False):
> +            row = self.query_equivalent(method, taskhash, self.ALL_QUERY)
> +        else:
> +            row = self.query_equivalent(method, taskhash, self.FAST_QUERY)
> +
>          if row is not None:
>              logger.debug('Found equivalent task %s -> %s', (row['taskhash'], row['unihash']))
> -            d = {k: row[k] for k in ('taskhash', 'method', 'unihash')}
> +            d = {k: row[k] for k in row.keys()}
>
>              self.write_message(d)
>          else:
> @@ -228,7 +268,7 @@ class ServerClient(object):
>
>                  (method, taskhash) = l.split()
>                  #logger.debug('Looking up %s %s' % (method, taskhash))
> -                row = self.query_equivalent(method, taskhash)
> +                row = self.query_equivalent(method, taskhash, self.FAST_QUERY)
>                  if row is not None:
>                      msg = ('%s\n' % row['unihash']).encode('utf-8')
>                      #logger.debug('Found equivalent task %s -> %s', (row['taskhash'], row['unihash']))
> @@ -328,7 +368,7 @@ class ServerClient(object):
>              # Fetch the unihash that will be reported for the taskhash. If the
>              # unihash matches, it means this row was inserted (or the mapping
>              # was already valid)
> -            row = self.query_equivalent(data['method'], data['taskhash'])
> +            row = self.query_equivalent(data['method'], data['taskhash'], self.FAST_QUERY)
>
>              if row['unihash'] == data['unihash']:
>                  logger.info('Adding taskhash equivalence for %s with unihash %s',
> @@ -354,12 +394,11 @@ class ServerClient(object):
>          self.request_stats.reset()
>          self.write_message(d)
>
> -    def query_equivalent(self, method, taskhash):
> +    def query_equivalent(self, method, taskhash, query):
>          # This is part of the inner loop and must be as fast as possible
>          try:
>              cursor = self.db.cursor()
> -            cursor.execute('SELECT taskhash, method, unihash FROM tasks_v2 WHERE method=:method AND taskhash=:taskhash ORDER BY created ASC LIMIT 1',
> -                           {'method': method, 'taskhash': taskhash})
> +            cursor.execute(query, {'method': method, 'taskhash': taskhash})
>              return cursor.fetchone()
>          except:
>              cursor.close()
> diff --git a/lib/hashserv/tests.py b/lib/hashserv/tests.py
> index a5472a99..6e862950 100644
> --- a/lib/hashserv/tests.py
> +++ b/lib/hashserv/tests.py
> @@ -99,6 +99,29 @@ class TestHashEquivalenceServer(object):
>          result = self.client.get_unihash(self.METHOD, taskhash)
>          self.assertEqual(result, unihash)
>
> +    def test_huge_message(self):
> +        # Simple test that hashes can be created
> +        taskhash = 'c665584ee6817aa99edfc77a44dd853828279370'
> +        outhash = '3c979c3db45c569f51ab7626a4651074be3a9d11a84b1db076f5b14f7d39db44'
> +        unihash = '90e9bc1d1f094c51824adca7f8ea79a048d68824'
> +
> +        result = self.client.get_unihash(self.METHOD, taskhash)
> +        self.assertIsNone(result, msg='Found unexpected task, %r' % result)
> +
> +        siginfo = "0" * (self.client.max_chunk * 4)
> +
> +        result = self.client.report_unihash(taskhash, self.METHOD, outhash, unihash, {
> +            'outhash_siginfo': siginfo
> +        })
> +        self.assertEqual(result['unihash'], unihash, 'Server returned bad unihash')
> +
> +        result = self.client.get_taskhash(self.METHOD, taskhash, True)
> +        self.assertEqual(result['taskhash'], taskhash)
> +        self.assertEqual(result['unihash'], unihash)
> +        self.assertEqual(result['method'], self.METHOD)
> +        self.assertEqual(result['outhash'], outhash)
> +        self.assertEqual(result['outhash_siginfo'], siginfo)
> +
>      def test_stress(self):
>          def query_server(failures):
>              client = Client(self.server.address)
> --
> 2.17.1

My understanding of
https://lists.openembedded.org/g/bitbake-devel/message/11453 is that
this isn't suitable for backporting to the LTS. I may be wrong though,
probably worth getting confirmation from Joshua or Richard (Cc'd).

--
Paul Barker
Konsulko Group

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [bitbake-devel] [bitbake][dunfell][1.46][PATCH 3/6] hashserv: Chunkify large messages
  2020-06-30 13:33   ` [bitbake-devel] " Paul Barker
@ 2020-06-30 14:05     ` Steve Sakoman
  2020-06-30 16:38       ` Richard Purdie
  0 siblings, 1 reply; 13+ messages in thread
From: Steve Sakoman @ 2020-06-30 14:05 UTC (permalink / raw)
  To: Paul Barker; +Cc: bitbake-devel, Joshua Watt, Richard Purdie

On Tue, Jun 30, 2020 at 3:33 AM Paul Barker <pbarker@konsulko.com> wrote:
>
> On Tue, 30 Jun 2020 at 04:09, Steve Sakoman <steve@sakoman.com> wrote:
> >
> > From: Joshua Watt <JPEWhacker@gmail.com>
> >
> > The hash equivalence client and server can occasionally send messages
> > that are too large for the server to fit in the receive buffer (64 KB).
> > To prevent this, support is added to the protocol to "chunkify" the
> > stream and break it up into manageable pieces that the server can each
> > side can back together.
> >
> > Ideally, this would be negotiated by the client and server, but it's
> > currently hard coded to 32 KB to prevent the round-trip delay.
> >
> > Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
> > Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
> > (cherry picked from commit e27a28c1e40e886ee68ba4b99b537ffc9c3577d4)
> > Signed-off-by: Steve Sakoman <steve@sakoman.com>
> > ---
> >  lib/hashserv/__init__.py |  22 ++++++++
> >  lib/hashserv/client.py   |  43 +++++++++++++---
> >  lib/hashserv/server.py   | 105 +++++++++++++++++++++++++++------------
> >  lib/hashserv/tests.py    |  23 +++++++++
> >  4 files changed, 152 insertions(+), 41 deletions(-)
> >
> > diff --git a/lib/hashserv/__init__.py b/lib/hashserv/__init__.py
> > index c3318620..f95e8f43 100644
> > --- a/lib/hashserv/__init__.py
> > +++ b/lib/hashserv/__init__.py
> > @@ -6,12 +6,20 @@
> >  from contextlib import closing
> >  import re
> >  import sqlite3
> > +import itertools
> > +import json
> >
> >  UNIX_PREFIX = "unix://"
> >
> >  ADDR_TYPE_UNIX = 0
> >  ADDR_TYPE_TCP = 1
> >
> > +# The Python async server defaults to a 64K receive buffer, so we hardcode our
> > +# maximum chunk size. It would be better if the client and server reported to
> > +# each other what the maximum chunk sizes were, but that will slow down the
> > +# connection setup with a round trip delay so I'd rather not do that unless it
> > +# is necessary
> > +DEFAULT_MAX_CHUNK = 32 * 1024
> >
> >  def setup_database(database, sync=True):
> >      db = sqlite3.connect(database)
> > @@ -66,6 +74,20 @@ def parse_address(addr):
> >          return (ADDR_TYPE_TCP, (host, int(port)))
> >
> >
> > +def chunkify(msg, max_chunk):
> > +    if len(msg) < max_chunk - 1:
> > +        yield ''.join((msg, "\n"))
> > +    else:
> > +        yield ''.join((json.dumps({
> > +                'chunk-stream': None
> > +            }), "\n"))
> > +
> > +        args = [iter(msg)] * (max_chunk - 1)
> > +        for m in map(''.join, itertools.zip_longest(*args, fillvalue='')):
> > +            yield ''.join(itertools.chain(m, "\n"))
> > +        yield "\n"
> > +
> > +
> >  def create_server(addr, dbname, *, sync=True):
> >      from . import server
> >      db = setup_database(dbname, sync=sync)
> > diff --git a/lib/hashserv/client.py b/lib/hashserv/client.py
> > index 46085d64..a29af836 100644
> > --- a/lib/hashserv/client.py
> > +++ b/lib/hashserv/client.py
> > @@ -7,6 +7,7 @@ import json
> >  import logging
> >  import socket
> >  import os
> > +from . import chunkify, DEFAULT_MAX_CHUNK
> >
> >
> >  logger = logging.getLogger('hashserv.client')
> > @@ -25,6 +26,7 @@ class Client(object):
> >          self.reader = None
> >          self.writer = None
> >          self.mode = self.MODE_NORMAL
> > +        self.max_chunk = DEFAULT_MAX_CHUNK
> >
> >      def connect_tcp(self, address, port):
> >          def connect_sock():
> > @@ -58,7 +60,7 @@ class Client(object):
> >              self.reader = self._socket.makefile('r', encoding='utf-8')
> >              self.writer = self._socket.makefile('w', encoding='utf-8')
> >
> > -            self.writer.write('OEHASHEQUIV 1.0\n\n')
> > +            self.writer.write('OEHASHEQUIV 1.1\n\n')
> >              self.writer.flush()
> >
> >              # Restore mode if the socket is being re-created
> > @@ -91,18 +93,35 @@ class Client(object):
> >                  count += 1
> >
> >      def send_message(self, msg):
> > +        def get_line():
> > +            line = self.reader.readline()
> > +            if not line:
> > +                raise HashConnectionError('Connection closed')
> > +
> > +            if not line.endswith('\n'):
> > +                raise HashConnectionError('Bad message %r' % message)
> > +
> > +            return line
> > +
> >          def proc():
> > -            self.writer.write('%s\n' % json.dumps(msg))
> > +            for c in chunkify(json.dumps(msg), self.max_chunk):
> > +                self.writer.write(c)
> >              self.writer.flush()
> >
> > -            l = self.reader.readline()
> > -            if not l:
> > -                raise HashConnectionError('Connection closed')
> > +            l = get_line()
> >
> > -            if not l.endswith('\n'):
> > -                raise HashConnectionError('Bad message %r' % message)
> > +            m = json.loads(l)
> > +            if 'chunk-stream' in m:
> > +                lines = []
> > +                while True:
> > +                    l = get_line().rstrip('\n')
> > +                    if not l:
> > +                        break
> > +                    lines.append(l)
> >
> > -            return json.loads(l)
> > +                m = json.loads(''.join(lines))
> > +
> > +            return m
> >
> >          return self._send_wrapper(proc)
> >
> > @@ -155,6 +174,14 @@ class Client(object):
> >          m['unihash'] = unihash
> >          return self.send_message({'report-equiv': m})
> >
> > +    def get_taskhash(self, method, taskhash, all_properties=False):
> > +        self._set_mode(self.MODE_NORMAL)
> > +        return self.send_message({'get': {
> > +            'taskhash': taskhash,
> > +            'method': method,
> > +            'all': all_properties
> > +        }})
> > +
> >      def get_stats(self):
> >          self._set_mode(self.MODE_NORMAL)
> >          return self.send_message({'get-stats': None})
> > diff --git a/lib/hashserv/server.py b/lib/hashserv/server.py
> > index cc7e4823..81050715 100644
> > --- a/lib/hashserv/server.py
> > +++ b/lib/hashserv/server.py
> > @@ -13,6 +13,7 @@ import os
> >  import signal
> >  import socket
> >  import time
> > +from . import chunkify, DEFAULT_MAX_CHUNK
> >
> >  logger = logging.getLogger('hashserv.server')
> >
> > @@ -107,12 +108,29 @@ class Stats(object):
> >          return {k: getattr(self, k) for k in ('num', 'total_time', 'max_time', 'average', 'stdev')}
> >
> >
> > +class ClientError(Exception):
> > +    pass
> > +
> >  class ServerClient(object):
> > +    FAST_QUERY = 'SELECT taskhash, method, unihash FROM tasks_v2 WHERE method=:method AND taskhash=:taskhash ORDER BY created ASC LIMIT 1'
> > +    ALL_QUERY =  'SELECT *                         FROM tasks_v2 WHERE method=:method AND taskhash=:taskhash ORDER BY created ASC LIMIT 1'
> > +
> >      def __init__(self, reader, writer, db, request_stats):
> >          self.reader = reader
> >          self.writer = writer
> >          self.db = db
> >          self.request_stats = request_stats
> > +        self.max_chunk = DEFAULT_MAX_CHUNK
> > +
> > +        self.handlers = {
> > +            'get': self.handle_get,
> > +            'report': self.handle_report,
> > +            'report-equiv': self.handle_equivreport,
> > +            'get-stream': self.handle_get_stream,
> > +            'get-stats': self.handle_get_stats,
> > +            'reset-stats': self.handle_reset_stats,
> > +            'chunk-stream': self.handle_chunk,
> > +        }
> >
> >      async def process_requests(self):
> >          try:
> > @@ -125,7 +143,11 @@ class ServerClient(object):
> >                  return
> >
> >              (proto_name, proto_version) = protocol.decode('utf-8').rstrip().split()
> > -            if proto_name != 'OEHASHEQUIV' or proto_version != '1.0':
> > +            if proto_name != 'OEHASHEQUIV':
> > +                return
> > +
> > +            proto_version = tuple(int(v) for v in proto_version.split('.'))
> > +            if proto_version < (1, 0) or proto_version > (1, 1):
> >                  return
> >
> >              # Read headers. Currently, no headers are implemented, so look for
> > @@ -140,40 +162,34 @@ class ServerClient(object):
> >                      break
> >
> >              # Handle messages
> > -            handlers = {
> > -                'get': self.handle_get,
> > -                'report': self.handle_report,
> > -                'report-equiv': self.handle_equivreport,
> > -                'get-stream': self.handle_get_stream,
> > -                'get-stats': self.handle_get_stats,
> > -                'reset-stats': self.handle_reset_stats,
> > -            }
> > -
> >              while True:
> >                  d = await self.read_message()
> >                  if d is None:
> >                      break
> > -
> > -                for k in handlers.keys():
> > -                    if k in d:
> > -                        logger.debug('Handling %s' % k)
> > -                        if 'stream' in k:
> > -                            await handlers[k](d[k])
> > -                        else:
> > -                            with self.request_stats.start_sample() as self.request_sample, \
> > -                                    self.request_sample.measure():
> > -                                await handlers[k](d[k])
> > -                        break
> > -                else:
> > -                    logger.warning("Unrecognized command %r" % d)
> > -                    break
> > -
> > +                await self.dispatch_message(d)
> >                  await self.writer.drain()
> > +        except ClientError as e:
> > +            logger.error(str(e))
> >          finally:
> >              self.writer.close()
> >
> > +    async def dispatch_message(self, msg):
> > +        for k in self.handlers.keys():
> > +            if k in msg:
> > +                logger.debug('Handling %s' % k)
> > +                if 'stream' in k:
> > +                    await self.handlers[k](msg[k])
> > +                else:
> > +                    with self.request_stats.start_sample() as self.request_sample, \
> > +                            self.request_sample.measure():
> > +                        await self.handlers[k](msg[k])
> > +                return
> > +
> > +        raise ClientError("Unrecognized command %r" % msg)
> > +
> >      def write_message(self, msg):
> > -        self.writer.write(('%s\n' % json.dumps(msg)).encode('utf-8'))
> > +        for c in chunkify(json.dumps(msg), self.max_chunk):
> > +            self.writer.write(c.encode('utf-8'))
> >
> >      async def read_message(self):
> >          l = await self.reader.readline()
> > @@ -191,14 +207,38 @@ class ServerClient(object):
> >              logger.error('Bad message from client: %r' % message)
> >              raise e
> >
> > +    async def handle_chunk(self, request):
> > +        lines = []
> > +        try:
> > +            while True:
> > +                l = await self.reader.readline()
> > +                l = l.rstrip(b"\n").decode("utf-8")
> > +                if not l:
> > +                    break
> > +                lines.append(l)
> > +
> > +            msg = json.loads(''.join(lines))
> > +        except (json.JSONDecodeError, UnicodeDecodeError) as e:
> > +            logger.error('Bad message from client: %r' % message)
> > +            raise e
> > +
> > +        if 'chunk-stream' in msg:
> > +            raise ClientError("Nested chunks are not allowed")
> > +
> > +        await self.dispatch_message(msg)
> > +
> >      async def handle_get(self, request):
> >          method = request['method']
> >          taskhash = request['taskhash']
> >
> > -        row = self.query_equivalent(method, taskhash)
> > +        if request.get('all', False):
> > +            row = self.query_equivalent(method, taskhash, self.ALL_QUERY)
> > +        else:
> > +            row = self.query_equivalent(method, taskhash, self.FAST_QUERY)
> > +
> >          if row is not None:
> >              logger.debug('Found equivalent task %s -> %s', (row['taskhash'], row['unihash']))
> > -            d = {k: row[k] for k in ('taskhash', 'method', 'unihash')}
> > +            d = {k: row[k] for k in row.keys()}
> >
> >              self.write_message(d)
> >          else:
> > @@ -228,7 +268,7 @@ class ServerClient(object):
> >
> >                  (method, taskhash) = l.split()
> >                  #logger.debug('Looking up %s %s' % (method, taskhash))
> > -                row = self.query_equivalent(method, taskhash)
> > +                row = self.query_equivalent(method, taskhash, self.FAST_QUERY)
> >                  if row is not None:
> >                      msg = ('%s\n' % row['unihash']).encode('utf-8')
> >                      #logger.debug('Found equivalent task %s -> %s', (row['taskhash'], row['unihash']))
> > @@ -328,7 +368,7 @@ class ServerClient(object):
> >              # Fetch the unihash that will be reported for the taskhash. If the
> >              # unihash matches, it means this row was inserted (or the mapping
> >              # was already valid)
> > -            row = self.query_equivalent(data['method'], data['taskhash'])
> > +            row = self.query_equivalent(data['method'], data['taskhash'], self.FAST_QUERY)
> >
> >              if row['unihash'] == data['unihash']:
> >                  logger.info('Adding taskhash equivalence for %s with unihash %s',
> > @@ -354,12 +394,11 @@ class ServerClient(object):
> >          self.request_stats.reset()
> >          self.write_message(d)
> >
> > -    def query_equivalent(self, method, taskhash):
> > +    def query_equivalent(self, method, taskhash, query):
> >          # This is part of the inner loop and must be as fast as possible
> >          try:
> >              cursor = self.db.cursor()
> > -            cursor.execute('SELECT taskhash, method, unihash FROM tasks_v2 WHERE method=:method AND taskhash=:taskhash ORDER BY created ASC LIMIT 1',
> > -                           {'method': method, 'taskhash': taskhash})
> > +            cursor.execute(query, {'method': method, 'taskhash': taskhash})
> >              return cursor.fetchone()
> >          except:
> >              cursor.close()
> > diff --git a/lib/hashserv/tests.py b/lib/hashserv/tests.py
> > index a5472a99..6e862950 100644
> > --- a/lib/hashserv/tests.py
> > +++ b/lib/hashserv/tests.py
> > @@ -99,6 +99,29 @@ class TestHashEquivalenceServer(object):
> >          result = self.client.get_unihash(self.METHOD, taskhash)
> >          self.assertEqual(result, unihash)
> >
> > +    def test_huge_message(self):
> > +        # Simple test that hashes can be created
> > +        taskhash = 'c665584ee6817aa99edfc77a44dd853828279370'
> > +        outhash = '3c979c3db45c569f51ab7626a4651074be3a9d11a84b1db076f5b14f7d39db44'
> > +        unihash = '90e9bc1d1f094c51824adca7f8ea79a048d68824'
> > +
> > +        result = self.client.get_unihash(self.METHOD, taskhash)
> > +        self.assertIsNone(result, msg='Found unexpected task, %r' % result)
> > +
> > +        siginfo = "0" * (self.client.max_chunk * 4)
> > +
> > +        result = self.client.report_unihash(taskhash, self.METHOD, outhash, unihash, {
> > +            'outhash_siginfo': siginfo
> > +        })
> > +        self.assertEqual(result['unihash'], unihash, 'Server returned bad unihash')
> > +
> > +        result = self.client.get_taskhash(self.METHOD, taskhash, True)
> > +        self.assertEqual(result['taskhash'], taskhash)
> > +        self.assertEqual(result['unihash'], unihash)
> > +        self.assertEqual(result['method'], self.METHOD)
> > +        self.assertEqual(result['outhash'], outhash)
> > +        self.assertEqual(result['outhash_siginfo'], siginfo)
> > +
> >      def test_stress(self):
> >          def query_server(failures):
> >              client = Client(self.server.address)
> > --
> > 2.17.1
>
> My understanding of
> https://lists.openembedded.org/g/bitbake-devel/message/11453 is that
> this isn't suitable for backporting to the LTS. I may be wrong though,
> probably worth getting confirmation from Joshua or Richard (Cc'd).

Richard was the one who suggested I take this patch, but definitely
worth getting confirmation!

Steve

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [bitbake-devel] [bitbake][dunfell][1.46][PATCH 1/6] taskdata: Improve handling of regex in ASSUME_PROVIDED
  2020-06-30 13:27   ` [bitbake-devel] " Paul Barker
@ 2020-06-30 14:11     ` Steve Sakoman
  0 siblings, 0 replies; 13+ messages in thread
From: Steve Sakoman @ 2020-06-30 14:11 UTC (permalink / raw)
  To: Paul Barker; +Cc: bitbake-devel

On Tue, Jun 30, 2020 at 3:27 AM Paul Barker <pbarker@konsulko.com> wrote:
>
> On Tue, 30 Jun 2020 at 04:08, Steve Sakoman <steve@sakoman.com> wrote:
> >
> > From: Richard Purdie <richard.purdie@linuxfoundation.org>
> >
> > ASSUME_PROVIDED can take regexs however the current way of handling
> > this in code is suboptimal. It means that you can add something like:
> >
> > DEPENDS += "texinfo-nativejunk-that-does-not-exist"
> >
> > and if texinfo-native is in ASSUME_PROVIDED, no error will occur.
> >
> > Update the code to only treat something as a regex if a start or end
> > anchor character is present (which wouldn't be valid in a recipe name).
> >
> > [YOCTO #13893]
> >
> > Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
> > (cherry picked from commit 3d72e23109990970fbb1086923277af752168b4a)
> > Signed-off-by: Steve Sakoman <steve@sakoman.com>
> > ---
> >  lib/bb/taskdata.py | 9 +++++++--
> >  1 file changed, 7 insertions(+), 2 deletions(-)
> >
> > diff --git a/lib/bb/taskdata.py b/lib/bb/taskdata.py
> > index d13a1249..ffbaf362 100644
> > --- a/lib/bb/taskdata.py
> > +++ b/lib/bb/taskdata.py
> > @@ -21,8 +21,13 @@ def re_match_strings(target, strings):
> >      Whether or not the string 'target' matches
> >      any one string of the strings which can be regular expression string
> >      """
> > -    return any(name == target or re.match(name, target)
> > -               for name in strings)
> > +    for name in strings:
> > +        if name.startswith("^") or name.endswith("$"):
> > +            if re.match(name, target):
> > +                return True
> > +        elif name == target:
> > +            return True
> > +    return False
> >
> >  class TaskEntry:
> >      def __init__(self):
> > --
> > 2.17.1
>
> I'm not sure we should be changing how ASSUME_PROVIDED is parsed on a
> stable branch. If we do backport this we should at least issue a
> warning where the behaviour changes (i.e. if re.match(name, target) is
> true but name != target and name doesn't start with ^ or end with $).

I went back and forth on taking this patch since it is kind of a bug
fix. I decided to add it to the review list and see if anyone had
comments.

I'm inclined now to drop it since the bug condition is rare and it is
a behaviour change.

So I'll be removing it from the final pull request.

Thanks for reviewing the patch set, it is much appreciated.

Steve

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [bitbake-devel] [bitbake][dunfell][1.46][PATCH 3/6] hashserv: Chunkify large messages
  2020-06-30 14:05     ` Steve Sakoman
@ 2020-06-30 16:38       ` Richard Purdie
  0 siblings, 0 replies; 13+ messages in thread
From: Richard Purdie @ 2020-06-30 16:38 UTC (permalink / raw)
  To: Steve Sakoman, Paul Barker; +Cc: bitbake-devel, Joshua Watt

On Tue, 2020-06-30 at 04:05 -1000, Steve Sakoman wrote:
> On Tue, Jun 30, 2020 at 3:33 AM Paul Barker <pbarker@konsulko.com>
> wrote:
> > On Tue, 30 Jun 2020 at 04:09, Steve Sakoman <steve@sakoman.com>
> > wrote:
> > > From: Joshua Watt <JPEWhacker@gmail.com>
> > > 
> > > The hash equivalence client and server can occasionally send
> > > messages
> > > that are too large for the server to fit in the receive buffer
> > > (64 KB).
> > > To prevent this, support is added to the protocol to "chunkify"
> > > the
> > > stream and break it up into manageable pieces that the server can
> > > each
> > > side can back together.
> > > 
> > > Ideally, this would be negotiated by the client and server, but
> > > it's
> > > currently hard coded to 32 KB to prevent the round-trip delay.
> > > 
> > > Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
> > > Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org
> > > >
> > > (cherry picked from commit
> > > e27a28c1e40e886ee68ba4b99b537ffc9c3577d4)
> > > Signed-off-by: Steve Sakoman <steve@sakoman.com>
> > > ---
> > > 
> > My understanding of
> > https://lists.openembedded.org/g/bitbake-devel/message/11453 is
> > that
> > this isn't suitable for backporting to the LTS. I may be wrong
> > though,
> > probably worth getting confirmation from Joshua or Richard (Cc'd).
> 
> Richard was the one who suggested I take this patch, but definitely
> worth getting confirmation!

People would run into the same issue on dunfell as we have in master
with large data sizes. The change does require any server is upgraded
so we will need to release note that.

We did run into an issue where we hadn't upgarded the autobuilder
server, all was fine once we did and it was obvious there was a
problem.

I'm leaning towards using the same code in dunfell and master for this
area. dunfell is already using the master upgraded server on the
infrastructure.

Cheers,

Richard


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [bitbake][dunfell][1.46][PATCH 0/6] Patch review
@ 2020-06-02  0:39 Steve Sakoman
  0 siblings, 0 replies; 13+ messages in thread
From: Steve Sakoman @ 2020-06-02  0:39 UTC (permalink / raw)
  To: bitbake-devel

Please review this set of patches for the upcoming 3.1.1 release and have comments back
by end of day Wednesday.

The following changes since commit b94dec477a8d48ebceec91952ba290798c56c1f5:

  bitbake.conf: Drop unneeded variables from bitbake.conf (2020-04-24 14:31:30 +0100)

are available in the Git repository at:

  git://git.openembedded.org/bitbake-contrib stable/1.46-nut
  http://cgit.openembedded.org/bitbake-contrib/log/?h=stable/1.46-nut

Jacob Kroon (2):
  doc: Clarify how task dependencies relate to RDEPENDS
  doc: More explanation to tasks that recursively depend on themselves

Kai Kang (1):
  bitbake-user-manual-metadata.xml: fix a minor error

Robert P. J. Day (2):
  docs: delete reference to obsolete recipe-depends.dot
  user manual: properly tag content as <replaceable>

Tim Orling (1):
  toaster-requirements.txt: require Django 2.2

 .../bitbake-user-manual-execution.xml         |  4 ++--
 .../bitbake-user-manual-intro.xml             |  7 +------
 .../bitbake-user-manual-metadata.xml          | 19 ++++++++++++-------
 toaster-requirements.txt                      |  2 +-
 4 files changed, 16 insertions(+), 16 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-06-30 16:38 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-30  3:08 [bitbake][dunfell][1.46][PATCH 0/6] Patch review Steve Sakoman
2020-06-30  3:08 ` [bitbake][dunfell][1.46][PATCH 1/6] taskdata: Improve handling of regex in ASSUME_PROVIDED Steve Sakoman
2020-06-30 13:27   ` [bitbake-devel] " Paul Barker
2020-06-30 14:11     ` Steve Sakoman
2020-06-30  3:08 ` [bitbake][dunfell][1.46][PATCH 2/6] test/fetch: change to better svn source Steve Sakoman
2020-06-30  3:08 ` [bitbake][dunfell][1.46][PATCH 3/6] hashserv: Chunkify large messages Steve Sakoman
2020-06-30 13:33   ` [bitbake-devel] " Paul Barker
2020-06-30 14:05     ` Steve Sakoman
2020-06-30 16:38       ` Richard Purdie
2020-06-30  3:08 ` [bitbake][dunfell][1.46][PATCH 4/6] siggen: Fix error when hash equivalence has an exception Steve Sakoman
2020-06-30  3:08 ` [bitbake][dunfell][1.46][PATCH 5/6] runqueue: Avoid unpickle errors in rare cases Steve Sakoman
2020-06-30  3:08 ` [bitbake][dunfell][1.46][PATCH 6/6] msg: Avoid issues where paths have relative components Steve Sakoman
  -- strict thread matches above, loose matches on Subject: below --
2020-06-02  0:39 [bitbake][dunfell][1.46][PATCH 0/6] Patch review Steve Sakoman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.