All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/8] hashserv read-only mode & upstream fixes
@ 2021-02-01 11:53 Paul Barker
  2021-02-01 11:53 ` [RFC PATCH 1/8] bitbake-hashclient: Remove obsolete call to client.connect Paul Barker
                   ` (7 more replies)
  0 siblings, 8 replies; 14+ messages in thread
From: Paul Barker @ 2021-02-01 11:53 UTC (permalink / raw)
  To: bitbake-devel, Richard Purdie, Joshua Watt; +Cc: Paul Barker

* Implement a read-only mode for the hash equivalence server. This mode is
  useful when you wish to populate a hash equivalence database from CI or other
  internal builds (via a read-write server instance) and also allow external
  clients to query this database (via a read-only server instance). External
  clients can therefore be prevented from adding hash equivalences to the
  server which correspond to sstate artifacts which would be missing in a
  primary sstate cache. This mode is enabled using the -r/--read-only
  argument to bitbake-hashserv.

* Expose the existing upstream server support via a -u/--upstream argument to
  bitbake-hashserv.

* Support querying an upstream server using the new `get-outhash` message when
  the server handles a `report` message from a client and a match is not found
  in the server's own database. This is important as the `report` message is
  used by bitbake when a task finishes executing to check if the task outhash
  matches the outhash for any previous execution. With this support such
  matches can now be found in a local (read-write) hash equivalence db as
  well as in an upstream (potentially read-only and/or remote) db.

* Other minor hashserv fixes.

These changes have been tested locally using the following setup:

1) Build core-image-base with BB_HASHSERVE = "auto". Additional logging was
   also enabled following the instructions in 
   https://docs.yoctoproject.org/bitbake/bitbake-user-manual/bitbake-user-manual-execution.html#logging.

2) Move the hashserv.db file into a new 'upstream-hashserv' directory. Start
   the upstream server in read-only mode using the following command in that
   directory:

    bitbake-hashserv -r -l DEBUG

3) Create an empty 'downstream-hashserv' directory. Start the downstream
   (local) server with an empty db using the following command in that
   directory:

    bitbake-hashserv -u "unix://../upstream-hashserv/hashserve.sock" -l DEBUG

4) Modify local.conf to set 
   BB_HASHSERVE = "unix://${TOPDIR}/downstream-hashserv/hashserve.sock".

5) Add an 'echo hello' command to do_configure for glibc to force a rebuild
   which should result in a matching hash equivalence.

6) Build core-image-base again, confirm that glibc is rebuilt but then a hash
   equivalence is found (copied from the upstream server into the downstream
   server) and dependent tasks are pulled from the sstate cache instead of
   being rebuilt.

This is an RFC series as it still needs documentation to be written and
selftest cases to be added. However it'd be great to get some feedback at
this stage before moving on to that work.

These changes can also be pulled from:

  https://gitlab.com/pbarker.dev/staging/bitbake.git
  tag: hashserv_2020-02-01

Let me know if you have any questions/feedback :)

Paul Barker (8):
  bitbake-hashclient: Remove obsolete call to client.connect
  hashserv: client: Fix handling of null responses
  hashserv: server: Fix logger.debug calls
  hashserv: Support read-only server
  hashserv: Support upstream command line argument
  hashserv: Add short forms of remaining command line arguments
  hashserv: Add get-outhash message
  hashserv: server: Support searching upstream for outhash

 bin/bitbake-hashclient   |  3 --
 bin/bitbake-hashserv     | 10 +++--
 lib/hashserv/__init__.py |  4 +-
 lib/hashserv/client.py   |  8 +++-
 lib/hashserv/server.py   | 85 ++++++++++++++++++++++++++++++++++------
 5 files changed, 88 insertions(+), 22 deletions(-)

-- 
2.26.2


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC PATCH 1/8] bitbake-hashclient: Remove obsolete call to client.connect
  2021-02-01 11:53 [RFC PATCH 0/8] hashserv read-only mode & upstream fixes Paul Barker
@ 2021-02-01 11:53 ` Paul Barker
  2021-02-01 11:53 ` [RFC PATCH 2/8] hashserv: client: Fix handling of null responses Paul Barker
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Paul Barker @ 2021-02-01 11:53 UTC (permalink / raw)
  To: bitbake-devel, Richard Purdie, Joshua Watt; +Cc: Paul Barker

The connect function was previously removed from the hashserv Client
class but the bitbake-hashclient app was not updated. The client is
connected during hashserv.create_client() anyway so not separate connect
call is needed.

Signed-off-by: Paul Barker <pbarker@konsulko.com>
---
 bin/bitbake-hashclient | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/bin/bitbake-hashclient b/bin/bitbake-hashclient
index 29ab65f17..a89290217 100755
--- a/bin/bitbake-hashclient
+++ b/bin/bitbake-hashclient
@@ -151,9 +151,6 @@ def main():
     func = getattr(args, 'func', None)
     if func:
         client = hashserv.create_client(args.address)
-        # Try to establish a connection to the server now to detect failures
-        # early
-        client.connect()
 
         return func(args, client)
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 2/8] hashserv: client: Fix handling of null responses
  2021-02-01 11:53 [RFC PATCH 0/8] hashserv read-only mode & upstream fixes Paul Barker
  2021-02-01 11:53 ` [RFC PATCH 1/8] bitbake-hashclient: Remove obsolete call to client.connect Paul Barker
@ 2021-02-01 11:53 ` Paul Barker
  2021-02-01 11:53 ` [RFC PATCH 3/8] hashserv: server: Fix logger.debug calls Paul Barker
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Paul Barker @ 2021-02-01 11:53 UTC (permalink / raw)
  To: bitbake-devel, Richard Purdie, Joshua Watt; +Cc: Paul Barker

If the server returns an empty response ("null" in json), this cannot
be iterated to check for the presence of the "chunk-stream" key.

Signed-off-by: Paul Barker <pbarker@konsulko.com>
---
 lib/hashserv/client.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/hashserv/client.py b/lib/hashserv/client.py
index 0ffd0c2ae..0b7f4e42e 100644
--- a/lib/hashserv/client.py
+++ b/lib/hashserv/client.py
@@ -99,7 +99,7 @@ class AsyncClient(object):
             l = await get_line()
 
             m = json.loads(l)
-            if "chunk-stream" in m:
+            if m and "chunk-stream" in m:
                 lines = []
                 while True:
                     l = (await get_line()).rstrip("\n")
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 3/8] hashserv: server: Fix logger.debug calls
  2021-02-01 11:53 [RFC PATCH 0/8] hashserv read-only mode & upstream fixes Paul Barker
  2021-02-01 11:53 ` [RFC PATCH 1/8] bitbake-hashclient: Remove obsolete call to client.connect Paul Barker
  2021-02-01 11:53 ` [RFC PATCH 2/8] hashserv: client: Fix handling of null responses Paul Barker
@ 2021-02-01 11:53 ` Paul Barker
  2021-02-01 11:53 ` [RFC PATCH 4/8] hashserv: Support read-only server Paul Barker
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Paul Barker @ 2021-02-01 11:53 UTC (permalink / raw)
  To: bitbake-devel, Richard Purdie, Joshua Watt; +Cc: Paul Barker

The first argument to debug calls in bitbake should be a debug level.

Signed-off-by: Paul Barker <pbarker@konsulko.com>
---
 lib/hashserv/server.py | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/lib/hashserv/server.py b/lib/hashserv/server.py
index 3ff4c51cc..fa0d3410b 100644
--- a/lib/hashserv/server.py
+++ b/lib/hashserv/server.py
@@ -168,7 +168,7 @@ class ServerClient(object):
 
 
             self.addr = self.writer.get_extra_info('peername')
-            logger.debug('Client %r connected' % (self.addr,))
+            logger.debug(1, 'Client %r connected' % (self.addr,))
 
             # Read protocol and version
             protocol = await self.reader.readline()
@@ -212,7 +212,7 @@ class ServerClient(object):
     async def dispatch_message(self, msg):
         for k in self.handlers.keys():
             if k in msg:
-                logger.debug('Handling %s' % k)
+                logger.debug(1, 'Handling %s' % k)
                 if 'stream' in k:
                     await self.handlers[k](msg[k])
                 else:
@@ -273,7 +273,7 @@ class ServerClient(object):
             row = self.query_equivalent(method, taskhash, self.FAST_QUERY)
 
         if row is not None:
-            logger.debug('Found equivalent task %s -> %s', (row['taskhash'], row['unihash']))
+            logger.debug(1, 'Found equivalent task %s -> %s', (row['taskhash'], row['unihash']))
             d = {k: row[k] for k in row.keys()}
         elif self.upstream_client is not None:
             d = await copy_from_upstream(self.upstream_client, self.db, method, taskhash)
@@ -307,11 +307,11 @@ class ServerClient(object):
                     return
 
                 (method, taskhash) = l.split()
-                #logger.debug('Looking up %s %s' % (method, taskhash))
+                #logger.debug(1, 'Looking up %s %s' % (method, taskhash))
                 row = self.query_equivalent(method, taskhash, self.FAST_QUERY)
                 if row is not None:
                     msg = ('%s\n' % row['unihash']).encode('utf-8')
-                    #logger.debug('Found equivalent task %s -> %s', (row['taskhash'], row['unihash']))
+                    #logger.debug(1, 'Found equivalent task %s -> %s', (row['taskhash'], row['unihash']))
                 elif self.upstream_client is not None:
                     upstream = await self.upstream_client.get_unihash(method, taskhash)
                     if upstream:
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 4/8] hashserv: Support read-only server
  2021-02-01 11:53 [RFC PATCH 0/8] hashserv read-only mode & upstream fixes Paul Barker
                   ` (2 preceding siblings ...)
  2021-02-01 11:53 ` [RFC PATCH 3/8] hashserv: server: Fix logger.debug calls Paul Barker
@ 2021-02-01 11:53 ` Paul Barker
  2021-02-01 11:53 ` [RFC PATCH 5/8] hashserv: Support upstream command line argument Paul Barker
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Paul Barker @ 2021-02-01 11:53 UTC (permalink / raw)
  To: bitbake-devel, Richard Purdie, Joshua Watt; +Cc: Paul Barker

The -r/--readonly argument is added to the bitbake-hashserv app. If this
argument is given then clients may only perform read operations against
the server. The read-only mode is implemented by simply not installing
handlers for write operations, this keeps the permission model simple
and reduces the risk of accidentally allowing write operations.

As a sqlite database can be safely opened by multiple processes in
parallel, it's possible to start two hashserv instances against a single
database if you wish to export both a read-only port and a read-write
port.

Signed-off-by: Paul Barker <pbarker@konsulko.com>
---
 bin/bitbake-hashserv     |  3 ++-
 lib/hashserv/__init__.py |  4 ++--
 lib/hashserv/server.py   | 25 ++++++++++++++++++-------
 3 files changed, 22 insertions(+), 10 deletions(-)

diff --git a/bin/bitbake-hashserv b/bin/bitbake-hashserv
index 1bc1f91f3..2669bbd13 100755
--- a/bin/bitbake-hashserv
+++ b/bin/bitbake-hashserv
@@ -33,6 +33,7 @@ def main():
     parser.add_argument('--bind', default=DEFAULT_BIND, help='Bind address (default "%(default)s")')
     parser.add_argument('--database', default='./hashserv.db', help='Database file (default "%(default)s")')
     parser.add_argument('--log', default='WARNING', help='Set logging level')
+    parser.add_argument('-r', '--read-only', action='store_true', help='Disallow write operations from clients')
 
     args = parser.parse_args()
 
@@ -47,7 +48,7 @@ def main():
     console.setLevel(level)
     logger.addHandler(console)
 
-    server = hashserv.create_server(args.bind, args.database)
+    server = hashserv.create_server(args.bind, args.database, read_only=args.read_only)
     server.serve_forever()
     return 0
 
diff --git a/lib/hashserv/__init__.py b/lib/hashserv/__init__.py
index 55f48410d..5f2e101e5 100644
--- a/lib/hashserv/__init__.py
+++ b/lib/hashserv/__init__.py
@@ -94,10 +94,10 @@ def chunkify(msg, max_chunk):
         yield "\n"
 
 
-def create_server(addr, dbname, *, sync=True, upstream=None):
+def create_server(addr, dbname, *, sync=True, upstream=None, read_only=False):
     from . import server
     db = setup_database(dbname, sync=sync)
-    s = server.Server(db, upstream=upstream)
+    s = server.Server(db, upstream=upstream, read_only=read_only)
 
     (typ, a) = parse_address(addr)
     if typ == ADDR_TYPE_UNIX:
diff --git a/lib/hashserv/server.py b/lib/hashserv/server.py
index fa0d3410b..b1e8b2f89 100644
--- a/lib/hashserv/server.py
+++ b/lib/hashserv/server.py
@@ -112,6 +112,9 @@ class Stats(object):
 class ClientError(Exception):
     pass
 
+class ServerError(Exception):
+    pass
+
 def insert_task(cursor, data, ignore=False):
     keys = sorted(data.keys())
     query = '''INSERT%s INTO tasks_v2 (%s) VALUES (%s)''' % (
@@ -138,7 +141,7 @@ class ServerClient(object):
     FAST_QUERY = 'SELECT taskhash, method, unihash FROM tasks_v2 WHERE method=:method AND taskhash=:taskhash ORDER BY created ASC LIMIT 1'
     ALL_QUERY =  'SELECT *                         FROM tasks_v2 WHERE method=:method AND taskhash=:taskhash ORDER BY created ASC LIMIT 1'
 
-    def __init__(self, reader, writer, db, request_stats, backfill_queue, upstream):
+    def __init__(self, reader, writer, db, request_stats, backfill_queue, upstream, read_only):
         self.reader = reader
         self.writer = writer
         self.db = db
@@ -149,15 +152,19 @@ class ServerClient(object):
 
         self.handlers = {
             'get': self.handle_get,
-            'report': self.handle_report,
-            'report-equiv': self.handle_equivreport,
             'get-stream': self.handle_get_stream,
             'get-stats': self.handle_get_stats,
-            'reset-stats': self.handle_reset_stats,
             'chunk-stream': self.handle_chunk,
-            'backfill-wait': self.handle_backfill_wait,
         }
 
+        if not read_only:
+            self.handlers.update({
+                'report': self.handle_report,
+                'report-equiv': self.handle_equivreport,
+                'reset-stats': self.handle_reset_stats,
+                'backfill-wait': self.handle_backfill_wait,
+            })
+
     async def process_requests(self):
         if self.upstream is not None:
             self.upstream_client = await create_async_client(self.upstream)
@@ -455,7 +462,10 @@ class ServerClient(object):
 
 
 class Server(object):
-    def __init__(self, db, loop=None, upstream=None):
+    def __init__(self, db, loop=None, upstream=None, read_only=False):
+        if upstream and read_only:
+            raise ServerError("Read-only hashserv cannot pull from an upstream server")
+
         self.request_stats = Stats()
         self.db = db
 
@@ -467,6 +477,7 @@ class Server(object):
             self.close_loop = False
 
         self.upstream = upstream
+        self.read_only = read_only
 
         self._cleanup_socket = None
 
@@ -510,7 +521,7 @@ class Server(object):
     async def handle_client(self, reader, writer):
         # writer.transport.set_write_buffer_limits(0)
         try:
-            client = ServerClient(reader, writer, self.db, self.request_stats, self.backfill_queue, self.upstream)
+            client = ServerClient(reader, writer, self.db, self.request_stats, self.backfill_queue, self.upstream, self.read_only)
             await client.process_requests()
         except Exception as e:
             import traceback
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 5/8] hashserv: Support upstream command line argument
  2021-02-01 11:53 [RFC PATCH 0/8] hashserv read-only mode & upstream fixes Paul Barker
                   ` (3 preceding siblings ...)
  2021-02-01 11:53 ` [RFC PATCH 4/8] hashserv: Support read-only server Paul Barker
@ 2021-02-01 11:53 ` Paul Barker
  2021-02-01 11:53 ` [RFC PATCH 6/8] hashserv: Add short forms of remaining command line arguments Paul Barker
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Paul Barker @ 2021-02-01 11:53 UTC (permalink / raw)
  To: bitbake-devel, Richard Purdie, Joshua Watt; +Cc: Paul Barker

The hashserv server already implements support for pulling hash data
from another "upstream" server. Add the -u/--upstream argument to the
bitbake-hashserv app to expose this functionality to users.

Signed-off-by: Paul Barker <pbarker@konsulko.com>
---
 bin/bitbake-hashserv | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/bin/bitbake-hashserv b/bin/bitbake-hashserv
index 2669bbd13..ab71f4e6c 100755
--- a/bin/bitbake-hashserv
+++ b/bin/bitbake-hashserv
@@ -33,6 +33,7 @@ def main():
     parser.add_argument('--bind', default=DEFAULT_BIND, help='Bind address (default "%(default)s")')
     parser.add_argument('--database', default='./hashserv.db', help='Database file (default "%(default)s")')
     parser.add_argument('--log', default='WARNING', help='Set logging level')
+    parser.add_argument('-u', '--upstream', help='Upstream hashserv to pull hashes from')
     parser.add_argument('-r', '--read-only', action='store_true', help='Disallow write operations from clients')
 
     args = parser.parse_args()
@@ -48,7 +49,7 @@ def main():
     console.setLevel(level)
     logger.addHandler(console)
 
-    server = hashserv.create_server(args.bind, args.database, read_only=args.read_only)
+    server = hashserv.create_server(args.bind, args.database, upstream=args.upstream, read_only=args.read_only)
     server.serve_forever()
     return 0
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 6/8] hashserv: Add short forms of remaining command line arguments
  2021-02-01 11:53 [RFC PATCH 0/8] hashserv read-only mode & upstream fixes Paul Barker
                   ` (4 preceding siblings ...)
  2021-02-01 11:53 ` [RFC PATCH 5/8] hashserv: Support upstream command line argument Paul Barker
@ 2021-02-01 11:53 ` Paul Barker
  2021-02-01 11:53 ` [RFC PATCH 7/8] hashserv: Add get-outhash message Paul Barker
  2021-02-01 11:53 ` [RFC PATCH 8/8] hashserv: server: Support searching upstream for outhash Paul Barker
  7 siblings, 0 replies; 14+ messages in thread
From: Paul Barker @ 2021-02-01 11:53 UTC (permalink / raw)
  To: bitbake-devel, Richard Purdie, Joshua Watt; +Cc: Paul Barker

Short form arguments are added for convenience.

Signed-off-by: Paul Barker <pbarker@konsulko.com>
---
 bin/bitbake-hashserv | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/bin/bitbake-hashserv b/bin/bitbake-hashserv
index ab71f4e6c..153f65a37 100755
--- a/bin/bitbake-hashserv
+++ b/bin/bitbake-hashserv
@@ -30,9 +30,9 @@ def main():
                                                "--bind [::1]:8686"'''
                                      )
 
-    parser.add_argument('--bind', default=DEFAULT_BIND, help='Bind address (default "%(default)s")')
-    parser.add_argument('--database', default='./hashserv.db', help='Database file (default "%(default)s")')
-    parser.add_argument('--log', default='WARNING', help='Set logging level')
+    parser.add_argument('-b', '--bind', default=DEFAULT_BIND, help='Bind address (default "%(default)s")')
+    parser.add_argument('-d', '--database', default='./hashserv.db', help='Database file (default "%(default)s")')
+    parser.add_argument('-l', '--log', default='WARNING', help='Set logging level')
     parser.add_argument('-u', '--upstream', help='Upstream hashserv to pull hashes from')
     parser.add_argument('-r', '--read-only', action='store_true', help='Disallow write operations from clients')
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 7/8] hashserv: Add get-outhash message
  2021-02-01 11:53 [RFC PATCH 0/8] hashserv read-only mode & upstream fixes Paul Barker
                   ` (5 preceding siblings ...)
  2021-02-01 11:53 ` [RFC PATCH 6/8] hashserv: Add short forms of remaining command line arguments Paul Barker
@ 2021-02-01 11:53 ` Paul Barker
  2021-02-01 14:09   ` Joshua Watt
  2021-02-01 11:53 ` [RFC PATCH 8/8] hashserv: server: Support searching upstream for outhash Paul Barker
  7 siblings, 1 reply; 14+ messages in thread
From: Paul Barker @ 2021-02-01 11:53 UTC (permalink / raw)
  To: bitbake-devel, Richard Purdie, Joshua Watt; +Cc: Paul Barker

The get-outhash message can be sent via the get_outhash client method.
This works in a similar way to the get message but looks up a db entry
by outhash rather than by taskhash. It is intended to be used as a
read-only form of the report message.

Signed-off-by: Paul Barker <pbarker@konsulko.com>
---
 lib/hashserv/client.py |  6 ++++++
 lib/hashserv/server.py | 28 ++++++++++++++++++++++++++++
 2 files changed, 34 insertions(+)

diff --git a/lib/hashserv/client.py b/lib/hashserv/client.py
index 0b7f4e42e..e05c1eb56 100644
--- a/lib/hashserv/client.py
+++ b/lib/hashserv/client.py
@@ -170,6 +170,12 @@ class AsyncClient(object):
             {"get": {"taskhash": taskhash, "method": method, "all": all_properties}}
         )
 
+    async def get_outhash(self, method, outhash, taskhash):
+        await self._set_mode(self.MODE_NORMAL)
+        return await self.send_message(
+            {"get-outhash": {"outhash": outhash, "taskhash": taskhash, "method": method}}
+        )
+
     async def get_stats(self):
         await self._set_mode(self.MODE_NORMAL)
         return await self.send_message({"get-stats": None})
diff --git a/lib/hashserv/server.py b/lib/hashserv/server.py
index b1e8b2f89..054791b8b 100644
--- a/lib/hashserv/server.py
+++ b/lib/hashserv/server.py
@@ -152,6 +152,7 @@ class ServerClient(object):
 
         self.handlers = {
             'get': self.handle_get,
+            'get-outhash': self.handle_get_outhash,
             'get-stream': self.handle_get_stream,
             'get-stats': self.handle_get_stats,
             'chunk-stream': self.handle_chunk,
@@ -289,6 +290,33 @@ class ServerClient(object):
 
         self.write_message(d)
 
+    async def handle_get_outhash(self, request):
+        with closing(self.db.cursor()) as cursor:
+            cursor.execute('''
+                -- Find tasks with a matching outhash (that is, tasks that
+                -- are equivalent)
+                SELECT * FROM tasks_v2 WHERE method=:method AND outhash=:outhash
+
+                -- If there is an exact match on the taskhash, return it.
+                -- Otherwise return the oldest matching outhash of any
+                -- taskhash
+                ORDER BY CASE WHEN taskhash=:taskhash THEN 1 ELSE 2 END,
+                    created ASC
+
+                -- Only return one row
+                LIMIT 1
+                ''', {k: request[k] for k in ('method', 'outhash', 'taskhash')})
+
+            row = cursor.fetchone()
+
+        if row is not None:
+            logger.debug(1, 'Found equivalent outhash %s -> %s', (row['outhash'], row['unihash']))
+            d = {k: row[k] for k in row.keys()}
+        else:
+            d = None
+
+        self.write_message(d)
+
     async def handle_get_stream(self, request):
         self.write_message('ok')
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 8/8] hashserv: server: Support searching upstream for outhash
  2021-02-01 11:53 [RFC PATCH 0/8] hashserv read-only mode & upstream fixes Paul Barker
                   ` (6 preceding siblings ...)
  2021-02-01 11:53 ` [RFC PATCH 7/8] hashserv: Add get-outhash message Paul Barker
@ 2021-02-01 11:53 ` Paul Barker
  2021-02-01 14:06   ` Joshua Watt
  7 siblings, 1 reply; 14+ messages in thread
From: Paul Barker @ 2021-02-01 11:53 UTC (permalink / raw)
  To: bitbake-devel, Richard Purdie, Joshua Watt; +Cc: Paul Barker

Use the new get-outhash message to perform a read-only query against an
upstream server (if present) when a reported taskhash/outhash
combination is not found in the current database. If a matching entry is
found upstream it is copied into the current database so it can be found
by future queries.

Signed-off-by: Paul Barker <pbarker@konsulko.com>
---
 lib/hashserv/server.py | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/lib/hashserv/server.py b/lib/hashserv/server.py
index 054791b8b..5fb562b52 100644
--- a/lib/hashserv/server.py
+++ b/lib/hashserv/server.py
@@ -131,6 +131,20 @@ async def copy_from_upstream(client, db, method, taskhash):
         keys = sorted(d.keys())
 
 
+        with closing(db.cursor()) as cursor:
+            insert_task(cursor, d)
+            db.commit()
+
+    return d
+
+async def copy_outhash_from_upstream(client, db, method, outhash, taskhash):
+    d = await client.get_outhash(method, outhash, taskhash)
+    if d is not None:
+        # Filter out unknown columns
+        d = {k: v for k, v in d.items() if k in TABLE_COLUMNS}
+        keys = sorted(d.keys())
+
+
         with closing(db.cursor()) as cursor:
             insert_task(cursor, d)
             db.commit()
@@ -387,6 +401,14 @@ class ServerClient(object):
 
             row = cursor.fetchone()
 
+            if row is None:
+                # Try upstream
+                row = await copy_outhash_from_upstream(self.upstream_client,
+                                                       self.db,
+                                                       data['method'],
+                                                       data['outhash'],
+                                                       data['taskhash'])
+
             # If no matching outhash was found, or one *was* found but it
             # wasn't an exact match on the taskhash, a new entry for this
             # taskhash should be added
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 8/8] hashserv: server: Support searching upstream for outhash
  2021-02-01 11:53 ` [RFC PATCH 8/8] hashserv: server: Support searching upstream for outhash Paul Barker
@ 2021-02-01 14:06   ` Joshua Watt
  2021-02-02 10:48     ` Paul Barker
  0 siblings, 1 reply; 14+ messages in thread
From: Joshua Watt @ 2021-02-01 14:06 UTC (permalink / raw)
  To: Paul Barker, bitbake-devel, Richard Purdie


On 2/1/21 5:53 AM, Paul Barker wrote:
> Use the new get-outhash message to perform a read-only query against an
> upstream server (if present) when a reported taskhash/outhash
> combination is not found in the current database. If a matching entry is
> found upstream it is copied into the current database so it can be found
> by future queries.
>
> Signed-off-by: Paul Barker <pbarker@konsulko.com>
> ---
>   lib/hashserv/server.py | 22 ++++++++++++++++++++++
>   1 file changed, 22 insertions(+)
>
> diff --git a/lib/hashserv/server.py b/lib/hashserv/server.py
> index 054791b8b..5fb562b52 100644
> --- a/lib/hashserv/server.py
> +++ b/lib/hashserv/server.py
> @@ -131,6 +131,20 @@ async def copy_from_upstream(client, db, method, taskhash):
>           keys = sorted(d.keys())
>   
>   
> +        with closing(db.cursor()) as cursor:
> +            insert_task(cursor, d)
> +            db.commit()
> +
> +    return d
> +
> +async def copy_outhash_from_upstream(client, db, method, outhash, taskhash):
> +    d = await client.get_outhash(method, outhash, taskhash)
> +    if d is not None:
> +        # Filter out unknown columns
> +        d = {k: v for k, v in d.items() if k in TABLE_COLUMNS}
> +        keys = sorted(d.keys())
> +
> +
>           with closing(db.cursor()) as cursor:
>               insert_task(cursor, d)
>               db.commit()
> @@ -387,6 +401,14 @@ class ServerClient(object):
>   
>               row = cursor.fetchone()
>   
> +            if row is None:


if believe you need to add "and self.upstream_client is not None" here 
to validate that the there is an upstream server.


> +                # Try upstream
> +                row = await copy_outhash_from_upstream(self.upstream_client,
> +                                                       self.db,
> +                                                       data['method'],
> +                                                       data['outhash'],
> +                                                       data['taskhash'])
> +
>               # If no matching outhash was found, or one *was* found but it
>               # wasn't an exact match on the taskhash, a new entry for this
>               # taskhash should be added

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 7/8] hashserv: Add get-outhash message
  2021-02-01 11:53 ` [RFC PATCH 7/8] hashserv: Add get-outhash message Paul Barker
@ 2021-02-01 14:09   ` Joshua Watt
  2021-02-02 10:49     ` Paul Barker
  0 siblings, 1 reply; 14+ messages in thread
From: Joshua Watt @ 2021-02-01 14:09 UTC (permalink / raw)
  To: Paul Barker, bitbake-devel, Richard Purdie


On 2/1/21 5:53 AM, Paul Barker wrote:
> The get-outhash message can be sent via the get_outhash client method.
> This works in a similar way to the get message but looks up a db entry
> by outhash rather than by taskhash. It is intended to be used as a
> read-only form of the report message.
>
> Signed-off-by: Paul Barker <pbarker@konsulko.com>
> ---
>   lib/hashserv/client.py |  6 ++++++
>   lib/hashserv/server.py | 28 ++++++++++++++++++++++++++++
>   2 files changed, 34 insertions(+)
>
> diff --git a/lib/hashserv/client.py b/lib/hashserv/client.py
> index 0b7f4e42e..e05c1eb56 100644
> --- a/lib/hashserv/client.py
> +++ b/lib/hashserv/client.py
> @@ -170,6 +170,12 @@ class AsyncClient(object):
>               {"get": {"taskhash": taskhash, "method": method, "all": all_properties}}
>           )
>   
> +    async def get_outhash(self, method, outhash, taskhash):
> +        await self._set_mode(self.MODE_NORMAL)
> +        return await self.send_message(
> +            {"get-outhash": {"outhash": outhash, "taskhash": taskhash, "method": method}}
> +        )
> +
>       async def get_stats(self):
>           await self._set_mode(self.MODE_NORMAL)
>           return await self.send_message({"get-stats": None})
> diff --git a/lib/hashserv/server.py b/lib/hashserv/server.py
> index b1e8b2f89..054791b8b 100644
> --- a/lib/hashserv/server.py
> +++ b/lib/hashserv/server.py
> @@ -152,6 +152,7 @@ class ServerClient(object):
>   
>           self.handlers = {
>               'get': self.handle_get,
> +            'get-outhash': self.handle_get_outhash,
>               'get-stream': self.handle_get_stream,
>               'get-stats': self.handle_get_stats,
>               'chunk-stream': self.handle_chunk,
> @@ -289,6 +290,33 @@ class ServerClient(object):
>   
>           self.write_message(d)
>   
> +    async def handle_get_outhash(self, request):
> +        with closing(self.db.cursor()) as cursor:
> +            cursor.execute('''
> +                -- Find tasks with a matching outhash (that is, tasks that
> +                -- are equivalent)
> +                SELECT * FROM tasks_v2 WHERE method=:method AND outhash=:outhash
> +
> +                -- If there is an exact match on the taskhash, return it.
> +                -- Otherwise return the oldest matching outhash of any
> +                -- taskhash
> +                ORDER BY CASE WHEN taskhash=:taskhash THEN 1 ELSE 2 END,
> +                    created ASC
> +
> +                -- Only return one row
> +                LIMIT 1

Since this is the same query as handle_report, lets pre-define the query 
like we did for FAST_QUERY and ALL_QUERY so it's not dual maintainence.


> +                ''', {k: request[k] for k in ('method', 'outhash', 'taskhash')})
> +
> +            row = cursor.fetchone()
> +
> +        if row is not None:
> +            logger.debug(1, 'Found equivalent outhash %s -> %s', (row['outhash'], row['unihash']))
> +            d = {k: row[k] for k in row.keys()}
> +        else:
> +            d = None
> +
> +        self.write_message(d)
> +
>       async def handle_get_stream(self, request):
>           self.write_message('ok')
>   

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 8/8] hashserv: server: Support searching upstream for outhash
  2021-02-01 14:06   ` Joshua Watt
@ 2021-02-02 10:48     ` Paul Barker
  0 siblings, 0 replies; 14+ messages in thread
From: Paul Barker @ 2021-02-02 10:48 UTC (permalink / raw)
  To: Joshua Watt; +Cc: bitbake-devel, Richard Purdie

On Mon, 1 Feb 2021 08:06:39 -0600
Joshua Watt <jpewhacker@gmail.com> wrote:

> On 2/1/21 5:53 AM, Paul Barker wrote:
> > Use the new get-outhash message to perform a read-only query
> > against an upstream server (if present) when a reported
> > taskhash/outhash combination is not found in the current database.
> > If a matching entry is found upstream it is copied into the current
> > database so it can be found by future queries.
> >
> > Signed-off-by: Paul Barker <pbarker@konsulko.com>
> > ---
> >   lib/hashserv/server.py | 22 ++++++++++++++++++++++
> >   1 file changed, 22 insertions(+)
> >
> > diff --git a/lib/hashserv/server.py b/lib/hashserv/server.py
> > index 054791b8b..5fb562b52 100644
> > --- a/lib/hashserv/server.py
> > +++ b/lib/hashserv/server.py
> > @@ -131,6 +131,20 @@ async def copy_from_upstream(client, db,
> > method, taskhash): keys = sorted(d.keys())
> >   
> >   
> > +        with closing(db.cursor()) as cursor:
> > +            insert_task(cursor, d)
> > +            db.commit()
> > +
> > +    return d
> > +
> > +async def copy_outhash_from_upstream(client, db, method, outhash,
> > taskhash):
> > +    d = await client.get_outhash(method, outhash, taskhash)
> > +    if d is not None:
> > +        # Filter out unknown columns
> > +        d = {k: v for k, v in d.items() if k in TABLE_COLUMNS}
> > +        keys = sorted(d.keys())
> > +
> > +
> >           with closing(db.cursor()) as cursor:
> >               insert_task(cursor, d)
> >               db.commit()
> > @@ -387,6 +401,14 @@ class ServerClient(object):
> >   
> >               row = cursor.fetchone()
> >   
> > +            if row is None:  
> 
> 
> if believe you need to add "and self.upstream_client is not None"
> here to validate that the there is an upstream server.

You're right, that is needed. I'll include it in the next version of
the series.

> 
> 
> > +                # Try upstream
> > +                row = await
> > copy_outhash_from_upstream(self.upstream_client,
> > +                                                       self.db,
> > +
> > data['method'],
> > +
> > data['outhash'],
> > +
> > data['taskhash']) +
> >               # If no matching outhash was found, or one *was*
> > found but it # wasn't an exact match on the taskhash, a new entry
> > for this # taskhash should be added  



-- 
Paul Barker
Principal Software Engineer
Konsulko Group

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 7/8] hashserv: Add get-outhash message
  2021-02-01 14:09   ` Joshua Watt
@ 2021-02-02 10:49     ` Paul Barker
  0 siblings, 0 replies; 14+ messages in thread
From: Paul Barker @ 2021-02-02 10:49 UTC (permalink / raw)
  To: Joshua Watt; +Cc: bitbake-devel, Richard Purdie

On Mon, 1 Feb 2021 08:09:30 -0600
Joshua Watt <jpewhacker@gmail.com> wrote:

> On 2/1/21 5:53 AM, Paul Barker wrote:
> > The get-outhash message can be sent via the get_outhash client
> > method. This works in a similar way to the get message but looks up
> > a db entry by outhash rather than by taskhash. It is intended to be
> > used as a read-only form of the report message.
> >
> > Signed-off-by: Paul Barker <pbarker@konsulko.com>
> > ---
> >   lib/hashserv/client.py |  6 ++++++
> >   lib/hashserv/server.py | 28 ++++++++++++++++++++++++++++
> >   2 files changed, 34 insertions(+)
> >
> > diff --git a/lib/hashserv/client.py b/lib/hashserv/client.py
> > index 0b7f4e42e..e05c1eb56 100644
> > --- a/lib/hashserv/client.py
> > +++ b/lib/hashserv/client.py
> > @@ -170,6 +170,12 @@ class AsyncClient(object):
> >               {"get": {"taskhash": taskhash, "method": method,
> > "all": all_properties}} )
> >   
> > +    async def get_outhash(self, method, outhash, taskhash):
> > +        await self._set_mode(self.MODE_NORMAL)
> > +        return await self.send_message(
> > +            {"get-outhash": {"outhash": outhash, "taskhash":
> > taskhash, "method": method}}
> > +        )
> > +
> >       async def get_stats(self):
> >           await self._set_mode(self.MODE_NORMAL)
> >           return await self.send_message({"get-stats": None})
> > diff --git a/lib/hashserv/server.py b/lib/hashserv/server.py
> > index b1e8b2f89..054791b8b 100644
> > --- a/lib/hashserv/server.py
> > +++ b/lib/hashserv/server.py
> > @@ -152,6 +152,7 @@ class ServerClient(object):
> >   
> >           self.handlers = {
> >               'get': self.handle_get,
> > +            'get-outhash': self.handle_get_outhash,
> >               'get-stream': self.handle_get_stream,
> >               'get-stats': self.handle_get_stats,
> >               'chunk-stream': self.handle_chunk,
> > @@ -289,6 +290,33 @@ class ServerClient(object):
> >   
> >           self.write_message(d)
> >   
> > +    async def handle_get_outhash(self, request):
> > +        with closing(self.db.cursor()) as cursor:
> > +            cursor.execute('''
> > +                -- Find tasks with a matching outhash (that is,
> > tasks that
> > +                -- are equivalent)
> > +                SELECT * FROM tasks_v2 WHERE method=:method AND
> > outhash=:outhash +
> > +                -- If there is an exact match on the taskhash,
> > return it.
> > +                -- Otherwise return the oldest matching outhash of
> > any
> > +                -- taskhash
> > +                ORDER BY CASE WHEN taskhash=:taskhash THEN 1 ELSE
> > 2 END,
> > +                    created ASC
> > +
> > +                -- Only return one row
> > +                LIMIT 1  
> 
> Since this is the same query as handle_report, lets pre-define the
> query like we did for FAST_QUERY and ALL_QUERY so it's not dual
> maintainence.

Agreed. I'll do that in v2.

> 
> 
> > +                ''', {k: request[k] for k in ('method', 'outhash',
> > 'taskhash')}) +
> > +            row = cursor.fetchone()
> > +
> > +        if row is not None:
> > +            logger.debug(1, 'Found equivalent outhash %s -> %s',
> > (row['outhash'], row['unihash']))
> > +            d = {k: row[k] for k in row.keys()}
> > +        else:
> > +            d = None
> > +
> > +        self.write_message(d)
> > +
> >       async def handle_get_stream(self, request):
> >           self.write_message('ok')
> >     



-- 
Paul Barker
Principal Software Engineer
Konsulko Group

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC PATCH 0/8] hashserv read-only mode & upstream fixes
@ 2021-02-01 11:45 Paul Barker
  0 siblings, 0 replies; 14+ messages in thread
From: Paul Barker @ 2021-02-01 11:45 UTC (permalink / raw)
  To: bitbake-devel, Richard Purdie, Joshua Watt; +Cc: Paul Barker

* Implement a read-only mode for the hash equivalence server. This mode is
  useful when you wish to populate a hash equivalence database from CI or other
  internal builds (via a read-write server instance) and also allow external
  clients to query this database (via a read-only server instance). External
  clients can therefore be prevented from adding hash equivalences to the
  server which correspond to sstate artifacts which would be missing in a
  primary sstate cache. This mode is enabled using the -r/--read-only
  argument to bitbake-hashserv.

* Expose the existing upstream server support via a -u/--upstream argument to
  bitbake-hashserv.

* Support querying an upstream server using the new `get-outhash` message when
  the server handles a `report` message from a client and a match is not found
  in the server's own database. This is important as the `report` message is
  used by bitbake when a task finishes executing to check if the task outhash
  matches the outhash for any previous execution. With this support such
  matches can now be found in a local (read-write) hash equivalence db as
  well as in an upstream (potentially read-only and/or remote) db.

* Other minor hashserv fixes.

These changes have been tested locally using the following setup:

1) Build core-image-base with BB_HASHSERVE = "auto". Additional logging was
   also enabled following the instructions in 
   https://docs.yoctoproject.org/bitbake/bitbake-user-manual/bitbake-user-manual-execution.html#logging.

2) Move the hashserv.db file into a new 'upstream-hashserv' directory. Start
   the upstream server in read-only mode using the following command in that
   directory:

    bitbake-hashserv -r -l DEBUG

3) Create an empty 'downstream-hashserv' directory. Start the downstream
   (local) server with an empty db using the following command in that
   directory:

    bitbake-hashserv -u "unix://../upstream-hashserv/hashserve.sock" -l DEBUG

4) Modify local.conf to set 
   BB_HASHSERVE = "unix://${TOPDIR}/downstream-hashserv/hashserve.sock".

5) Add an 'echo hello' command to do_configure for glibc to force a rebuild
   which should result in a matching hash equivalence.

6) Build core-image-base again, confirm that glibc is rebuilt but then a hash
   equivalence is found (copied from the upstream server into the downstream
   server) and dependent tasks are pulled from the sstate cache instead of
   being rebuilt.

This is an RFC series as it still needs documentation to be written and
selftest cases to be added. However it'd be great to get some feedback at
this stage before moving on to that work.

These changes can also be pulled from:

  https://gitlab.com/pbarker.dev/staging/bitbake.git
  tag: hashserv_2020-02-01

Let me know if you have any questions/feedback :)

Paul Barker (8):
  bitbake-hashclient: Remove obsolete call to client.connect
  hashserv: client: Fix handling of null responses
  hashserv: server: Fix logger.debug calls
  hashserv: Support read-only server
  hashserv: Support upstream command line argument
  hashserv: Add short forms of remaining command line arguments
  hashserv: Add get-outhash message
  hashserv: server: Support searching upstream for outhash

 bin/bitbake-hashclient   |  3 --
 bin/bitbake-hashserv     | 10 +++--
 lib/hashserv/__init__.py |  4 +-
 lib/hashserv/client.py   |  8 +++-
 lib/hashserv/server.py   | 85 ++++++++++++++++++++++++++++++++++------
 5 files changed, 88 insertions(+), 22 deletions(-)

-- 
2.26.2


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-02-02 10:49 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-01 11:53 [RFC PATCH 0/8] hashserv read-only mode & upstream fixes Paul Barker
2021-02-01 11:53 ` [RFC PATCH 1/8] bitbake-hashclient: Remove obsolete call to client.connect Paul Barker
2021-02-01 11:53 ` [RFC PATCH 2/8] hashserv: client: Fix handling of null responses Paul Barker
2021-02-01 11:53 ` [RFC PATCH 3/8] hashserv: server: Fix logger.debug calls Paul Barker
2021-02-01 11:53 ` [RFC PATCH 4/8] hashserv: Support read-only server Paul Barker
2021-02-01 11:53 ` [RFC PATCH 5/8] hashserv: Support upstream command line argument Paul Barker
2021-02-01 11:53 ` [RFC PATCH 6/8] hashserv: Add short forms of remaining command line arguments Paul Barker
2021-02-01 11:53 ` [RFC PATCH 7/8] hashserv: Add get-outhash message Paul Barker
2021-02-01 14:09   ` Joshua Watt
2021-02-02 10:49     ` Paul Barker
2021-02-01 11:53 ` [RFC PATCH 8/8] hashserv: server: Support searching upstream for outhash Paul Barker
2021-02-01 14:06   ` Joshua Watt
2021-02-02 10:48     ` Paul Barker
  -- strict thread matches above, loose matches on Subject: below --
2021-02-01 11:45 [RFC PATCH 0/8] hashserv read-only mode & upstream fixes Paul Barker

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.