All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 0/9] Hash Equivalency Server
@ 2018-07-16 20:37 Joshua Watt
  2018-07-16 20:37 ` [RFC 1/9] bitbake-worker: Pass taskhash as runtask parameter Joshua Watt
                   ` (9 more replies)
  0 siblings, 10 replies; 158+ messages in thread
From: Joshua Watt @ 2018-07-16 20:37 UTC (permalink / raw)
  To: bitbake-devel, openembedded-core

These patches are a first pass at implementing a hash equivalence server
in bitbake & OE.

Apologies for cross-posting this to both the bitbake-devel and
openembedded-devel; this work necessarily intertwines both places, and
it is really necessary to look at both parts to get an idea of what is
going on. For convenience, the bitbake patches are listed first,
followed by the oe-core patches.

The basic premise is that any given task no longer hashes a dependent
task's taskhash to determine it's own taskhash, but instead hashes the
dependent task's "dependency ID" (which doesn't strictly need to be a
hash, but is for consistency. We can have the discussion as to whether
this should be called a "dependency hash" if anyone wants). This allows
multiple taskhashes to map to the same dependency ID, meaning that
trivial changes to a recipe that would change the taskhash don't
necessarily need to change the dependency ID, and thus don't need to
cause downstream tasks to be rebuilt (with caveats, see below).

In the absence of any interaction by the user, the dependency ID for a
task is just that task's taskhash, which effectively maintains the
current behavior. However, if the user enables the "OEEquivHash"
signature generator, they can direct it to look at a hash equivalency
server (of which a reference implementation is provided). The sstate
code will provide the server with an output hash that it calculates, and
the server will record all tasks with the same output hash as
"equivalent" and report the same dependency ID for them when requested.
When initializing tasks, bitbake can ask the server about the dependency
ID for new tasks it has never seen before and potentially skip
rebuilding, or restore the task from an equivalent sstate file. To
facilitate restoring tasks from sstate, sstate objects are now named
based on the tasks dependency ID instead of the taskhash (which, again
has no effect if the server is in use).

This patchset doesn't make any attempt to dynamically update task
dependency IDs after bitbake initializes the tasks, and as such there
are some cases where this isn't accelerating the build as much as it
possibly could. I think it will be possible to add support for this, but
this preliminary support needs to come first.

Some patches have additional NOTEs that indicate places where I wasn't
sure what to do.

You can also see these patches (and my first attempts at dynamic task
re-hashing) on the "jpew/hash-equivalence" branch in poky-contrib.

As always, thanks for your feedback and time

Joshua Watt (9):
  bitbake-worker: Pass taskhash as runtask parameter
  siggen: Split out stampfile hash fetch
  siggen: Split out task depend ID
  runqueue: Track task dependency ID
  runqueue: Pass dependency ID to task
  runqueue: Pass dependency ID to hash validate
  classes/sstate: Handle depid in hash check
  hashserver: Add initial reference server
  sstate: Implement hash equivalence sstate

 bitbake/bin/bitbake-worker            |   9 +-
 bitbake/contrib/hashserver/.gitignore |   2 +
 bitbake/contrib/hashserver/Pipfile    |  15 ++
 bitbake/contrib/hashserver/app.py     | 212 ++++++++++++++++++++++++++
 bitbake/lib/bb/runqueue.py            |  56 ++++---
 bitbake/lib/bb/siggen.py              |  20 ++-
 meta/classes/sstate.bbclass           | 102 +++++++++++--
 meta/conf/bitbake.conf                |   4 +-
 meta/lib/oe/sstatesig.py              | 166 ++++++++++++++++++++
 9 files changed, 544 insertions(+), 42 deletions(-)
 create mode 100644 bitbake/contrib/hashserver/.gitignore
 create mode 100644 bitbake/contrib/hashserver/Pipfile
 create mode 100755 bitbake/contrib/hashserver/app.py

--
2.17.1



^ permalink raw reply	[flat|nested] 158+ messages in thread

* [RFC 1/9] bitbake-worker: Pass taskhash as runtask parameter
  2018-07-16 20:37 [RFC 0/9] Hash Equivalency Server Joshua Watt
@ 2018-07-16 20:37 ` Joshua Watt
  2018-07-16 20:37 ` [RFC 2/9] siggen: Split out stampfile hash fetch Joshua Watt
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-07-16 20:37 UTC (permalink / raw)
  To: bitbake-devel, openembedded-core

Pass the task hash as a parameter to the 'runtask' message instead of
passing the entire dictionary of hashes when the worker is setup. This
is possible less efficient, but prevents the worker taskhashes from
being out of sync with the runqueue in the event that the taskhashes in
the runqueue change.

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/bin/bitbake-worker |  8 ++++----
 bitbake/lib/bb/runqueue.py | 15 ++++++---------
 2 files changed, 10 insertions(+), 13 deletions(-)

diff --git a/bitbake/bin/bitbake-worker b/bitbake/bin/bitbake-worker
index e925054b7f9..cd687e6e433 100755
--- a/bitbake/bin/bitbake-worker
+++ b/bitbake/bin/bitbake-worker
@@ -136,7 +136,7 @@ def sigterm_handler(signum, frame):
     os.killpg(0, signal.SIGTERM)
     sys.exit()
 
-def fork_off_task(cfg, data, databuilder, workerdata, fn, task, taskname, appends, taskdepdata, extraconfigdata, quieterrors=False, dry_run_exec=False):
+def fork_off_task(cfg, data, databuilder, workerdata, fn, task, taskname, taskhash, appends, taskdepdata, extraconfigdata, quieterrors=False, dry_run_exec=False):
     # We need to setup the environment BEFORE the fork, since
     # a fork() or exec*() activates PSEUDO...
 
@@ -234,7 +234,7 @@ def fork_off_task(cfg, data, databuilder, workerdata, fn, task, taskname, append
                 ret = 0
 
                 the_data = bb_cache.loadDataFull(fn, appends)
-                the_data.setVar('BB_TASKHASH', workerdata["runq_hash"][task])
+                the_data.setVar('BB_TASKHASH', taskhash)
 
                 bb.utils.set_process_name("%s:%s" % (the_data.getVar("PN"), taskname.replace("do_", "")))
 
@@ -425,10 +425,10 @@ class BitbakeWorker(object):
         sys.exit(0)
 
     def handle_runtask(self, data):
-        fn, task, taskname, quieterrors, appends, taskdepdata, dry_run_exec = pickle.loads(data)
+        fn, task, taskname, taskhash, quieterrors, appends, taskdepdata, dry_run_exec = pickle.loads(data)
         workerlog_write("Handling runtask %s %s %s\n" % (task, fn, taskname))
 
-        pid, pipein, pipeout = fork_off_task(self.cookercfg, self.data, self.databuilder, self.workerdata, fn, task, taskname, appends, taskdepdata, self.extraconfigdata, quieterrors, dry_run_exec)
+        pid, pipein, pipeout = fork_off_task(self.cookercfg, self.data, self.databuilder, self.workerdata, fn, task, taskname, taskhash, appends, taskdepdata, self.extraconfigdata, quieterrors, dry_run_exec)
 
         self.build_pids[pid] = task
         self.build_pipes[pid] = runQueueWorkerPipe(pipein, pipeout)
diff --git a/bitbake/lib/bb/runqueue.py b/bitbake/lib/bb/runqueue.py
index ba9bebebcfe..f655614d1ce 100644
--- a/bitbake/lib/bb/runqueue.py
+++ b/bitbake/lib/bb/runqueue.py
@@ -1207,17 +1207,12 @@ class RunQueue:
         bb.utils.nonblockingfd(worker.stdout)
         workerpipe = runQueuePipe(worker.stdout, None, self.cfgData, self, rqexec)
 
-        runqhash = {}
-        for tid in self.rqdata.runtaskentries:
-            runqhash[tid] = self.rqdata.runtaskentries[tid].hash
-
         workerdata = {
             "taskdeps" : self.rqdata.dataCaches[mc].task_deps,
             "fakerootenv" : self.rqdata.dataCaches[mc].fakerootenv,
             "fakerootdirs" : self.rqdata.dataCaches[mc].fakerootdirs,
             "fakerootnoenv" : self.rqdata.dataCaches[mc].fakerootnoenv,
             "sigdata" : bb.parse.siggen.get_taskdata(),
-            "runq_hash" : runqhash,
             "logdefaultdebug" : bb.msg.loggerDefaultDebugLevel,
             "logdefaultverbose" : bb.msg.loggerDefaultVerbose,
             "logdefaultverboselogs" : bb.msg.loggerVerboseLogs,
@@ -2008,6 +2003,7 @@ class RunQueueExecuteTasks(RunQueueExecute):
             taskdepdata = self.build_taskdepdata(task)
 
             taskdep = self.rqdata.dataCaches[mc].task_deps[taskfn]
+            taskhash = self.rqdata.get_task_hash(task)
             if 'fakeroot' in taskdep and taskname in taskdep['fakeroot'] and not (self.cooker.configuration.dry_run or self.rqdata.setscene_enforce):
                 if not mc in self.rq.fakeworker:
                     try:
@@ -2017,10 +2013,10 @@ class RunQueueExecuteTasks(RunQueueExecute):
                         self.rq.state = runQueueFailed
                         self.stats.taskFailed()
                         return True
-                self.rq.fakeworker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, False, self.cooker.collection.get_file_appends(taskfn), taskdepdata, self.rqdata.setscene_enforce)) + b"</runtask>")
+                self.rq.fakeworker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, False, self.cooker.collection.get_file_appends(taskfn), taskdepdata, self.rqdata.setscene_enforce)) + b"</runtask>")
                 self.rq.fakeworker[mc].process.stdin.flush()
             else:
-                self.rq.worker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, False, self.cooker.collection.get_file_appends(taskfn), taskdepdata, self.rqdata.setscene_enforce)) + b"</runtask>")
+                self.rq.worker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, False, self.cooker.collection.get_file_appends(taskfn), taskdepdata, self.rqdata.setscene_enforce)) + b"</runtask>")
                 self.rq.worker[mc].process.stdin.flush()
 
             self.build_stamps[task] = bb.build.stampfile(taskname, self.rqdata.dataCaches[mc], taskfn, noextra=True)
@@ -2428,13 +2424,14 @@ class RunQueueExecuteScenequeue(RunQueueExecute):
             taskdepdata = self.build_taskdepdata(task)
 
             taskdep = self.rqdata.dataCaches[mc].task_deps[taskfn]
+            taskhash = self.rqdata.get_task_hash(task)
             if 'fakeroot' in taskdep and taskname in taskdep['fakeroot'] and not self.cooker.configuration.dry_run:
                 if not mc in self.rq.fakeworker:
                     self.rq.start_fakeworker(self, mc)
-                self.rq.fakeworker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, True, self.cooker.collection.get_file_appends(taskfn), taskdepdata, False)) + b"</runtask>")
+                self.rq.fakeworker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, True, self.cooker.collection.get_file_appends(taskfn), taskdepdata, False)) + b"</runtask>")
                 self.rq.fakeworker[mc].process.stdin.flush()
             else:
-                self.rq.worker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, True, self.cooker.collection.get_file_appends(taskfn), taskdepdata, False)) + b"</runtask>")
+                self.rq.worker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, True, self.cooker.collection.get_file_appends(taskfn), taskdepdata, False)) + b"</runtask>")
                 self.rq.worker[mc].process.stdin.flush()
 
             self.build_stamps[task] = bb.build.stampfile(taskname, self.rqdata.dataCaches[mc], taskfn, noextra=True)
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [RFC 2/9] siggen: Split out stampfile hash fetch
  2018-07-16 20:37 [RFC 0/9] Hash Equivalency Server Joshua Watt
  2018-07-16 20:37 ` [RFC 1/9] bitbake-worker: Pass taskhash as runtask parameter Joshua Watt
@ 2018-07-16 20:37 ` Joshua Watt
  2018-07-16 20:37 ` [RFC 3/9] siggen: Split out task depend ID Joshua Watt
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-07-16 20:37 UTC (permalink / raw)
  To: bitbake-devel, openembedded-core

The mechanism used to get the hash for a stamp file is split out so that
it can be overridden by derived classes

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/lib/bb/siggen.py | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/bitbake/lib/bb/siggen.py b/bitbake/lib/bb/siggen.py
index ab228e4148e..d2dfcbc3fdb 100644
--- a/bitbake/lib/bb/siggen.py
+++ b/bitbake/lib/bb/siggen.py
@@ -331,6 +331,13 @@ class SignatureGeneratorBasic(SignatureGenerator):
 class SignatureGeneratorBasicHash(SignatureGeneratorBasic):
     name = "basichash"
 
+    def get_stampfile_hash(self, task):
+        if task in self.taskhash:
+            return self.taskhash[task]
+
+        # If task is not in basehash, then error
+        return self.basehash[task]
+
     def stampfile(self, stampbase, fn, taskname, extrainfo, clean=False):
         if taskname != "do_setscene" and taskname.endswith("_setscene"):
             k = fn + "." + taskname[:-9]
@@ -338,11 +345,9 @@ class SignatureGeneratorBasicHash(SignatureGeneratorBasic):
             k = fn + "." + taskname
         if clean:
             h = "*"
-        elif k in self.taskhash:
-            h = self.taskhash[k]
         else:
-            # If k is not in basehash, then error
-            h = self.basehash[k]
+            h = self.get_stampfile_hash(k)
+
         return ("%s.%s.%s.%s" % (stampbase, taskname, h, extrainfo)).rstrip('.')
 
     def stampcleanmask(self, stampbase, fn, taskname, extrainfo):
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [RFC 3/9] siggen: Split out task depend ID
  2018-07-16 20:37 [RFC 0/9] Hash Equivalency Server Joshua Watt
  2018-07-16 20:37 ` [RFC 1/9] bitbake-worker: Pass taskhash as runtask parameter Joshua Watt
  2018-07-16 20:37 ` [RFC 2/9] siggen: Split out stampfile hash fetch Joshua Watt
@ 2018-07-16 20:37 ` Joshua Watt
  2018-07-16 20:37 ` [RFC 4/9] runqueue: Track task dependency ID Joshua Watt
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-07-16 20:37 UTC (permalink / raw)
  To: bitbake-devel, openembedded-core

Abstracts the function to get the dependency ID for a task so it can
return something other that the taskhash

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/lib/bb/siggen.py | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/bitbake/lib/bb/siggen.py b/bitbake/lib/bb/siggen.py
index d2dfcbc3fdb..0b1393e21d5 100644
--- a/bitbake/lib/bb/siggen.py
+++ b/bitbake/lib/bb/siggen.py
@@ -41,6 +41,9 @@ class SignatureGenerator(object):
     def finalise(self, fn, d, varient):
         return
 
+    def get_depid(self, task):
+        return self.taskhash[task]
+
     def get_taskhash(self, fn, task, deps, dataCache):
         return "0"
 
@@ -206,7 +209,7 @@ class SignatureGeneratorBasic(SignatureGenerator):
                 continue
             if dep not in self.taskhash:
                 bb.fatal("%s is not in taskhash, caller isn't calling in dependency order?", dep)
-            data = data + self.taskhash[dep]
+            data = data + self.get_depid(dep)
             self.runtaskdeps[k].append(dep)
 
         if task in dataCache.file_checksums[fn]:
@@ -281,7 +284,7 @@ class SignatureGeneratorBasic(SignatureGenerator):
             data['file_checksum_values'] = [(os.path.basename(f), cs) for f,cs in self.file_checksum_values[k]]
             data['runtaskhashes'] = {}
             for dep in data['runtaskdeps']:
-                data['runtaskhashes'][dep] = self.taskhash[dep]
+                data['runtaskhashes'][dep] = self.get_depid(dep)
             data['taskhash'] = self.taskhash[k]
 
         taint = self.read_taint(fn, task, referencestamp)
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [RFC 4/9] runqueue: Track task dependency ID
  2018-07-16 20:37 [RFC 0/9] Hash Equivalency Server Joshua Watt
                   ` (2 preceding siblings ...)
  2018-07-16 20:37 ` [RFC 3/9] siggen: Split out task depend ID Joshua Watt
@ 2018-07-16 20:37 ` Joshua Watt
  2018-07-16 20:37 ` [RFC 5/9] runqueue: Pass dependency ID to task Joshua Watt
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-07-16 20:37 UTC (permalink / raw)
  To: bitbake-devel, openembedded-core

Requests the task dependency ID from siggen and tracks it

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/lib/bb/runqueue.py | 25 +++++++++++++++++--------
 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/bitbake/lib/bb/runqueue.py b/bitbake/lib/bb/runqueue.py
index f655614d1ce..97cc8a948af 100644
--- a/bitbake/lib/bb/runqueue.py
+++ b/bitbake/lib/bb/runqueue.py
@@ -346,6 +346,7 @@ class RunTaskEntry(object):
         self.depends = set()
         self.revdeps = set()
         self.hash = None
+        self.depid = None
         self.task = None
         self.weight = 1
 
@@ -385,6 +386,9 @@ class RunQueueData:
     def get_task_hash(self, tid):
         return self.runtaskentries[tid].hash
 
+    def get_task_depid(self, tid):
+        return self.runtaskentries[tid].depid
+
     def get_user_idstring(self, tid, task_name_suffix = ""):
         return tid + task_name_suffix
 
@@ -1133,18 +1137,21 @@ class RunQueueData:
                 if len(self.runtaskentries[tid].depends - dealtwith) == 0:
                     dealtwith.add(tid)
                     todeal.remove(tid)
-                    procdep = []
-                    for dep in self.runtaskentries[tid].depends:
-                        procdep.append(fn_from_tid(dep) + "." + taskname_from_tid(dep))
-                    (mc, fn, taskname, taskfn) = split_tid_mcfn(tid)
-                    self.runtaskentries[tid].hash = bb.parse.siggen.get_taskhash(taskfn, taskname, procdep, self.dataCaches[mc])
-                    task = self.runtaskentries[tid].task
+                    self.prepare_task_hash(tid)
 
         bb.parse.siggen.writeout_file_checksum_cache()
 
         #self.dump_data()
         return len(self.runtaskentries)
 
+    def prepare_task_hash(self, tid):
+        procdep = []
+        for dep in self.runtaskentries[tid].depends:
+            procdep.append(fn_from_tid(dep) + "." + taskname_from_tid(dep))
+        (mc, fn, taskname, taskfn) = split_tid_mcfn(tid)
+        self.runtaskentries[tid].hash = bb.parse.siggen.get_taskhash(taskfn, taskname, procdep, self.dataCaches[mc])
+        self.runtaskentries[tid].depid = bb.parse.siggen.get_depid(fn + "." + taskname)
+
     def dump_data(self):
         """
         Dump some debug information on the internal data structures
@@ -2058,7 +2065,8 @@ class RunQueueExecuteTasks(RunQueueExecute):
                 deps = self.rqdata.runtaskentries[revdep].depends
                 provides = self.rqdata.dataCaches[mc].fn_provides[taskfn]
                 taskhash = self.rqdata.runtaskentries[revdep].hash
-                taskdepdata[revdep] = [pn, taskname, fn, deps, provides, taskhash]
+                depid = self.rqdata.runtaskentries[revdep].depid
+                taskdepdata[revdep] = [pn, taskname, fn, deps, provides, taskhash, depid]
                 for revdep2 in deps:
                     if revdep2 not in taskdepdata:
                         additional.append(revdep2)
@@ -2495,7 +2503,8 @@ class RunQueueExecuteScenequeue(RunQueueExecute):
                 deps = getsetscenedeps(revdep)
                 provides = self.rqdata.dataCaches[mc].fn_provides[taskfn]
                 taskhash = self.rqdata.runtaskentries[revdep].hash
-                taskdepdata[revdep] = [pn, taskname, fn, deps, provides, taskhash]
+                depid = self.rqdata.runtaskentries[revdep].depid
+                taskdepdata[revdep] = [pn, taskname, fn, deps, provides, taskhash, depid]
                 for revdep2 in deps:
                     if revdep2 not in taskdepdata:
                         additional.append(revdep2)
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [RFC 5/9] runqueue: Pass dependency ID to task
  2018-07-16 20:37 [RFC 0/9] Hash Equivalency Server Joshua Watt
                   ` (3 preceding siblings ...)
  2018-07-16 20:37 ` [RFC 4/9] runqueue: Track task dependency ID Joshua Watt
@ 2018-07-16 20:37 ` Joshua Watt
  2018-07-16 20:37 ` [RFC 6/9] runqueue: Pass dependency ID to hash validate Joshua Watt
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-07-16 20:37 UTC (permalink / raw)
  To: bitbake-devel, openembedded-core

The dependency ID is now passed to the task in the BB_DEPID variable

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/bin/bitbake-worker |  7 ++++---
 bitbake/lib/bb/runqueue.py | 10 ++++++----
 2 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/bitbake/bin/bitbake-worker b/bitbake/bin/bitbake-worker
index cd687e6e433..37cbcdb369a 100755
--- a/bitbake/bin/bitbake-worker
+++ b/bitbake/bin/bitbake-worker
@@ -136,7 +136,7 @@ def sigterm_handler(signum, frame):
     os.killpg(0, signal.SIGTERM)
     sys.exit()
 
-def fork_off_task(cfg, data, databuilder, workerdata, fn, task, taskname, taskhash, appends, taskdepdata, extraconfigdata, quieterrors=False, dry_run_exec=False):
+def fork_off_task(cfg, data, databuilder, workerdata, fn, task, taskname, taskhash, depid, appends, taskdepdata, extraconfigdata, quieterrors=False, dry_run_exec=False):
     # We need to setup the environment BEFORE the fork, since
     # a fork() or exec*() activates PSEUDO...
 
@@ -235,6 +235,7 @@ def fork_off_task(cfg, data, databuilder, workerdata, fn, task, taskname, taskha
 
                 the_data = bb_cache.loadDataFull(fn, appends)
                 the_data.setVar('BB_TASKHASH', taskhash)
+                the_data.setVar('BB_DEPID', depid)
 
                 bb.utils.set_process_name("%s:%s" % (the_data.getVar("PN"), taskname.replace("do_", "")))
 
@@ -425,10 +426,10 @@ class BitbakeWorker(object):
         sys.exit(0)
 
     def handle_runtask(self, data):
-        fn, task, taskname, taskhash, quieterrors, appends, taskdepdata, dry_run_exec = pickle.loads(data)
+        fn, task, taskname, taskhash, depid, quieterrors, appends, taskdepdata, dry_run_exec = pickle.loads(data)
         workerlog_write("Handling runtask %s %s %s\n" % (task, fn, taskname))
 
-        pid, pipein, pipeout = fork_off_task(self.cookercfg, self.data, self.databuilder, self.workerdata, fn, task, taskname, taskhash, appends, taskdepdata, self.extraconfigdata, quieterrors, dry_run_exec)
+        pid, pipein, pipeout = fork_off_task(self.cookercfg, self.data, self.databuilder, self.workerdata, fn, task, taskname, taskhash, depid, appends, taskdepdata, self.extraconfigdata, quieterrors, dry_run_exec)
 
         self.build_pids[pid] = task
         self.build_pipes[pid] = runQueueWorkerPipe(pipein, pipeout)
diff --git a/bitbake/lib/bb/runqueue.py b/bitbake/lib/bb/runqueue.py
index 97cc8a948af..5922431bbe1 100644
--- a/bitbake/lib/bb/runqueue.py
+++ b/bitbake/lib/bb/runqueue.py
@@ -2011,6 +2011,7 @@ class RunQueueExecuteTasks(RunQueueExecute):
 
             taskdep = self.rqdata.dataCaches[mc].task_deps[taskfn]
             taskhash = self.rqdata.get_task_hash(task)
+            depid = self.rqdata.get_task_depid(task)
             if 'fakeroot' in taskdep and taskname in taskdep['fakeroot'] and not (self.cooker.configuration.dry_run or self.rqdata.setscene_enforce):
                 if not mc in self.rq.fakeworker:
                     try:
@@ -2020,10 +2021,10 @@ class RunQueueExecuteTasks(RunQueueExecute):
                         self.rq.state = runQueueFailed
                         self.stats.taskFailed()
                         return True
-                self.rq.fakeworker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, False, self.cooker.collection.get_file_appends(taskfn), taskdepdata, self.rqdata.setscene_enforce)) + b"</runtask>")
+                self.rq.fakeworker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, depid, False, self.cooker.collection.get_file_appends(taskfn), taskdepdata, self.rqdata.setscene_enforce)) + b"</runtask>")
                 self.rq.fakeworker[mc].process.stdin.flush()
             else:
-                self.rq.worker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, False, self.cooker.collection.get_file_appends(taskfn), taskdepdata, self.rqdata.setscene_enforce)) + b"</runtask>")
+                self.rq.worker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, depid, False, self.cooker.collection.get_file_appends(taskfn), taskdepdata, self.rqdata.setscene_enforce)) + b"</runtask>")
                 self.rq.worker[mc].process.stdin.flush()
 
             self.build_stamps[task] = bb.build.stampfile(taskname, self.rqdata.dataCaches[mc], taskfn, noextra=True)
@@ -2433,13 +2434,14 @@ class RunQueueExecuteScenequeue(RunQueueExecute):
 
             taskdep = self.rqdata.dataCaches[mc].task_deps[taskfn]
             taskhash = self.rqdata.get_task_hash(task)
+            depid = self.rqdata.get_task_depid(task)
             if 'fakeroot' in taskdep and taskname in taskdep['fakeroot'] and not self.cooker.configuration.dry_run:
                 if not mc in self.rq.fakeworker:
                     self.rq.start_fakeworker(self, mc)
-                self.rq.fakeworker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, True, self.cooker.collection.get_file_appends(taskfn), taskdepdata, False)) + b"</runtask>")
+                self.rq.fakeworker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, depid, True, self.cooker.collection.get_file_appends(taskfn), taskdepdata, False)) + b"</runtask>")
                 self.rq.fakeworker[mc].process.stdin.flush()
             else:
-                self.rq.worker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, True, self.cooker.collection.get_file_appends(taskfn), taskdepdata, False)) + b"</runtask>")
+                self.rq.worker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, depid, True, self.cooker.collection.get_file_appends(taskfn), taskdepdata, False)) + b"</runtask>")
                 self.rq.worker[mc].process.stdin.flush()
 
             self.build_stamps[task] = bb.build.stampfile(taskname, self.rqdata.dataCaches[mc], taskfn, noextra=True)
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [RFC 6/9] runqueue: Pass dependency ID to hash validate
  2018-07-16 20:37 [RFC 0/9] Hash Equivalency Server Joshua Watt
                   ` (4 preceding siblings ...)
  2018-07-16 20:37 ` [RFC 5/9] runqueue: Pass dependency ID to task Joshua Watt
@ 2018-07-16 20:37 ` Joshua Watt
  2018-07-16 20:37 ` [RFC 7/9] classes/sstate: Handle depid in hash check Joshua Watt
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-07-16 20:37 UTC (permalink / raw)
  To: bitbake-devel, openembedded-core

If the dependency ID is being used to track task dependencies, the hash
validation function needs to know about it in order to properly validate
the hash.

TODO: This currently isn't going to be backward compatible with older
hashvalidate functions. Is that necessary, and if so are there any
suggestions for a good approach?

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/lib/bb/runqueue.py | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/bitbake/lib/bb/runqueue.py b/bitbake/lib/bb/runqueue.py
index 5922431bbe1..82dce426bd1 100644
--- a/bitbake/lib/bb/runqueue.py
+++ b/bitbake/lib/bb/runqueue.py
@@ -1531,6 +1531,7 @@ class RunQueue:
         valid = []
         sq_hash = []
         sq_hashfn = []
+        sq_depid = []
         sq_fn = []
         sq_taskname = []
         sq_task = []
@@ -1549,15 +1550,16 @@ class RunQueue:
             sq_fn.append(fn)
             sq_hashfn.append(self.rqdata.dataCaches[mc].hashfn[taskfn])
             sq_hash.append(self.rqdata.runtaskentries[tid].hash)
+            sq_depid.append(self.rqdata.runtaskentries[tid].depid)
             sq_taskname.append(taskname)
             sq_task.append(tid)
-        locs = { "sq_fn" : sq_fn, "sq_task" : sq_taskname, "sq_hash" : sq_hash, "sq_hashfn" : sq_hashfn, "d" : self.cooker.data }
+        locs = { "sq_fn" : sq_fn, "sq_task" : sq_taskname, "sq_hash" : sq_hash, "sq_hashfn" : sq_hashfn, "sq_depid" : sq_depid, "d" : self.cooker.data }
         try:
-            call = self.hashvalidate + "(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=True)"
+            call = self.hashvalidate + "(sq_fn, sq_task, sq_hash, sq_hashfn, sq_depid, d, siginfo=True)"
             valid = bb.utils.better_eval(call, locs)
         # Handle version with no siginfo parameter
         except TypeError:
-            call = self.hashvalidate + "(sq_fn, sq_task, sq_hash, sq_hashfn, d)"
+            call = self.hashvalidate + "(sq_fn, sq_task, sq_hash, sq_hashfn, sq_depid, d)"
             valid = bb.utils.better_eval(call, locs)
         for v in valid:
             valid_new.add(sq_task[v])
@@ -2269,6 +2271,7 @@ class RunQueueExecuteScenequeue(RunQueueExecute):
         if self.rq.hashvalidate:
             sq_hash = []
             sq_hashfn = []
+            sq_depid = []
             sq_fn = []
             sq_taskname = []
             sq_task = []
@@ -2300,10 +2303,11 @@ class RunQueueExecuteScenequeue(RunQueueExecute):
                 sq_fn.append(fn)
                 sq_hashfn.append(self.rqdata.dataCaches[mc].hashfn[taskfn])
                 sq_hash.append(self.rqdata.runtaskentries[tid].hash)
+                sq_depid.append(self.rqdata.runtaskentries[tid].depid)
                 sq_taskname.append(taskname)
                 sq_task.append(tid)
-            call = self.rq.hashvalidate + "(sq_fn, sq_task, sq_hash, sq_hashfn, d)"
-            locs = { "sq_fn" : sq_fn, "sq_task" : sq_taskname, "sq_hash" : sq_hash, "sq_hashfn" : sq_hashfn, "d" : self.cooker.data }
+            call = self.rq.hashvalidate + "(sq_fn, sq_task, sq_hash, sq_hashfn, sq_depid, d)"
+            locs = { "sq_fn" : sq_fn, "sq_task" : sq_taskname, "sq_hash" : sq_hash, "sq_hashfn" : sq_hashfn, "sq_depid": sq_depid, "d" : self.cooker.data }
             valid = bb.utils.better_eval(call, locs)
 
             valid_new = stamppresent
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [RFC 7/9] classes/sstate: Handle depid in hash check
  2018-07-16 20:37 [RFC 0/9] Hash Equivalency Server Joshua Watt
                   ` (5 preceding siblings ...)
  2018-07-16 20:37 ` [RFC 6/9] runqueue: Pass dependency ID to hash validate Joshua Watt
@ 2018-07-16 20:37 ` Joshua Watt
  2018-07-16 20:37 ` [RFC 8/9] hashserver: Add initial reference server Joshua Watt
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-07-16 20:37 UTC (permalink / raw)
  To: bitbake-devel, openembedded-core

Handles the argument that passes task dependency IDs in the hash check
function, as it is now required by bitbake

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 meta/classes/sstate.bbclass | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/meta/classes/sstate.bbclass b/meta/classes/sstate.bbclass
index 5a0722567a5..b11f56b799b 100644
--- a/meta/classes/sstate.bbclass
+++ b/meta/classes/sstate.bbclass
@@ -759,7 +759,7 @@ sstate_unpack_package () {
 
 BB_HASHCHECK_FUNCTION = "sstate_checkhashes"
 
-def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False):
+def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, sq_depid, d, siginfo=False):
 
     ret = []
     missed = []
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [RFC 8/9] hashserver: Add initial reference server
  2018-07-16 20:37 [RFC 0/9] Hash Equivalency Server Joshua Watt
                   ` (6 preceding siblings ...)
  2018-07-16 20:37 ` [RFC 7/9] classes/sstate: Handle depid in hash check Joshua Watt
@ 2018-07-16 20:37 ` Joshua Watt
  2018-07-17 12:11     ` [bitbake-devel] " Richard Purdie
  2018-07-16 20:37 ` [RFC 9/9] sstate: Implement hash equivalence sstate Joshua Watt
  2018-08-09 22:08 ` [RFC v2 00/16] Hash Equivalency Server Joshua Watt
  9 siblings, 1 reply; 158+ messages in thread
From: Joshua Watt @ 2018-07-16 20:37 UTC (permalink / raw)
  To: bitbake-devel, openembedded-core

Adds an initial reference implementation of the hash server.

NOTE: This is my first dive into HTTP & REST technologies. Feedback is
appreciated. Also, I don't think it will be necessary for this reference
implementation to live in bitbake, and it can be moved to it's own
independent project if necessary?

Also, this server has some concurrency issues that I haven't tracked
down and will occasionally fail to record a new POST'd task with an
error indicating the database is locked. Based on some reading, I
believe this is because the server is using a sqlite backend, and it
would go away with a more production worthy backend. Anyway, it is good
enough for some preliminary testing.

Starting the server is simple and only requires pipenv to be installed:

 $ pipenv shell
 $ ./app.py

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/contrib/hashserver/.gitignore |   2 +
 bitbake/contrib/hashserver/Pipfile    |  15 ++
 bitbake/contrib/hashserver/app.py     | 212 ++++++++++++++++++++++++++
 3 files changed, 229 insertions(+)
 create mode 100644 bitbake/contrib/hashserver/.gitignore
 create mode 100644 bitbake/contrib/hashserver/Pipfile
 create mode 100755 bitbake/contrib/hashserver/app.py

diff --git a/bitbake/contrib/hashserver/.gitignore b/bitbake/contrib/hashserver/.gitignore
new file mode 100644
index 00000000000..030640a2b21
--- /dev/null
+++ b/bitbake/contrib/hashserver/.gitignore
@@ -0,0 +1,2 @@
+hashes.db
+Pipfile.lock
diff --git a/bitbake/contrib/hashserver/Pipfile b/bitbake/contrib/hashserver/Pipfile
new file mode 100644
index 00000000000..29cfb41a907
--- /dev/null
+++ b/bitbake/contrib/hashserver/Pipfile
@@ -0,0 +1,15 @@
+[[source]]
+url = "https://pypi.org/simple"
+verify_ssl = true
+name = "pypi"
+
+[packages]
+flask = "*"
+flask-sqlalchemy = "*"
+marshmallow-sqlalchemy = "*"
+flask-marshmallow = "*"
+
+[dev-packages]
+
+[requires]
+python_version = "3.6"
diff --git a/bitbake/contrib/hashserver/app.py b/bitbake/contrib/hashserver/app.py
new file mode 100755
index 00000000000..4fd2070fe92
--- /dev/null
+++ b/bitbake/contrib/hashserver/app.py
@@ -0,0 +1,212 @@
+#! /usr/bin/env python3
+#
+# BitBake Hash Server Reference Implementation
+#
+# Copyright (C) 2018 Garmin International
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License version 2 as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License along
+# with this program; if not, write to the Free Software Foundation, Inc.,
+# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+
+from flask import Flask, request, jsonify
+from flask_sqlalchemy import SQLAlchemy
+from flask_marshmallow import Marshmallow
+from sqlalchemy import desc, case, func
+import sqlalchemy
+import sqlite3
+import hashlib
+import datetime
+
+app = Flask(__name__)
+app.config['SQLALCHEMY_DATABASE_URI'] = 'sqlite:///hashes.db'
+app.config['SQLALCHEMY_TIMEOUT'] = 15
+
+# Order matters: Initialize SQLAlchemy before Marshmallow
+db = SQLAlchemy(app)
+ma = Marshmallow(app)
+
+@sqlalchemy.event.listens_for(sqlalchemy.engine.Engine, "connect")
+def set_sqlite_pragma(dbapi_connection, connection_record):
+    cursor = dbapi_connection.cursor()
+    cursor.execute("PRAGMA journal_mode=WAL")
+    cursor.close()
+
+class TaskModel(db.Model):
+    __tablename__ = 'tasks'
+
+    id = db.Column(db.Integer, primary_key=True)
+    taskhash = db.Column(db.String(), nullable=False)
+    method = db.Column(db.String(), nullable=False)
+    outhash = db.Column(db.String(), nullable=False)
+    depid = db.Column(db.String(), nullable=False)
+    owner = db.Column(db.String())
+    created = db.Column(db.DateTime)
+    PN = db.Column(db.String())
+    PV = db.Column(db.String())
+    PR = db.Column(db.String())
+    task = db.Column(db.String())
+    outhash_siginfo = db.Column(db.Text())
+
+    __table_args__ = (
+            db.UniqueConstraint('taskhash', 'method', 'outhash', name='unique_task'),
+            # Make an index on taskhash and method for fast lookup
+            db.Index('lookup_index', 'taskhash', 'method'),
+            )
+
+    def __init__(self, taskhash, method, outhash, depid, owner=None):
+        self.taskhash = taskhash
+        self.method = method
+        self.outhash = outhash
+        self.depid = depid
+        self.owner = owner
+        self.created = datetime.datetime.utcnow()
+        self.task = None
+        self.outhash_siginfo = None
+
+schemas = {}
+
+class TaskFullSchema(ma.ModelSchema):
+    class Meta:
+        model = TaskModel
+
+task_full_schema = TaskFullSchema()
+tasks_full_schema = TaskFullSchema(many=True)
+schemas['full'] = tasks_full_schema
+
+class TaskSchema(ma.ModelSchema):
+    class Meta:
+        fields = ('id', 'taskhash', 'method', 'outhash', 'depid', 'owner', 'created', 'PN', 'PV', 'PR', 'task')
+
+task_schema = TaskSchema()
+tasks_schema = TaskSchema(many=True)
+schemas['default'] = tasks_schema
+
+class DepIDSchema(ma.ModelSchema):
+    class Meta:
+        fields = ('taskhash', 'method', 'depid')
+
+depid_schema = DepIDSchema()
+depids_schema = DepIDSchema(many=True)
+schemas['depid'] = depids_schema
+
+def get_tasks_schema():
+    return schemas.get(request.args.get('output', 'default'), tasks_schema)
+
+def get_count_column(column, min_count):
+    count_column = func.count(column).label('count')
+    query = (db.session.query(TaskModel)
+            .with_entities(column, count_column)
+            .group_by(column))
+
+    if min_count > 1:
+        query = query.having(count_column >= min_count)
+
+    col_name = column.name
+
+    result = [{'count': data.count, col_name: getattr(data, col_name)} for data in query.all()]
+
+    return jsonify(result)
+
+def filter_query_from_request(query, request):
+    for key in request.args:
+        if hasattr(TaskModel, key):
+            vals = request.args.getlist(key)
+            query = (query
+                    .filter(getattr(TaskModel, key).in_(vals))
+                    .order_by(TaskModel.created.asc()))
+    return query
+
+@app.route("/v1/count/outhashes", methods=["GET"])
+def outhashes():
+    return get_count_column(TaskModel.outhash, int(request.args.get('min', 1)))
+
+@app.route("/v1/count/taskhashes", methods=["GET"])
+def taskhashes():
+    return get_count_column(TaskModel.taskhash, int(request.args.get('min', 1)))
+
+@app.route("/v1/equivalent", methods=["GET", "POST"])
+def equivalent():
+    if request.method == 'GET':
+        task = (db.session.query(TaskModel)
+            .filter(TaskModel.method == request.args['method'])
+            .filter(TaskModel.taskhash == request.args['taskhash'])
+            .order_by(
+                # If there are multiple matching task hashes, return the oldest
+                # one
+                TaskModel.created.asc())
+            .limit(1)
+            .one_or_none())
+
+        return depid_schema.jsonify(task)
+
+    # TODO: Handle authentication
+
+    data = request.get_json()
+    # TODO handle when data is None. Currently breaks
+
+    # Find an appropriate task.
+    new_task = (db.session.query(TaskModel)
+            .filter(
+                # The output hash and method must match exactly
+                TaskModel.method == data['method'],
+                TaskModel.outhash == data['outhash'])
+            .order_by(
+                # If there is a matching taskhash, it will be sorted first, and
+                # thus the only row returned.
+                case([(TaskModel.taskhash == data['taskhash'], 1)], else_=2))
+            .order_by(
+                # Sort by date, oldest first. This only really matters if there
+                # isn't an exact match on the taskhash
+                TaskModel.created.asc())
+            # Only return one row
+            .limit(1)
+            .one_or_none())
+
+    # If no task was found that exactly matches this taskhash, create a new one
+    if not new_task or new_task.taskhash != data['taskhash']:
+        # Capture the dependency ID. If a matching task was found, then change
+        # this tasks dependency ID to match.
+        depid = data['depid']
+        if new_task:
+            depid = new_task.depid
+
+        new_task = TaskModel(data['taskhash'], data['method'], data['outhash'], depid)
+        db.session.add(new_task)
+
+    if new_task.taskhash == data['taskhash']:
+        # Add or update optional attributes
+        for o in ('outhash_siginfo', 'owner', 'task', 'PN', 'PV', 'PR'):
+            v = getattr(new_task, o, None)
+            if v is None:
+                setattr(new_task, o, data.get(o, None))
+
+    db.session.commit()
+
+    return depid_schema.jsonify(new_task)
+
+# TODO: Handle errors. Currently, everything is a 500 error
+@app.route("/v1/tasks", methods=["GET"])
+def tasks():
+    query = db.session.query(TaskModel)
+    query = filter_query_from_request(query, request)
+    return get_tasks_schema().jsonify(query.all())
+
+@app.route("/v1/tasks/overridden", methods=["GET"])
+def overridden():
+    query = db.session.query(TaskModel).filter(TaskModel.depid != TaskModel.taskhash)
+    query = filter_query_from_request(query, request)
+    return get_tasks_schema().jsonify(query.all())
+
+if __name__ == '__main__':
+    db.create_all()
+    app.run(debug=True)
+
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [RFC 9/9] sstate: Implement hash equivalence sstate
  2018-07-16 20:37 [RFC 0/9] Hash Equivalency Server Joshua Watt
                   ` (7 preceding siblings ...)
  2018-07-16 20:37 ` [RFC 8/9] hashserver: Add initial reference server Joshua Watt
@ 2018-07-16 20:37 ` Joshua Watt
  2018-08-09 22:08 ` [RFC v2 00/16] Hash Equivalency Server Joshua Watt
  9 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-07-16 20:37 UTC (permalink / raw)
  To: bitbake-devel, openembedded-core

Converts sstate so that it can use a hash equivalence server to
determine if a task really needs to be rebuilt, or if it can be restored
from a different (equivalent) sstate object.

The dependency IDs are cached persistently using persist_data. This has
a number of advantages:
 1) Dependency IDs can be cached between invocations of bitbake to
    prevent needing to contact the server every time (which is slow)
 2) The value of each tasks dependency ID can easily be synchronized
    between different threads, which will be useful if bitbake is
    updated to do on the fly task re-hashing.

NOTE: No attempt whatsoever was made to implement any sort of
authentication to the server. Is this something that anyone cares about?
Do we imagine having a "public" server that maintains equivalencies, and
if so does it need some mechanism to authenticate the users that post
tasks to it? If so, does anyone have any ideas as to an acceptable
authentication mechanism?

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 meta/classes/sstate.bbclass | 100 ++++++++++++++++++++--
 meta/conf/bitbake.conf      |   4 +-
 meta/lib/oe/sstatesig.py    | 166 ++++++++++++++++++++++++++++++++++++
 3 files changed, 261 insertions(+), 9 deletions(-)

diff --git a/meta/classes/sstate.bbclass b/meta/classes/sstate.bbclass
index b11f56b799b..a5be4d93317 100644
--- a/meta/classes/sstate.bbclass
+++ b/meta/classes/sstate.bbclass
@@ -11,7 +11,7 @@ def generate_sstatefn(spec, hash, d):
 SSTATE_PKGARCH    = "${PACKAGE_ARCH}"
 SSTATE_PKGSPEC    = "sstate:${PN}:${PACKAGE_ARCH}${TARGET_VENDOR}-${TARGET_OS}:${PV}:${PR}:${SSTATE_PKGARCH}:${SSTATE_VERSION}:"
 SSTATE_SWSPEC     = "sstate:${PN}::${PV}:${PR}::${SSTATE_VERSION}:"
-SSTATE_PKGNAME    = "${SSTATE_EXTRAPATH}${@generate_sstatefn(d.getVar('SSTATE_PKGSPEC'), d.getVar('BB_TASKHASH'), d)}"
+SSTATE_PKGNAME    = "${SSTATE_EXTRAPATH}${@generate_sstatefn(d.getVar('SSTATE_PKGSPEC'), d.getVar('BB_DEPID'), d)}"
 SSTATE_PKG        = "${SSTATE_DIR}/${SSTATE_PKGNAME}"
 SSTATE_EXTRAPATH   = ""
 SSTATE_EXTRAPATHWILDCARD = ""
@@ -75,6 +75,23 @@ SSTATE_SIG_PASSPHRASE ?= ""
 # Whether to verify the GnUPG signatures when extracting sstate archives
 SSTATE_VERIFY_SIG ?= "0"
 
+SSTATE_HASHEQUIV_METHOD ?= "OEOuthashBasic"
+SSTATE_HASHEQUIV_METHOD[doc] = "The function used to calculate the output hash \
+    for a task, which in turn is used to determine equivalency. \
+    "
+
+SSTATE_HASHEQUIV_SERVER ?= ""
+SSTATE_HASHEQUIV_SERVER[doc] = "The hash equivalence sever. For example, \
+    'http://192.168.0.1:5000'. Do not include a trailing slash \
+    "
+
+SSTATE_HASHEQUIV_REPORT_TASKDATA ?= "0"
+SSTATE_HASHEQUIV_REPORT_TASKDATA[doc] = "Report additional useful data to the \
+    hash equivalency server, such as PN, PV, taskname, etc. This information \
+    is very useful for developers looking at task data, but may leak sensitive \
+    data if the equivalence server is public. \
+    "
+
 python () {
     if bb.data.inherits_class('native', d):
         d.setVar('SSTATE_PKGARCH', d.getVar('BUILD_ARCH', False))
@@ -626,7 +643,7 @@ def sstate_package(ss, d):
         return
 
     for f in (d.getVar('SSTATECREATEFUNCS') or '').split() + \
-             ['sstate_create_package', 'sstate_sign_package'] + \
+             ['sstate_report_depid', 'sstate_create_package', 'sstate_sign_package'] + \
              (d.getVar('SSTATEPOSTCREATEFUNCS') or '').split():
         # All hooks should run in SSTATE_BUILDDIR.
         bb.build.exec_func(f, d, (sstatebuild,))
@@ -743,6 +760,73 @@ python sstate_sign_package () {
                            d.getVar('SSTATE_SIG_PASSPHRASE'), armor=False)
 }
 
+def OEOuthashBasic(path, sigfile, task, d):
+    import hashlib
+    import stat
+
+    def update_hash(s):
+        s = s.encode('utf-8')
+        h.update(s)
+        if sigfile:
+            sigfile.write(s)
+
+    h = hashlib.sha1()
+    prev_dir = os.getcwd()
+
+    try:
+        os.chdir(path)
+
+        update_hash("OEOuthashBasic\n")
+
+        # It is only currently useful to get equivalent hashes for things that
+        # can be restored from sstate. Since the sstate object is named using
+        # SSTATE_PKGSPEC and the task name, those should be included in the
+        # output hash calculation.
+        update_hash("SSTATE_PKGSPEC=%s\n" % d.getVar('SSTATE_PKGSPEC'))
+        update_hash("task=%s\n" % task)
+
+        for root, dirs, files in os.walk('.', topdown=True):
+            # Sort directories and files to ensure consistent ordering
+            dirs.sort()
+            files.sort()
+
+            for f in files:
+                path = os.path.join(root, f)
+                s = os.lstat(path)
+
+                # Hash file path
+                update_hash(path + '\n')
+
+                # Hash file mode
+                update_hash("\tmode=0x%x\n" % stat.S_IMODE(s.st_mode))
+                update_hash("\ttype=0x%x\n" % stat.S_IFMT(s.st_mode))
+
+                if stat.S_ISBLK(s.st_mode) or stat.S_ISBLK(s.st_mode):
+                    # Hash device major and minor
+                    update_hash("\tdev=%d,%d\n" % (os.major(s.st_rdev), os.minor(s.st_rdev)))
+                elif stat.S_ISLNK(s.st_mode):
+                    # Hash symbolic link
+                    update_hash("\tsymlink=%s\n" % os.readlink(path))
+                else:
+                    fh = hashlib.sha1()
+                    # Hash file contents
+                    with open(path, 'rb') as d:
+                        for chunk in iter(lambda: d.read(4096), b""):
+                            fh.update(chunk)
+                    update_hash("\tdigest=%s\n" % fh.hexdigest())
+    finally:
+        os.chdir(prev_dir)
+
+    return h.hexdigest()
+
+python sstate_report_depid() {
+    report_depid = getattr(bb.parse.siggen, 'report_depid', None)
+
+    if report_depid:
+        ss = sstate_state_fromvars(d)
+        report_depid(os.getcwd(), ss['task'], d)
+}
+
 #
 # Shell function to decompress and prepare a package for installation
 # Will be run from within SSTATE_INSTDIR.
@@ -789,7 +873,7 @@ def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, sq_depid, d, siginfo=
 
         spec, extrapath, tname = getpathcomponents(task, d)
 
-        sstatefile = d.expand("${SSTATE_DIR}/" + extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + extension)
+        sstatefile = d.expand("${SSTATE_DIR}/" + extrapath + generate_sstatefn(spec, sq_depid[task], d) + "_" + tname + extension)
 
         if os.path.exists(sstatefile):
             bb.debug(2, "SState: Found valid sstate file %s" % sstatefile)
@@ -851,7 +935,7 @@ def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, sq_depid, d, siginfo=
             if task in ret:
                 continue
             spec, extrapath, tname = getpathcomponents(task, d)
-            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + extension)
+            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_depid[task], d) + "_" + tname + extension)
             tasklist.append((task, sstatefile))
 
         if tasklist:
@@ -877,12 +961,12 @@ def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, sq_depid, d, siginfo=
         evdata = {'missed': [], 'found': []};
         for task in missed:
             spec, extrapath, tname = getpathcomponents(task, d)
-            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + ".tgz")
-            evdata['missed'].append( (sq_fn[task], sq_task[task], sq_hash[task], sstatefile ) )
+            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_depid[task], d) + "_" + tname + ".tgz")
+            evdata['missed'].append( (sq_fn[task], sq_task[task], sq_depid[task], sstatefile ) )
         for task in ret:
             spec, extrapath, tname = getpathcomponents(task, d)
-            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + ".tgz")
-            evdata['found'].append( (sq_fn[task], sq_task[task], sq_hash[task], sstatefile ) )
+            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_depid[task], d) + "_" + tname + ".tgz")
+            evdata['found'].append( (sq_fn[task], sq_task[task], sq_depid[task], sstatefile ) )
         bb.event.fire(bb.event.MetadataEvent("MissedSstate", evdata), d)
 
     if hasattr(bb.parse.siggen, "checkhashes"):
diff --git a/meta/conf/bitbake.conf b/meta/conf/bitbake.conf
index 8f738545995..47f258569f2 100644
--- a/meta/conf/bitbake.conf
+++ b/meta/conf/bitbake.conf
@@ -865,7 +865,9 @@ BB_HASHBASE_WHITELIST ?= "TMPDIR FILE PATH PWD BB_TASKHASH BBPATH BBSERVER DL_DI
     STAMPS_DIR PRSERV_DUMPDIR PRSERV_DUMPFILE PRSERV_LOCKDOWN PARALLEL_MAKE \
     CCACHE_DIR EXTERNAL_TOOLCHAIN CCACHE CCACHE_NOHASHDIR LICENSE_PATH SDKPKGSUFFIX \
     WARN_QA ERROR_QA WORKDIR STAMPCLEAN PKGDATA_DIR BUILD_ARCH SSTATE_PKGARCH \
-    BB_WORKERCONTEXT BB_LIMITEDDEPS extend_recipe_sysroot DEPLOY_DIR"
+    BB_WORKERCONTEXT BB_LIMITEDDEPS BB_DEPID extend_recipe_sysroot DEPLOY_DIR \
+    SSTATE_HASHEQUIV_METHOD SSTATE_HASHEQUIV_SERVER SSTATE_HASHEQUIV_REPORT_TASKDATA \
+    SSTATE_HASHEQUIV_OWNER"
 BB_HASHCONFIG_WHITELIST ?= "${BB_HASHBASE_WHITELIST} DATE TIME SSH_AGENT_PID \
     SSH_AUTH_SOCK PSEUDO_BUILD BB_ENV_EXTRAWHITE DISABLE_SANITY_CHECKS \
     PARALLEL_MAKE BB_NUMBER_THREADS BB_ORIGENV BB_INVALIDCONF BBINCLUDED \
diff --git a/meta/lib/oe/sstatesig.py b/meta/lib/oe/sstatesig.py
index 5dcc2f5cd6d..727baa95970 100644
--- a/meta/lib/oe/sstatesig.py
+++ b/meta/lib/oe/sstatesig.py
@@ -256,10 +256,176 @@ class SignatureGeneratorOEBasicHash(bb.siggen.SignatureGeneratorBasicHash):
         if error_msgs:
             bb.fatal("\n".join(error_msgs))
 
+class SignatureGeneratorOEEquivHash(SignatureGeneratorOEBasicHash):
+    name = "OEEquivHash"
+
+    def init_rundepcheck(self, data):
+        super().init_rundepcheck(data)
+        self.server = data.getVar('SSTATE_HASHEQUIV_SERVER')
+        self.method = data.getVar('SSTATE_HASHEQUIV_METHOD')
+        self.depids = bb.persist_data.persist('SSTATESIG_DEPID_CACHE_v1_' + self.method, data)
+
+    def get_taskdata(self):
+        return (self.server, self.method) + super().get_taskdata()
+
+    def set_taskdata(self, data):
+        self.server, self.method = data[:2]
+        super().set_taskdata(data[2:])
+
+    def __get_depid_key(self, task):
+        return '%s:%s' % (task, self.taskhash[task])
+
+    def get_stampfile_hash(self, task):
+        if task in self.taskhash:
+            # If a depid is reported, use it as the stampfile hash. This
+            # ensures that if a task won't be re-run if the taskhash changes,
+            # but it would result in the same output hash
+            depid = self.depids.get(self.__get_depid_key(task))
+            if depid is not None:
+                return depid
+
+        return super().get_stampfile_hash(task)
+
+    def get_depid(self, task):
+        import urllib
+        import json
+
+        taskhash = self.taskhash[task]
+
+        # TODO: The key only *needs* to be the taskhash, the task is just
+        # convenient
+        key = self.__get_depid_key(task)
+
+        # TODO: This cache can grow unbounded. It probably only needs to keep
+        # for each task
+        depid = self.depids.get(key)
+        if depid is not None:
+            return depid
+
+        # In the absence of being able to discover a dependency ID from the
+        # server, make it be equivalent to the taskhash. The dependency ID only
+        # really needs to be a unique string (not even necessarily a hash), but
+        # making it match the taskhash has a few advantages:
+        #
+        # 1) All of the sstate code that assumes hashes can be the same
+        # 2) It provides maximal compatibility with builders that don't use
+        #    an equivalency server
+        # 3) The value is easy for multiple independent builders to derive the
+        #    same depid from the same input. This means that if the independent
+        #    builders find the same taskhash, but it isn't reported to the server,
+        #    there is a better chance that they will agree on the dependency ID.
+        depid = taskhash
+
+        try:
+            url = '%s/v1/equivalent?%s' % (self.server,
+                    urllib.parse.urlencode({'method': self.method, 'taskhash': self.taskhash[task]}))
+
+            request = urllib.request.Request(url)
+            response = urllib.request.urlopen(request)
+            data = response.read()
+
+            json_data = json.loads(data)
+
+            if json_data:
+                depid = json_data['depid']
+                # Dependency ID equal to the taskhash is not very interesting,
+                # so it is reported it at debug level 2. If they differ, that
+                # is much more interesting, so it is reported at debug level 1
+                bb.debug((1, 2)[depid == taskhash], 'Found depid %s in place of %s for %s from %s' % (depid, taskhash, task, self.server))
+            else:
+                bb.debug(2, 'No reported dependency ID for %s:%s from %s' % (task, taskhash, self.server))
+        except urllib.error.URLError as e:
+            bb.warn('Failure contacting Hash Equivalence Server %s: %s' % (self.server, str(e)))
+        except (KeyError, json.JSONDecodeError) as e:
+            bb.warn('Poorly formatted response from %s: %s' % (self.server, str(e)))
+
+        self.depids[key] = depid
+        return depid
+
+    def report_depid(self, path, task, d):
+        import urllib
+        import json
+        import tempfile
+        import base64
+
+        taskhash = d.getVar('BB_TASKHASH')
+        depid = d.getVar('BB_DEPID')
+        report_taskdata = d.getVar('SSTATE_HASHEQUIV_REPORT_TASKDATA') == '1'
+        tempdir = d.getVar('T')
+        fn = d.getVar('BB_FILENAME')
+        key = fn + '.do_' + task + ':' + taskhash
+
+        # Sanity checks
+        cache_depid = self.depids.get(key)
+        if cache_depid is None:
+            bb.fatal('%s not in depid cache. Please report this error' % key)
+
+        if cache_depid != depid:
+            bb.fatal("Cache depid %s doesn't match BB_DEPID %s" % (cache_depid, depid))
+
+        sigfile = None
+        sigfile_name = "depsig.do_%s.%d" % (task, os.getpid())
+        sigfile_link = "depsig.do_%s" % task
+
+        try:
+            call = self.method + '(path, sigfile, task, d)'
+            sigfile = open(os.path.join(tempdir, sigfile_name), 'w+b')
+            locs = {'path': path, 'sigfile': sigfile, 'task': task, 'd': d}
+
+            outhash = bb.utils.better_eval(call, locs)
+
+            try:
+                url = '%s/v1/equivalent' % self.server
+                task_data = {
+                    'taskhash': taskhash,
+                    'method': self.method,
+                    'outhash': outhash,
+                    'depid': depid,
+                    'owner': d.getVar('SSTATE_HASHEQUIV_OWNER')
+                    }
+
+                if report_taskdata:
+                    sigfile.seek(0)
+
+                    task_data['PN'] = d.getVar('PN')
+                    task_data['PV'] = d.getVar('PV')
+                    task_data['PR'] = d.getVar('PR')
+                    task_data['task'] = task
+                    task_data['outhash_siginfo'] = sigfile.read().decode('utf-8')
+
+                headers = {'content-type': 'application/json'}
+
+                request = urllib.request.Request(url, json.dumps(task_data).encode('utf-8'), headers)
+                response = urllib.request.urlopen(request)
+                data = response.read()
+
+                json_data = json.loads(data)
+                new_depid = json_data['depid']
+
+                if new_depid != depid:
+                    bb.debug(1, 'Task %s depid changed %s -> %s by server %s' % (taskhash, depid, new_depid, self.server))
+                else:
+                    bb.debug(1, 'Reported task %s as depid %s to %s' % (taskhash, depid, self.server))
+            except urllib.error.URLError as e:
+                bb.warn('Failure contacting Hash Equivalence Server %s: %s' % (self.server, str(e)))
+            except (KeyError, json.JSONDecodeError) as e:
+                bb.warn('Poorly formatted response from %s: %s' % (self.server, str(e)))
+        finally:
+            if sigfile:
+                sigfile.close()
+
+                sigfile_link_path = os.path.join(tempdir, sigfile_link)
+                bb.utils.remove(sigfile_link_path)
+
+                try:
+                    os.symlink(sigfile_name, sigfile_link_path)
+                except OSError:
+                    pass
 
 # Insert these classes into siggen's namespace so it can see and select them
 bb.siggen.SignatureGeneratorOEBasic = SignatureGeneratorOEBasic
 bb.siggen.SignatureGeneratorOEBasicHash = SignatureGeneratorOEBasicHash
+bb.siggen.SignatureGeneratorOEEquivHash = SignatureGeneratorOEEquivHash
 
 
 def find_siginfo(pn, taskname, taskhashlist, d):
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* Re: [RFC 8/9] hashserver: Add initial reference server
  2018-07-16 20:37 ` [RFC 8/9] hashserver: Add initial reference server Joshua Watt
@ 2018-07-17 12:11     ` Richard Purdie
  0 siblings, 0 replies; 158+ messages in thread
From: Richard Purdie @ 2018-07-17 12:11 UTC (permalink / raw)
  To: Joshua Watt, bitbake-devel, openembedded-core

On Mon, 2018-07-16 at 15:37 -0500, Joshua Watt wrote:
> Adds an initial reference implementation of the hash server.
> 
> NOTE: This is my first dive into HTTP & REST technologies. Feedback
> is
> appreciated. Also, I don't think it will be necessary for this
> reference
> implementation to live in bitbake, and it can be moved to it's own
> independent project if necessary?
> 
> Also, this server has some concurrency issues that I haven't tracked
> down and will occasionally fail to record a new POST'd task with an
> error indicating the database is locked. Based on some reading, I
> believe this is because the server is using a sqlite backend, and it
> would go away with a more production worthy backend. Anyway, it is
> good
> enough for some preliminary testing.
> 
> Starting the server is simple and only requires pipenv to be
> installed:
> 
>  $ pipenv shell
>  $ ./app.py

I need to spend some time digesting this series but this patch did make
me a little sad.

I'm hoping we can make the hash equivalence server something people use
easily and perhaps part of bitbake. The dependencies you've used in
this code mean it has a significantly higher "barrier to use" than most
of our other code though :(

On the one hand I can understand people wanting to use dependencies and
new technology. On the other, keeping things simple also has
advantages.

Even the minimum python version is potentially problematic, we don't
even have working recipes for python 3.6!

I appreciate its a reference and means we can test the rest of the code
 so its good but we may need a different implementation of this
ultimately.

Cheers,

Richard



^ permalink raw reply	[flat|nested] 158+ messages in thread

* Re: [bitbake-devel] [RFC 8/9] hashserver: Add initial reference server
@ 2018-07-17 12:11     ` Richard Purdie
  0 siblings, 0 replies; 158+ messages in thread
From: Richard Purdie @ 2018-07-17 12:11 UTC (permalink / raw)
  To: Joshua Watt, bitbake-devel, openembedded-core

On Mon, 2018-07-16 at 15:37 -0500, Joshua Watt wrote:
> Adds an initial reference implementation of the hash server.
> 
> NOTE: This is my first dive into HTTP & REST technologies. Feedback
> is
> appreciated. Also, I don't think it will be necessary for this
> reference
> implementation to live in bitbake, and it can be moved to it's own
> independent project if necessary?
> 
> Also, this server has some concurrency issues that I haven't tracked
> down and will occasionally fail to record a new POST'd task with an
> error indicating the database is locked. Based on some reading, I
> believe this is because the server is using a sqlite backend, and it
> would go away with a more production worthy backend. Anyway, it is
> good
> enough for some preliminary testing.
> 
> Starting the server is simple and only requires pipenv to be
> installed:
> 
>  $ pipenv shell
>  $ ./app.py

I need to spend some time digesting this series but this patch did make
me a little sad.

I'm hoping we can make the hash equivalence server something people use
easily and perhaps part of bitbake. The dependencies you've used in
this code mean it has a significantly higher "barrier to use" than most
of our other code though :(

On the one hand I can understand people wanting to use dependencies and
new technology. On the other, keeping things simple also has
advantages.

Even the minimum python version is potentially problematic, we don't
even have working recipes for python 3.6!

I appreciate its a reference and means we can test the rest of the code
 so its good but we may need a different implementation of this
ultimately.

Cheers,

Richard



^ permalink raw reply	[flat|nested] 158+ messages in thread

* Re: [RFC 8/9] hashserver: Add initial reference server
  2018-07-17 12:11     ` [bitbake-devel] " Richard Purdie
@ 2018-07-17 13:44       ` Joshua Watt
  -1 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-07-17 13:44 UTC (permalink / raw)
  To: Richard Purdie, bitbake-devel, openembedded-core

On Tue, 2018-07-17 at 13:11 +0100, Richard Purdie wrote:
> On Mon, 2018-07-16 at 15:37 -0500, Joshua Watt wrote:
> > Adds an initial reference implementation of the hash server.
> > 
> > NOTE: This is my first dive into HTTP & REST technologies. Feedback
> > is
> > appreciated. Also, I don't think it will be necessary for this
> > reference
> > implementation to live in bitbake, and it can be moved to it's own
> > independent project if necessary?
> > 
> > Also, this server has some concurrency issues that I haven't
> > tracked
> > down and will occasionally fail to record a new POST'd task with an
> > error indicating the database is locked. Based on some reading, I
> > believe this is because the server is using a sqlite backend, and
> > it
> > would go away with a more production worthy backend. Anyway, it is
> > good
> > enough for some preliminary testing.
> > 
> > Starting the server is simple and only requires pipenv to be
> > installed:
> > 
> >  $ pipenv shell
> >  $ ./app.py
> 
> I need to spend some time digesting this series but this patch did
> make
> me a little sad.
> 
> I'm hoping we can make the hash equivalence server something people
> use
> easily and perhaps part of bitbake. The dependencies you've used in
> this code mean it has a significantly higher "barrier to use" than
> most
> of our other code though :(

Ah, I didn't realize there was desire to have it actually be part of
bitbake... That would change the strategy considerably.

I think that is a worth while discussion to have. I obviously wasn't
thinking the server would be part of bitbake, I was more thinking it
would probably be an independent project (I even have a name picked out
already ;), so I'm curious what thoughts there are on the advantages
and disadvantages of doing that.

> On the one hand I can understand people wanting to use dependencies
> and
> new technology. On the other, keeping things simple also has
> advantages.

Sure. Pipenv makes it pretty easy to pull in a lot of shiny modules
that you don't necessarily need, and I may have gone a little
overboard. I do have some justification for *most* of the things I
brought in besides just "it looked cool". We can dig into them deeper.

However, pipenv (or python virtual environments) are designed to amke
these types of decisions easier, so I think that the discussion about
what dependencies we do or don't want should probably start there (i.e.
do we allow pipenv or not?). Basically, it's much easier to pull in the
dependencies you need (and even fix versions for production use) with
pipenv. This of course doesn't mean you can be careless about what you
pull in, but it makes you a lot more independent from the host setup.

> 
> Even the minimum python version is potentially problematic, we don't
> even have working recipes for python 3.6!

Oh right! Pipenv defaults to the version of Python you have installed.
I'm pretty sure you can relax the restriction to just "python 3", but I
forgot to do so before I pushed the patch.

> 
> I appreciate its a reference and means we can test the rest of the
> code
>  so its good but we may need a different implementation of this
> ultimately.

I think perhaps my use of the term "reference implementation" was too
strong. "Example implementation for testing", or "toy implementation"
might be more inline with my intention. 


Thanks,
Joshua Watt

> 
> 
> Cheers,
> 
> Richard
> 


^ permalink raw reply	[flat|nested] 158+ messages in thread

* Re: [bitbake-devel] [RFC 8/9] hashserver: Add initial reference server
@ 2018-07-17 13:44       ` Joshua Watt
  0 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-07-17 13:44 UTC (permalink / raw)
  To: Richard Purdie, bitbake-devel, openembedded-core

On Tue, 2018-07-17 at 13:11 +0100, Richard Purdie wrote:
> On Mon, 2018-07-16 at 15:37 -0500, Joshua Watt wrote:
> > Adds an initial reference implementation of the hash server.
> > 
> > NOTE: This is my first dive into HTTP & REST technologies. Feedback
> > is
> > appreciated. Also, I don't think it will be necessary for this
> > reference
> > implementation to live in bitbake, and it can be moved to it's own
> > independent project if necessary?
> > 
> > Also, this server has some concurrency issues that I haven't
> > tracked
> > down and will occasionally fail to record a new POST'd task with an
> > error indicating the database is locked. Based on some reading, I
> > believe this is because the server is using a sqlite backend, and
> > it
> > would go away with a more production worthy backend. Anyway, it is
> > good
> > enough for some preliminary testing.
> > 
> > Starting the server is simple and only requires pipenv to be
> > installed:
> > 
> >  $ pipenv shell
> >  $ ./app.py
> 
> I need to spend some time digesting this series but this patch did
> make
> me a little sad.
> 
> I'm hoping we can make the hash equivalence server something people
> use
> easily and perhaps part of bitbake. The dependencies you've used in
> this code mean it has a significantly higher "barrier to use" than
> most
> of our other code though :(

Ah, I didn't realize there was desire to have it actually be part of
bitbake... That would change the strategy considerably.

I think that is a worth while discussion to have. I obviously wasn't
thinking the server would be part of bitbake, I was more thinking it
would probably be an independent project (I even have a name picked out
already ;), so I'm curious what thoughts there are on the advantages
and disadvantages of doing that.

> On the one hand I can understand people wanting to use dependencies
> and
> new technology. On the other, keeping things simple also has
> advantages.

Sure. Pipenv makes it pretty easy to pull in a lot of shiny modules
that you don't necessarily need, and I may have gone a little
overboard. I do have some justification for *most* of the things I
brought in besides just "it looked cool". We can dig into them deeper.

However, pipenv (or python virtual environments) are designed to amke
these types of decisions easier, so I think that the discussion about
what dependencies we do or don't want should probably start there (i.e.
do we allow pipenv or not?). Basically, it's much easier to pull in the
dependencies you need (and even fix versions for production use) with
pipenv. This of course doesn't mean you can be careless about what you
pull in, but it makes you a lot more independent from the host setup.

> 
> Even the minimum python version is potentially problematic, we don't
> even have working recipes for python 3.6!

Oh right! Pipenv defaults to the version of Python you have installed.
I'm pretty sure you can relax the restriction to just "python 3", but I
forgot to do so before I pushed the patch.

> 
> I appreciate its a reference and means we can test the rest of the
> code
>  so its good but we may need a different implementation of this
> ultimately.

I think perhaps my use of the term "reference implementation" was too
strong. "Example implementation for testing", or "toy implementation"
might be more inline with my intention. 


Thanks,
Joshua Watt

> 
> 
> Cheers,
> 
> Richard
> 


^ permalink raw reply	[flat|nested] 158+ messages in thread

* Re: [RFC 8/9] hashserver: Add initial reference server
  2018-07-17 12:11     ` [bitbake-devel] " Richard Purdie
@ 2018-07-18 13:53       ` Joshua Watt
  -1 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-07-18 13:53 UTC (permalink / raw)
  To: Richard Purdie, bitbake-devel, openembedded-core

On Tue, 2018-07-17 at 13:11 +0100, Richard Purdie wrote:
> On Mon, 2018-07-16 at 15:37 -0500, Joshua Watt wrote:
> > Adds an initial reference implementation of the hash server.
> > 
> > NOTE: This is my first dive into HTTP & REST technologies. Feedback
> > is
> > appreciated. Also, I don't think it will be necessary for this
> > reference
> > implementation to live in bitbake, and it can be moved to it's own
> > independent project if necessary?
> > 
> > Also, this server has some concurrency issues that I haven't
> > tracked
> > down and will occasionally fail to record a new POST'd task with an
> > error indicating the database is locked. Based on some reading, I
> > believe this is because the server is using a sqlite backend, and
> > it
> > would go away with a more production worthy backend. Anyway, it is
> > good
> > enough for some preliminary testing.
> > 
> > Starting the server is simple and only requires pipenv to be
> > installed:
> > 
> >  $ pipenv shell
> >  $ ./app.py
> 
> I need to spend some time digesting this series but this patch did
> make
> me a little sad.
> 
> I'm hoping we can make the hash equivalence server something people
> use
> easily and perhaps part of bitbake. The dependencies you've used in
> this code mean it has a significantly higher "barrier to use" than
> most
> of our other code though :(
> 
> On the one hand I can understand people wanting to use dependencies
> and
> new technology. On the other, keeping things simple also has
> advantages.
> 
> Even the minimum python version is potentially problematic, we don't
> even have working recipes for python 3.6!
> 
> I appreciate its a reference and means we can test the rest of the
> code
>  so its good but we may need a different implementation of this
> ultimately.

I had a look around, and I think it should be quite feasible to
implement a reference server using only the standard Python libraries
(e.g. SimpleHTTPServer). The actual API that the server is *required*
to implement is very small (a single endpoint of "/v1/equivalent" with
GET and POST methods). It will probably have a number of limitations
(scalability, authentication, security, etc.) but for something to get
started or for automated testing purposes, I think it will be
sufficient.

I'll probably wait a few days at least before I loop back around to
look at that, so I'd like to leave this discussion open, e.g. Do we
want a simpler reference implementation?

> 
> Cheers,
> 
> Richard
> 


^ permalink raw reply	[flat|nested] 158+ messages in thread

* Re: [bitbake-devel] [RFC 8/9] hashserver: Add initial reference server
@ 2018-07-18 13:53       ` Joshua Watt
  0 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-07-18 13:53 UTC (permalink / raw)
  To: Richard Purdie, bitbake-devel, openembedded-core

On Tue, 2018-07-17 at 13:11 +0100, Richard Purdie wrote:
> On Mon, 2018-07-16 at 15:37 -0500, Joshua Watt wrote:
> > Adds an initial reference implementation of the hash server.
> > 
> > NOTE: This is my first dive into HTTP & REST technologies. Feedback
> > is
> > appreciated. Also, I don't think it will be necessary for this
> > reference
> > implementation to live in bitbake, and it can be moved to it's own
> > independent project if necessary?
> > 
> > Also, this server has some concurrency issues that I haven't
> > tracked
> > down and will occasionally fail to record a new POST'd task with an
> > error indicating the database is locked. Based on some reading, I
> > believe this is because the server is using a sqlite backend, and
> > it
> > would go away with a more production worthy backend. Anyway, it is
> > good
> > enough for some preliminary testing.
> > 
> > Starting the server is simple and only requires pipenv to be
> > installed:
> > 
> >  $ pipenv shell
> >  $ ./app.py
> 
> I need to spend some time digesting this series but this patch did
> make
> me a little sad.
> 
> I'm hoping we can make the hash equivalence server something people
> use
> easily and perhaps part of bitbake. The dependencies you've used in
> this code mean it has a significantly higher "barrier to use" than
> most
> of our other code though :(
> 
> On the one hand I can understand people wanting to use dependencies
> and
> new technology. On the other, keeping things simple also has
> advantages.
> 
> Even the minimum python version is potentially problematic, we don't
> even have working recipes for python 3.6!
> 
> I appreciate its a reference and means we can test the rest of the
> code
>  so its good but we may need a different implementation of this
> ultimately.

I had a look around, and I think it should be quite feasible to
implement a reference server using only the standard Python libraries
(e.g. SimpleHTTPServer). The actual API that the server is *required*
to implement is very small (a single endpoint of "/v1/equivalent" with
GET and POST methods). It will probably have a number of limitations
(scalability, authentication, security, etc.) but for something to get
started or for automated testing purposes, I think it will be
sufficient.

I'll probably wait a few days at least before I loop back around to
look at that, so I'd like to leave this discussion open, e.g. Do we
want a simpler reference implementation?

> 
> Cheers,
> 
> Richard
> 


^ permalink raw reply	[flat|nested] 158+ messages in thread

* [RFC v2 00/16] Hash Equivalency Server
  2018-07-16 20:37 [RFC 0/9] Hash Equivalency Server Joshua Watt
                   ` (8 preceding siblings ...)
  2018-07-16 20:37 ` [RFC 9/9] sstate: Implement hash equivalence sstate Joshua Watt
@ 2018-08-09 22:08 ` Joshua Watt
  2018-08-09 22:08   ` [RFC v2 01/16] bitbake: fork: Add os.fork() wrappers Joshua Watt
                     ` (17 more replies)
  9 siblings, 18 replies; 158+ messages in thread
From: Joshua Watt @ 2018-08-09 22:08 UTC (permalink / raw)
  To: bitbake-devel, openembedded-core

These patches are a first pass at implementing a hash equivalence server
in bitbake & OE.

Apologies for cross-posting this to both the bitbake-devel and
openembedded-devel; this work necessarily intertwines both places, and
it is really necessary to look at both parts to get an idea of what is
going on. For convenience, the bitbake patches are listed first,
followed by the oe-core patches.

The basic premise is that any given task no longer hashes a dependent
task's taskhash to determine it's own taskhash, but instead hashes the
dependent task's "dependency ID" (which doesn't strictly need to be a
hash, but is for consistency. We can have the discussion as to whether
this should be called a "dependency hash" if anyone wants). This allows
multiple taskhashes to map to the same dependency ID, meaning that
trivial changes to a recipe that would change the taskhash don't
necessarily need to change the dependency ID, and thus don't need to
cause downstream tasks to be rebuilt (with caveats, see below).

In the absence of any interaction by the user, the dependency ID for a
task is just that task's taskhash, which effectively maintains the
current behavior. However, if the user enables the "OEEquivHash"
signature generator, they can direct it to look at a hash equivalency
server (of which a reference implementation is provided). The sstate
code will provide the server with an output hash that it calculates, and
the server will record all tasks with the same output hash as
"equivalent" and report the same dependency ID for them when requested.
When initializing tasks, bitbake can ask the server about the dependency
ID for new tasks it has never seen before and potentially skip
rebuilding, or restore the task from an equivalent sstate file. To
facilitate restoring tasks from sstate, sstate objects are now named
based on the tasks dependency ID instead of the taskhash (which, again
has no effect if the server is in use).

This patchset doesn't make any attempt to dynamically update task
dependency IDs after bitbake initializes the tasks, and as such there
are some cases where this isn't accelerating the build as much as it
possibly could. I think it will be possible to add support for this, but
this preliminary support needs to come first.

Some patches have additional NOTEs that indicate places where I wasn't
sure what to do.

You can also see these patches (and my first attempts at dynamic task
re-hashing) on the "jpew/hash-equivalence" branch in poky-contrib.

As always, thanks for your feedback and time

VERSION 2:

At the core, this patch does the same thing as V1 with some very minor
tweaks. The main things that have changed are:
 1) Per request, the Hash Equivalence Server reference implementation is
    now based entirely on built in Python modules and requires no
    external libraries. It also has a wrapper script to launch it
    (bitbake-hashserv) and unittests.
 2) There is a major rework of persist_data in bitbake. I think these
    patches could be submitted independently, but I doubt anyone is
    clamoring for them. The general gist of them is that there were a
    lot of strange edge cases that I found when using persist_data as an
    IPC mechanism between the main bitbake process and the
    bitbake-worker processes. I went ahead and added extensive unit
    tests for this as well.

Joshua Watt (16):
  bitbake: fork: Add os.fork() wrappers
  bitbake: persist_data: Fix leaking cursors causing deadlock
  bitbake: persist_data: Add key constraints
  bitbake: persist_data: Enable Write Ahead Log
  bitbake: persist_data: Disable enable_shared_cache
  bitbake: persist_data: Close databases across fork
  bitbake: tests/persist_data: Add tests
  bitbake: bitbake-worker: Pass taskhash as runtask parameter
  bitbake: siggen: Split out stampfile hash fetch
  bitbake: siggen: Split out task depend ID
  bitbake: runqueue: Track task dependency ID
  bitbake: runqueue: Pass dependency ID to task
  bitbake: runqueue: Pass dependency ID to hash validate
  classes/sstate: Handle depid in hash check
  bitbake: hashserv: Add hash equivalence reference server
  sstate: Implement hash equivalence sstate

 bitbake/bin/bitbake-hashserv         |  67 ++++++++
 bitbake/bin/bitbake-selftest         |   3 +
 bitbake/bin/bitbake-worker           |  11 +-
 bitbake/lib/bb/fork.py               |  71 ++++++++
 bitbake/lib/bb/persist_data.py       | 239 ++++++++++++++++++++-------
 bitbake/lib/bb/runqueue.py           |  56 ++++---
 bitbake/lib/bb/siggen.py             |  20 ++-
 bitbake/lib/bb/tests/persist_data.py | 188 +++++++++++++++++++++
 bitbake/lib/hashserv/__init__.py     | 152 +++++++++++++++++
 bitbake/lib/hashserv/tests.py        | 141 ++++++++++++++++
 meta/classes/sstate.bbclass          | 102 +++++++++++-
 meta/conf/bitbake.conf               |   4 +-
 meta/lib/oe/sstatesig.py             | 166 +++++++++++++++++++
 13 files changed, 1117 insertions(+), 103 deletions(-)
 create mode 100755 bitbake/bin/bitbake-hashserv
 create mode 100644 bitbake/lib/bb/fork.py
 create mode 100644 bitbake/lib/bb/tests/persist_data.py
 create mode 100644 bitbake/lib/hashserv/__init__.py
 create mode 100644 bitbake/lib/hashserv/tests.py

-- 
2.17.1



^ permalink raw reply	[flat|nested] 158+ messages in thread

* [RFC v2 01/16] bitbake: fork: Add os.fork() wrappers
  2018-08-09 22:08 ` [RFC v2 00/16] Hash Equivalency Server Joshua Watt
@ 2018-08-09 22:08   ` Joshua Watt
  2018-08-09 22:08   ` [RFC v2 02/16] bitbake: persist_data: Fix leaking cursors causing deadlock Joshua Watt
                     ` (16 subsequent siblings)
  17 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-08-09 22:08 UTC (permalink / raw)
  To: bitbake-devel, openembedded-core

Adds a compatibility wrapper around os.fork() that backports the ability
to register fork event handlers (os.register_at_fork()) from Python 3.7

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/bin/bitbake-worker |  2 +-
 bitbake/lib/bb/fork.py     | 71 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 72 insertions(+), 1 deletion(-)
 create mode 100644 bitbake/lib/bb/fork.py

diff --git a/bitbake/bin/bitbake-worker b/bitbake/bin/bitbake-worker
index e925054b7f9..baa1a84e6dd 100755
--- a/bitbake/bin/bitbake-worker
+++ b/bitbake/bin/bitbake-worker
@@ -181,7 +181,7 @@ def fork_off_task(cfg, data, databuilder, workerdata, fn, task, taskname, append
         pipein, pipeout = os.pipe()
         pipein = os.fdopen(pipein, 'rb', 4096)
         pipeout = os.fdopen(pipeout, 'wb', 0)
-        pid = os.fork()
+        pid = bb.fork.fork()
     except OSError as e:
         logger.critical("fork failed: %d (%s)" % (e.errno, e.strerror))
         sys.exit(1)
diff --git a/bitbake/lib/bb/fork.py b/bitbake/lib/bb/fork.py
new file mode 100644
index 00000000000..5ac5aba1832
--- /dev/null
+++ b/bitbake/lib/bb/fork.py
@@ -0,0 +1,71 @@
+# ex:ts=4:sw=4:sts=4:et
+# -*- tab-width: 4; c-basic-offset: 4; indent-tabs-mode: nil -*-
+#
+# Copyright (C) 2018 Garmin Ltd.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License version 2 as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License along
+# with this program; if not, write to the Free Software Foundation, Inc.,
+# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+#
+"""
+Python wrappers for os.fork() that allow the insertion of callbacks for fork events.
+This is designed to exacmimic os.register_at_fork() available in Python 3.7 with the
+intent that it can be removed when that version becomes standard
+"""
+
+import sys
+import os
+
+before_calls = []
+after_in_parent_calls = []
+after_in_child_calls = []
+
+def _do_calls(l, reverse=False):
+    # Make a copy in case the list is modified in the callback
+    copy = l[:]
+    if reverse:
+        copy = reversed(copy)
+
+    for f in copy:
+        # All exception in calls are ignored
+        try:
+            f()
+        except:
+            pass
+
+def fork():
+    if sys.hexversion >= 0x030700F0:
+        return os.fork()
+
+    _do_calls(before_calls, reverse=True)
+
+    ret = os.fork()
+    if ret == 0:
+        _do_calls(after_in_child_calls)
+    else:
+        _do_calls(after_in_parent_calls)
+    return ret
+
+def register_at_fork(*, before=None, after_in_parent=None, after_in_child=None):
+    if sys.hexversion >= 0x030700F0:
+        os.register_at_fork(before=before, after_in_parent=after_in_parent, after_in_child=after_in_child)
+        return
+
+    if before is not None:
+        before_calls.append(before)
+
+    if after_in_parent is not None:
+        after_in_parent_calls.append(after_in_parent)
+
+    if after_in_child is not None:
+        after_in_child_calls.append(after_in_child)
+
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [RFC v2 02/16] bitbake: persist_data: Fix leaking cursors causing deadlock
  2018-08-09 22:08 ` [RFC v2 00/16] Hash Equivalency Server Joshua Watt
  2018-08-09 22:08   ` [RFC v2 01/16] bitbake: fork: Add os.fork() wrappers Joshua Watt
@ 2018-08-09 22:08   ` Joshua Watt
  2018-08-09 22:08   ` [RFC v2 03/16] bitbake: persist_data: Add key constraints Joshua Watt
                     ` (15 subsequent siblings)
  17 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-08-09 22:08 UTC (permalink / raw)
  To: bitbake-devel, openembedded-core

The original implementation of persistent data executed all SQL
statements via sqlite3.Connection.execute(). Behind the scenes, this
function created a sqlite3 Cursor object, executed the statement, then
returned the cursor. However, the implementation did not account for
this and failed to close the cursor object when it was done. The cursor
would eventually be closed when the garbage collector got around to
destroying it. However, sqlite has a limit on the number of cursors that
can exist at any given time, and once this limit is reached it will
block a query to wait for a cursor to be destroyed. Under heavy database
queries, this can result in Python deadlocking with itself, since the
SQL query will block waiting for a free cursor, but Python can no longer
run garbage collection (as it is blocked) to free one.

This restructures the SQLTable class to use two decorators to aid in
performing actions correctly. The first decorator (@retry) wraps a
member function in the retry logic that automatically restarts the
function in the event that the database is locked.

The second decorator (@transaction) wraps the function so that it occurs
in a database transaction, which will automatically COMMIT the changes
on success and ROLLBACK on failure. This function additionally creates
an explicit cursor, passes it to the wrapped function, and cleans it up
when the function is finished.

Note that it is still possible to leak cursors when iterating. This is
much less frequent, but can still be mitigated by wrapping the iteration
in a `with` statement:

 with db.iteritems() as it:
     for (k, v) in it:
         ...

As a side effect, since most statements are wrapped in a transaction,
setting the isolation_level when the connection is created is no longer
necessary.

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/lib/bb/persist_data.py | 188 +++++++++++++++++++++++----------
 1 file changed, 135 insertions(+), 53 deletions(-)

diff --git a/bitbake/lib/bb/persist_data.py b/bitbake/lib/bb/persist_data.py
index bef7018614d..1a6319f9498 100644
--- a/bitbake/lib/bb/persist_data.py
+++ b/bitbake/lib/bb/persist_data.py
@@ -29,6 +29,7 @@ import warnings
 from bb.compat import total_ordering
 from collections import Mapping
 import sqlite3
+import contextlib
 
 sqlversion = sqlite3.sqlite_version_info
 if sqlversion[0] < 3 or (sqlversion[0] == 3 and sqlversion[1] < 3):
@@ -45,75 +46,154 @@ if hasattr(sqlite3, 'enable_shared_cache'):
 
 @total_ordering
 class SQLTable(collections.MutableMapping):
+    class _Decorators(object):
+        @staticmethod
+        def retry(f):
+            """
+            Decorator that restarts a function if a database locked sqlite
+            exception occurs.
+            """
+            def wrap_func(self, *args, **kwargs):
+                count = 0
+                while True:
+                    try:
+                        return f(self, *args, **kwargs)
+                    except sqlite3.OperationalError as exc:
+                        if 'is locked' in str(exc) and count < 500:
+                            count = count + 1
+                            self.connection.close()
+                            self.connection = connect(self.cachefile)
+                            continue
+                        raise
+            return wrap_func
+
+        @staticmethod
+        def transaction(f):
+            """
+            Decorator that starts a database transaction and creates a database
+            cursor for performing queries. If no exception is thrown, the
+            database results are commited. If an exception occurs, the database
+            is rolled back. In all cases, the cursor is closed after the
+            function ends.
+
+            Note that the cursor is passed as an extra argument to the function
+            after `self` and before any of the normal arguments
+            """
+            def wrap_func(self, *args, **kwargs):
+                # Context manager will COMMIT the database on success,
+                # or ROLLBACK on an exception
+                with self.connection:
+                    # Automatically close the cursor when done
+                    with contextlib.closing(self.connection.cursor()) as cursor:
+                        return f(self, cursor, *args, **kwargs)
+            return wrap_func
+
     """Object representing a table/domain in the database"""
     def __init__(self, cachefile, table):
         self.cachefile = cachefile
         self.table = table
-        self.cursor = connect(self.cachefile)
-
-        self._execute("CREATE TABLE IF NOT EXISTS %s(key TEXT, value TEXT);"
-                      % table)
-
-    def _execute(self, *query):
-        """Execute a query, waiting to acquire a lock if necessary"""
-        count = 0
-        while True:
-            try:
-                return self.cursor.execute(*query)
-            except sqlite3.OperationalError as exc:
-                if 'database is locked' in str(exc) and count < 500:
-                    count = count + 1
+        self.connection = connect(self.cachefile)
+
+        self._execute_single("CREATE TABLE IF NOT EXISTS %s(key TEXT, value TEXT);" % table)
+
+    @_Decorators.retry
+    @_Decorators.transaction
+    def _execute_single(self, cursor, *query):
+        """
+        Executes a single query and discards the results. This correctly closes
+        the database cursor when finished
+        """
+        cursor.execute(*query)
+
+    @_Decorators.retry
+    def _row_iter(self, f, *query):
+        """
+        Helper function that returns a row iterator. Each time __next__ is
+        called on the iterator, the provided function is evaluated to determine
+        the return value
+        """
+        class CursorIter(object):
+            def __init__(self, cursor):
+                self.cursor = cursor
+
+            def __iter__(self):
+                return self
+
+            def __next__(self):
+                row = self.cursor.fetchone()
+                if row is None:
                     self.cursor.close()
-                    self.cursor = connect(self.cachefile)
-                    continue
-                raise
+                    raise StopIteration
+                return f(row)
+
+            def __enter__(self):
+                return self
+
+            def __exit__(self, typ, value, traceback):
+                self.cursor.close()
+                return False
+
+        cursor = self.connection.cursor()
+        try:
+            cursor.execute(*query)
+            return CursorIter(cursor)
+        except:
+            cursor.close()
 
     def __enter__(self):
-        self.cursor.__enter__()
+        self.connection.__enter__()
         return self
 
     def __exit__(self, *excinfo):
-        self.cursor.__exit__(*excinfo)
-
-    def __getitem__(self, key):
-        data = self._execute("SELECT * from %s where key=?;" %
-                             self.table, [key])
-        for row in data:
+        self.connection.__exit__(*excinfo)
+
+    @_Decorators.retry
+    @_Decorators.transaction
+    def __getitem__(self, cursor, key):
+        cursor.execute("SELECT * from %s where key=?;" % self.table, [key])
+        row = cursor.fetchone()
+        if row is not None:
             return row[1]
         raise KeyError(key)
 
-    def __delitem__(self, key):
+    @_Decorators.retry
+    @_Decorators.transaction
+    def __delitem__(self, cursor, key):
         if key not in self:
             raise KeyError(key)
-        self._execute("DELETE from %s where key=?;" % self.table, [key])
+        cursor.execute("DELETE from %s where key=?;" % self.table, [key])
 
-    def __setitem__(self, key, value):
+    @_Decorators.retry
+    @_Decorators.transaction
+    def __setitem__(self, cursor, key, value):
         if not isinstance(key, str):
             raise TypeError('Only string keys are supported')
         elif not isinstance(value, str):
             raise TypeError('Only string values are supported')
 
-        data = self._execute("SELECT * from %s where key=?;" %
-                                   self.table, [key])
-        exists = len(list(data))
-        if exists:
-            self._execute("UPDATE %s SET value=? WHERE key=?;" % self.table,
-                          [value, key])
+        cursor.execute("SELECT * from %s where key=?;" % self.table, [key])
+        row = cursor.fetchone()
+        if row is not None:
+            cursor.execute("UPDATE %s SET value=? WHERE key=?;" % self.table, [value, key])
         else:
-            self._execute("INSERT into %s(key, value) values (?, ?);" %
-                          self.table, [key, value])
-
-    def __contains__(self, key):
-        return key in set(self)
-
-    def __len__(self):
-        data = self._execute("SELECT COUNT(key) FROM %s;" % self.table)
-        for row in data:
+            cursor.execute("INSERT into %s(key, value) values (?, ?);" % self.table, [key, value])
+
+    @_Decorators.retry
+    @_Decorators.transaction
+    def __contains__(self, cursor, key):
+        cursor.execute('SELECT * from %s where key=?;' % self.table, [key])
+        return cursor.fetchone() is not None
+
+    @_Decorators.retry
+    @_Decorators.transaction
+    def __len__(self, cursor):
+        cursor.execute("SELECT COUNT(key) FROM %s;" % self.table)
+        row = cursor.fetchone()
+        if row is not None:
             return row[0]
 
     def __iter__(self):
-        data = self._execute("SELECT key FROM %s;" % self.table)
-        return (row[0] for row in data)
+        return self._row_iter(lambda row: row[0], "SELECT key from %s;" % self.table)
 
     def __lt__(self, other):
         if not isinstance(other, Mapping):
@@ -122,25 +202,27 @@ class SQLTable(collections.MutableMapping):
         return len(self) < len(other)
 
     def get_by_pattern(self, pattern):
-        data = self._execute("SELECT * FROM %s WHERE key LIKE ?;" %
-                             self.table, [pattern])
-        return [row[1] for row in data]
+        return self._row_iter(lambda row: row[1], "SELECT * FROM %s WHERE key LIKE ?;" %
+                              self.table, [pattern])
 
     def values(self):
         return list(self.itervalues())
 
     def itervalues(self):
-        data = self._execute("SELECT value FROM %s;" % self.table)
-        return (row[0] for row in data)
+        return self._row_iter(lambda row: row[0], "SELECT value FROM %s;" %
+                              self.table)
 
     def items(self):
         return list(self.iteritems())
 
     def iteritems(self):
-        return self._execute("SELECT * FROM %s;" % self.table)
+        return self._row_iter(lambda row: (row[0], row[1]), "SELECT * FROM %s;" %
+                              self.table)
 
-    def clear(self):
-        self._execute("DELETE FROM %s;" % self.table)
+    @_Decorators.retry
+    @_Decorators.transaction
+    def clear(self, cursor):
+        cursor.execute("DELETE FROM %s;" % self.table)
 
     def has_key(self, key):
         return key in self
@@ -195,7 +277,7 @@ class PersistData(object):
         del self.data[domain][key]
 
 def connect(database):
-    connection = sqlite3.connect(database, timeout=5, isolation_level=None)
+    connection = sqlite3.connect(database, timeout=5)
     connection.execute("pragma synchronous = off;")
     connection.text_factory = str
     return connection
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [RFC v2 03/16] bitbake: persist_data: Add key constraints
  2018-08-09 22:08 ` [RFC v2 00/16] Hash Equivalency Server Joshua Watt
  2018-08-09 22:08   ` [RFC v2 01/16] bitbake: fork: Add os.fork() wrappers Joshua Watt
  2018-08-09 22:08   ` [RFC v2 02/16] bitbake: persist_data: Fix leaking cursors causing deadlock Joshua Watt
@ 2018-08-09 22:08   ` Joshua Watt
  2018-08-09 22:08   ` [RFC v2 04/16] bitbake: persist_data: Enable Write Ahead Log Joshua Watt
                     ` (14 subsequent siblings)
  17 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-08-09 22:08 UTC (permalink / raw)
  To: bitbake-devel, openembedded-core

Constructs the "key" column in the persistent database as a non-NULL
primary key. This significantly speeds up lookup operations in large
databases.

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/lib/bb/persist_data.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/bitbake/lib/bb/persist_data.py b/bitbake/lib/bb/persist_data.py
index 1a6319f9498..2bc3e766a93 100644
--- a/bitbake/lib/bb/persist_data.py
+++ b/bitbake/lib/bb/persist_data.py
@@ -94,7 +94,7 @@ class SQLTable(collections.MutableMapping):
         self.table = table
         self.connection = connect(self.cachefile)
 
-        self._execute_single("CREATE TABLE IF NOT EXISTS %s(key TEXT, value TEXT);" % table)
+        self._execute_single("CREATE TABLE IF NOT EXISTS %s(key TEXT PRIMARY KEY NOT NULL, value TEXT);" % table)
 
     @_Decorators.retry
     @_Decorators.transaction
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [RFC v2 04/16] bitbake: persist_data: Enable Write Ahead Log
  2018-08-09 22:08 ` [RFC v2 00/16] Hash Equivalency Server Joshua Watt
                     ` (2 preceding siblings ...)
  2018-08-09 22:08   ` [RFC v2 03/16] bitbake: persist_data: Add key constraints Joshua Watt
@ 2018-08-09 22:08   ` Joshua Watt
  2018-08-09 22:08   ` [RFC v2 05/16] bitbake: persist_data: Disable enable_shared_cache Joshua Watt
                     ` (13 subsequent siblings)
  17 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-08-09 22:08 UTC (permalink / raw)
  To: bitbake-devel, openembedded-core

Enabling the write ahead log improves database reliability, speeds up
writes (since they mostly happen sequentially), and speeds up readers
(since they are no longer blocked by most write operations). The
persistent database is very read heavy, so the auto-checkpoint size is
reduced from the default (usually 1000) to 100 so that reads remain
fast.

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/lib/bb/persist_data.py | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/bitbake/lib/bb/persist_data.py b/bitbake/lib/bb/persist_data.py
index 2bc3e766a93..9a4e7dd5941 100644
--- a/bitbake/lib/bb/persist_data.py
+++ b/bitbake/lib/bb/persist_data.py
@@ -278,7 +278,12 @@ class PersistData(object):
 
 def connect(database):
     connection = sqlite3.connect(database, timeout=5)
-    connection.execute("pragma synchronous = off;")
+    connection.execute("pragma synchronous = normal;")
+    # Enable WAL and keep the autocheckpoint length small (the default is
+    # usually 1000). Persistent caches are usually read-mostly, so keeping
+    # this short will keep readers running quickly
+    connection.execute("pragma journal_mode = WAL;")
+    connection.execute("pragma wal_autocheckpoint = 100;")
     connection.text_factory = str
     return connection
 
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [RFC v2 05/16] bitbake: persist_data: Disable enable_shared_cache
  2018-08-09 22:08 ` [RFC v2 00/16] Hash Equivalency Server Joshua Watt
                     ` (3 preceding siblings ...)
  2018-08-09 22:08   ` [RFC v2 04/16] bitbake: persist_data: Enable Write Ahead Log Joshua Watt
@ 2018-08-09 22:08   ` Joshua Watt
  2018-08-09 22:08   ` [RFC v2 06/16] bitbake: persist_data: Close databases across fork Joshua Watt
                     ` (12 subsequent siblings)
  17 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-08-09 22:08 UTC (permalink / raw)
  To: bitbake-devel, openembedded-core

Turns off the shared cache. It isn't a significant factor in performance
(now that WAL is enabled), and is a really bad idea to have enabled in
processes that fork() (as bitbake it prone to do).

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/lib/bb/persist_data.py | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/bitbake/lib/bb/persist_data.py b/bitbake/lib/bb/persist_data.py
index 9a4e7dd5941..f0d3ce665d9 100644
--- a/bitbake/lib/bb/persist_data.py
+++ b/bitbake/lib/bb/persist_data.py
@@ -37,12 +37,6 @@ if sqlversion[0] < 3 or (sqlversion[0] == 3 and sqlversion[1] < 3):
 
 
 logger = logging.getLogger("BitBake.PersistData")
-if hasattr(sqlite3, 'enable_shared_cache'):
-    try:
-        sqlite3.enable_shared_cache(True)
-    except sqlite3.OperationalError:
-        pass
-
 
 @total_ordering
 class SQLTable(collections.MutableMapping):
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [RFC v2 06/16] bitbake: persist_data: Close databases across fork
  2018-08-09 22:08 ` [RFC v2 00/16] Hash Equivalency Server Joshua Watt
                     ` (4 preceding siblings ...)
  2018-08-09 22:08   ` [RFC v2 05/16] bitbake: persist_data: Disable enable_shared_cache Joshua Watt
@ 2018-08-09 22:08   ` Joshua Watt
  2018-08-09 22:08   ` [RFC v2 07/16] bitbake: tests/persist_data: Add tests Joshua Watt
                     ` (11 subsequent siblings)
  17 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-08-09 22:08 UTC (permalink / raw)
  To: bitbake-devel, openembedded-core

sqlite gets really angry if a database is closed across a fork() call,
and will give all sorts of messages ranging from I/O errors to database
corruption errors. To deal with this, close all database connections
before forking, and reopen them (lazily) on the other side.

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/lib/bb/persist_data.py | 44 +++++++++++++++++++++++++++++++---
 1 file changed, 41 insertions(+), 3 deletions(-)

diff --git a/bitbake/lib/bb/persist_data.py b/bitbake/lib/bb/persist_data.py
index f0d3ce665d9..6bd3924ffb3 100644
--- a/bitbake/lib/bb/persist_data.py
+++ b/bitbake/lib/bb/persist_data.py
@@ -30,6 +30,8 @@ from bb.compat import total_ordering
 from collections import Mapping
 import sqlite3
 import contextlib
+import bb.fork
+import weakref
 
 sqlversion = sqlite3.sqlite_version_info
 if sqlversion[0] < 3 or (sqlversion[0] == 3 and sqlversion[1] < 3):
@@ -38,6 +40,28 @@ if sqlversion[0] < 3 or (sqlversion[0] == 3 and sqlversion[1] < 3):
 
 logger = logging.getLogger("BitBake.PersistData")
 
+# Carrying an open database connection across a fork() confuses sqlite and
+# results in fun errors like 'database disk image is malformed'.
+# To remedy this, close all connections before forking, then they will be
+# (lazily) reopen them on the other side. This will cause a lot of problems if
+# there are threads running and trying to access the database at the same time,
+# but if you are mixing threads and fork() you have no one to blame but
+# yourself. If that is discovered to be a problem in the future, some sort of
+# per-table reader-writer lock could be used to block the fork() until all
+# pending transactions complete
+sql_table_weakrefs = []
+def _fork_before_handler():
+    for ref in sql_table_weakrefs:
+        t = ref()
+        if t is not None and t.connection is not None:
+            t.connection.close()
+            t.connection = None
+
+bb.fork.register_at_fork(before=_fork_before_handler)
+
+def _remove_table_weakref(ref):
+    sql_table_weakrefs.remove(ref)
+
 @total_ordering
 class SQLTable(collections.MutableMapping):
     class _Decorators(object):
@@ -48,6 +72,10 @@ class SQLTable(collections.MutableMapping):
             exception occurs.
             """
             def wrap_func(self, *args, **kwargs):
+                # Reconnect if necessary
+                if self.connection is None:
+                    self.reconnect()
+
                 count = 0
                 while True:
                     try:
@@ -55,8 +83,7 @@ class SQLTable(collections.MutableMapping):
                     except sqlite3.OperationalError as exc:
                         if 'is locked' in str(exc) and count < 500:
                             count = count + 1
-                            self.connection.close()
-                            self.connection = connect(self.cachefile)
+                            self.reconnect()
                             continue
                         raise
             return wrap_func
@@ -90,6 +117,11 @@ class SQLTable(collections.MutableMapping):
 
         self._execute_single("CREATE TABLE IF NOT EXISTS %s(key TEXT PRIMARY KEY NOT NULL, value TEXT);" % table)
 
+    def reconnect(self):
+        if self.connection is not None:
+            self.connection.close()
+        self.connection = connect(self.cachefile)
+
     @_Decorators.retry
     @_Decorators.transaction
     def _execute_single(self, cursor, *query):
@@ -292,4 +324,10 @@ def persist(domain, d):
 
     bb.utils.mkdirhier(cachedir)
     cachefile = os.path.join(cachedir, "bb_persist_data.sqlite3")
-    return SQLTable(cachefile, domain)
+    t = SQLTable(cachefile, domain)
+
+    # Add a weak reference to the table list. The weak reference will not keep
+    # the object alive by itself, so it prevents circular reference counts
+    sql_table_weakrefs.append(weakref.ref(t, _remove_table_weakref))
+
+    return t
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [RFC v2 07/16] bitbake: tests/persist_data: Add tests
  2018-08-09 22:08 ` [RFC v2 00/16] Hash Equivalency Server Joshua Watt
                     ` (5 preceding siblings ...)
  2018-08-09 22:08   ` [RFC v2 06/16] bitbake: persist_data: Close databases across fork Joshua Watt
@ 2018-08-09 22:08   ` Joshua Watt
  2018-08-09 22:08   ` [RFC v2 08/16] bitbake: bitbake-worker: Pass taskhash as runtask parameter Joshua Watt
                     ` (10 subsequent siblings)
  17 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-08-09 22:08 UTC (permalink / raw)
  To: bitbake-devel, openembedded-core

Adds a test suite for testing the persistent data cache

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/bin/bitbake-selftest         |   1 +
 bitbake/lib/bb/tests/persist_data.py | 188 +++++++++++++++++++++++++++
 2 files changed, 189 insertions(+)
 create mode 100644 bitbake/lib/bb/tests/persist_data.py

diff --git a/bitbake/bin/bitbake-selftest b/bitbake/bin/bitbake-selftest
index 7564de304c8..06a1c9a78dd 100755
--- a/bitbake/bin/bitbake-selftest
+++ b/bitbake/bin/bitbake-selftest
@@ -32,6 +32,7 @@ tests = ["bb.tests.codeparser",
          "bb.tests.event",
          "bb.tests.fetch",
          "bb.tests.parse",
+         "bb.tests.persist_data",
          "bb.tests.utils",
          "layerindexlib.tests.layerindexobj",
          "layerindexlib.tests.restapi",
diff --git a/bitbake/lib/bb/tests/persist_data.py b/bitbake/lib/bb/tests/persist_data.py
new file mode 100644
index 00000000000..055f1d9ce47
--- /dev/null
+++ b/bitbake/lib/bb/tests/persist_data.py
@@ -0,0 +1,188 @@
+# ex:ts=4:sw=4:sts=4:et
+# -*- tab-width: 4; c-basic-offset: 4; indent-tabs-mode: nil -*-
+#
+# BitBake Test for lib/bb/persist_data/
+#
+# Copyright (C) 2018 Garmin Ltd.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License version 2 as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License along
+# with this program; if not, write to the Free Software Foundation, Inc.,
+# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+#
+
+import unittest
+import bb.data
+import bb.persist_data
+import bb.fork
+import tempfile
+import threading
+
+class PersistDataTest(unittest.TestCase):
+    def _create_data(self):
+        return bb.persist_data.persist('TEST_PERSIST_DATA', self.d)
+
+    def setUp(self):
+        self.d = bb.data.init()
+        self.tempdir = tempfile.TemporaryDirectory()
+        self.d['PERSISTENT_DIR'] = self.tempdir.name
+        self.data = self._create_data()
+        self.items = {
+                'A1': '1',
+                'B1': '2',
+                'C2': '3'
+                }
+        self.stress_count = 10000
+        self.thread_count = 5
+
+        for k,v in self.items.items():
+            self.data[k] = v
+
+    def tearDown(self):
+        self.tempdir.cleanup()
+
+    def _iter_helper(self, seen, iterator):
+        with iter(iterator):
+            for v in iterator:
+                self.assertTrue(v in seen)
+                seen.remove(v)
+        self.assertEqual(len(seen), 0, '%s not seen' % seen)
+
+    def test_get(self):
+        for k, v in self.items.items():
+            self.assertEqual(self.data[k], v)
+
+        self.assertIsNone(self.data.get('D'))
+        with self.assertRaises(KeyError):
+            self.data['D']
+
+    def test_set(self):
+        for k, v in self.items.items():
+            self.data[k] += '-foo'
+
+        for k, v in self.items.items():
+            self.assertEqual(self.data[k], v + '-foo')
+
+    def test_delete(self):
+        self.data['D'] = '4'
+        self.assertEqual(self.data['D'], '4')
+        del self.data['D']
+        self.assertIsNone(self.data.get('D'))
+        with self.assertRaises(KeyError):
+            self.data['D']
+
+    def test_contains(self):
+        for k in self.items:
+            self.assertTrue(k in self.data)
+            self.assertTrue(self.data.has_key(k))
+        self.assertFalse('NotFound' in self.data)
+        self.assertFalse(self.data.has_key('NotFound'))
+
+    def test_len(self):
+        self.assertEqual(len(self.data), len(self.items))
+
+    def test_iter(self):
+        self._iter_helper(set(self.items.keys()), self.data)
+
+    def test_itervalues(self):
+        self._iter_helper(set(self.items.values()), self.data.itervalues())
+
+    def test_iteritems(self):
+        self._iter_helper(set(self.items.items()), self.data.iteritems())
+
+    def test_get_by_pattern(self):
+        self._iter_helper({'1', '2'}, self.data.get_by_pattern('_1'))
+
+    def _stress_read(self, data):
+        for i in range(self.stress_count):
+            for k in self.items:
+                data[k]
+
+    def _stress_write(self, data):
+        for i in range(self.stress_count):
+            for k, v in self.items.items():
+                data[k] = v + str(i)
+
+    def _validate_stress(self):
+        for k, v in self.items.items():
+            self.assertEqual(self.data[k], v + str(self.stress_count - 1))
+
+    def test_stress(self):
+        self._stress_read(self.data)
+        self._stress_write(self.data)
+        self._validate_stress()
+
+    def test_stress_threads(self):
+        def read_thread():
+            data = self._create_data()
+            self._stress_read(data)
+
+        def write_thread():
+            data = self._create_data()
+            self._stress_write(data)
+
+        threads = []
+        for i in range(self.thread_count):
+            threads.append(threading.Thread(target=read_thread))
+            threads.append(threading.Thread(target=write_thread))
+
+        for t in threads:
+            t.start()
+        self._stress_read(self.data)
+        for t in threads:
+            t.join()
+        self._validate_stress()
+
+    def test_stress_fork(self):
+        children = []
+        for i in range(self.thread_count):
+            # Create a writer
+            pid = bb.fork.fork()
+            if pid == 0:
+                try:
+                    self._stress_write(self.data)
+                except:
+                    os._exit(1)
+                else:
+                    os._exit(0)
+            else:
+                children.append(pid)
+
+            # Create a reader
+            pid = bb.fork.fork()
+            if pid == 0:
+                try:
+                    self._stress_read(self.data)
+                except:
+                    os._exit(1)
+                else:
+                    os._exit(0)
+            else:
+                children.append(pid)
+
+        self._stress_read(self.data)
+
+        for pid in children:
+            while True:
+                try:
+                    (_, status) = os.waitpid(pid, 0)
+                    break
+                # Python < 3.5 will raise this if waitpid() is interrupted
+                except InterruptedError:
+                    pass
+                except:
+                    raise
+
+            self.assertTrue(os.WIFEXITED(status), "PID %d did not exit normally" % pid)
+            self.assertEqual(os.WEXITSTATUS(status), 0, "PID %d exited with code %d" % (pid, os.WEXITSTATUS(status)))
+
+        self._validate_stress()
+
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [RFC v2 08/16] bitbake: bitbake-worker: Pass taskhash as runtask parameter
  2018-08-09 22:08 ` [RFC v2 00/16] Hash Equivalency Server Joshua Watt
                     ` (6 preceding siblings ...)
  2018-08-09 22:08   ` [RFC v2 07/16] bitbake: tests/persist_data: Add tests Joshua Watt
@ 2018-08-09 22:08   ` Joshua Watt
  2018-08-09 22:08   ` [RFC v2 09/16] bitbake: siggen: Split out stampfile hash fetch Joshua Watt
                     ` (9 subsequent siblings)
  17 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-08-09 22:08 UTC (permalink / raw)
  To: bitbake-devel, openembedded-core

Pass the task hash as a parameter to the 'runtask' message instead of
passing the entire dictionary of hashes when the worker is setup. This
is possible less efficient, but prevents the worker taskhashes from
being out of sync with the runqueue in the event that the taskhashes in
the runqueue change.

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/bin/bitbake-worker |  8 ++++----
 bitbake/lib/bb/runqueue.py | 15 ++++++---------
 2 files changed, 10 insertions(+), 13 deletions(-)

diff --git a/bitbake/bin/bitbake-worker b/bitbake/bin/bitbake-worker
index baa1a84e6dd..41ef6d848ac 100755
--- a/bitbake/bin/bitbake-worker
+++ b/bitbake/bin/bitbake-worker
@@ -136,7 +136,7 @@ def sigterm_handler(signum, frame):
     os.killpg(0, signal.SIGTERM)
     sys.exit()
 
-def fork_off_task(cfg, data, databuilder, workerdata, fn, task, taskname, appends, taskdepdata, extraconfigdata, quieterrors=False, dry_run_exec=False):
+def fork_off_task(cfg, data, databuilder, workerdata, fn, task, taskname, taskhash, appends, taskdepdata, extraconfigdata, quieterrors=False, dry_run_exec=False):
     # We need to setup the environment BEFORE the fork, since
     # a fork() or exec*() activates PSEUDO...
 
@@ -234,7 +234,7 @@ def fork_off_task(cfg, data, databuilder, workerdata, fn, task, taskname, append
                 ret = 0
 
                 the_data = bb_cache.loadDataFull(fn, appends)
-                the_data.setVar('BB_TASKHASH', workerdata["runq_hash"][task])
+                the_data.setVar('BB_TASKHASH', taskhash)
 
                 bb.utils.set_process_name("%s:%s" % (the_data.getVar("PN"), taskname.replace("do_", "")))
 
@@ -425,10 +425,10 @@ class BitbakeWorker(object):
         sys.exit(0)
 
     def handle_runtask(self, data):
-        fn, task, taskname, quieterrors, appends, taskdepdata, dry_run_exec = pickle.loads(data)
+        fn, task, taskname, taskhash, quieterrors, appends, taskdepdata, dry_run_exec = pickle.loads(data)
         workerlog_write("Handling runtask %s %s %s\n" % (task, fn, taskname))
 
-        pid, pipein, pipeout = fork_off_task(self.cookercfg, self.data, self.databuilder, self.workerdata, fn, task, taskname, appends, taskdepdata, self.extraconfigdata, quieterrors, dry_run_exec)
+        pid, pipein, pipeout = fork_off_task(self.cookercfg, self.data, self.databuilder, self.workerdata, fn, task, taskname, taskhash, appends, taskdepdata, self.extraconfigdata, quieterrors, dry_run_exec)
 
         self.build_pids[pid] = task
         self.build_pipes[pid] = runQueueWorkerPipe(pipein, pipeout)
diff --git a/bitbake/lib/bb/runqueue.py b/bitbake/lib/bb/runqueue.py
index 400709c1601..b173cc0a951 100644
--- a/bitbake/lib/bb/runqueue.py
+++ b/bitbake/lib/bb/runqueue.py
@@ -1224,17 +1224,12 @@ class RunQueue:
         bb.utils.nonblockingfd(worker.stdout)
         workerpipe = runQueuePipe(worker.stdout, None, self.cfgData, self, rqexec)
 
-        runqhash = {}
-        for tid in self.rqdata.runtaskentries:
-            runqhash[tid] = self.rqdata.runtaskentries[tid].hash
-
         workerdata = {
             "taskdeps" : self.rqdata.dataCaches[mc].task_deps,
             "fakerootenv" : self.rqdata.dataCaches[mc].fakerootenv,
             "fakerootdirs" : self.rqdata.dataCaches[mc].fakerootdirs,
             "fakerootnoenv" : self.rqdata.dataCaches[mc].fakerootnoenv,
             "sigdata" : bb.parse.siggen.get_taskdata(),
-            "runq_hash" : runqhash,
             "logdefaultdebug" : bb.msg.loggerDefaultDebugLevel,
             "logdefaultverbose" : bb.msg.loggerDefaultVerbose,
             "logdefaultverboselogs" : bb.msg.loggerVerboseLogs,
@@ -2025,6 +2020,7 @@ class RunQueueExecuteTasks(RunQueueExecute):
             taskdepdata = self.build_taskdepdata(task)
 
             taskdep = self.rqdata.dataCaches[mc].task_deps[taskfn]
+            taskhash = self.rqdata.get_task_hash(task)
             if 'fakeroot' in taskdep and taskname in taskdep['fakeroot'] and not (self.cooker.configuration.dry_run or self.rqdata.setscene_enforce):
                 if not mc in self.rq.fakeworker:
                     try:
@@ -2034,10 +2030,10 @@ class RunQueueExecuteTasks(RunQueueExecute):
                         self.rq.state = runQueueFailed
                         self.stats.taskFailed()
                         return True
-                self.rq.fakeworker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, False, self.cooker.collection.get_file_appends(taskfn), taskdepdata, self.rqdata.setscene_enforce)) + b"</runtask>")
+                self.rq.fakeworker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, False, self.cooker.collection.get_file_appends(taskfn), taskdepdata, self.rqdata.setscene_enforce)) + b"</runtask>")
                 self.rq.fakeworker[mc].process.stdin.flush()
             else:
-                self.rq.worker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, False, self.cooker.collection.get_file_appends(taskfn), taskdepdata, self.rqdata.setscene_enforce)) + b"</runtask>")
+                self.rq.worker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, False, self.cooker.collection.get_file_appends(taskfn), taskdepdata, self.rqdata.setscene_enforce)) + b"</runtask>")
                 self.rq.worker[mc].process.stdin.flush()
 
             self.build_stamps[task] = bb.build.stampfile(taskname, self.rqdata.dataCaches[mc], taskfn, noextra=True)
@@ -2450,13 +2446,14 @@ class RunQueueExecuteScenequeue(RunQueueExecute):
             taskdepdata = self.build_taskdepdata(task)
 
             taskdep = self.rqdata.dataCaches[mc].task_deps[taskfn]
+            taskhash = self.rqdata.get_task_hash(task)
             if 'fakeroot' in taskdep and taskname in taskdep['fakeroot'] and not self.cooker.configuration.dry_run:
                 if not mc in self.rq.fakeworker:
                     self.rq.start_fakeworker(self, mc)
-                self.rq.fakeworker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, True, self.cooker.collection.get_file_appends(taskfn), taskdepdata, False)) + b"</runtask>")
+                self.rq.fakeworker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, True, self.cooker.collection.get_file_appends(taskfn), taskdepdata, False)) + b"</runtask>")
                 self.rq.fakeworker[mc].process.stdin.flush()
             else:
-                self.rq.worker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, True, self.cooker.collection.get_file_appends(taskfn), taskdepdata, False)) + b"</runtask>")
+                self.rq.worker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, True, self.cooker.collection.get_file_appends(taskfn), taskdepdata, False)) + b"</runtask>")
                 self.rq.worker[mc].process.stdin.flush()
 
             self.build_stamps[task] = bb.build.stampfile(taskname, self.rqdata.dataCaches[mc], taskfn, noextra=True)
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [RFC v2 09/16] bitbake: siggen: Split out stampfile hash fetch
  2018-08-09 22:08 ` [RFC v2 00/16] Hash Equivalency Server Joshua Watt
                     ` (7 preceding siblings ...)
  2018-08-09 22:08   ` [RFC v2 08/16] bitbake: bitbake-worker: Pass taskhash as runtask parameter Joshua Watt
@ 2018-08-09 22:08   ` Joshua Watt
  2018-08-09 22:08   ` [RFC v2 10/16] bitbake: siggen: Split out task depend ID Joshua Watt
                     ` (8 subsequent siblings)
  17 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-08-09 22:08 UTC (permalink / raw)
  To: bitbake-devel, openembedded-core

The mechanism used to get the hash for a stamp file is split out so that
it can be overridden by derived classes

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/lib/bb/siggen.py | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/bitbake/lib/bb/siggen.py b/bitbake/lib/bb/siggen.py
index e9bb51d736f..7fa5b35337d 100644
--- a/bitbake/lib/bb/siggen.py
+++ b/bitbake/lib/bb/siggen.py
@@ -340,6 +340,13 @@ class SignatureGeneratorBasic(SignatureGenerator):
 class SignatureGeneratorBasicHash(SignatureGeneratorBasic):
     name = "basichash"
 
+    def get_stampfile_hash(self, task):
+        if task in self.taskhash:
+            return self.taskhash[task]
+
+        # If task is not in basehash, then error
+        return self.basehash[task]
+
     def stampfile(self, stampbase, fn, taskname, extrainfo, clean=False):
         if taskname != "do_setscene" and taskname.endswith("_setscene"):
             k = fn + "." + taskname[:-9]
@@ -347,11 +354,9 @@ class SignatureGeneratorBasicHash(SignatureGeneratorBasic):
             k = fn + "." + taskname
         if clean:
             h = "*"
-        elif k in self.taskhash:
-            h = self.taskhash[k]
         else:
-            # If k is not in basehash, then error
-            h = self.basehash[k]
+            h = self.get_stampfile_hash(k)
+
         return ("%s.%s.%s.%s" % (stampbase, taskname, h, extrainfo)).rstrip('.')
 
     def stampcleanmask(self, stampbase, fn, taskname, extrainfo):
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [RFC v2 10/16] bitbake: siggen: Split out task depend ID
  2018-08-09 22:08 ` [RFC v2 00/16] Hash Equivalency Server Joshua Watt
                     ` (8 preceding siblings ...)
  2018-08-09 22:08   ` [RFC v2 09/16] bitbake: siggen: Split out stampfile hash fetch Joshua Watt
@ 2018-08-09 22:08   ` Joshua Watt
  2018-08-09 22:08   ` [RFC v2 11/16] bitbake: runqueue: Track task dependency ID Joshua Watt
                     ` (7 subsequent siblings)
  17 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-08-09 22:08 UTC (permalink / raw)
  To: bitbake-devel, openembedded-core

Abstracts the function to get the dependency ID for a task so it can
return something other that the taskhash

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/lib/bb/siggen.py | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/bitbake/lib/bb/siggen.py b/bitbake/lib/bb/siggen.py
index 7fa5b35337d..4ef175edbbf 100644
--- a/bitbake/lib/bb/siggen.py
+++ b/bitbake/lib/bb/siggen.py
@@ -41,6 +41,9 @@ class SignatureGenerator(object):
     def finalise(self, fn, d, varient):
         return
 
+    def get_depid(self, task):
+        return self.taskhash[task]
+
     def get_taskhash(self, fn, task, deps, dataCache):
         return "0"
 
@@ -215,7 +218,7 @@ class SignatureGeneratorBasic(SignatureGenerator):
                 continue
             if dep not in self.taskhash:
                 bb.fatal("%s is not in taskhash, caller isn't calling in dependency order?", dep)
-            data = data + self.taskhash[dep]
+            data = data + self.get_depid(dep)
             self.runtaskdeps[k].append(dep)
 
         if task in dataCache.file_checksums[fn]:
@@ -290,7 +293,7 @@ class SignatureGeneratorBasic(SignatureGenerator):
             data['file_checksum_values'] = [(os.path.basename(f), cs) for f,cs in self.file_checksum_values[k]]
             data['runtaskhashes'] = {}
             for dep in data['runtaskdeps']:
-                data['runtaskhashes'][dep] = self.taskhash[dep]
+                data['runtaskhashes'][dep] = self.get_depid(dep)
             data['taskhash'] = self.taskhash[k]
 
         taint = self.read_taint(fn, task, referencestamp)
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [RFC v2 11/16] bitbake: runqueue: Track task dependency ID
  2018-08-09 22:08 ` [RFC v2 00/16] Hash Equivalency Server Joshua Watt
                     ` (9 preceding siblings ...)
  2018-08-09 22:08   ` [RFC v2 10/16] bitbake: siggen: Split out task depend ID Joshua Watt
@ 2018-08-09 22:08   ` Joshua Watt
  2018-08-09 22:08   ` [RFC v2 12/16] bitbake: runqueue: Pass dependency ID to task Joshua Watt
                     ` (6 subsequent siblings)
  17 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-08-09 22:08 UTC (permalink / raw)
  To: bitbake-devel, openembedded-core

Requests the task dependency ID from siggen and tracks it

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/lib/bb/runqueue.py | 25 +++++++++++++++++--------
 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/bitbake/lib/bb/runqueue.py b/bitbake/lib/bb/runqueue.py
index b173cc0a951..3ae12d8e69f 100644
--- a/bitbake/lib/bb/runqueue.py
+++ b/bitbake/lib/bb/runqueue.py
@@ -346,6 +346,7 @@ class RunTaskEntry(object):
         self.depends = set()
         self.revdeps = set()
         self.hash = None
+        self.depid = None
         self.task = None
         self.weight = 1
 
@@ -385,6 +386,9 @@ class RunQueueData:
     def get_task_hash(self, tid):
         return self.runtaskentries[tid].hash
 
+    def get_task_depid(self, tid):
+        return self.runtaskentries[tid].depid
+
     def get_user_idstring(self, tid, task_name_suffix = ""):
         return tid + task_name_suffix
 
@@ -1150,18 +1154,21 @@ class RunQueueData:
                 if len(self.runtaskentries[tid].depends - dealtwith) == 0:
                     dealtwith.add(tid)
                     todeal.remove(tid)
-                    procdep = []
-                    for dep in self.runtaskentries[tid].depends:
-                        procdep.append(fn_from_tid(dep) + "." + taskname_from_tid(dep))
-                    (mc, fn, taskname, taskfn) = split_tid_mcfn(tid)
-                    self.runtaskentries[tid].hash = bb.parse.siggen.get_taskhash(taskfn, taskname, procdep, self.dataCaches[mc])
-                    task = self.runtaskentries[tid].task
+                    self.prepare_task_hash(tid)
 
         bb.parse.siggen.writeout_file_checksum_cache()
 
         #self.dump_data()
         return len(self.runtaskentries)
 
+    def prepare_task_hash(self, tid):
+        procdep = []
+        for dep in self.runtaskentries[tid].depends:
+            procdep.append(fn_from_tid(dep) + "." + taskname_from_tid(dep))
+        (mc, fn, taskname, taskfn) = split_tid_mcfn(tid)
+        self.runtaskentries[tid].hash = bb.parse.siggen.get_taskhash(taskfn, taskname, procdep, self.dataCaches[mc])
+        self.runtaskentries[tid].depid = bb.parse.siggen.get_depid(fn + "." + taskname)
+
     def dump_data(self):
         """
         Dump some debug information on the internal data structures
@@ -2075,7 +2082,8 @@ class RunQueueExecuteTasks(RunQueueExecute):
                 deps = self.rqdata.runtaskentries[revdep].depends
                 provides = self.rqdata.dataCaches[mc].fn_provides[taskfn]
                 taskhash = self.rqdata.runtaskentries[revdep].hash
-                taskdepdata[revdep] = [pn, taskname, fn, deps, provides, taskhash]
+                depid = self.rqdata.runtaskentries[revdep].depid
+                taskdepdata[revdep] = [pn, taskname, fn, deps, provides, taskhash, depid]
                 for revdep2 in deps:
                     if revdep2 not in taskdepdata:
                         additional.append(revdep2)
@@ -2517,7 +2525,8 @@ class RunQueueExecuteScenequeue(RunQueueExecute):
                 deps = getsetscenedeps(revdep)
                 provides = self.rqdata.dataCaches[mc].fn_provides[taskfn]
                 taskhash = self.rqdata.runtaskentries[revdep].hash
-                taskdepdata[revdep] = [pn, taskname, fn, deps, provides, taskhash]
+                depid = self.rqdata.runtaskentries[revdep].depid
+                taskdepdata[revdep] = [pn, taskname, fn, deps, provides, taskhash, depid]
                 for revdep2 in deps:
                     if revdep2 not in taskdepdata:
                         additional.append(revdep2)
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [RFC v2 12/16] bitbake: runqueue: Pass dependency ID to task
  2018-08-09 22:08 ` [RFC v2 00/16] Hash Equivalency Server Joshua Watt
                     ` (10 preceding siblings ...)
  2018-08-09 22:08   ` [RFC v2 11/16] bitbake: runqueue: Track task dependency ID Joshua Watt
@ 2018-08-09 22:08   ` Joshua Watt
  2018-08-09 22:08   ` [RFC v2 13/16] bitbake: runqueue: Pass dependency ID to hash validate Joshua Watt
                     ` (5 subsequent siblings)
  17 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-08-09 22:08 UTC (permalink / raw)
  To: bitbake-devel, openembedded-core

The dependency ID is now passed to the task in the BB_DEPID variable

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/bin/bitbake-worker |  7 ++++---
 bitbake/lib/bb/runqueue.py | 10 ++++++----
 2 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/bitbake/bin/bitbake-worker b/bitbake/bin/bitbake-worker
index 41ef6d848ac..9650c954359 100755
--- a/bitbake/bin/bitbake-worker
+++ b/bitbake/bin/bitbake-worker
@@ -136,7 +136,7 @@ def sigterm_handler(signum, frame):
     os.killpg(0, signal.SIGTERM)
     sys.exit()
 
-def fork_off_task(cfg, data, databuilder, workerdata, fn, task, taskname, taskhash, appends, taskdepdata, extraconfigdata, quieterrors=False, dry_run_exec=False):
+def fork_off_task(cfg, data, databuilder, workerdata, fn, task, taskname, taskhash, depid, appends, taskdepdata, extraconfigdata, quieterrors=False, dry_run_exec=False):
     # We need to setup the environment BEFORE the fork, since
     # a fork() or exec*() activates PSEUDO...
 
@@ -235,6 +235,7 @@ def fork_off_task(cfg, data, databuilder, workerdata, fn, task, taskname, taskha
 
                 the_data = bb_cache.loadDataFull(fn, appends)
                 the_data.setVar('BB_TASKHASH', taskhash)
+                the_data.setVar('BB_DEPID', depid)
 
                 bb.utils.set_process_name("%s:%s" % (the_data.getVar("PN"), taskname.replace("do_", "")))
 
@@ -425,10 +426,10 @@ class BitbakeWorker(object):
         sys.exit(0)
 
     def handle_runtask(self, data):
-        fn, task, taskname, taskhash, quieterrors, appends, taskdepdata, dry_run_exec = pickle.loads(data)
+        fn, task, taskname, taskhash, depid, quieterrors, appends, taskdepdata, dry_run_exec = pickle.loads(data)
         workerlog_write("Handling runtask %s %s %s\n" % (task, fn, taskname))
 
-        pid, pipein, pipeout = fork_off_task(self.cookercfg, self.data, self.databuilder, self.workerdata, fn, task, taskname, taskhash, appends, taskdepdata, self.extraconfigdata, quieterrors, dry_run_exec)
+        pid, pipein, pipeout = fork_off_task(self.cookercfg, self.data, self.databuilder, self.workerdata, fn, task, taskname, taskhash, depid, appends, taskdepdata, self.extraconfigdata, quieterrors, dry_run_exec)
 
         self.build_pids[pid] = task
         self.build_pipes[pid] = runQueueWorkerPipe(pipein, pipeout)
diff --git a/bitbake/lib/bb/runqueue.py b/bitbake/lib/bb/runqueue.py
index 3ae12d8e69f..909fa3aec22 100644
--- a/bitbake/lib/bb/runqueue.py
+++ b/bitbake/lib/bb/runqueue.py
@@ -2028,6 +2028,7 @@ class RunQueueExecuteTasks(RunQueueExecute):
 
             taskdep = self.rqdata.dataCaches[mc].task_deps[taskfn]
             taskhash = self.rqdata.get_task_hash(task)
+            depid = self.rqdata.get_task_depid(task)
             if 'fakeroot' in taskdep and taskname in taskdep['fakeroot'] and not (self.cooker.configuration.dry_run or self.rqdata.setscene_enforce):
                 if not mc in self.rq.fakeworker:
                     try:
@@ -2037,10 +2038,10 @@ class RunQueueExecuteTasks(RunQueueExecute):
                         self.rq.state = runQueueFailed
                         self.stats.taskFailed()
                         return True
-                self.rq.fakeworker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, False, self.cooker.collection.get_file_appends(taskfn), taskdepdata, self.rqdata.setscene_enforce)) + b"</runtask>")
+                self.rq.fakeworker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, depid, False, self.cooker.collection.get_file_appends(taskfn), taskdepdata, self.rqdata.setscene_enforce)) + b"</runtask>")
                 self.rq.fakeworker[mc].process.stdin.flush()
             else:
-                self.rq.worker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, False, self.cooker.collection.get_file_appends(taskfn), taskdepdata, self.rqdata.setscene_enforce)) + b"</runtask>")
+                self.rq.worker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, depid, False, self.cooker.collection.get_file_appends(taskfn), taskdepdata, self.rqdata.setscene_enforce)) + b"</runtask>")
                 self.rq.worker[mc].process.stdin.flush()
 
             self.build_stamps[task] = bb.build.stampfile(taskname, self.rqdata.dataCaches[mc], taskfn, noextra=True)
@@ -2455,13 +2456,14 @@ class RunQueueExecuteScenequeue(RunQueueExecute):
 
             taskdep = self.rqdata.dataCaches[mc].task_deps[taskfn]
             taskhash = self.rqdata.get_task_hash(task)
+            depid = self.rqdata.get_task_depid(task)
             if 'fakeroot' in taskdep and taskname in taskdep['fakeroot'] and not self.cooker.configuration.dry_run:
                 if not mc in self.rq.fakeworker:
                     self.rq.start_fakeworker(self, mc)
-                self.rq.fakeworker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, True, self.cooker.collection.get_file_appends(taskfn), taskdepdata, False)) + b"</runtask>")
+                self.rq.fakeworker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, depid, True, self.cooker.collection.get_file_appends(taskfn), taskdepdata, False)) + b"</runtask>")
                 self.rq.fakeworker[mc].process.stdin.flush()
             else:
-                self.rq.worker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, True, self.cooker.collection.get_file_appends(taskfn), taskdepdata, False)) + b"</runtask>")
+                self.rq.worker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, depid, True, self.cooker.collection.get_file_appends(taskfn), taskdepdata, False)) + b"</runtask>")
                 self.rq.worker[mc].process.stdin.flush()
 
             self.build_stamps[task] = bb.build.stampfile(taskname, self.rqdata.dataCaches[mc], taskfn, noextra=True)
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [RFC v2 13/16] bitbake: runqueue: Pass dependency ID to hash validate
  2018-08-09 22:08 ` [RFC v2 00/16] Hash Equivalency Server Joshua Watt
                     ` (11 preceding siblings ...)
  2018-08-09 22:08   ` [RFC v2 12/16] bitbake: runqueue: Pass dependency ID to task Joshua Watt
@ 2018-08-09 22:08   ` Joshua Watt
  2018-08-09 22:08   ` [RFC v2 14/16] classes/sstate: Handle depid in hash check Joshua Watt
                     ` (4 subsequent siblings)
  17 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-08-09 22:08 UTC (permalink / raw)
  To: bitbake-devel, openembedded-core

If the dependency ID is being used to track task dependencies, the hash
validation function needs to know about it in order to properly validate
the hash.

TODO: This currently isn't going to be backward compatible with older
hashvalidate functions. Is that necessary, and if so are there any
suggestions for a good approach?

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/lib/bb/runqueue.py | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/bitbake/lib/bb/runqueue.py b/bitbake/lib/bb/runqueue.py
index 909fa3aec22..707baed2125 100644
--- a/bitbake/lib/bb/runqueue.py
+++ b/bitbake/lib/bb/runqueue.py
@@ -1548,6 +1548,7 @@ class RunQueue:
         valid = []
         sq_hash = []
         sq_hashfn = []
+        sq_depid = []
         sq_fn = []
         sq_taskname = []
         sq_task = []
@@ -1566,15 +1567,16 @@ class RunQueue:
             sq_fn.append(fn)
             sq_hashfn.append(self.rqdata.dataCaches[mc].hashfn[taskfn])
             sq_hash.append(self.rqdata.runtaskentries[tid].hash)
+            sq_depid.append(self.rqdata.runtaskentries[tid].depid)
             sq_taskname.append(taskname)
             sq_task.append(tid)
-        locs = { "sq_fn" : sq_fn, "sq_task" : sq_taskname, "sq_hash" : sq_hash, "sq_hashfn" : sq_hashfn, "d" : self.cooker.data }
+        locs = { "sq_fn" : sq_fn, "sq_task" : sq_taskname, "sq_hash" : sq_hash, "sq_hashfn" : sq_hashfn, "sq_depid" : sq_depid, "d" : self.cooker.data }
         try:
-            call = self.hashvalidate + "(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=True)"
+            call = self.hashvalidate + "(sq_fn, sq_task, sq_hash, sq_hashfn, sq_depid, d, siginfo=True)"
             valid = bb.utils.better_eval(call, locs)
         # Handle version with no siginfo parameter
         except TypeError:
-            call = self.hashvalidate + "(sq_fn, sq_task, sq_hash, sq_hashfn, d)"
+            call = self.hashvalidate + "(sq_fn, sq_task, sq_hash, sq_hashfn, sq_depid, d)"
             valid = bb.utils.better_eval(call, locs)
         for v in valid:
             valid_new.add(sq_task[v])
@@ -2286,6 +2288,7 @@ class RunQueueExecuteScenequeue(RunQueueExecute):
         if self.rq.hashvalidate:
             sq_hash = []
             sq_hashfn = []
+            sq_depid = []
             sq_fn = []
             sq_taskname = []
             sq_task = []
@@ -2317,13 +2320,14 @@ class RunQueueExecuteScenequeue(RunQueueExecute):
                 sq_fn.append(fn)
                 sq_hashfn.append(self.rqdata.dataCaches[mc].hashfn[taskfn])
                 sq_hash.append(self.rqdata.runtaskentries[tid].hash)
+                sq_depid.append(self.rqdata.runtaskentries[tid].depid)
                 sq_taskname.append(taskname)
                 sq_task.append(tid)
 
             self.cooker.data.setVar("BB_SETSCENE_STAMPCURRENT_COUNT", len(stamppresent))
 
-            call = self.rq.hashvalidate + "(sq_fn, sq_task, sq_hash, sq_hashfn, d)"
-            locs = { "sq_fn" : sq_fn, "sq_task" : sq_taskname, "sq_hash" : sq_hash, "sq_hashfn" : sq_hashfn, "d" : self.cooker.data }
+            call = self.rq.hashvalidate + "(sq_fn, sq_task, sq_hash, sq_hashfn, sq_depid, d)"
+            locs = { "sq_fn" : sq_fn, "sq_task" : sq_taskname, "sq_hash" : sq_hash, "sq_hashfn" : sq_hashfn, "sq_depid": sq_depid, "d" : self.cooker.data }
             valid = bb.utils.better_eval(call, locs)
 
             self.cooker.data.delVar("BB_SETSCENE_STAMPCURRENT_COUNT")
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [RFC v2 14/16] classes/sstate: Handle depid in hash check
  2018-08-09 22:08 ` [RFC v2 00/16] Hash Equivalency Server Joshua Watt
                     ` (12 preceding siblings ...)
  2018-08-09 22:08   ` [RFC v2 13/16] bitbake: runqueue: Pass dependency ID to hash validate Joshua Watt
@ 2018-08-09 22:08   ` Joshua Watt
  2018-08-09 22:08   ` [RFC v2 15/16] bitbake: hashserv: Add hash equivalence reference server Joshua Watt
                     ` (3 subsequent siblings)
  17 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-08-09 22:08 UTC (permalink / raw)
  To: bitbake-devel, openembedded-core

Handles the argument that passes task dependency IDs in the hash check
function, as it is now required by bitbake

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 meta/classes/sstate.bbclass | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/meta/classes/sstate.bbclass b/meta/classes/sstate.bbclass
index 6743becf071..28a64315b60 100644
--- a/meta/classes/sstate.bbclass
+++ b/meta/classes/sstate.bbclass
@@ -766,7 +766,7 @@ sstate_unpack_package () {
 
 BB_HASHCHECK_FUNCTION = "sstate_checkhashes"
 
-def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False):
+def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, sq_depid, d, siginfo=False):
 
     ret = []
     missed = []
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [RFC v2 15/16] bitbake: hashserv: Add hash equivalence reference server
  2018-08-09 22:08 ` [RFC v2 00/16] Hash Equivalency Server Joshua Watt
                     ` (13 preceding siblings ...)
  2018-08-09 22:08   ` [RFC v2 14/16] classes/sstate: Handle depid in hash check Joshua Watt
@ 2018-08-09 22:08   ` Joshua Watt
  2018-08-09 22:08   ` [RFC v2 16/16] sstate: Implement hash equivalence sstate Joshua Watt
                     ` (2 subsequent siblings)
  17 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-08-09 22:08 UTC (permalink / raw)
  To: bitbake-devel, openembedded-core

Implements a reference implementation of the hash equivalence server.
This server has minimal dependencies (and no dependencies outside of the
standard Python library), and implements the minimum required to be a
conforming hash equivalence server.

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/bin/bitbake-hashserv     |  67 ++++++++++++++
 bitbake/bin/bitbake-selftest     |   2 +
 bitbake/lib/hashserv/__init__.py | 152 +++++++++++++++++++++++++++++++
 bitbake/lib/hashserv/tests.py    | 141 ++++++++++++++++++++++++++++
 4 files changed, 362 insertions(+)
 create mode 100755 bitbake/bin/bitbake-hashserv
 create mode 100644 bitbake/lib/hashserv/__init__.py
 create mode 100644 bitbake/lib/hashserv/tests.py

diff --git a/bitbake/bin/bitbake-hashserv b/bitbake/bin/bitbake-hashserv
new file mode 100755
index 00000000000..c49397b73a5
--- /dev/null
+++ b/bitbake/bin/bitbake-hashserv
@@ -0,0 +1,67 @@
+#! /usr/bin/env python3
+#
+# Copyright (C) 2018 Garmin Ltd.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License version 2 as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License along
+# with this program; if not, write to the Free Software Foundation, Inc.,
+# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+import os
+import sys
+import logging
+import argparse
+import sqlite3
+
+sys.path.insert(0, os.path.join(os.path.dirname(os.path.dirname(__file__)),'lib'))
+
+import hashserv
+
+VERSION = "1.0.0"
+
+DEFAULT_HOST = ''
+DEFAULT_PORT = 8686
+
+def main():
+    parser = argparse.ArgumentParser(description='HTTP Equivalence Reference Server. Version=%s' % VERSION)
+    parser.add_argument('--address', default=DEFAULT_HOST, help='Bind address (default "%(default)s")')
+    parser.add_argument('--port', type=int, default=DEFAULT_PORT, help='Bind port (default %(default)d)')
+    parser.add_argument('--prefix', default='', help='HTTP path prefix (default "%(default)s")')
+    parser.add_argument('--database', default='./hashserv.db', help='Database file (default "%(default)s")')
+    parser.add_argument('--log', default='WARNING', help='Set logging level')
+
+    args = parser.parse_args()
+
+    logger = logging.getLogger('hashserv')
+
+    level = getattr(logging, args.log.upper(), None)
+    if not isinstance(level, int):
+        raise ValueError('Invalid log level: %s' % args.log)
+
+    logger.setLevel(level)
+    console = logging.StreamHandler()
+    console.setLevel(level)
+    logger.addHandler(console)
+
+    db = sqlite3.connect(args.database)
+
+    server = hashserv.create_server((args.address, args.port), db, args.prefix)
+    server.serve_forever()
+    return 0
+
+if __name__ == '__main__':
+    try:
+        ret = main()
+    except Exception:
+        ret = 1
+        import traceback
+        traceback.print_exc()
+    sys.exit(ret)
+
diff --git a/bitbake/bin/bitbake-selftest b/bitbake/bin/bitbake-selftest
index 06a1c9a78dd..de1f8f74dda 100755
--- a/bitbake/bin/bitbake-selftest
+++ b/bitbake/bin/bitbake-selftest
@@ -22,6 +22,7 @@ sys.path.insert(0, os.path.join(os.path.dirname(os.path.dirname(__file__)), 'lib
 import unittest
 try:
     import bb
+    import hashserv
     import layerindexlib
 except RuntimeError as exc:
     sys.exit(str(exc))
@@ -34,6 +35,7 @@ tests = ["bb.tests.codeparser",
          "bb.tests.parse",
          "bb.tests.persist_data",
          "bb.tests.utils",
+         "hashserv.tests",
          "layerindexlib.tests.layerindexobj",
          "layerindexlib.tests.restapi",
          "layerindexlib.tests.cooker"]
diff --git a/bitbake/lib/hashserv/__init__.py b/bitbake/lib/hashserv/__init__.py
new file mode 100644
index 00000000000..cde030cb88e
--- /dev/null
+++ b/bitbake/lib/hashserv/__init__.py
@@ -0,0 +1,152 @@
+# Copyright (C) 2018 Garmin Ltd.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License version 2 as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License along
+# with this program; if not, write to the Free Software Foundation, Inc.,
+# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+
+from http.server import BaseHTTPRequestHandler, HTTPServer
+import contextlib
+import urllib.parse
+import sqlite3
+import json
+import traceback
+import logging
+from datetime import datetime
+
+logger = logging.getLogger('hashserv')
+
+class HashEquivalenceServer(BaseHTTPRequestHandler):
+    def log_message(self, f, *args):
+        logger.debug(f, *args)
+
+    def do_GET(self):
+        try:
+            p = urllib.parse.urlparse(self.path)
+
+            if p.path != self.prefix + '/v1/equivalent':
+                self.send_error(404)
+                return
+
+            query = urllib.parse.parse_qs(p.query, strict_parsing=True)
+            method = query['method'][0]
+            taskhash = query['taskhash'][0]
+
+            d = None
+            with contextlib.closing(self.db.cursor()) as cursor:
+                cursor.execute('SELECT taskhash, method, depid FROM tasks_v1 WHERE method=:method AND taskhash=:taskhash ORDER BY created ASC LIMIT 1',
+                        {'method': method, 'taskhash': taskhash})
+
+                row = cursor.fetchone()
+
+                if row is not None:
+                    logger.debug('Found equivalent task %s', row['taskhash'])
+                    d = {k: row[k] for k in ('taskhash', 'method', 'depid')}
+
+            self.send_response(200)
+            self.send_header('Content-Type', 'application/json; charset=utf-8')
+            self.end_headers()
+            self.wfile.write(json.dumps(d).encode('utf-8'))
+        except:
+            logger.exception('Error in GET')
+            self.send_error(400, explain=traceback.format_exc())
+            return
+
+    def do_POST(self):
+        try:
+            p = urllib.parse.urlparse(self.path)
+
+            if p.path != self.prefix + '/v1/equivalent':
+                self.send_error(404)
+                return
+
+            length = int(self.headers['content-length'])
+            data = json.loads(self.rfile.read(length).decode('utf-8'))
+
+            with contextlib.closing(self.db.cursor()) as cursor:
+                cursor.execute('''
+                    SELECT taskhash, method, depid FROM tasks_v1 WHERE method=:method AND outhash=:outhash
+                    ORDER BY CASE WHEN taskhash=:taskhash THEN 1 ELSE 2 END,
+                        created ASC
+                    LIMIT 1
+                    ''', {k: data[k] for k in ('method', 'outhash', 'taskhash')})
+
+                row = cursor.fetchone()
+
+                if row is None or row['taskhash'] != data['taskhash']:
+                    depid = data['depid']
+                    if row is not None:
+                        depid = row['depid']
+
+                    insert_data = {
+                            'method': data['method'],
+                            'outhash': data['outhash'],
+                            'taskhash': data['taskhash'],
+                            'depid': depid,
+                            'created': datetime.now()
+                            }
+
+                    for k in ('owner', 'PN', 'PV', 'PR', 'task', 'outhash_siginfo'):
+                        if k in data:
+                            insert_data[k] = data[k]
+
+                    cursor.execute('''INSERT INTO tasks_v1 (%s) VALUES (%s)''' % (
+                            ', '.join(sorted(insert_data.keys())),
+                            ', '.join(':' + k for k in sorted(insert_data.keys()))),
+                        insert_data)
+
+                    logger.info('Adding taskhash %s with depid %s', data['taskhash'], depid)
+                    cursor.execute('SELECT taskhash, method, depid FROM tasks_v1 WHERE id=:id', {'id': cursor.lastrowid})
+                    row = cursor.fetchone()
+
+                    self.db.commit()
+
+                d = {k: row[k] for k in ('taskhash', 'method', 'depid')}
+
+                self.send_response(200)
+                self.send_header('Content-Type', 'application/json; charset=utf-8')
+                self.end_headers()
+                self.wfile.write(json.dumps(d).encode('utf-8'))
+        except:
+            logger.exception('Error in POST')
+            self.send_error(400, explain=traceback.format_exc())
+            return
+
+def create_server(addr, db, prefix=''):
+    class Handler(HashEquivalenceServer):
+        pass
+
+    Handler.prefix = prefix
+    Handler.db = db
+    db.row_factory = sqlite3.Row
+
+    with contextlib.closing(db.cursor()) as cursor:
+        cursor.execute('''
+            CREATE TABLE IF NOT EXISTS tasks_v1 (
+                id INTEGER PRIMARY KEY AUTOINCREMENT,
+                method TEXT NOT NULL,
+                outhash TEXT NOT NULL,
+                taskhash TEXT NOT NULL,
+                depid TEXT NOT NULL,
+                created DATETIME,
+
+                -- Optional fields
+                owner TEXT,
+                PN TEXT,
+                PV TEXT,
+                PR TEXT,
+                task TEXT,
+                outhash_siginfo TEXT
+                )
+            ''')
+
+    logger.info('Starting server on %s', addr)
+    return HTTPServer(addr, Handler)
diff --git a/bitbake/lib/hashserv/tests.py b/bitbake/lib/hashserv/tests.py
new file mode 100644
index 00000000000..7efb1fce0bc
--- /dev/null
+++ b/bitbake/lib/hashserv/tests.py
@@ -0,0 +1,141 @@
+#! /usr/bin/env python3
+#
+# Copyright (C) 2018 Garmin Ltd.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License version 2 as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License along
+# with this program; if not, write to the Free Software Foundation, Inc.,
+# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+
+import unittest
+import threading
+import sqlite3
+import hashlib
+import urllib.request
+import json
+from . import create_server
+
+class TestHashEquivalenceServer(unittest.TestCase):
+    def setUp(self):
+        # Start an in memory hash equivalence server in the background bound to
+        # an ephemeral port
+        db = sqlite3.connect(':memory:', check_same_thread=False)
+        self.server = create_server(('localhost', 0), db)
+        self.server_addr = 'http://localhost:%d' % self.server.socket.getsockname()[1]
+        self.server_thread = threading.Thread(target=self.server.serve_forever)
+        self.server_thread.start()
+
+    def tearDown(self):
+        # Shutdown server
+        s = getattr(self, 'server', None)
+        if s is not None:
+            self.server.shutdown()
+            self.server_thread.join()
+            self.server.server_close()
+
+    def send_get(self, path):
+        url = '%s/%s' % (self.server_addr, path)
+        request = urllib.request.Request(url)
+        response = urllib.request.urlopen(request)
+        return json.loads(response.read().decode('utf-8'))
+
+    def send_post(self, path, data):
+        headers = {'content-type': 'application/json'}
+        url = '%s/%s' % (self.server_addr, path)
+        request = urllib.request.Request(url, json.dumps(data).encode('utf-8'), headers)
+        response = urllib.request.urlopen(request)
+        return json.loads(response.read().decode('utf-8'))
+
+    def test_create_hash(self):
+        # Simple test that hashes can be created
+        taskhash = '35788efcb8dfb0a02659d81cf2bfd695fb30faf9'
+        outhash = '2765d4a5884be49b28601445c2760c5f21e7e5c0ee2b7e3fce98fd7e5970796f'
+        depid = 'f46d3fbb439bd9b921095da657a4de906510d2cd'
+
+        d = self.send_get('v1/equivalent?method=TestMethod&taskhash=%s' % taskhash)
+        self.assertIsNone(d, msg='Found unexpected task, %r' % d)
+
+        d = self.send_post('v1/equivalent', {
+            'taskhash': taskhash,
+            'method': 'TestMethod',
+            'outhash': outhash,
+            'depid': depid,
+            })
+        self.assertEqual(d['depid'], depid, 'Server returned bad depid')
+
+    def test_create_equivalent(self):
+        # Tests that a second reported task with the same outhash will be
+        # assigned the same depid
+        taskhash = '53b8dce672cb6d0c73170be43f540460bfc347b4'
+        outhash = '5a9cb1649625f0bf41fc7791b635cd9c2d7118c7f021ba87dcd03f72b67ce7a8'
+        depid = 'f37918cc02eb5a520b1aff86faacbc0a38124646'
+        d = self.send_post('v1/equivalent', {
+            'taskhash': taskhash,
+            'method': 'TestMethod',
+            'outhash': outhash,
+            'depid': depid,
+            })
+        self.assertEqual(d['depid'], depid, 'Server returned bad depid')
+
+        # Report a different task with the same outhash. The returned depid
+        # should match the first task
+        taskhash2 = '3bf6f1e89d26205aec90da04854fbdbf73afe6b4'
+        depid2 = 'af36b199320e611fbb16f1f277d3ee1d619ca58b'
+        d = self.send_post('v1/equivalent', {
+            'taskhash': taskhash2,
+            'method': 'TestMethod',
+            'outhash': outhash,
+            'depid': depid2,
+            })
+        self.assertEqual(d['depid'], depid, 'Server returned bad depid')
+
+    def test_duplicate_taskhash(self):
+        # Tests that duplicate reports of the same taskhash with different
+        # outhash & depid always return the depid from the first reported
+        # taskhash
+        taskhash = '8aa96fcffb5831b3c2c0cb75f0431e3f8b20554a'
+        outhash = 'afe240a439959ce86f5e322f8c208e1fedefea9e813f2140c81af866cc9edf7e'
+        depid = '218e57509998197d570e2c98512d0105985dffc9'
+        d = self.send_post('v1/equivalent', {
+            'taskhash': taskhash,
+            'method': 'TestMethod',
+            'outhash': outhash,
+            'depid': depid,
+            })
+
+        d = self.send_get('v1/equivalent?method=TestMethod&taskhash=%s' % taskhash)
+        self.assertEqual(d['depid'], depid)
+
+        outhash2 = '0904a7fe3dc712d9fd8a74a616ddca2a825a8ee97adf0bd3fc86082c7639914d'
+        depid2 = 'ae9a7d252735f0dafcdb10e2e02561ca3a47314c'
+        d = self.send_post('v1/equivalent', {
+            'taskhash': taskhash,
+            'method': 'TestMethod',
+            'outhash': outhash2,
+            'depid': depid2
+            })
+
+        d = self.send_get('v1/equivalent?method=TestMethod&taskhash=%s' % taskhash)
+        self.assertEqual(d['depid'], depid)
+
+        outhash3 = '77623a549b5b1a31e3732dfa8fe61d7ce5d44b3370f253c5360e136b852967b4'
+        depid3 = '9217a7d6398518e5dc002ed58f2cbbbc78696603'
+        d = self.send_post('v1/equivalent', {
+            'taskhash': taskhash,
+            'method': 'TestMethod',
+            'outhash': outhash3,
+            'depid': depid3
+            })
+
+        d = self.send_get('v1/equivalent?method=TestMethod&taskhash=%s' % taskhash)
+        self.assertEqual(d['depid'], depid)
+
+
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [RFC v2 16/16] sstate: Implement hash equivalence sstate
  2018-08-09 22:08 ` [RFC v2 00/16] Hash Equivalency Server Joshua Watt
                     ` (14 preceding siblings ...)
  2018-08-09 22:08   ` [RFC v2 15/16] bitbake: hashserv: Add hash equivalence reference server Joshua Watt
@ 2018-08-09 22:08   ` Joshua Watt
  2018-12-04  3:42     ` [PATCH " Joshua Watt
  2018-12-04  4:05   ` ✗ patchtest: failure for Hash Equivalency Server Patchwork
  17 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-08-09 22:08 UTC (permalink / raw)
  To: bitbake-devel, openembedded-core

Converts sstate so that it can use a hash equivalence server to
determine if a task really needs to be rebuilt, or if it can be restored
from a different (equivalent) sstate object.

The dependency IDs are cached persistently using persist_data. This has
a number of advantages:
 1) Dependency IDs can be cached between invocations of bitbake to
    prevent needing to contact the server every time (which is slow)
 2) The value of each tasks dependency ID can easily be synchronized
    between different threads, which will be useful if bitbake is
    updated to do on the fly task re-hashing.

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 meta/classes/sstate.bbclass | 100 ++++++++++++++++++++--
 meta/conf/bitbake.conf      |   4 +-
 meta/lib/oe/sstatesig.py    | 166 ++++++++++++++++++++++++++++++++++++
 3 files changed, 261 insertions(+), 9 deletions(-)

diff --git a/meta/classes/sstate.bbclass b/meta/classes/sstate.bbclass
index 28a64315b60..e956bd40b25 100644
--- a/meta/classes/sstate.bbclass
+++ b/meta/classes/sstate.bbclass
@@ -11,7 +11,7 @@ def generate_sstatefn(spec, hash, d):
 SSTATE_PKGARCH    = "${PACKAGE_ARCH}"
 SSTATE_PKGSPEC    = "sstate:${PN}:${PACKAGE_ARCH}${TARGET_VENDOR}-${TARGET_OS}:${PV}:${PR}:${SSTATE_PKGARCH}:${SSTATE_VERSION}:"
 SSTATE_SWSPEC     = "sstate:${PN}::${PV}:${PR}::${SSTATE_VERSION}:"
-SSTATE_PKGNAME    = "${SSTATE_EXTRAPATH}${@generate_sstatefn(d.getVar('SSTATE_PKGSPEC'), d.getVar('BB_TASKHASH'), d)}"
+SSTATE_PKGNAME    = "${SSTATE_EXTRAPATH}${@generate_sstatefn(d.getVar('SSTATE_PKGSPEC'), d.getVar('BB_DEPID'), d)}"
 SSTATE_PKG        = "${SSTATE_DIR}/${SSTATE_PKGNAME}"
 SSTATE_EXTRAPATH   = ""
 SSTATE_EXTRAPATHWILDCARD = ""
@@ -75,6 +75,23 @@ SSTATE_SIG_PASSPHRASE ?= ""
 # Whether to verify the GnUPG signatures when extracting sstate archives
 SSTATE_VERIFY_SIG ?= "0"
 
+SSTATE_HASHEQUIV_METHOD ?= "OEOuthashBasic"
+SSTATE_HASHEQUIV_METHOD[doc] = "The function used to calculate the output hash \
+    for a task, which in turn is used to determine equivalency. \
+    "
+
+SSTATE_HASHEQUIV_SERVER ?= ""
+SSTATE_HASHEQUIV_SERVER[doc] = "The hash equivalence sever. For example, \
+    'http://192.168.0.1:5000'. Do not include a trailing slash \
+    "
+
+SSTATE_HASHEQUIV_REPORT_TASKDATA ?= "0"
+SSTATE_HASHEQUIV_REPORT_TASKDATA[doc] = "Report additional useful data to the \
+    hash equivalency server, such as PN, PV, taskname, etc. This information \
+    is very useful for developers looking at task data, but may leak sensitive \
+    data if the equivalence server is public. \
+    "
+
 python () {
     if bb.data.inherits_class('native', d):
         d.setVar('SSTATE_PKGARCH', d.getVar('BUILD_ARCH', False))
@@ -626,7 +643,7 @@ def sstate_package(ss, d):
         return
 
     for f in (d.getVar('SSTATECREATEFUNCS') or '').split() + \
-             ['sstate_create_package', 'sstate_sign_package'] + \
+             ['sstate_report_depid', 'sstate_create_package', 'sstate_sign_package'] + \
              (d.getVar('SSTATEPOSTCREATEFUNCS') or '').split():
         # All hooks should run in SSTATE_BUILDDIR.
         bb.build.exec_func(f, d, (sstatebuild,))
@@ -750,6 +767,73 @@ python sstate_sign_package () {
                            d.getVar('SSTATE_SIG_PASSPHRASE'), armor=False)
 }
 
+def OEOuthashBasic(path, sigfile, task, d):
+    import hashlib
+    import stat
+
+    def update_hash(s):
+        s = s.encode('utf-8')
+        h.update(s)
+        if sigfile:
+            sigfile.write(s)
+
+    h = hashlib.sha1()
+    prev_dir = os.getcwd()
+
+    try:
+        os.chdir(path)
+
+        update_hash("OEOuthashBasic\n")
+
+        # It is only currently useful to get equivalent hashes for things that
+        # can be restored from sstate. Since the sstate object is named using
+        # SSTATE_PKGSPEC and the task name, those should be included in the
+        # output hash calculation.
+        update_hash("SSTATE_PKGSPEC=%s\n" % d.getVar('SSTATE_PKGSPEC'))
+        update_hash("task=%s\n" % task)
+
+        for root, dirs, files in os.walk('.', topdown=True):
+            # Sort directories and files to ensure consistent ordering
+            dirs.sort()
+            files.sort()
+
+            for f in files:
+                path = os.path.join(root, f)
+                s = os.lstat(path)
+
+                # Hash file path
+                update_hash(path + '\n')
+
+                # Hash file mode
+                update_hash("\tmode=0x%x\n" % stat.S_IMODE(s.st_mode))
+                update_hash("\ttype=0x%x\n" % stat.S_IFMT(s.st_mode))
+
+                if stat.S_ISBLK(s.st_mode) or stat.S_ISBLK(s.st_mode):
+                    # Hash device major and minor
+                    update_hash("\tdev=%d,%d\n" % (os.major(s.st_rdev), os.minor(s.st_rdev)))
+                elif stat.S_ISLNK(s.st_mode):
+                    # Hash symbolic link
+                    update_hash("\tsymlink=%s\n" % os.readlink(path))
+                else:
+                    fh = hashlib.sha1()
+                    # Hash file contents
+                    with open(path, 'rb') as d:
+                        for chunk in iter(lambda: d.read(4096), b""):
+                            fh.update(chunk)
+                    update_hash("\tdigest=%s\n" % fh.hexdigest())
+    finally:
+        os.chdir(prev_dir)
+
+    return h.hexdigest()
+
+python sstate_report_depid() {
+    report_depid = getattr(bb.parse.siggen, 'report_depid', None)
+
+    if report_depid:
+        ss = sstate_state_fromvars(d)
+        report_depid(os.getcwd(), ss['task'], d)
+}
+
 #
 # Shell function to decompress and prepare a package for installation
 # Will be run from within SSTATE_INSTDIR.
@@ -796,7 +880,7 @@ def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, sq_depid, d, siginfo=
 
         spec, extrapath, tname = getpathcomponents(task, d)
 
-        sstatefile = d.expand("${SSTATE_DIR}/" + extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + extension)
+        sstatefile = d.expand("${SSTATE_DIR}/" + extrapath + generate_sstatefn(spec, sq_depid[task], d) + "_" + tname + extension)
 
         if os.path.exists(sstatefile):
             bb.debug(2, "SState: Found valid sstate file %s" % sstatefile)
@@ -858,7 +942,7 @@ def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, sq_depid, d, siginfo=
             if task in ret:
                 continue
             spec, extrapath, tname = getpathcomponents(task, d)
-            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + extension)
+            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_depid[task], d) + "_" + tname + extension)
             tasklist.append((task, sstatefile))
 
         if tasklist:
@@ -884,12 +968,12 @@ def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, sq_depid, d, siginfo=
         evdata = {'missed': [], 'found': []};
         for task in missed:
             spec, extrapath, tname = getpathcomponents(task, d)
-            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + ".tgz")
-            evdata['missed'].append( (sq_fn[task], sq_task[task], sq_hash[task], sstatefile ) )
+            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_depid[task], d) + "_" + tname + ".tgz")
+            evdata['missed'].append( (sq_fn[task], sq_task[task], sq_depid[task], sstatefile ) )
         for task in ret:
             spec, extrapath, tname = getpathcomponents(task, d)
-            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + ".tgz")
-            evdata['found'].append( (sq_fn[task], sq_task[task], sq_hash[task], sstatefile ) )
+            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_depid[task], d) + "_" + tname + ".tgz")
+            evdata['found'].append( (sq_fn[task], sq_task[task], sq_depid[task], sstatefile ) )
         bb.event.fire(bb.event.MetadataEvent("MissedSstate", evdata), d)
 
     # Print some summary statistics about the current task completion and how much sstate
diff --git a/meta/conf/bitbake.conf b/meta/conf/bitbake.conf
index f68954c511e..6c5cad60b83 100644
--- a/meta/conf/bitbake.conf
+++ b/meta/conf/bitbake.conf
@@ -865,7 +865,9 @@ BB_HASHBASE_WHITELIST ?= "TMPDIR FILE PATH PWD BB_TASKHASH BBPATH BBSERVER DL_DI
     STAMPS_DIR PRSERV_DUMPDIR PRSERV_DUMPFILE PRSERV_LOCKDOWN PARALLEL_MAKE \
     CCACHE_DIR EXTERNAL_TOOLCHAIN CCACHE CCACHE_NOHASHDIR LICENSE_PATH SDKPKGSUFFIX \
     WARN_QA ERROR_QA WORKDIR STAMPCLEAN PKGDATA_DIR BUILD_ARCH SSTATE_PKGARCH \
-    BB_WORKERCONTEXT BB_LIMITEDDEPS extend_recipe_sysroot DEPLOY_DIR"
+    BB_WORKERCONTEXT BB_LIMITEDDEPS BB_DEPID extend_recipe_sysroot DEPLOY_DIR \
+    SSTATE_HASHEQUIV_METHOD SSTATE_HASHEQUIV_SERVER SSTATE_HASHEQUIV_REPORT_TASKDATA \
+    SSTATE_HASHEQUIV_OWNER"
 BB_HASHCONFIG_WHITELIST ?= "${BB_HASHBASE_WHITELIST} DATE TIME SSH_AGENT_PID \
     SSH_AUTH_SOCK PSEUDO_BUILD BB_ENV_EXTRAWHITE DISABLE_SANITY_CHECKS \
     PARALLEL_MAKE BB_NUMBER_THREADS BB_ORIGENV BB_INVALIDCONF BBINCLUDED \
diff --git a/meta/lib/oe/sstatesig.py b/meta/lib/oe/sstatesig.py
index 18c5a353a2a..7f75de3279f 100644
--- a/meta/lib/oe/sstatesig.py
+++ b/meta/lib/oe/sstatesig.py
@@ -263,10 +263,176 @@ class SignatureGeneratorOEBasicHash(bb.siggen.SignatureGeneratorBasicHash):
         if error_msgs:
             bb.fatal("\n".join(error_msgs))
 
+class SignatureGeneratorOEEquivHash(SignatureGeneratorOEBasicHash):
+    name = "OEEquivHash"
+
+    def init_rundepcheck(self, data):
+        super().init_rundepcheck(data)
+        self.server = data.getVar('SSTATE_HASHEQUIV_SERVER')
+        self.method = data.getVar('SSTATE_HASHEQUIV_METHOD')
+        self.depids = bb.persist_data.persist('SSTATESIG_DEPID_CACHE_v1_' + self.method, data)
+
+    def get_taskdata(self):
+        return (self.server, self.method) + super().get_taskdata()
+
+    def set_taskdata(self, data):
+        self.server, self.method = data[:2]
+        super().set_taskdata(data[2:])
+
+    def __get_task_depid_key(self, task):
+        # TODO: The key only *needs* to be the taskhash, the task is just
+        # convenient
+        return '%s:%s' % (task, self.taskhash[task])
+
+    def get_stampfile_hash(self, task):
+        if task in self.taskhash:
+            # If a depid is reported, use it as the stampfile hash. This
+            # ensures that if a task won't be re-run if the taskhash changes,
+            # but it would result in the same output hash
+            depid = self.depids.get(self.__get_task_depid_key(task))
+            if depid is not None:
+                return depid
+
+        return super().get_stampfile_hash(task)
+
+    def get_depid(self, task):
+        import urllib
+        import json
+
+        taskhash = self.taskhash[task]
+
+        key = self.__get_task_depid_key(task)
+
+        # TODO: This cache can grow unbounded. It probably only needs to keep
+        # for each task
+        depid = self.depids.get(key)
+        if depid is not None:
+            return depid
+
+        # In the absence of being able to discover a dependency ID from the
+        # server, make it be equivalent to the taskhash. The dependency ID only
+        # really needs to be a unique string (not even necessarily a hash), but
+        # making it match the taskhash has a few advantages:
+        #
+        # 1) All of the sstate code that assumes hashes can be the same
+        # 2) It provides maximal compatibility with builders that don't use
+        #    an equivalency server
+        # 3) The value is easy for multiple independent builders to derive the
+        #    same depid from the same input. This means that if the independent
+        #    builders find the same taskhash, but it isn't reported to the server,
+        #    there is a better chance that they will agree on the dependency ID.
+        depid = taskhash
+
+        try:
+            url = '%s/v1/equivalent?%s' % (self.server,
+                    urllib.parse.urlencode({'method': self.method, 'taskhash': self.taskhash[task]}))
+
+            request = urllib.request.Request(url)
+            response = urllib.request.urlopen(request)
+            data = response.read().decode('utf-8')
+
+            json_data = json.loads(data)
+
+            if json_data:
+                depid = json_data['depid']
+                # Dependency ID equal to the taskhash is not very interesting,
+                # so it is reported it at debug level 2. If they differ, that
+                # is much more interesting, so it is reported at debug level 1
+                bb.debug((1, 2)[depid == taskhash], 'Found depid %s in place of %s for %s from %s' % (depid, taskhash, task, self.server))
+            else:
+                bb.debug(2, 'No reported dependency ID for %s:%s from %s' % (task, taskhash, self.server))
+        except urllib.error.URLError as e:
+            bb.warn('Failure contacting Hash Equivalence Server %s: %s' % (self.server, str(e)))
+        except (KeyError, json.JSONDecodeError) as e:
+            bb.warn('Poorly formatted response from %s: %s' % (self.server, str(e)))
+
+        self.depids[key] = depid
+        return depid
+
+    def report_depid(self, path, task, d):
+        import urllib
+        import json
+        import tempfile
+        import base64
+
+        taskhash = d.getVar('BB_TASKHASH')
+        depid = d.getVar('BB_DEPID')
+        report_taskdata = d.getVar('SSTATE_HASHEQUIV_REPORT_TASKDATA') == '1'
+        tempdir = d.getVar('T')
+        fn = d.getVar('BB_FILENAME')
+        key = fn + '.do_' + task + ':' + taskhash
+
+        # Sanity checks
+        cache_depid = self.depids.get(key)
+        if cache_depid is None:
+            bb.fatal('%s not in depid cache. Please report this error' % key)
+
+        if cache_depid != depid:
+            bb.fatal("Cache depid %s doesn't match BB_DEPID %s" % (cache_depid, depid))
+
+        sigfile = None
+        sigfile_name = "depsig.do_%s.%d" % (task, os.getpid())
+        sigfile_link = "depsig.do_%s" % task
+
+        try:
+            call = self.method + '(path, sigfile, task, d)'
+            sigfile = open(os.path.join(tempdir, sigfile_name), 'w+b')
+            locs = {'path': path, 'sigfile': sigfile, 'task': task, 'd': d}
+
+            outhash = bb.utils.better_eval(call, locs)
+
+            try:
+                url = '%s/v1/equivalent' % self.server
+                task_data = {
+                    'taskhash': taskhash,
+                    'method': self.method,
+                    'outhash': outhash,
+                    'depid': depid,
+                    'owner': d.getVar('SSTATE_HASHEQUIV_OWNER')
+                    }
+
+                if report_taskdata:
+                    sigfile.seek(0)
+
+                    task_data['PN'] = d.getVar('PN')
+                    task_data['PV'] = d.getVar('PV')
+                    task_data['PR'] = d.getVar('PR')
+                    task_data['task'] = task
+                    task_data['outhash_siginfo'] = sigfile.read().decode('utf-8')
+
+                headers = {'content-type': 'application/json'}
+
+                request = urllib.request.Request(url, json.dumps(task_data).encode('utf-8'), headers)
+                response = urllib.request.urlopen(request)
+                data = response.read().decode('utf-8')
+
+                json_data = json.loads(data)
+                new_depid = json_data['depid']
+
+                if new_depid != depid:
+                    bb.debug(1, 'Task %s depid changed %s -> %s by server %s' % (taskhash, depid, new_depid, self.server))
+                else:
+                    bb.debug(1, 'Reported task %s as depid %s to %s' % (taskhash, depid, self.server))
+            except urllib.error.URLError as e:
+                bb.warn('Failure contacting Hash Equivalence Server %s: %s' % (self.server, str(e)))
+            except (KeyError, json.JSONDecodeError) as e:
+                bb.warn('Poorly formatted response from %s: %s' % (self.server, str(e)))
+        finally:
+            if sigfile:
+                sigfile.close()
+
+                sigfile_link_path = os.path.join(tempdir, sigfile_link)
+                bb.utils.remove(sigfile_link_path)
+
+                try:
+                    os.symlink(sigfile_name, sigfile_link_path)
+                except OSError:
+                    pass
 
 # Insert these classes into siggen's namespace so it can see and select them
 bb.siggen.SignatureGeneratorOEBasic = SignatureGeneratorOEBasic
 bb.siggen.SignatureGeneratorOEBasicHash = SignatureGeneratorOEBasicHash
+bb.siggen.SignatureGeneratorOEEquivHash = SignatureGeneratorOEEquivHash
 
 
 def find_siginfo(pn, taskname, taskhashlist, d):
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [OE-core][PATCH v3 00/17] Hash Equivalency Server
  2018-08-09 22:08 ` [RFC v2 00/16] Hash Equivalency Server Joshua Watt
@ 2018-12-04  3:42     ` Joshua Watt
  2018-08-09 22:08   ` [RFC v2 02/16] bitbake: persist_data: Fix leaking cursors causing deadlock Joshua Watt
                       ` (16 subsequent siblings)
  17 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-04  3:42 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

These patches are a first pass at implementing a hash equivalence server
in bitbake & OE.

Apologies for cross-posting this to both the bitbake-devel and
openembedded-devel; this work necessarily intertwines both places, and
it is really necessary to look at both parts to get an idea of what is
going on. For convenience, the bitbake patches are listed first,
followed by the oe-core patches.

The basic premise is that any given task no longer hashes a dependent
task's taskhash to determine it's own taskhash, but instead hashes the
dependent task's "dependency ID" (which doesn't strictly need to be a
hash, but is for consistency. We can have the discussion as to whether
this should be called a "dependency hash" if anyone wants). This allows
multiple taskhashes to map to the same dependency ID, meaning that
trivial changes to a recipe that would change the taskhash don't
necessarily need to change the dependency ID, and thus don't need to
cause downstream tasks to be rebuilt (with caveats, see below).

In the absence of any interaction by the user, the dependency ID for a
task is just that task's taskhash, which effectively maintains the
current behavior. However, if the user enables the "OEEquivHash"
signature generator, they can direct it to look at a hash equivalency
server (of which a reference implementation is provided). The sstate
code will provide the server with an output hash that it calculates, and
the server will record all tasks with the same output hash as
"equivalent" and report the same dependency ID for them when requested.
When initializing tasks, bitbake can ask the server about the dependency
ID for new tasks it has never seen before and potentially skip
rebuilding, or restore the task from an equivalent sstate file. To
facilitate restoring tasks from sstate, sstate objects are now named
based on the tasks dependency ID instead of the taskhash (which, again
has no effect if the server is in use).

This patchset doesn't make any attempt to dynamically update task
dependency IDs after bitbake initializes the tasks, and as such there
are some cases where this isn't accelerating the build as much as it
possibly could. I think it will be possible to add support for this, but
this preliminary support needs to come first.

Some patches have additional NOTEs that indicate places where I wasn't
sure what to do.

You can also see these patches (and my first attempts at dynamic task
re-hashing) on the "jpew/hash-equivalence" branch in poky-contrib.

As always, thanks for your feedback and time

VERSION 2:

At the core, this patch does the same thing as V1 with some very minor
tweaks. The main things that have changed are:
 1) Per request, the Hash Equivalence Server reference implementation is
    now based entirely on built in Python modules and requires no
    external libraries. It also has a wrapper script to launch it
    (bitbake-hashserv) and unittests.
 2) There is a major rework of persist_data in bitbake. I
    think these patches could be submitted independently, but I doubt
    anyone is clamoring for them. The general gist of them is that there
    were a lot of strange edge cases that I found when using
    persist_data as an IPC mechanism between the main bitbake process
    and the bitbake-worker processes. I went ahead and added extensive
    unit tests for this as well.

VERSION 3:

Minor tweak to version 2 that should fix timeout errors seen on the
autobuilder

Joshua Watt (17):
  bitbake: fork: Add os.fork() wrappers
  bitbake: persist_data: Fix leaking cursors causing deadlock
  bitbake: persist_data: Add key constraints
  bitbake: persist_data: Enable Write Ahead Log
  bitbake: persist_data: Disable enable_shared_cache
  bitbake: persist_data: Close databases across fork
  bitbake: tests/persist_data: Add tests
  bitbake: bitbake-worker: Pass taskhash as runtask parameter
  bitbake: siggen: Split out stampfile hash fetch
  bitbake: siggen: Split out task depend ID
  bitbake: runqueue: Track task dependency ID
  bitbake: runqueue: Pass dependency ID to task
  bitbake: runqueue: Pass dependency ID to hash validate
  classes/sstate: Handle depid in hash check
  bitbake: hashserv: Add hash equivalence reference server
  sstate: Implement hash equivalence sstate
  classes/image-buildinfo: Remove unused argument

 bitbake/bin/bitbake-hashserv         |  67 ++++++++
 bitbake/bin/bitbake-selftest         |   3 +
 bitbake/bin/bitbake-worker           |  11 +-
 bitbake/lib/bb/fork.py               |  73 +++++++++
 bitbake/lib/bb/persist_data.py       | 237 ++++++++++++++++++++-------
 bitbake/lib/bb/runqueue.py           |  56 ++++---
 bitbake/lib/bb/siggen.py             |  20 ++-
 bitbake/lib/bb/tests/persist_data.py | 188 +++++++++++++++++++++
 bitbake/lib/hashserv/__init__.py     | 152 +++++++++++++++++
 bitbake/lib/hashserv/tests.py        | 141 ++++++++++++++++
 meta/classes/image-buildinfo.bbclass |   6 +-
 meta/classes/sstate.bbclass          | 102 +++++++++++-
 meta/conf/bitbake.conf               |   4 +-
 meta/lib/oe/sstatesig.py             | 166 +++++++++++++++++++
 14 files changed, 1120 insertions(+), 106 deletions(-)
 create mode 100755 bitbake/bin/bitbake-hashserv
 create mode 100644 bitbake/lib/bb/fork.py
 create mode 100644 bitbake/lib/bb/tests/persist_data.py
 create mode 100644 bitbake/lib/hashserv/__init__.py
 create mode 100644 bitbake/lib/hashserv/tests.py

-- 
2.19.1



^ permalink raw reply	[flat|nested] 158+ messages in thread

* [PATCH v3 00/17] Hash Equivalency Server
@ 2018-12-04  3:42     ` Joshua Watt
  0 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-04  3:42 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

These patches are a first pass at implementing a hash equivalence server
in bitbake & OE.

Apologies for cross-posting this to both the bitbake-devel and
openembedded-devel; this work necessarily intertwines both places, and
it is really necessary to look at both parts to get an idea of what is
going on. For convenience, the bitbake patches are listed first,
followed by the oe-core patches.

The basic premise is that any given task no longer hashes a dependent
task's taskhash to determine it's own taskhash, but instead hashes the
dependent task's "dependency ID" (which doesn't strictly need to be a
hash, but is for consistency. We can have the discussion as to whether
this should be called a "dependency hash" if anyone wants). This allows
multiple taskhashes to map to the same dependency ID, meaning that
trivial changes to a recipe that would change the taskhash don't
necessarily need to change the dependency ID, and thus don't need to
cause downstream tasks to be rebuilt (with caveats, see below).

In the absence of any interaction by the user, the dependency ID for a
task is just that task's taskhash, which effectively maintains the
current behavior. However, if the user enables the "OEEquivHash"
signature generator, they can direct it to look at a hash equivalency
server (of which a reference implementation is provided). The sstate
code will provide the server with an output hash that it calculates, and
the server will record all tasks with the same output hash as
"equivalent" and report the same dependency ID for them when requested.
When initializing tasks, bitbake can ask the server about the dependency
ID for new tasks it has never seen before and potentially skip
rebuilding, or restore the task from an equivalent sstate file. To
facilitate restoring tasks from sstate, sstate objects are now named
based on the tasks dependency ID instead of the taskhash (which, again
has no effect if the server is in use).

This patchset doesn't make any attempt to dynamically update task
dependency IDs after bitbake initializes the tasks, and as such there
are some cases where this isn't accelerating the build as much as it
possibly could. I think it will be possible to add support for this, but
this preliminary support needs to come first.

Some patches have additional NOTEs that indicate places where I wasn't
sure what to do.

You can also see these patches (and my first attempts at dynamic task
re-hashing) on the "jpew/hash-equivalence" branch in poky-contrib.

As always, thanks for your feedback and time

VERSION 2:

At the core, this patch does the same thing as V1 with some very minor
tweaks. The main things that have changed are:
 1) Per request, the Hash Equivalence Server reference implementation is
    now based entirely on built in Python modules and requires no
    external libraries. It also has a wrapper script to launch it
    (bitbake-hashserv) and unittests.
 2) There is a major rework of persist_data in bitbake. I
    think these patches could be submitted independently, but I doubt
    anyone is clamoring for them. The general gist of them is that there
    were a lot of strange edge cases that I found when using
    persist_data as an IPC mechanism between the main bitbake process
    and the bitbake-worker processes. I went ahead and added extensive
    unit tests for this as well.

VERSION 3:

Minor tweak to version 2 that should fix timeout errors seen on the
autobuilder

Joshua Watt (17):
  bitbake: fork: Add os.fork() wrappers
  bitbake: persist_data: Fix leaking cursors causing deadlock
  bitbake: persist_data: Add key constraints
  bitbake: persist_data: Enable Write Ahead Log
  bitbake: persist_data: Disable enable_shared_cache
  bitbake: persist_data: Close databases across fork
  bitbake: tests/persist_data: Add tests
  bitbake: bitbake-worker: Pass taskhash as runtask parameter
  bitbake: siggen: Split out stampfile hash fetch
  bitbake: siggen: Split out task depend ID
  bitbake: runqueue: Track task dependency ID
  bitbake: runqueue: Pass dependency ID to task
  bitbake: runqueue: Pass dependency ID to hash validate
  classes/sstate: Handle depid in hash check
  bitbake: hashserv: Add hash equivalence reference server
  sstate: Implement hash equivalence sstate
  classes/image-buildinfo: Remove unused argument

 bitbake/bin/bitbake-hashserv         |  67 ++++++++
 bitbake/bin/bitbake-selftest         |   3 +
 bitbake/bin/bitbake-worker           |  11 +-
 bitbake/lib/bb/fork.py               |  73 +++++++++
 bitbake/lib/bb/persist_data.py       | 237 ++++++++++++++++++++-------
 bitbake/lib/bb/runqueue.py           |  56 ++++---
 bitbake/lib/bb/siggen.py             |  20 ++-
 bitbake/lib/bb/tests/persist_data.py | 188 +++++++++++++++++++++
 bitbake/lib/hashserv/__init__.py     | 152 +++++++++++++++++
 bitbake/lib/hashserv/tests.py        | 141 ++++++++++++++++
 meta/classes/image-buildinfo.bbclass |   6 +-
 meta/classes/sstate.bbclass          | 102 +++++++++++-
 meta/conf/bitbake.conf               |   4 +-
 meta/lib/oe/sstatesig.py             | 166 +++++++++++++++++++
 14 files changed, 1120 insertions(+), 106 deletions(-)
 create mode 100755 bitbake/bin/bitbake-hashserv
 create mode 100644 bitbake/lib/bb/fork.py
 create mode 100644 bitbake/lib/bb/tests/persist_data.py
 create mode 100644 bitbake/lib/hashserv/__init__.py
 create mode 100644 bitbake/lib/hashserv/tests.py

-- 
2.19.1



^ permalink raw reply	[flat|nested] 158+ messages in thread

* [OE-core][PATCH v3 01/17] bitbake: fork: Add os.fork() wrappers
  2018-12-04  3:42     ` [PATCH " Joshua Watt
@ 2018-12-04  3:42       ` Joshua Watt
  -1 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-04  3:42 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

Adds a compatibility wrapper around os.fork() that backports the ability
to register fork event handlers (os.register_at_fork()) from Python 3.7

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/bin/bitbake-worker |  2 +-
 bitbake/lib/bb/fork.py     | 73 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 74 insertions(+), 1 deletion(-)
 create mode 100644 bitbake/lib/bb/fork.py

diff --git a/bitbake/bin/bitbake-worker b/bitbake/bin/bitbake-worker
index e925054b7f9..baa1a84e6dd 100755
--- a/bitbake/bin/bitbake-worker
+++ b/bitbake/bin/bitbake-worker
@@ -181,7 +181,7 @@ def fork_off_task(cfg, data, databuilder, workerdata, fn, task, taskname, append
         pipein, pipeout = os.pipe()
         pipein = os.fdopen(pipein, 'rb', 4096)
         pipeout = os.fdopen(pipeout, 'wb', 0)
-        pid = os.fork()
+        pid = bb.fork.fork()
     except OSError as e:
         logger.critical("fork failed: %d (%s)" % (e.errno, e.strerror))
         sys.exit(1)
diff --git a/bitbake/lib/bb/fork.py b/bitbake/lib/bb/fork.py
new file mode 100644
index 00000000000..2b2b0b73b62
--- /dev/null
+++ b/bitbake/lib/bb/fork.py
@@ -0,0 +1,73 @@
+# ex:ts=4:sw=4:sts=4:et
+# -*- tab-width: 4; c-basic-offset: 4; indent-tabs-mode: nil -*-
+#
+# Copyright (C) 2018 Garmin Ltd.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License version 2 as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License along
+# with this program; if not, write to the Free Software Foundation, Inc.,
+# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+#
+"""
+Python wrappers for os.fork() that allow the insertion of callbacks for fork events.
+This is designed to exacmimic os.register_at_fork() available in Python 3.7 with the
+intent that it can be removed when that version becomes standard
+"""
+
+import sys
+import os
+
+before_calls = []
+after_in_parent_calls = []
+after_in_child_calls = []
+
+def _do_calls(l, reverse=False):
+    # Make a copy in case the list is modified in the callback
+    copy = l[:]
+    if reverse:
+        copy = reversed(copy)
+
+    for f in copy:
+        # All exception in calls are ignored
+        try:
+            f()
+        except:
+            pass
+
+def fork():
+    if sys.hexversion >= 0x030700F0:
+        return os.fork()
+
+    _do_calls(before_calls, reverse=True)
+
+    ret = os.fork()
+    if ret == 0:
+        _do_calls(after_in_child_calls)
+    else:
+        _do_calls(after_in_parent_calls)
+    return ret
+
+def register_at_fork(**kwargs):
+    def add_arg_to_list(name, lst):
+        if name in kwargs:
+            arg = kwargs[name]
+            if not callable(arg):
+                raise TypeError("'%s' must be callable, not %s" % (name, type(arg)))
+            lst.append(arg)
+
+    if sys.hexversion >= 0x030700F0:
+        os.register_at_fork(**kwargs)
+        return
+
+    add_arg_to_list('before', before_calls)
+    add_arg_to_list('after_in_parent', after_in_parent_calls)
+    add_arg_to_list('after_in_child', after_in_child_calls)
+
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [PATCH v3 01/17] bitbake: fork: Add os.fork() wrappers
@ 2018-12-04  3:42       ` Joshua Watt
  0 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-04  3:42 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

Adds a compatibility wrapper around os.fork() that backports the ability
to register fork event handlers (os.register_at_fork()) from Python 3.7

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/bin/bitbake-worker |  2 +-
 bitbake/lib/bb/fork.py     | 73 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 74 insertions(+), 1 deletion(-)
 create mode 100644 bitbake/lib/bb/fork.py

diff --git a/bitbake/bin/bitbake-worker b/bitbake/bin/bitbake-worker
index e925054b7f9..baa1a84e6dd 100755
--- a/bitbake/bin/bitbake-worker
+++ b/bitbake/bin/bitbake-worker
@@ -181,7 +181,7 @@ def fork_off_task(cfg, data, databuilder, workerdata, fn, task, taskname, append
         pipein, pipeout = os.pipe()
         pipein = os.fdopen(pipein, 'rb', 4096)
         pipeout = os.fdopen(pipeout, 'wb', 0)
-        pid = os.fork()
+        pid = bb.fork.fork()
     except OSError as e:
         logger.critical("fork failed: %d (%s)" % (e.errno, e.strerror))
         sys.exit(1)
diff --git a/bitbake/lib/bb/fork.py b/bitbake/lib/bb/fork.py
new file mode 100644
index 00000000000..2b2b0b73b62
--- /dev/null
+++ b/bitbake/lib/bb/fork.py
@@ -0,0 +1,73 @@
+# ex:ts=4:sw=4:sts=4:et
+# -*- tab-width: 4; c-basic-offset: 4; indent-tabs-mode: nil -*-
+#
+# Copyright (C) 2018 Garmin Ltd.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License version 2 as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License along
+# with this program; if not, write to the Free Software Foundation, Inc.,
+# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+#
+"""
+Python wrappers for os.fork() that allow the insertion of callbacks for fork events.
+This is designed to exacmimic os.register_at_fork() available in Python 3.7 with the
+intent that it can be removed when that version becomes standard
+"""
+
+import sys
+import os
+
+before_calls = []
+after_in_parent_calls = []
+after_in_child_calls = []
+
+def _do_calls(l, reverse=False):
+    # Make a copy in case the list is modified in the callback
+    copy = l[:]
+    if reverse:
+        copy = reversed(copy)
+
+    for f in copy:
+        # All exception in calls are ignored
+        try:
+            f()
+        except:
+            pass
+
+def fork():
+    if sys.hexversion >= 0x030700F0:
+        return os.fork()
+
+    _do_calls(before_calls, reverse=True)
+
+    ret = os.fork()
+    if ret == 0:
+        _do_calls(after_in_child_calls)
+    else:
+        _do_calls(after_in_parent_calls)
+    return ret
+
+def register_at_fork(**kwargs):
+    def add_arg_to_list(name, lst):
+        if name in kwargs:
+            arg = kwargs[name]
+            if not callable(arg):
+                raise TypeError("'%s' must be callable, not %s" % (name, type(arg)))
+            lst.append(arg)
+
+    if sys.hexversion >= 0x030700F0:
+        os.register_at_fork(**kwargs)
+        return
+
+    add_arg_to_list('before', before_calls)
+    add_arg_to_list('after_in_parent', after_in_parent_calls)
+    add_arg_to_list('after_in_child', after_in_child_calls)
+
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [OE-core][PATCH v3 02/17] bitbake: persist_data: Fix leaking cursors causing deadlock
  2018-12-04  3:42     ` [PATCH " Joshua Watt
@ 2018-12-04  3:42       ` Joshua Watt
  -1 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-04  3:42 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

The original implementation of persistent data executed all SQL
statements via sqlite3.Connection.execute(). Behind the scenes, this
function created a sqlite3 Cursor object, executed the statement, then
returned the cursor. However, the implementation did not account for
this and failed to close the cursor object when it was done. The cursor
would eventually be closed when the garbage collector got around to
destroying it. However, sqlite has a limit on the number of cursors that
can exist at any given time, and once this limit is reached it will
block a query to wait for a cursor to be destroyed. Under heavy database
queries, this can result in Python deadlocking with itself, since the
SQL query will block waiting for a free cursor, but Python can no longer
run garbage collection (as it is blocked) to free one.

This restructures the SQLTable class to use two decorators to aid in
performing actions correctly. The first decorator (@retry) wraps a
member function in the retry logic that automatically restarts the
function in the event that the database is locked.

The second decorator (@transaction) wraps the function so that it occurs
in a database transaction, which will automatically COMMIT the changes
on success and ROLLBACK on failure. This function additionally creates
an explicit cursor, passes it to the wrapped function, and cleans it up
when the function is finished.

Note that it is still possible to leak cursors when iterating. This is
much less frequent, but can still be mitigated by wrapping the iteration
in a `with` statement:

 with db.iteritems() as it:
     for (k, v) in it:
         ...

As a side effect, since most statements are wrapped in a transaction,
setting the isolation_level when the connection is created is no longer
necessary.

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/lib/bb/persist_data.py | 188 +++++++++++++++++++++++----------
 1 file changed, 135 insertions(+), 53 deletions(-)

diff --git a/bitbake/lib/bb/persist_data.py b/bitbake/lib/bb/persist_data.py
index bef7018614d..1a6319f9498 100644
--- a/bitbake/lib/bb/persist_data.py
+++ b/bitbake/lib/bb/persist_data.py
@@ -29,6 +29,7 @@ import warnings
 from bb.compat import total_ordering
 from collections import Mapping
 import sqlite3
+import contextlib
 
 sqlversion = sqlite3.sqlite_version_info
 if sqlversion[0] < 3 or (sqlversion[0] == 3 and sqlversion[1] < 3):
@@ -45,75 +46,154 @@ if hasattr(sqlite3, 'enable_shared_cache'):
 
 @total_ordering
 class SQLTable(collections.MutableMapping):
+    class _Decorators(object):
+        @staticmethod
+        def retry(f):
+            """
+            Decorator that restarts a function if a database locked sqlite
+            exception occurs.
+            """
+            def wrap_func(self, *args, **kwargs):
+                count = 0
+                while True:
+                    try:
+                        return f(self, *args, **kwargs)
+                    except sqlite3.OperationalError as exc:
+                        if 'is locked' in str(exc) and count < 500:
+                            count = count + 1
+                            self.connection.close()
+                            self.connection = connect(self.cachefile)
+                            continue
+                        raise
+            return wrap_func
+
+        @staticmethod
+        def transaction(f):
+            """
+            Decorator that starts a database transaction and creates a database
+            cursor for performing queries. If no exception is thrown, the
+            database results are commited. If an exception occurs, the database
+            is rolled back. In all cases, the cursor is closed after the
+            function ends.
+
+            Note that the cursor is passed as an extra argument to the function
+            after `self` and before any of the normal arguments
+            """
+            def wrap_func(self, *args, **kwargs):
+                # Context manager will COMMIT the database on success,
+                # or ROLLBACK on an exception
+                with self.connection:
+                    # Automatically close the cursor when done
+                    with contextlib.closing(self.connection.cursor()) as cursor:
+                        return f(self, cursor, *args, **kwargs)
+            return wrap_func
+
     """Object representing a table/domain in the database"""
     def __init__(self, cachefile, table):
         self.cachefile = cachefile
         self.table = table
-        self.cursor = connect(self.cachefile)
-
-        self._execute("CREATE TABLE IF NOT EXISTS %s(key TEXT, value TEXT);"
-                      % table)
-
-    def _execute(self, *query):
-        """Execute a query, waiting to acquire a lock if necessary"""
-        count = 0
-        while True:
-            try:
-                return self.cursor.execute(*query)
-            except sqlite3.OperationalError as exc:
-                if 'database is locked' in str(exc) and count < 500:
-                    count = count + 1
+        self.connection = connect(self.cachefile)
+
+        self._execute_single("CREATE TABLE IF NOT EXISTS %s(key TEXT, value TEXT);" % table)
+
+    @_Decorators.retry
+    @_Decorators.transaction
+    def _execute_single(self, cursor, *query):
+        """
+        Executes a single query and discards the results. This correctly closes
+        the database cursor when finished
+        """
+        cursor.execute(*query)
+
+    @_Decorators.retry
+    def _row_iter(self, f, *query):
+        """
+        Helper function that returns a row iterator. Each time __next__ is
+        called on the iterator, the provided function is evaluated to determine
+        the return value
+        """
+        class CursorIter(object):
+            def __init__(self, cursor):
+                self.cursor = cursor
+
+            def __iter__(self):
+                return self
+
+            def __next__(self):
+                row = self.cursor.fetchone()
+                if row is None:
                     self.cursor.close()
-                    self.cursor = connect(self.cachefile)
-                    continue
-                raise
+                    raise StopIteration
+                return f(row)
+
+            def __enter__(self):
+                return self
+
+            def __exit__(self, typ, value, traceback):
+                self.cursor.close()
+                return False
+
+        cursor = self.connection.cursor()
+        try:
+            cursor.execute(*query)
+            return CursorIter(cursor)
+        except:
+            cursor.close()
 
     def __enter__(self):
-        self.cursor.__enter__()
+        self.connection.__enter__()
         return self
 
     def __exit__(self, *excinfo):
-        self.cursor.__exit__(*excinfo)
-
-    def __getitem__(self, key):
-        data = self._execute("SELECT * from %s where key=?;" %
-                             self.table, [key])
-        for row in data:
+        self.connection.__exit__(*excinfo)
+
+    @_Decorators.retry
+    @_Decorators.transaction
+    def __getitem__(self, cursor, key):
+        cursor.execute("SELECT * from %s where key=?;" % self.table, [key])
+        row = cursor.fetchone()
+        if row is not None:
             return row[1]
         raise KeyError(key)
 
-    def __delitem__(self, key):
+    @_Decorators.retry
+    @_Decorators.transaction
+    def __delitem__(self, cursor, key):
         if key not in self:
             raise KeyError(key)
-        self._execute("DELETE from %s where key=?;" % self.table, [key])
+        cursor.execute("DELETE from %s where key=?;" % self.table, [key])
 
-    def __setitem__(self, key, value):
+    @_Decorators.retry
+    @_Decorators.transaction
+    def __setitem__(self, cursor, key, value):
         if not isinstance(key, str):
             raise TypeError('Only string keys are supported')
         elif not isinstance(value, str):
             raise TypeError('Only string values are supported')
 
-        data = self._execute("SELECT * from %s where key=?;" %
-                                   self.table, [key])
-        exists = len(list(data))
-        if exists:
-            self._execute("UPDATE %s SET value=? WHERE key=?;" % self.table,
-                          [value, key])
+        cursor.execute("SELECT * from %s where key=?;" % self.table, [key])
+        row = cursor.fetchone()
+        if row is not None:
+            cursor.execute("UPDATE %s SET value=? WHERE key=?;" % self.table, [value, key])
         else:
-            self._execute("INSERT into %s(key, value) values (?, ?);" %
-                          self.table, [key, value])
-
-    def __contains__(self, key):
-        return key in set(self)
-
-    def __len__(self):
-        data = self._execute("SELECT COUNT(key) FROM %s;" % self.table)
-        for row in data:
+            cursor.execute("INSERT into %s(key, value) values (?, ?);" % self.table, [key, value])
+
+    @_Decorators.retry
+    @_Decorators.transaction
+    def __contains__(self, cursor, key):
+        cursor.execute('SELECT * from %s where key=?;' % self.table, [key])
+        return cursor.fetchone() is not None
+
+    @_Decorators.retry
+    @_Decorators.transaction
+    def __len__(self, cursor):
+        cursor.execute("SELECT COUNT(key) FROM %s;" % self.table)
+        row = cursor.fetchone()
+        if row is not None:
             return row[0]
 
     def __iter__(self):
-        data = self._execute("SELECT key FROM %s;" % self.table)
-        return (row[0] for row in data)
+        return self._row_iter(lambda row: row[0], "SELECT key from %s;" % self.table)
 
     def __lt__(self, other):
         if not isinstance(other, Mapping):
@@ -122,25 +202,27 @@ class SQLTable(collections.MutableMapping):
         return len(self) < len(other)
 
     def get_by_pattern(self, pattern):
-        data = self._execute("SELECT * FROM %s WHERE key LIKE ?;" %
-                             self.table, [pattern])
-        return [row[1] for row in data]
+        return self._row_iter(lambda row: row[1], "SELECT * FROM %s WHERE key LIKE ?;" %
+                              self.table, [pattern])
 
     def values(self):
         return list(self.itervalues())
 
     def itervalues(self):
-        data = self._execute("SELECT value FROM %s;" % self.table)
-        return (row[0] for row in data)
+        return self._row_iter(lambda row: row[0], "SELECT value FROM %s;" %
+                              self.table)
 
     def items(self):
         return list(self.iteritems())
 
     def iteritems(self):
-        return self._execute("SELECT * FROM %s;" % self.table)
+        return self._row_iter(lambda row: (row[0], row[1]), "SELECT * FROM %s;" %
+                              self.table)
 
-    def clear(self):
-        self._execute("DELETE FROM %s;" % self.table)
+    @_Decorators.retry
+    @_Decorators.transaction
+    def clear(self, cursor):
+        cursor.execute("DELETE FROM %s;" % self.table)
 
     def has_key(self, key):
         return key in self
@@ -195,7 +277,7 @@ class PersistData(object):
         del self.data[domain][key]
 
 def connect(database):
-    connection = sqlite3.connect(database, timeout=5, isolation_level=None)
+    connection = sqlite3.connect(database, timeout=5)
     connection.execute("pragma synchronous = off;")
     connection.text_factory = str
     return connection
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [PATCH v3 02/17] bitbake: persist_data: Fix leaking cursors causing deadlock
@ 2018-12-04  3:42       ` Joshua Watt
  0 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-04  3:42 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

The original implementation of persistent data executed all SQL
statements via sqlite3.Connection.execute(). Behind the scenes, this
function created a sqlite3 Cursor object, executed the statement, then
returned the cursor. However, the implementation did not account for
this and failed to close the cursor object when it was done. The cursor
would eventually be closed when the garbage collector got around to
destroying it. However, sqlite has a limit on the number of cursors that
can exist at any given time, and once this limit is reached it will
block a query to wait for a cursor to be destroyed. Under heavy database
queries, this can result in Python deadlocking with itself, since the
SQL query will block waiting for a free cursor, but Python can no longer
run garbage collection (as it is blocked) to free one.

This restructures the SQLTable class to use two decorators to aid in
performing actions correctly. The first decorator (@retry) wraps a
member function in the retry logic that automatically restarts the
function in the event that the database is locked.

The second decorator (@transaction) wraps the function so that it occurs
in a database transaction, which will automatically COMMIT the changes
on success and ROLLBACK on failure. This function additionally creates
an explicit cursor, passes it to the wrapped function, and cleans it up
when the function is finished.

Note that it is still possible to leak cursors when iterating. This is
much less frequent, but can still be mitigated by wrapping the iteration
in a `with` statement:

 with db.iteritems() as it:
     for (k, v) in it:
         ...

As a side effect, since most statements are wrapped in a transaction,
setting the isolation_level when the connection is created is no longer
necessary.

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/lib/bb/persist_data.py | 188 +++++++++++++++++++++++----------
 1 file changed, 135 insertions(+), 53 deletions(-)

diff --git a/bitbake/lib/bb/persist_data.py b/bitbake/lib/bb/persist_data.py
index bef7018614d..1a6319f9498 100644
--- a/bitbake/lib/bb/persist_data.py
+++ b/bitbake/lib/bb/persist_data.py
@@ -29,6 +29,7 @@ import warnings
 from bb.compat import total_ordering
 from collections import Mapping
 import sqlite3
+import contextlib
 
 sqlversion = sqlite3.sqlite_version_info
 if sqlversion[0] < 3 or (sqlversion[0] == 3 and sqlversion[1] < 3):
@@ -45,75 +46,154 @@ if hasattr(sqlite3, 'enable_shared_cache'):
 
 @total_ordering
 class SQLTable(collections.MutableMapping):
+    class _Decorators(object):
+        @staticmethod
+        def retry(f):
+            """
+            Decorator that restarts a function if a database locked sqlite
+            exception occurs.
+            """
+            def wrap_func(self, *args, **kwargs):
+                count = 0
+                while True:
+                    try:
+                        return f(self, *args, **kwargs)
+                    except sqlite3.OperationalError as exc:
+                        if 'is locked' in str(exc) and count < 500:
+                            count = count + 1
+                            self.connection.close()
+                            self.connection = connect(self.cachefile)
+                            continue
+                        raise
+            return wrap_func
+
+        @staticmethod
+        def transaction(f):
+            """
+            Decorator that starts a database transaction and creates a database
+            cursor for performing queries. If no exception is thrown, the
+            database results are commited. If an exception occurs, the database
+            is rolled back. In all cases, the cursor is closed after the
+            function ends.
+
+            Note that the cursor is passed as an extra argument to the function
+            after `self` and before any of the normal arguments
+            """
+            def wrap_func(self, *args, **kwargs):
+                # Context manager will COMMIT the database on success,
+                # or ROLLBACK on an exception
+                with self.connection:
+                    # Automatically close the cursor when done
+                    with contextlib.closing(self.connection.cursor()) as cursor:
+                        return f(self, cursor, *args, **kwargs)
+            return wrap_func
+
     """Object representing a table/domain in the database"""
     def __init__(self, cachefile, table):
         self.cachefile = cachefile
         self.table = table
-        self.cursor = connect(self.cachefile)
-
-        self._execute("CREATE TABLE IF NOT EXISTS %s(key TEXT, value TEXT);"
-                      % table)
-
-    def _execute(self, *query):
-        """Execute a query, waiting to acquire a lock if necessary"""
-        count = 0
-        while True:
-            try:
-                return self.cursor.execute(*query)
-            except sqlite3.OperationalError as exc:
-                if 'database is locked' in str(exc) and count < 500:
-                    count = count + 1
+        self.connection = connect(self.cachefile)
+
+        self._execute_single("CREATE TABLE IF NOT EXISTS %s(key TEXT, value TEXT);" % table)
+
+    @_Decorators.retry
+    @_Decorators.transaction
+    def _execute_single(self, cursor, *query):
+        """
+        Executes a single query and discards the results. This correctly closes
+        the database cursor when finished
+        """
+        cursor.execute(*query)
+
+    @_Decorators.retry
+    def _row_iter(self, f, *query):
+        """
+        Helper function that returns a row iterator. Each time __next__ is
+        called on the iterator, the provided function is evaluated to determine
+        the return value
+        """
+        class CursorIter(object):
+            def __init__(self, cursor):
+                self.cursor = cursor
+
+            def __iter__(self):
+                return self
+
+            def __next__(self):
+                row = self.cursor.fetchone()
+                if row is None:
                     self.cursor.close()
-                    self.cursor = connect(self.cachefile)
-                    continue
-                raise
+                    raise StopIteration
+                return f(row)
+
+            def __enter__(self):
+                return self
+
+            def __exit__(self, typ, value, traceback):
+                self.cursor.close()
+                return False
+
+        cursor = self.connection.cursor()
+        try:
+            cursor.execute(*query)
+            return CursorIter(cursor)
+        except:
+            cursor.close()
 
     def __enter__(self):
-        self.cursor.__enter__()
+        self.connection.__enter__()
         return self
 
     def __exit__(self, *excinfo):
-        self.cursor.__exit__(*excinfo)
-
-    def __getitem__(self, key):
-        data = self._execute("SELECT * from %s where key=?;" %
-                             self.table, [key])
-        for row in data:
+        self.connection.__exit__(*excinfo)
+
+    @_Decorators.retry
+    @_Decorators.transaction
+    def __getitem__(self, cursor, key):
+        cursor.execute("SELECT * from %s where key=?;" % self.table, [key])
+        row = cursor.fetchone()
+        if row is not None:
             return row[1]
         raise KeyError(key)
 
-    def __delitem__(self, key):
+    @_Decorators.retry
+    @_Decorators.transaction
+    def __delitem__(self, cursor, key):
         if key not in self:
             raise KeyError(key)
-        self._execute("DELETE from %s where key=?;" % self.table, [key])
+        cursor.execute("DELETE from %s where key=?;" % self.table, [key])
 
-    def __setitem__(self, key, value):
+    @_Decorators.retry
+    @_Decorators.transaction
+    def __setitem__(self, cursor, key, value):
         if not isinstance(key, str):
             raise TypeError('Only string keys are supported')
         elif not isinstance(value, str):
             raise TypeError('Only string values are supported')
 
-        data = self._execute("SELECT * from %s where key=?;" %
-                                   self.table, [key])
-        exists = len(list(data))
-        if exists:
-            self._execute("UPDATE %s SET value=? WHERE key=?;" % self.table,
-                          [value, key])
+        cursor.execute("SELECT * from %s where key=?;" % self.table, [key])
+        row = cursor.fetchone()
+        if row is not None:
+            cursor.execute("UPDATE %s SET value=? WHERE key=?;" % self.table, [value, key])
         else:
-            self._execute("INSERT into %s(key, value) values (?, ?);" %
-                          self.table, [key, value])
-
-    def __contains__(self, key):
-        return key in set(self)
-
-    def __len__(self):
-        data = self._execute("SELECT COUNT(key) FROM %s;" % self.table)
-        for row in data:
+            cursor.execute("INSERT into %s(key, value) values (?, ?);" % self.table, [key, value])
+
+    @_Decorators.retry
+    @_Decorators.transaction
+    def __contains__(self, cursor, key):
+        cursor.execute('SELECT * from %s where key=?;' % self.table, [key])
+        return cursor.fetchone() is not None
+
+    @_Decorators.retry
+    @_Decorators.transaction
+    def __len__(self, cursor):
+        cursor.execute("SELECT COUNT(key) FROM %s;" % self.table)
+        row = cursor.fetchone()
+        if row is not None:
             return row[0]
 
     def __iter__(self):
-        data = self._execute("SELECT key FROM %s;" % self.table)
-        return (row[0] for row in data)
+        return self._row_iter(lambda row: row[0], "SELECT key from %s;" % self.table)
 
     def __lt__(self, other):
         if not isinstance(other, Mapping):
@@ -122,25 +202,27 @@ class SQLTable(collections.MutableMapping):
         return len(self) < len(other)
 
     def get_by_pattern(self, pattern):
-        data = self._execute("SELECT * FROM %s WHERE key LIKE ?;" %
-                             self.table, [pattern])
-        return [row[1] for row in data]
+        return self._row_iter(lambda row: row[1], "SELECT * FROM %s WHERE key LIKE ?;" %
+                              self.table, [pattern])
 
     def values(self):
         return list(self.itervalues())
 
     def itervalues(self):
-        data = self._execute("SELECT value FROM %s;" % self.table)
-        return (row[0] for row in data)
+        return self._row_iter(lambda row: row[0], "SELECT value FROM %s;" %
+                              self.table)
 
     def items(self):
         return list(self.iteritems())
 
     def iteritems(self):
-        return self._execute("SELECT * FROM %s;" % self.table)
+        return self._row_iter(lambda row: (row[0], row[1]), "SELECT * FROM %s;" %
+                              self.table)
 
-    def clear(self):
-        self._execute("DELETE FROM %s;" % self.table)
+    @_Decorators.retry
+    @_Decorators.transaction
+    def clear(self, cursor):
+        cursor.execute("DELETE FROM %s;" % self.table)
 
     def has_key(self, key):
         return key in self
@@ -195,7 +277,7 @@ class PersistData(object):
         del self.data[domain][key]
 
 def connect(database):
-    connection = sqlite3.connect(database, timeout=5, isolation_level=None)
+    connection = sqlite3.connect(database, timeout=5)
     connection.execute("pragma synchronous = off;")
     connection.text_factory = str
     return connection
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [OE-core][PATCH v3 03/17] bitbake: persist_data: Add key constraints
  2018-12-04  3:42     ` [PATCH " Joshua Watt
@ 2018-12-04  3:42       ` Joshua Watt
  -1 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-04  3:42 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

Constructs the "key" column in the persistent database as a non-NULL
primary key. This significantly speeds up lookup operations in large
databases.

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/lib/bb/persist_data.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/bitbake/lib/bb/persist_data.py b/bitbake/lib/bb/persist_data.py
index 1a6319f9498..2bc3e766a93 100644
--- a/bitbake/lib/bb/persist_data.py
+++ b/bitbake/lib/bb/persist_data.py
@@ -94,7 +94,7 @@ class SQLTable(collections.MutableMapping):
         self.table = table
         self.connection = connect(self.cachefile)
 
-        self._execute_single("CREATE TABLE IF NOT EXISTS %s(key TEXT, value TEXT);" % table)
+        self._execute_single("CREATE TABLE IF NOT EXISTS %s(key TEXT PRIMARY KEY NOT NULL, value TEXT);" % table)
 
     @_Decorators.retry
     @_Decorators.transaction
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [PATCH v3 03/17] bitbake: persist_data: Add key constraints
@ 2018-12-04  3:42       ` Joshua Watt
  0 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-04  3:42 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

Constructs the "key" column in the persistent database as a non-NULL
primary key. This significantly speeds up lookup operations in large
databases.

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/lib/bb/persist_data.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/bitbake/lib/bb/persist_data.py b/bitbake/lib/bb/persist_data.py
index 1a6319f9498..2bc3e766a93 100644
--- a/bitbake/lib/bb/persist_data.py
+++ b/bitbake/lib/bb/persist_data.py
@@ -94,7 +94,7 @@ class SQLTable(collections.MutableMapping):
         self.table = table
         self.connection = connect(self.cachefile)
 
-        self._execute_single("CREATE TABLE IF NOT EXISTS %s(key TEXT, value TEXT);" % table)
+        self._execute_single("CREATE TABLE IF NOT EXISTS %s(key TEXT PRIMARY KEY NOT NULL, value TEXT);" % table)
 
     @_Decorators.retry
     @_Decorators.transaction
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [OE-core][PATCH v3 04/17] bitbake: persist_data: Enable Write Ahead Log
  2018-12-04  3:42     ` [PATCH " Joshua Watt
@ 2018-12-04  3:42       ` Joshua Watt
  -1 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-04  3:42 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

Enabling the write ahead log improves database reliability, speeds up
writes (since they mostly happen sequentially), and speeds up readers
(since they are no longer blocked by most write operations). The
persistent database is very read heavy, so the auto-checkpoint size is
reduced from the default (usually 1000) to 100 so that reads remain
fast.

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/lib/bb/persist_data.py | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/bitbake/lib/bb/persist_data.py b/bitbake/lib/bb/persist_data.py
index 2bc3e766a93..14927920908 100644
--- a/bitbake/lib/bb/persist_data.py
+++ b/bitbake/lib/bb/persist_data.py
@@ -279,6 +279,11 @@ class PersistData(object):
 def connect(database):
     connection = sqlite3.connect(database, timeout=5)
     connection.execute("pragma synchronous = off;")
+    # Enable WAL and keep the autocheckpoint length small (the default is
+    # usually 1000). Persistent caches are usually read-mostly, so keeping
+    # this short will keep readers running quickly
+    connection.execute("pragma journal_mode = WAL;")
+    connection.execute("pragma wal_autocheckpoint = 100;")
     connection.text_factory = str
     return connection
 
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [PATCH v3 04/17] bitbake: persist_data: Enable Write Ahead Log
@ 2018-12-04  3:42       ` Joshua Watt
  0 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-04  3:42 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

Enabling the write ahead log improves database reliability, speeds up
writes (since they mostly happen sequentially), and speeds up readers
(since they are no longer blocked by most write operations). The
persistent database is very read heavy, so the auto-checkpoint size is
reduced from the default (usually 1000) to 100 so that reads remain
fast.

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/lib/bb/persist_data.py | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/bitbake/lib/bb/persist_data.py b/bitbake/lib/bb/persist_data.py
index 2bc3e766a93..14927920908 100644
--- a/bitbake/lib/bb/persist_data.py
+++ b/bitbake/lib/bb/persist_data.py
@@ -279,6 +279,11 @@ class PersistData(object):
 def connect(database):
     connection = sqlite3.connect(database, timeout=5)
     connection.execute("pragma synchronous = off;")
+    # Enable WAL and keep the autocheckpoint length small (the default is
+    # usually 1000). Persistent caches are usually read-mostly, so keeping
+    # this short will keep readers running quickly
+    connection.execute("pragma journal_mode = WAL;")
+    connection.execute("pragma wal_autocheckpoint = 100;")
     connection.text_factory = str
     return connection
 
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [OE-core][PATCH v3 05/17] bitbake: persist_data: Disable enable_shared_cache
  2018-12-04  3:42     ` [PATCH " Joshua Watt
@ 2018-12-04  3:42       ` Joshua Watt
  -1 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-04  3:42 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

Turns off the shared cache. It isn't a significant factor in performance
(now that WAL is enabled), and is a really bad idea to have enabled in
processes that fork() (as bitbake it prone to do).

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/lib/bb/persist_data.py | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/bitbake/lib/bb/persist_data.py b/bitbake/lib/bb/persist_data.py
index 14927920908..41fcf2a41c4 100644
--- a/bitbake/lib/bb/persist_data.py
+++ b/bitbake/lib/bb/persist_data.py
@@ -37,12 +37,6 @@ if sqlversion[0] < 3 or (sqlversion[0] == 3 and sqlversion[1] < 3):
 
 
 logger = logging.getLogger("BitBake.PersistData")
-if hasattr(sqlite3, 'enable_shared_cache'):
-    try:
-        sqlite3.enable_shared_cache(True)
-    except sqlite3.OperationalError:
-        pass
-
 
 @total_ordering
 class SQLTable(collections.MutableMapping):
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [PATCH v3 05/17] bitbake: persist_data: Disable enable_shared_cache
@ 2018-12-04  3:42       ` Joshua Watt
  0 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-04  3:42 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

Turns off the shared cache. It isn't a significant factor in performance
(now that WAL is enabled), and is a really bad idea to have enabled in
processes that fork() (as bitbake it prone to do).

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/lib/bb/persist_data.py | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/bitbake/lib/bb/persist_data.py b/bitbake/lib/bb/persist_data.py
index 14927920908..41fcf2a41c4 100644
--- a/bitbake/lib/bb/persist_data.py
+++ b/bitbake/lib/bb/persist_data.py
@@ -37,12 +37,6 @@ if sqlversion[0] < 3 or (sqlversion[0] == 3 and sqlversion[1] < 3):
 
 
 logger = logging.getLogger("BitBake.PersistData")
-if hasattr(sqlite3, 'enable_shared_cache'):
-    try:
-        sqlite3.enable_shared_cache(True)
-    except sqlite3.OperationalError:
-        pass
-
 
 @total_ordering
 class SQLTable(collections.MutableMapping):
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [OE-core][PATCH v3 06/17] bitbake: persist_data: Close databases across fork
  2018-12-04  3:42     ` [PATCH " Joshua Watt
@ 2018-12-04  3:42       ` Joshua Watt
  -1 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-04  3:42 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

sqlite gets really angry if a database is open across a fork() call,
and will give all sorts of messages ranging from I/O errors to database
corruption errors. To deal with this, close all database connections
before forking, and reopen them (lazily) on the other side.

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/lib/bb/persist_data.py | 44 +++++++++++++++++++++++++++++++---
 1 file changed, 41 insertions(+), 3 deletions(-)

diff --git a/bitbake/lib/bb/persist_data.py b/bitbake/lib/bb/persist_data.py
index 41fcf2a41c4..29feb78203b 100644
--- a/bitbake/lib/bb/persist_data.py
+++ b/bitbake/lib/bb/persist_data.py
@@ -30,6 +30,8 @@ from bb.compat import total_ordering
 from collections import Mapping
 import sqlite3
 import contextlib
+import bb.fork
+import weakref
 
 sqlversion = sqlite3.sqlite_version_info
 if sqlversion[0] < 3 or (sqlversion[0] == 3 and sqlversion[1] < 3):
@@ -38,6 +40,28 @@ if sqlversion[0] < 3 or (sqlversion[0] == 3 and sqlversion[1] < 3):
 
 logger = logging.getLogger("BitBake.PersistData")
 
+# Carrying an open database connection across a fork() confuses sqlite and
+# results in fun errors like 'database disk image is malformed'.
+# To remedy this, close all connections before forking, then they will be
+# (lazily) reopen them on the other side. This will cause a lot of problems if
+# there are threads running and trying to access the database at the same time,
+# but if you are mixing threads and fork() you have no one to blame but
+# yourself. If that is discovered to be a problem in the future, some sort of
+# per-table reader-writer lock could be used to block the fork() until all
+# pending transactions complete
+sql_table_weakrefs = []
+def _fork_before_handler():
+    for ref in sql_table_weakrefs:
+        t = ref()
+        if t is not None and t.connection is not None:
+            t.connection.close()
+            t.connection = None
+
+bb.fork.register_at_fork(before=_fork_before_handler)
+
+def _remove_table_weakref(ref):
+    sql_table_weakrefs.remove(ref)
+
 @total_ordering
 class SQLTable(collections.MutableMapping):
     class _Decorators(object):
@@ -48,6 +72,10 @@ class SQLTable(collections.MutableMapping):
             exception occurs.
             """
             def wrap_func(self, *args, **kwargs):
+                # Reconnect if necessary
+                if self.connection is None:
+                    self.reconnect()
+
                 count = 0
                 while True:
                     try:
@@ -55,8 +83,7 @@ class SQLTable(collections.MutableMapping):
                     except sqlite3.OperationalError as exc:
                         if 'is locked' in str(exc) and count < 500:
                             count = count + 1
-                            self.connection.close()
-                            self.connection = connect(self.cachefile)
+                            self.reconnect()
                             continue
                         raise
             return wrap_func
@@ -90,6 +117,11 @@ class SQLTable(collections.MutableMapping):
 
         self._execute_single("CREATE TABLE IF NOT EXISTS %s(key TEXT PRIMARY KEY NOT NULL, value TEXT);" % table)
 
+    def reconnect(self):
+        if self.connection is not None:
+            self.connection.close()
+        self.connection = connect(self.cachefile)
+
     @_Decorators.retry
     @_Decorators.transaction
     def _execute_single(self, cursor, *query):
@@ -292,4 +324,10 @@ def persist(domain, d):
 
     bb.utils.mkdirhier(cachedir)
     cachefile = os.path.join(cachedir, "bb_persist_data.sqlite3")
-    return SQLTable(cachefile, domain)
+    t = SQLTable(cachefile, domain)
+
+    # Add a weak reference to the table list. The weak reference will not keep
+    # the object alive by itself, so it prevents circular reference counts
+    sql_table_weakrefs.append(weakref.ref(t, _remove_table_weakref))
+
+    return t
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [PATCH v3 06/17] bitbake: persist_data: Close databases across fork
@ 2018-12-04  3:42       ` Joshua Watt
  0 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-04  3:42 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

sqlite gets really angry if a database is open across a fork() call,
and will give all sorts of messages ranging from I/O errors to database
corruption errors. To deal with this, close all database connections
before forking, and reopen them (lazily) on the other side.

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/lib/bb/persist_data.py | 44 +++++++++++++++++++++++++++++++---
 1 file changed, 41 insertions(+), 3 deletions(-)

diff --git a/bitbake/lib/bb/persist_data.py b/bitbake/lib/bb/persist_data.py
index 41fcf2a41c4..29feb78203b 100644
--- a/bitbake/lib/bb/persist_data.py
+++ b/bitbake/lib/bb/persist_data.py
@@ -30,6 +30,8 @@ from bb.compat import total_ordering
 from collections import Mapping
 import sqlite3
 import contextlib
+import bb.fork
+import weakref
 
 sqlversion = sqlite3.sqlite_version_info
 if sqlversion[0] < 3 or (sqlversion[0] == 3 and sqlversion[1] < 3):
@@ -38,6 +40,28 @@ if sqlversion[0] < 3 or (sqlversion[0] == 3 and sqlversion[1] < 3):
 
 logger = logging.getLogger("BitBake.PersistData")
 
+# Carrying an open database connection across a fork() confuses sqlite and
+# results in fun errors like 'database disk image is malformed'.
+# To remedy this, close all connections before forking, then they will be
+# (lazily) reopen them on the other side. This will cause a lot of problems if
+# there are threads running and trying to access the database at the same time,
+# but if you are mixing threads and fork() you have no one to blame but
+# yourself. If that is discovered to be a problem in the future, some sort of
+# per-table reader-writer lock could be used to block the fork() until all
+# pending transactions complete
+sql_table_weakrefs = []
+def _fork_before_handler():
+    for ref in sql_table_weakrefs:
+        t = ref()
+        if t is not None and t.connection is not None:
+            t.connection.close()
+            t.connection = None
+
+bb.fork.register_at_fork(before=_fork_before_handler)
+
+def _remove_table_weakref(ref):
+    sql_table_weakrefs.remove(ref)
+
 @total_ordering
 class SQLTable(collections.MutableMapping):
     class _Decorators(object):
@@ -48,6 +72,10 @@ class SQLTable(collections.MutableMapping):
             exception occurs.
             """
             def wrap_func(self, *args, **kwargs):
+                # Reconnect if necessary
+                if self.connection is None:
+                    self.reconnect()
+
                 count = 0
                 while True:
                     try:
@@ -55,8 +83,7 @@ class SQLTable(collections.MutableMapping):
                     except sqlite3.OperationalError as exc:
                         if 'is locked' in str(exc) and count < 500:
                             count = count + 1
-                            self.connection.close()
-                            self.connection = connect(self.cachefile)
+                            self.reconnect()
                             continue
                         raise
             return wrap_func
@@ -90,6 +117,11 @@ class SQLTable(collections.MutableMapping):
 
         self._execute_single("CREATE TABLE IF NOT EXISTS %s(key TEXT PRIMARY KEY NOT NULL, value TEXT);" % table)
 
+    def reconnect(self):
+        if self.connection is not None:
+            self.connection.close()
+        self.connection = connect(self.cachefile)
+
     @_Decorators.retry
     @_Decorators.transaction
     def _execute_single(self, cursor, *query):
@@ -292,4 +324,10 @@ def persist(domain, d):
 
     bb.utils.mkdirhier(cachedir)
     cachefile = os.path.join(cachedir, "bb_persist_data.sqlite3")
-    return SQLTable(cachefile, domain)
+    t = SQLTable(cachefile, domain)
+
+    # Add a weak reference to the table list. The weak reference will not keep
+    # the object alive by itself, so it prevents circular reference counts
+    sql_table_weakrefs.append(weakref.ref(t, _remove_table_weakref))
+
+    return t
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [OE-core][PATCH v3 07/17] bitbake: tests/persist_data: Add tests
  2018-12-04  3:42     ` [PATCH " Joshua Watt
@ 2018-12-04  3:42       ` Joshua Watt
  -1 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-04  3:42 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

Adds a test suite for testing the persistent data cache

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/bin/bitbake-selftest         |   1 +
 bitbake/lib/bb/tests/persist_data.py | 188 +++++++++++++++++++++++++++
 2 files changed, 189 insertions(+)
 create mode 100644 bitbake/lib/bb/tests/persist_data.py

diff --git a/bitbake/bin/bitbake-selftest b/bitbake/bin/bitbake-selftest
index cfa7ac5391b..c970dcae90c 100755
--- a/bitbake/bin/bitbake-selftest
+++ b/bitbake/bin/bitbake-selftest
@@ -33,6 +33,7 @@ tests = ["bb.tests.codeparser",
          "bb.tests.event",
          "bb.tests.fetch",
          "bb.tests.parse",
+         "bb.tests.persist_data",
          "bb.tests.utils",
          "layerindexlib.tests.layerindexobj",
          "layerindexlib.tests.restapi",
diff --git a/bitbake/lib/bb/tests/persist_data.py b/bitbake/lib/bb/tests/persist_data.py
new file mode 100644
index 00000000000..055f1d9ce47
--- /dev/null
+++ b/bitbake/lib/bb/tests/persist_data.py
@@ -0,0 +1,188 @@
+# ex:ts=4:sw=4:sts=4:et
+# -*- tab-width: 4; c-basic-offset: 4; indent-tabs-mode: nil -*-
+#
+# BitBake Test for lib/bb/persist_data/
+#
+# Copyright (C) 2018 Garmin Ltd.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License version 2 as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License along
+# with this program; if not, write to the Free Software Foundation, Inc.,
+# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+#
+
+import unittest
+import bb.data
+import bb.persist_data
+import bb.fork
+import tempfile
+import threading
+
+class PersistDataTest(unittest.TestCase):
+    def _create_data(self):
+        return bb.persist_data.persist('TEST_PERSIST_DATA', self.d)
+
+    def setUp(self):
+        self.d = bb.data.init()
+        self.tempdir = tempfile.TemporaryDirectory()
+        self.d['PERSISTENT_DIR'] = self.tempdir.name
+        self.data = self._create_data()
+        self.items = {
+                'A1': '1',
+                'B1': '2',
+                'C2': '3'
+                }
+        self.stress_count = 10000
+        self.thread_count = 5
+
+        for k,v in self.items.items():
+            self.data[k] = v
+
+    def tearDown(self):
+        self.tempdir.cleanup()
+
+    def _iter_helper(self, seen, iterator):
+        with iter(iterator):
+            for v in iterator:
+                self.assertTrue(v in seen)
+                seen.remove(v)
+        self.assertEqual(len(seen), 0, '%s not seen' % seen)
+
+    def test_get(self):
+        for k, v in self.items.items():
+            self.assertEqual(self.data[k], v)
+
+        self.assertIsNone(self.data.get('D'))
+        with self.assertRaises(KeyError):
+            self.data['D']
+
+    def test_set(self):
+        for k, v in self.items.items():
+            self.data[k] += '-foo'
+
+        for k, v in self.items.items():
+            self.assertEqual(self.data[k], v + '-foo')
+
+    def test_delete(self):
+        self.data['D'] = '4'
+        self.assertEqual(self.data['D'], '4')
+        del self.data['D']
+        self.assertIsNone(self.data.get('D'))
+        with self.assertRaises(KeyError):
+            self.data['D']
+
+    def test_contains(self):
+        for k in self.items:
+            self.assertTrue(k in self.data)
+            self.assertTrue(self.data.has_key(k))
+        self.assertFalse('NotFound' in self.data)
+        self.assertFalse(self.data.has_key('NotFound'))
+
+    def test_len(self):
+        self.assertEqual(len(self.data), len(self.items))
+
+    def test_iter(self):
+        self._iter_helper(set(self.items.keys()), self.data)
+
+    def test_itervalues(self):
+        self._iter_helper(set(self.items.values()), self.data.itervalues())
+
+    def test_iteritems(self):
+        self._iter_helper(set(self.items.items()), self.data.iteritems())
+
+    def test_get_by_pattern(self):
+        self._iter_helper({'1', '2'}, self.data.get_by_pattern('_1'))
+
+    def _stress_read(self, data):
+        for i in range(self.stress_count):
+            for k in self.items:
+                data[k]
+
+    def _stress_write(self, data):
+        for i in range(self.stress_count):
+            for k, v in self.items.items():
+                data[k] = v + str(i)
+
+    def _validate_stress(self):
+        for k, v in self.items.items():
+            self.assertEqual(self.data[k], v + str(self.stress_count - 1))
+
+    def test_stress(self):
+        self._stress_read(self.data)
+        self._stress_write(self.data)
+        self._validate_stress()
+
+    def test_stress_threads(self):
+        def read_thread():
+            data = self._create_data()
+            self._stress_read(data)
+
+        def write_thread():
+            data = self._create_data()
+            self._stress_write(data)
+
+        threads = []
+        for i in range(self.thread_count):
+            threads.append(threading.Thread(target=read_thread))
+            threads.append(threading.Thread(target=write_thread))
+
+        for t in threads:
+            t.start()
+        self._stress_read(self.data)
+        for t in threads:
+            t.join()
+        self._validate_stress()
+
+    def test_stress_fork(self):
+        children = []
+        for i in range(self.thread_count):
+            # Create a writer
+            pid = bb.fork.fork()
+            if pid == 0:
+                try:
+                    self._stress_write(self.data)
+                except:
+                    os._exit(1)
+                else:
+                    os._exit(0)
+            else:
+                children.append(pid)
+
+            # Create a reader
+            pid = bb.fork.fork()
+            if pid == 0:
+                try:
+                    self._stress_read(self.data)
+                except:
+                    os._exit(1)
+                else:
+                    os._exit(0)
+            else:
+                children.append(pid)
+
+        self._stress_read(self.data)
+
+        for pid in children:
+            while True:
+                try:
+                    (_, status) = os.waitpid(pid, 0)
+                    break
+                # Python < 3.5 will raise this if waitpid() is interrupted
+                except InterruptedError:
+                    pass
+                except:
+                    raise
+
+            self.assertTrue(os.WIFEXITED(status), "PID %d did not exit normally" % pid)
+            self.assertEqual(os.WEXITSTATUS(status), 0, "PID %d exited with code %d" % (pid, os.WEXITSTATUS(status)))
+
+        self._validate_stress()
+
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [PATCH v3 07/17] bitbake: tests/persist_data: Add tests
@ 2018-12-04  3:42       ` Joshua Watt
  0 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-04  3:42 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

Adds a test suite for testing the persistent data cache

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/bin/bitbake-selftest         |   1 +
 bitbake/lib/bb/tests/persist_data.py | 188 +++++++++++++++++++++++++++
 2 files changed, 189 insertions(+)
 create mode 100644 bitbake/lib/bb/tests/persist_data.py

diff --git a/bitbake/bin/bitbake-selftest b/bitbake/bin/bitbake-selftest
index cfa7ac5391b..c970dcae90c 100755
--- a/bitbake/bin/bitbake-selftest
+++ b/bitbake/bin/bitbake-selftest
@@ -33,6 +33,7 @@ tests = ["bb.tests.codeparser",
          "bb.tests.event",
          "bb.tests.fetch",
          "bb.tests.parse",
+         "bb.tests.persist_data",
          "bb.tests.utils",
          "layerindexlib.tests.layerindexobj",
          "layerindexlib.tests.restapi",
diff --git a/bitbake/lib/bb/tests/persist_data.py b/bitbake/lib/bb/tests/persist_data.py
new file mode 100644
index 00000000000..055f1d9ce47
--- /dev/null
+++ b/bitbake/lib/bb/tests/persist_data.py
@@ -0,0 +1,188 @@
+# ex:ts=4:sw=4:sts=4:et
+# -*- tab-width: 4; c-basic-offset: 4; indent-tabs-mode: nil -*-
+#
+# BitBake Test for lib/bb/persist_data/
+#
+# Copyright (C) 2018 Garmin Ltd.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License version 2 as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License along
+# with this program; if not, write to the Free Software Foundation, Inc.,
+# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+#
+
+import unittest
+import bb.data
+import bb.persist_data
+import bb.fork
+import tempfile
+import threading
+
+class PersistDataTest(unittest.TestCase):
+    def _create_data(self):
+        return bb.persist_data.persist('TEST_PERSIST_DATA', self.d)
+
+    def setUp(self):
+        self.d = bb.data.init()
+        self.tempdir = tempfile.TemporaryDirectory()
+        self.d['PERSISTENT_DIR'] = self.tempdir.name
+        self.data = self._create_data()
+        self.items = {
+                'A1': '1',
+                'B1': '2',
+                'C2': '3'
+                }
+        self.stress_count = 10000
+        self.thread_count = 5
+
+        for k,v in self.items.items():
+            self.data[k] = v
+
+    def tearDown(self):
+        self.tempdir.cleanup()
+
+    def _iter_helper(self, seen, iterator):
+        with iter(iterator):
+            for v in iterator:
+                self.assertTrue(v in seen)
+                seen.remove(v)
+        self.assertEqual(len(seen), 0, '%s not seen' % seen)
+
+    def test_get(self):
+        for k, v in self.items.items():
+            self.assertEqual(self.data[k], v)
+
+        self.assertIsNone(self.data.get('D'))
+        with self.assertRaises(KeyError):
+            self.data['D']
+
+    def test_set(self):
+        for k, v in self.items.items():
+            self.data[k] += '-foo'
+
+        for k, v in self.items.items():
+            self.assertEqual(self.data[k], v + '-foo')
+
+    def test_delete(self):
+        self.data['D'] = '4'
+        self.assertEqual(self.data['D'], '4')
+        del self.data['D']
+        self.assertIsNone(self.data.get('D'))
+        with self.assertRaises(KeyError):
+            self.data['D']
+
+    def test_contains(self):
+        for k in self.items:
+            self.assertTrue(k in self.data)
+            self.assertTrue(self.data.has_key(k))
+        self.assertFalse('NotFound' in self.data)
+        self.assertFalse(self.data.has_key('NotFound'))
+
+    def test_len(self):
+        self.assertEqual(len(self.data), len(self.items))
+
+    def test_iter(self):
+        self._iter_helper(set(self.items.keys()), self.data)
+
+    def test_itervalues(self):
+        self._iter_helper(set(self.items.values()), self.data.itervalues())
+
+    def test_iteritems(self):
+        self._iter_helper(set(self.items.items()), self.data.iteritems())
+
+    def test_get_by_pattern(self):
+        self._iter_helper({'1', '2'}, self.data.get_by_pattern('_1'))
+
+    def _stress_read(self, data):
+        for i in range(self.stress_count):
+            for k in self.items:
+                data[k]
+
+    def _stress_write(self, data):
+        for i in range(self.stress_count):
+            for k, v in self.items.items():
+                data[k] = v + str(i)
+
+    def _validate_stress(self):
+        for k, v in self.items.items():
+            self.assertEqual(self.data[k], v + str(self.stress_count - 1))
+
+    def test_stress(self):
+        self._stress_read(self.data)
+        self._stress_write(self.data)
+        self._validate_stress()
+
+    def test_stress_threads(self):
+        def read_thread():
+            data = self._create_data()
+            self._stress_read(data)
+
+        def write_thread():
+            data = self._create_data()
+            self._stress_write(data)
+
+        threads = []
+        for i in range(self.thread_count):
+            threads.append(threading.Thread(target=read_thread))
+            threads.append(threading.Thread(target=write_thread))
+
+        for t in threads:
+            t.start()
+        self._stress_read(self.data)
+        for t in threads:
+            t.join()
+        self._validate_stress()
+
+    def test_stress_fork(self):
+        children = []
+        for i in range(self.thread_count):
+            # Create a writer
+            pid = bb.fork.fork()
+            if pid == 0:
+                try:
+                    self._stress_write(self.data)
+                except:
+                    os._exit(1)
+                else:
+                    os._exit(0)
+            else:
+                children.append(pid)
+
+            # Create a reader
+            pid = bb.fork.fork()
+            if pid == 0:
+                try:
+                    self._stress_read(self.data)
+                except:
+                    os._exit(1)
+                else:
+                    os._exit(0)
+            else:
+                children.append(pid)
+
+        self._stress_read(self.data)
+
+        for pid in children:
+            while True:
+                try:
+                    (_, status) = os.waitpid(pid, 0)
+                    break
+                # Python < 3.5 will raise this if waitpid() is interrupted
+                except InterruptedError:
+                    pass
+                except:
+                    raise
+
+            self.assertTrue(os.WIFEXITED(status), "PID %d did not exit normally" % pid)
+            self.assertEqual(os.WEXITSTATUS(status), 0, "PID %d exited with code %d" % (pid, os.WEXITSTATUS(status)))
+
+        self._validate_stress()
+
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [OE-core][PATCH v3 08/17] bitbake: bitbake-worker: Pass taskhash as runtask parameter
  2018-12-04  3:42     ` [PATCH " Joshua Watt
@ 2018-12-04  3:42       ` Joshua Watt
  -1 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-04  3:42 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

Pass the task hash as a parameter to the 'runtask' message instead of
passing the entire dictionary of hashes when the worker is setup. This
is possible less efficient, but prevents the worker taskhashes from
being out of sync with the runqueue in the event that the taskhashes in
the runqueue change.

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/bin/bitbake-worker |  8 ++++----
 bitbake/lib/bb/runqueue.py | 15 ++++++---------
 2 files changed, 10 insertions(+), 13 deletions(-)

diff --git a/bitbake/bin/bitbake-worker b/bitbake/bin/bitbake-worker
index baa1a84e6dd..41ef6d848ac 100755
--- a/bitbake/bin/bitbake-worker
+++ b/bitbake/bin/bitbake-worker
@@ -136,7 +136,7 @@ def sigterm_handler(signum, frame):
     os.killpg(0, signal.SIGTERM)
     sys.exit()
 
-def fork_off_task(cfg, data, databuilder, workerdata, fn, task, taskname, appends, taskdepdata, extraconfigdata, quieterrors=False, dry_run_exec=False):
+def fork_off_task(cfg, data, databuilder, workerdata, fn, task, taskname, taskhash, appends, taskdepdata, extraconfigdata, quieterrors=False, dry_run_exec=False):
     # We need to setup the environment BEFORE the fork, since
     # a fork() or exec*() activates PSEUDO...
 
@@ -234,7 +234,7 @@ def fork_off_task(cfg, data, databuilder, workerdata, fn, task, taskname, append
                 ret = 0
 
                 the_data = bb_cache.loadDataFull(fn, appends)
-                the_data.setVar('BB_TASKHASH', workerdata["runq_hash"][task])
+                the_data.setVar('BB_TASKHASH', taskhash)
 
                 bb.utils.set_process_name("%s:%s" % (the_data.getVar("PN"), taskname.replace("do_", "")))
 
@@ -425,10 +425,10 @@ class BitbakeWorker(object):
         sys.exit(0)
 
     def handle_runtask(self, data):
-        fn, task, taskname, quieterrors, appends, taskdepdata, dry_run_exec = pickle.loads(data)
+        fn, task, taskname, taskhash, quieterrors, appends, taskdepdata, dry_run_exec = pickle.loads(data)
         workerlog_write("Handling runtask %s %s %s\n" % (task, fn, taskname))
 
-        pid, pipein, pipeout = fork_off_task(self.cookercfg, self.data, self.databuilder, self.workerdata, fn, task, taskname, appends, taskdepdata, self.extraconfigdata, quieterrors, dry_run_exec)
+        pid, pipein, pipeout = fork_off_task(self.cookercfg, self.data, self.databuilder, self.workerdata, fn, task, taskname, taskhash, appends, taskdepdata, self.extraconfigdata, quieterrors, dry_run_exec)
 
         self.build_pids[pid] = task
         self.build_pipes[pid] = runQueueWorkerPipe(pipein, pipeout)
diff --git a/bitbake/lib/bb/runqueue.py b/bitbake/lib/bb/runqueue.py
index 4d5d8767973..f2b95a9829b 100644
--- a/bitbake/lib/bb/runqueue.py
+++ b/bitbake/lib/bb/runqueue.py
@@ -1224,17 +1224,12 @@ class RunQueue:
         bb.utils.nonblockingfd(worker.stdout)
         workerpipe = runQueuePipe(worker.stdout, None, self.cfgData, self, rqexec)
 
-        runqhash = {}
-        for tid in self.rqdata.runtaskentries:
-            runqhash[tid] = self.rqdata.runtaskentries[tid].hash
-
         workerdata = {
             "taskdeps" : self.rqdata.dataCaches[mc].task_deps,
             "fakerootenv" : self.rqdata.dataCaches[mc].fakerootenv,
             "fakerootdirs" : self.rqdata.dataCaches[mc].fakerootdirs,
             "fakerootnoenv" : self.rqdata.dataCaches[mc].fakerootnoenv,
             "sigdata" : bb.parse.siggen.get_taskdata(),
-            "runq_hash" : runqhash,
             "logdefaultdebug" : bb.msg.loggerDefaultDebugLevel,
             "logdefaultverbose" : bb.msg.loggerDefaultVerbose,
             "logdefaultverboselogs" : bb.msg.loggerVerboseLogs,
@@ -2031,6 +2026,7 @@ class RunQueueExecuteTasks(RunQueueExecute):
             taskdepdata = self.build_taskdepdata(task)
 
             taskdep = self.rqdata.dataCaches[mc].task_deps[taskfn]
+            taskhash = self.rqdata.get_task_hash(task)
             if 'fakeroot' in taskdep and taskname in taskdep['fakeroot'] and not (self.cooker.configuration.dry_run or self.rqdata.setscene_enforce):
                 if not mc in self.rq.fakeworker:
                     try:
@@ -2040,10 +2036,10 @@ class RunQueueExecuteTasks(RunQueueExecute):
                         self.rq.state = runQueueFailed
                         self.stats.taskFailed()
                         return True
-                self.rq.fakeworker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, False, self.cooker.collection.get_file_appends(taskfn), taskdepdata, self.rqdata.setscene_enforce)) + b"</runtask>")
+                self.rq.fakeworker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, False, self.cooker.collection.get_file_appends(taskfn), taskdepdata, self.rqdata.setscene_enforce)) + b"</runtask>")
                 self.rq.fakeworker[mc].process.stdin.flush()
             else:
-                self.rq.worker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, False, self.cooker.collection.get_file_appends(taskfn), taskdepdata, self.rqdata.setscene_enforce)) + b"</runtask>")
+                self.rq.worker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, False, self.cooker.collection.get_file_appends(taskfn), taskdepdata, self.rqdata.setscene_enforce)) + b"</runtask>")
                 self.rq.worker[mc].process.stdin.flush()
 
             self.build_stamps[task] = bb.build.stampfile(taskname, self.rqdata.dataCaches[mc], taskfn, noextra=True)
@@ -2457,13 +2453,14 @@ class RunQueueExecuteScenequeue(RunQueueExecute):
             taskdepdata = self.build_taskdepdata(task)
 
             taskdep = self.rqdata.dataCaches[mc].task_deps[taskfn]
+            taskhash = self.rqdata.get_task_hash(task)
             if 'fakeroot' in taskdep and taskname in taskdep['fakeroot'] and not self.cooker.configuration.dry_run:
                 if not mc in self.rq.fakeworker:
                     self.rq.start_fakeworker(self, mc)
-                self.rq.fakeworker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, True, self.cooker.collection.get_file_appends(taskfn), taskdepdata, False)) + b"</runtask>")
+                self.rq.fakeworker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, True, self.cooker.collection.get_file_appends(taskfn), taskdepdata, False)) + b"</runtask>")
                 self.rq.fakeworker[mc].process.stdin.flush()
             else:
-                self.rq.worker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, True, self.cooker.collection.get_file_appends(taskfn), taskdepdata, False)) + b"</runtask>")
+                self.rq.worker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, True, self.cooker.collection.get_file_appends(taskfn), taskdepdata, False)) + b"</runtask>")
                 self.rq.worker[mc].process.stdin.flush()
 
             self.build_stamps[task] = bb.build.stampfile(taskname, self.rqdata.dataCaches[mc], taskfn, noextra=True)
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [PATCH v3 08/17] bitbake: bitbake-worker: Pass taskhash as runtask parameter
@ 2018-12-04  3:42       ` Joshua Watt
  0 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-04  3:42 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

Pass the task hash as a parameter to the 'runtask' message instead of
passing the entire dictionary of hashes when the worker is setup. This
is possible less efficient, but prevents the worker taskhashes from
being out of sync with the runqueue in the event that the taskhashes in
the runqueue change.

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/bin/bitbake-worker |  8 ++++----
 bitbake/lib/bb/runqueue.py | 15 ++++++---------
 2 files changed, 10 insertions(+), 13 deletions(-)

diff --git a/bitbake/bin/bitbake-worker b/bitbake/bin/bitbake-worker
index baa1a84e6dd..41ef6d848ac 100755
--- a/bitbake/bin/bitbake-worker
+++ b/bitbake/bin/bitbake-worker
@@ -136,7 +136,7 @@ def sigterm_handler(signum, frame):
     os.killpg(0, signal.SIGTERM)
     sys.exit()
 
-def fork_off_task(cfg, data, databuilder, workerdata, fn, task, taskname, appends, taskdepdata, extraconfigdata, quieterrors=False, dry_run_exec=False):
+def fork_off_task(cfg, data, databuilder, workerdata, fn, task, taskname, taskhash, appends, taskdepdata, extraconfigdata, quieterrors=False, dry_run_exec=False):
     # We need to setup the environment BEFORE the fork, since
     # a fork() or exec*() activates PSEUDO...
 
@@ -234,7 +234,7 @@ def fork_off_task(cfg, data, databuilder, workerdata, fn, task, taskname, append
                 ret = 0
 
                 the_data = bb_cache.loadDataFull(fn, appends)
-                the_data.setVar('BB_TASKHASH', workerdata["runq_hash"][task])
+                the_data.setVar('BB_TASKHASH', taskhash)
 
                 bb.utils.set_process_name("%s:%s" % (the_data.getVar("PN"), taskname.replace("do_", "")))
 
@@ -425,10 +425,10 @@ class BitbakeWorker(object):
         sys.exit(0)
 
     def handle_runtask(self, data):
-        fn, task, taskname, quieterrors, appends, taskdepdata, dry_run_exec = pickle.loads(data)
+        fn, task, taskname, taskhash, quieterrors, appends, taskdepdata, dry_run_exec = pickle.loads(data)
         workerlog_write("Handling runtask %s %s %s\n" % (task, fn, taskname))
 
-        pid, pipein, pipeout = fork_off_task(self.cookercfg, self.data, self.databuilder, self.workerdata, fn, task, taskname, appends, taskdepdata, self.extraconfigdata, quieterrors, dry_run_exec)
+        pid, pipein, pipeout = fork_off_task(self.cookercfg, self.data, self.databuilder, self.workerdata, fn, task, taskname, taskhash, appends, taskdepdata, self.extraconfigdata, quieterrors, dry_run_exec)
 
         self.build_pids[pid] = task
         self.build_pipes[pid] = runQueueWorkerPipe(pipein, pipeout)
diff --git a/bitbake/lib/bb/runqueue.py b/bitbake/lib/bb/runqueue.py
index 4d5d8767973..f2b95a9829b 100644
--- a/bitbake/lib/bb/runqueue.py
+++ b/bitbake/lib/bb/runqueue.py
@@ -1224,17 +1224,12 @@ class RunQueue:
         bb.utils.nonblockingfd(worker.stdout)
         workerpipe = runQueuePipe(worker.stdout, None, self.cfgData, self, rqexec)
 
-        runqhash = {}
-        for tid in self.rqdata.runtaskentries:
-            runqhash[tid] = self.rqdata.runtaskentries[tid].hash
-
         workerdata = {
             "taskdeps" : self.rqdata.dataCaches[mc].task_deps,
             "fakerootenv" : self.rqdata.dataCaches[mc].fakerootenv,
             "fakerootdirs" : self.rqdata.dataCaches[mc].fakerootdirs,
             "fakerootnoenv" : self.rqdata.dataCaches[mc].fakerootnoenv,
             "sigdata" : bb.parse.siggen.get_taskdata(),
-            "runq_hash" : runqhash,
             "logdefaultdebug" : bb.msg.loggerDefaultDebugLevel,
             "logdefaultverbose" : bb.msg.loggerDefaultVerbose,
             "logdefaultverboselogs" : bb.msg.loggerVerboseLogs,
@@ -2031,6 +2026,7 @@ class RunQueueExecuteTasks(RunQueueExecute):
             taskdepdata = self.build_taskdepdata(task)
 
             taskdep = self.rqdata.dataCaches[mc].task_deps[taskfn]
+            taskhash = self.rqdata.get_task_hash(task)
             if 'fakeroot' in taskdep and taskname in taskdep['fakeroot'] and not (self.cooker.configuration.dry_run or self.rqdata.setscene_enforce):
                 if not mc in self.rq.fakeworker:
                     try:
@@ -2040,10 +2036,10 @@ class RunQueueExecuteTasks(RunQueueExecute):
                         self.rq.state = runQueueFailed
                         self.stats.taskFailed()
                         return True
-                self.rq.fakeworker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, False, self.cooker.collection.get_file_appends(taskfn), taskdepdata, self.rqdata.setscene_enforce)) + b"</runtask>")
+                self.rq.fakeworker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, False, self.cooker.collection.get_file_appends(taskfn), taskdepdata, self.rqdata.setscene_enforce)) + b"</runtask>")
                 self.rq.fakeworker[mc].process.stdin.flush()
             else:
-                self.rq.worker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, False, self.cooker.collection.get_file_appends(taskfn), taskdepdata, self.rqdata.setscene_enforce)) + b"</runtask>")
+                self.rq.worker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, False, self.cooker.collection.get_file_appends(taskfn), taskdepdata, self.rqdata.setscene_enforce)) + b"</runtask>")
                 self.rq.worker[mc].process.stdin.flush()
 
             self.build_stamps[task] = bb.build.stampfile(taskname, self.rqdata.dataCaches[mc], taskfn, noextra=True)
@@ -2457,13 +2453,14 @@ class RunQueueExecuteScenequeue(RunQueueExecute):
             taskdepdata = self.build_taskdepdata(task)
 
             taskdep = self.rqdata.dataCaches[mc].task_deps[taskfn]
+            taskhash = self.rqdata.get_task_hash(task)
             if 'fakeroot' in taskdep and taskname in taskdep['fakeroot'] and not self.cooker.configuration.dry_run:
                 if not mc in self.rq.fakeworker:
                     self.rq.start_fakeworker(self, mc)
-                self.rq.fakeworker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, True, self.cooker.collection.get_file_appends(taskfn), taskdepdata, False)) + b"</runtask>")
+                self.rq.fakeworker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, True, self.cooker.collection.get_file_appends(taskfn), taskdepdata, False)) + b"</runtask>")
                 self.rq.fakeworker[mc].process.stdin.flush()
             else:
-                self.rq.worker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, True, self.cooker.collection.get_file_appends(taskfn), taskdepdata, False)) + b"</runtask>")
+                self.rq.worker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, True, self.cooker.collection.get_file_appends(taskfn), taskdepdata, False)) + b"</runtask>")
                 self.rq.worker[mc].process.stdin.flush()
 
             self.build_stamps[task] = bb.build.stampfile(taskname, self.rqdata.dataCaches[mc], taskfn, noextra=True)
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [OE-core][PATCH v3 09/17] bitbake: siggen: Split out stampfile hash fetch
  2018-12-04  3:42     ` [PATCH " Joshua Watt
@ 2018-12-04  3:42       ` Joshua Watt
  -1 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-04  3:42 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

The mechanism used to get the hash for a stamp file is split out so that
it can be overridden by derived classes

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/lib/bb/siggen.py | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/bitbake/lib/bb/siggen.py b/bitbake/lib/bb/siggen.py
index fdbb2a39988..ab6df7603c8 100644
--- a/bitbake/lib/bb/siggen.py
+++ b/bitbake/lib/bb/siggen.py
@@ -311,6 +311,13 @@ class SignatureGeneratorBasic(SignatureGenerator):
 class SignatureGeneratorBasicHash(SignatureGeneratorBasic):
     name = "basichash"
 
+    def get_stampfile_hash(self, task):
+        if task in self.taskhash:
+            return self.taskhash[task]
+
+        # If task is not in basehash, then error
+        return self.basehash[task]
+
     def stampfile(self, stampbase, fn, taskname, extrainfo, clean=False):
         if taskname != "do_setscene" and taskname.endswith("_setscene"):
             k = fn + "." + taskname[:-9]
@@ -318,11 +325,9 @@ class SignatureGeneratorBasicHash(SignatureGeneratorBasic):
             k = fn + "." + taskname
         if clean:
             h = "*"
-        elif k in self.taskhash:
-            h = self.taskhash[k]
         else:
-            # If k is not in basehash, then error
-            h = self.basehash[k]
+            h = self.get_stampfile_hash(k)
+
         return ("%s.%s.%s.%s" % (stampbase, taskname, h, extrainfo)).rstrip('.')
 
     def stampcleanmask(self, stampbase, fn, taskname, extrainfo):
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [PATCH v3 09/17] bitbake: siggen: Split out stampfile hash fetch
@ 2018-12-04  3:42       ` Joshua Watt
  0 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-04  3:42 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

The mechanism used to get the hash for a stamp file is split out so that
it can be overridden by derived classes

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/lib/bb/siggen.py | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/bitbake/lib/bb/siggen.py b/bitbake/lib/bb/siggen.py
index fdbb2a39988..ab6df7603c8 100644
--- a/bitbake/lib/bb/siggen.py
+++ b/bitbake/lib/bb/siggen.py
@@ -311,6 +311,13 @@ class SignatureGeneratorBasic(SignatureGenerator):
 class SignatureGeneratorBasicHash(SignatureGeneratorBasic):
     name = "basichash"
 
+    def get_stampfile_hash(self, task):
+        if task in self.taskhash:
+            return self.taskhash[task]
+
+        # If task is not in basehash, then error
+        return self.basehash[task]
+
     def stampfile(self, stampbase, fn, taskname, extrainfo, clean=False):
         if taskname != "do_setscene" and taskname.endswith("_setscene"):
             k = fn + "." + taskname[:-9]
@@ -318,11 +325,9 @@ class SignatureGeneratorBasicHash(SignatureGeneratorBasic):
             k = fn + "." + taskname
         if clean:
             h = "*"
-        elif k in self.taskhash:
-            h = self.taskhash[k]
         else:
-            # If k is not in basehash, then error
-            h = self.basehash[k]
+            h = self.get_stampfile_hash(k)
+
         return ("%s.%s.%s.%s" % (stampbase, taskname, h, extrainfo)).rstrip('.')
 
     def stampcleanmask(self, stampbase, fn, taskname, extrainfo):
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [OE-core][PATCH v3 10/17] bitbake: siggen: Split out task depend ID
  2018-12-04  3:42     ` [PATCH " Joshua Watt
@ 2018-12-04  3:42       ` Joshua Watt
  -1 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-04  3:42 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

Abstracts the function to get the dependency ID for a task so it can
return something other that the taskhash

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/lib/bb/siggen.py | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/bitbake/lib/bb/siggen.py b/bitbake/lib/bb/siggen.py
index ab6df7603c8..2daca70538a 100644
--- a/bitbake/lib/bb/siggen.py
+++ b/bitbake/lib/bb/siggen.py
@@ -41,6 +41,9 @@ class SignatureGenerator(object):
     def finalise(self, fn, d, varient):
         return
 
+    def get_depid(self, task):
+        return self.taskhash[task]
+
     def get_taskhash(self, fn, task, deps, dataCache):
         return "0"
 
@@ -186,7 +189,7 @@ class SignatureGeneratorBasic(SignatureGenerator):
                 continue
             if dep not in self.taskhash:
                 bb.fatal("%s is not in taskhash, caller isn't calling in dependency order?" % dep)
-            data = data + self.taskhash[dep]
+            data = data + self.get_depid(dep)
             self.runtaskdeps[k].append(dep)
 
         if task in dataCache.file_checksums[fn]:
@@ -261,7 +264,7 @@ class SignatureGeneratorBasic(SignatureGenerator):
             data['file_checksum_values'] = [(os.path.basename(f), cs) for f,cs in self.file_checksum_values[k]]
             data['runtaskhashes'] = {}
             for dep in data['runtaskdeps']:
-                data['runtaskhashes'][dep] = self.taskhash[dep]
+                data['runtaskhashes'][dep] = self.get_depid(dep)
             data['taskhash'] = self.taskhash[k]
 
         taint = self.read_taint(fn, task, referencestamp)
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [PATCH v3 10/17] bitbake: siggen: Split out task depend ID
@ 2018-12-04  3:42       ` Joshua Watt
  0 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-04  3:42 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

Abstracts the function to get the dependency ID for a task so it can
return something other that the taskhash

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/lib/bb/siggen.py | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/bitbake/lib/bb/siggen.py b/bitbake/lib/bb/siggen.py
index ab6df7603c8..2daca70538a 100644
--- a/bitbake/lib/bb/siggen.py
+++ b/bitbake/lib/bb/siggen.py
@@ -41,6 +41,9 @@ class SignatureGenerator(object):
     def finalise(self, fn, d, varient):
         return
 
+    def get_depid(self, task):
+        return self.taskhash[task]
+
     def get_taskhash(self, fn, task, deps, dataCache):
         return "0"
 
@@ -186,7 +189,7 @@ class SignatureGeneratorBasic(SignatureGenerator):
                 continue
             if dep not in self.taskhash:
                 bb.fatal("%s is not in taskhash, caller isn't calling in dependency order?" % dep)
-            data = data + self.taskhash[dep]
+            data = data + self.get_depid(dep)
             self.runtaskdeps[k].append(dep)
 
         if task in dataCache.file_checksums[fn]:
@@ -261,7 +264,7 @@ class SignatureGeneratorBasic(SignatureGenerator):
             data['file_checksum_values'] = [(os.path.basename(f), cs) for f,cs in self.file_checksum_values[k]]
             data['runtaskhashes'] = {}
             for dep in data['runtaskdeps']:
-                data['runtaskhashes'][dep] = self.taskhash[dep]
+                data['runtaskhashes'][dep] = self.get_depid(dep)
             data['taskhash'] = self.taskhash[k]
 
         taint = self.read_taint(fn, task, referencestamp)
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [OE-core][PATCH v3 11/17] bitbake: runqueue: Track task dependency ID
  2018-12-04  3:42     ` [PATCH " Joshua Watt
@ 2018-12-04  3:42       ` Joshua Watt
  -1 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-04  3:42 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

Requests the task dependency ID from siggen and tracks it

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/lib/bb/runqueue.py | 25 +++++++++++++++++--------
 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/bitbake/lib/bb/runqueue.py b/bitbake/lib/bb/runqueue.py
index f2b95a9829b..32dcd84e175 100644
--- a/bitbake/lib/bb/runqueue.py
+++ b/bitbake/lib/bb/runqueue.py
@@ -346,6 +346,7 @@ class RunTaskEntry(object):
         self.depends = set()
         self.revdeps = set()
         self.hash = None
+        self.depid = None
         self.task = None
         self.weight = 1
 
@@ -385,6 +386,9 @@ class RunQueueData:
     def get_task_hash(self, tid):
         return self.runtaskentries[tid].hash
 
+    def get_task_depid(self, tid):
+        return self.runtaskentries[tid].depid
+
     def get_user_idstring(self, tid, task_name_suffix = ""):
         return tid + task_name_suffix
 
@@ -1150,18 +1154,21 @@ class RunQueueData:
                 if len(self.runtaskentries[tid].depends - dealtwith) == 0:
                     dealtwith.add(tid)
                     todeal.remove(tid)
-                    procdep = []
-                    for dep in self.runtaskentries[tid].depends:
-                        procdep.append(fn_from_tid(dep) + "." + taskname_from_tid(dep))
-                    (mc, fn, taskname, taskfn) = split_tid_mcfn(tid)
-                    self.runtaskentries[tid].hash = bb.parse.siggen.get_taskhash(taskfn, taskname, procdep, self.dataCaches[mc])
-                    task = self.runtaskentries[tid].task
+                    self.prepare_task_hash(tid)
 
         bb.parse.siggen.writeout_file_checksum_cache()
 
         #self.dump_data()
         return len(self.runtaskentries)
 
+    def prepare_task_hash(self, tid):
+        procdep = []
+        for dep in self.runtaskentries[tid].depends:
+            procdep.append(fn_from_tid(dep) + "." + taskname_from_tid(dep))
+        (mc, fn, taskname, taskfn) = split_tid_mcfn(tid)
+        self.runtaskentries[tid].hash = bb.parse.siggen.get_taskhash(taskfn, taskname, procdep, self.dataCaches[mc])
+        self.runtaskentries[tid].depid = bb.parse.siggen.get_depid(fn + "." + taskname)
+
     def dump_data(self):
         """
         Dump some debug information on the internal data structures
@@ -2081,7 +2088,8 @@ class RunQueueExecuteTasks(RunQueueExecute):
                 deps = self.rqdata.runtaskentries[revdep].depends
                 provides = self.rqdata.dataCaches[mc].fn_provides[taskfn]
                 taskhash = self.rqdata.runtaskentries[revdep].hash
-                taskdepdata[revdep] = [pn, taskname, fn, deps, provides, taskhash]
+                depid = self.rqdata.runtaskentries[revdep].depid
+                taskdepdata[revdep] = [pn, taskname, fn, deps, provides, taskhash, depid]
                 for revdep2 in deps:
                     if revdep2 not in taskdepdata:
                         additional.append(revdep2)
@@ -2524,7 +2532,8 @@ class RunQueueExecuteScenequeue(RunQueueExecute):
                 deps = getsetscenedeps(revdep)
                 provides = self.rqdata.dataCaches[mc].fn_provides[taskfn]
                 taskhash = self.rqdata.runtaskentries[revdep].hash
-                taskdepdata[revdep] = [pn, taskname, fn, deps, provides, taskhash]
+                depid = self.rqdata.runtaskentries[revdep].depid
+                taskdepdata[revdep] = [pn, taskname, fn, deps, provides, taskhash, depid]
                 for revdep2 in deps:
                     if revdep2 not in taskdepdata:
                         additional.append(revdep2)
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [PATCH v3 11/17] bitbake: runqueue: Track task dependency ID
@ 2018-12-04  3:42       ` Joshua Watt
  0 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-04  3:42 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

Requests the task dependency ID from siggen and tracks it

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/lib/bb/runqueue.py | 25 +++++++++++++++++--------
 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/bitbake/lib/bb/runqueue.py b/bitbake/lib/bb/runqueue.py
index f2b95a9829b..32dcd84e175 100644
--- a/bitbake/lib/bb/runqueue.py
+++ b/bitbake/lib/bb/runqueue.py
@@ -346,6 +346,7 @@ class RunTaskEntry(object):
         self.depends = set()
         self.revdeps = set()
         self.hash = None
+        self.depid = None
         self.task = None
         self.weight = 1
 
@@ -385,6 +386,9 @@ class RunQueueData:
     def get_task_hash(self, tid):
         return self.runtaskentries[tid].hash
 
+    def get_task_depid(self, tid):
+        return self.runtaskentries[tid].depid
+
     def get_user_idstring(self, tid, task_name_suffix = ""):
         return tid + task_name_suffix
 
@@ -1150,18 +1154,21 @@ class RunQueueData:
                 if len(self.runtaskentries[tid].depends - dealtwith) == 0:
                     dealtwith.add(tid)
                     todeal.remove(tid)
-                    procdep = []
-                    for dep in self.runtaskentries[tid].depends:
-                        procdep.append(fn_from_tid(dep) + "." + taskname_from_tid(dep))
-                    (mc, fn, taskname, taskfn) = split_tid_mcfn(tid)
-                    self.runtaskentries[tid].hash = bb.parse.siggen.get_taskhash(taskfn, taskname, procdep, self.dataCaches[mc])
-                    task = self.runtaskentries[tid].task
+                    self.prepare_task_hash(tid)
 
         bb.parse.siggen.writeout_file_checksum_cache()
 
         #self.dump_data()
         return len(self.runtaskentries)
 
+    def prepare_task_hash(self, tid):
+        procdep = []
+        for dep in self.runtaskentries[tid].depends:
+            procdep.append(fn_from_tid(dep) + "." + taskname_from_tid(dep))
+        (mc, fn, taskname, taskfn) = split_tid_mcfn(tid)
+        self.runtaskentries[tid].hash = bb.parse.siggen.get_taskhash(taskfn, taskname, procdep, self.dataCaches[mc])
+        self.runtaskentries[tid].depid = bb.parse.siggen.get_depid(fn + "." + taskname)
+
     def dump_data(self):
         """
         Dump some debug information on the internal data structures
@@ -2081,7 +2088,8 @@ class RunQueueExecuteTasks(RunQueueExecute):
                 deps = self.rqdata.runtaskentries[revdep].depends
                 provides = self.rqdata.dataCaches[mc].fn_provides[taskfn]
                 taskhash = self.rqdata.runtaskentries[revdep].hash
-                taskdepdata[revdep] = [pn, taskname, fn, deps, provides, taskhash]
+                depid = self.rqdata.runtaskentries[revdep].depid
+                taskdepdata[revdep] = [pn, taskname, fn, deps, provides, taskhash, depid]
                 for revdep2 in deps:
                     if revdep2 not in taskdepdata:
                         additional.append(revdep2)
@@ -2524,7 +2532,8 @@ class RunQueueExecuteScenequeue(RunQueueExecute):
                 deps = getsetscenedeps(revdep)
                 provides = self.rqdata.dataCaches[mc].fn_provides[taskfn]
                 taskhash = self.rqdata.runtaskentries[revdep].hash
-                taskdepdata[revdep] = [pn, taskname, fn, deps, provides, taskhash]
+                depid = self.rqdata.runtaskentries[revdep].depid
+                taskdepdata[revdep] = [pn, taskname, fn, deps, provides, taskhash, depid]
                 for revdep2 in deps:
                     if revdep2 not in taskdepdata:
                         additional.append(revdep2)
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [OE-core][PATCH v3 12/17] bitbake: runqueue: Pass dependency ID to task
  2018-12-04  3:42     ` [PATCH " Joshua Watt
@ 2018-12-04  3:42       ` Joshua Watt
  -1 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-04  3:42 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

The dependency ID is now passed to the task in the BB_DEPID variable

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/bin/bitbake-worker |  7 ++++---
 bitbake/lib/bb/runqueue.py | 10 ++++++----
 2 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/bitbake/bin/bitbake-worker b/bitbake/bin/bitbake-worker
index 41ef6d848ac..9650c954359 100755
--- a/bitbake/bin/bitbake-worker
+++ b/bitbake/bin/bitbake-worker
@@ -136,7 +136,7 @@ def sigterm_handler(signum, frame):
     os.killpg(0, signal.SIGTERM)
     sys.exit()
 
-def fork_off_task(cfg, data, databuilder, workerdata, fn, task, taskname, taskhash, appends, taskdepdata, extraconfigdata, quieterrors=False, dry_run_exec=False):
+def fork_off_task(cfg, data, databuilder, workerdata, fn, task, taskname, taskhash, depid, appends, taskdepdata, extraconfigdata, quieterrors=False, dry_run_exec=False):
     # We need to setup the environment BEFORE the fork, since
     # a fork() or exec*() activates PSEUDO...
 
@@ -235,6 +235,7 @@ def fork_off_task(cfg, data, databuilder, workerdata, fn, task, taskname, taskha
 
                 the_data = bb_cache.loadDataFull(fn, appends)
                 the_data.setVar('BB_TASKHASH', taskhash)
+                the_data.setVar('BB_DEPID', depid)
 
                 bb.utils.set_process_name("%s:%s" % (the_data.getVar("PN"), taskname.replace("do_", "")))
 
@@ -425,10 +426,10 @@ class BitbakeWorker(object):
         sys.exit(0)
 
     def handle_runtask(self, data):
-        fn, task, taskname, taskhash, quieterrors, appends, taskdepdata, dry_run_exec = pickle.loads(data)
+        fn, task, taskname, taskhash, depid, quieterrors, appends, taskdepdata, dry_run_exec = pickle.loads(data)
         workerlog_write("Handling runtask %s %s %s\n" % (task, fn, taskname))
 
-        pid, pipein, pipeout = fork_off_task(self.cookercfg, self.data, self.databuilder, self.workerdata, fn, task, taskname, taskhash, appends, taskdepdata, self.extraconfigdata, quieterrors, dry_run_exec)
+        pid, pipein, pipeout = fork_off_task(self.cookercfg, self.data, self.databuilder, self.workerdata, fn, task, taskname, taskhash, depid, appends, taskdepdata, self.extraconfigdata, quieterrors, dry_run_exec)
 
         self.build_pids[pid] = task
         self.build_pipes[pid] = runQueueWorkerPipe(pipein, pipeout)
diff --git a/bitbake/lib/bb/runqueue.py b/bitbake/lib/bb/runqueue.py
index 32dcd84e175..c5e4573b278 100644
--- a/bitbake/lib/bb/runqueue.py
+++ b/bitbake/lib/bb/runqueue.py
@@ -2034,6 +2034,7 @@ class RunQueueExecuteTasks(RunQueueExecute):
 
             taskdep = self.rqdata.dataCaches[mc].task_deps[taskfn]
             taskhash = self.rqdata.get_task_hash(task)
+            depid = self.rqdata.get_task_depid(task)
             if 'fakeroot' in taskdep and taskname in taskdep['fakeroot'] and not (self.cooker.configuration.dry_run or self.rqdata.setscene_enforce):
                 if not mc in self.rq.fakeworker:
                     try:
@@ -2043,10 +2044,10 @@ class RunQueueExecuteTasks(RunQueueExecute):
                         self.rq.state = runQueueFailed
                         self.stats.taskFailed()
                         return True
-                self.rq.fakeworker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, False, self.cooker.collection.get_file_appends(taskfn), taskdepdata, self.rqdata.setscene_enforce)) + b"</runtask>")
+                self.rq.fakeworker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, depid, False, self.cooker.collection.get_file_appends(taskfn), taskdepdata, self.rqdata.setscene_enforce)) + b"</runtask>")
                 self.rq.fakeworker[mc].process.stdin.flush()
             else:
-                self.rq.worker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, False, self.cooker.collection.get_file_appends(taskfn), taskdepdata, self.rqdata.setscene_enforce)) + b"</runtask>")
+                self.rq.worker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, depid, False, self.cooker.collection.get_file_appends(taskfn), taskdepdata, self.rqdata.setscene_enforce)) + b"</runtask>")
                 self.rq.worker[mc].process.stdin.flush()
 
             self.build_stamps[task] = bb.build.stampfile(taskname, self.rqdata.dataCaches[mc], taskfn, noextra=True)
@@ -2462,13 +2463,14 @@ class RunQueueExecuteScenequeue(RunQueueExecute):
 
             taskdep = self.rqdata.dataCaches[mc].task_deps[taskfn]
             taskhash = self.rqdata.get_task_hash(task)
+            depid = self.rqdata.get_task_depid(task)
             if 'fakeroot' in taskdep and taskname in taskdep['fakeroot'] and not self.cooker.configuration.dry_run:
                 if not mc in self.rq.fakeworker:
                     self.rq.start_fakeworker(self, mc)
-                self.rq.fakeworker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, True, self.cooker.collection.get_file_appends(taskfn), taskdepdata, False)) + b"</runtask>")
+                self.rq.fakeworker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, depid, True, self.cooker.collection.get_file_appends(taskfn), taskdepdata, False)) + b"</runtask>")
                 self.rq.fakeworker[mc].process.stdin.flush()
             else:
-                self.rq.worker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, True, self.cooker.collection.get_file_appends(taskfn), taskdepdata, False)) + b"</runtask>")
+                self.rq.worker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, depid, True, self.cooker.collection.get_file_appends(taskfn), taskdepdata, False)) + b"</runtask>")
                 self.rq.worker[mc].process.stdin.flush()
 
             self.build_stamps[task] = bb.build.stampfile(taskname, self.rqdata.dataCaches[mc], taskfn, noextra=True)
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [PATCH v3 12/17] bitbake: runqueue: Pass dependency ID to task
@ 2018-12-04  3:42       ` Joshua Watt
  0 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-04  3:42 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

The dependency ID is now passed to the task in the BB_DEPID variable

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/bin/bitbake-worker |  7 ++++---
 bitbake/lib/bb/runqueue.py | 10 ++++++----
 2 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/bitbake/bin/bitbake-worker b/bitbake/bin/bitbake-worker
index 41ef6d848ac..9650c954359 100755
--- a/bitbake/bin/bitbake-worker
+++ b/bitbake/bin/bitbake-worker
@@ -136,7 +136,7 @@ def sigterm_handler(signum, frame):
     os.killpg(0, signal.SIGTERM)
     sys.exit()
 
-def fork_off_task(cfg, data, databuilder, workerdata, fn, task, taskname, taskhash, appends, taskdepdata, extraconfigdata, quieterrors=False, dry_run_exec=False):
+def fork_off_task(cfg, data, databuilder, workerdata, fn, task, taskname, taskhash, depid, appends, taskdepdata, extraconfigdata, quieterrors=False, dry_run_exec=False):
     # We need to setup the environment BEFORE the fork, since
     # a fork() or exec*() activates PSEUDO...
 
@@ -235,6 +235,7 @@ def fork_off_task(cfg, data, databuilder, workerdata, fn, task, taskname, taskha
 
                 the_data = bb_cache.loadDataFull(fn, appends)
                 the_data.setVar('BB_TASKHASH', taskhash)
+                the_data.setVar('BB_DEPID', depid)
 
                 bb.utils.set_process_name("%s:%s" % (the_data.getVar("PN"), taskname.replace("do_", "")))
 
@@ -425,10 +426,10 @@ class BitbakeWorker(object):
         sys.exit(0)
 
     def handle_runtask(self, data):
-        fn, task, taskname, taskhash, quieterrors, appends, taskdepdata, dry_run_exec = pickle.loads(data)
+        fn, task, taskname, taskhash, depid, quieterrors, appends, taskdepdata, dry_run_exec = pickle.loads(data)
         workerlog_write("Handling runtask %s %s %s\n" % (task, fn, taskname))
 
-        pid, pipein, pipeout = fork_off_task(self.cookercfg, self.data, self.databuilder, self.workerdata, fn, task, taskname, taskhash, appends, taskdepdata, self.extraconfigdata, quieterrors, dry_run_exec)
+        pid, pipein, pipeout = fork_off_task(self.cookercfg, self.data, self.databuilder, self.workerdata, fn, task, taskname, taskhash, depid, appends, taskdepdata, self.extraconfigdata, quieterrors, dry_run_exec)
 
         self.build_pids[pid] = task
         self.build_pipes[pid] = runQueueWorkerPipe(pipein, pipeout)
diff --git a/bitbake/lib/bb/runqueue.py b/bitbake/lib/bb/runqueue.py
index 32dcd84e175..c5e4573b278 100644
--- a/bitbake/lib/bb/runqueue.py
+++ b/bitbake/lib/bb/runqueue.py
@@ -2034,6 +2034,7 @@ class RunQueueExecuteTasks(RunQueueExecute):
 
             taskdep = self.rqdata.dataCaches[mc].task_deps[taskfn]
             taskhash = self.rqdata.get_task_hash(task)
+            depid = self.rqdata.get_task_depid(task)
             if 'fakeroot' in taskdep and taskname in taskdep['fakeroot'] and not (self.cooker.configuration.dry_run or self.rqdata.setscene_enforce):
                 if not mc in self.rq.fakeworker:
                     try:
@@ -2043,10 +2044,10 @@ class RunQueueExecuteTasks(RunQueueExecute):
                         self.rq.state = runQueueFailed
                         self.stats.taskFailed()
                         return True
-                self.rq.fakeworker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, False, self.cooker.collection.get_file_appends(taskfn), taskdepdata, self.rqdata.setscene_enforce)) + b"</runtask>")
+                self.rq.fakeworker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, depid, False, self.cooker.collection.get_file_appends(taskfn), taskdepdata, self.rqdata.setscene_enforce)) + b"</runtask>")
                 self.rq.fakeworker[mc].process.stdin.flush()
             else:
-                self.rq.worker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, False, self.cooker.collection.get_file_appends(taskfn), taskdepdata, self.rqdata.setscene_enforce)) + b"</runtask>")
+                self.rq.worker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, depid, False, self.cooker.collection.get_file_appends(taskfn), taskdepdata, self.rqdata.setscene_enforce)) + b"</runtask>")
                 self.rq.worker[mc].process.stdin.flush()
 
             self.build_stamps[task] = bb.build.stampfile(taskname, self.rqdata.dataCaches[mc], taskfn, noextra=True)
@@ -2462,13 +2463,14 @@ class RunQueueExecuteScenequeue(RunQueueExecute):
 
             taskdep = self.rqdata.dataCaches[mc].task_deps[taskfn]
             taskhash = self.rqdata.get_task_hash(task)
+            depid = self.rqdata.get_task_depid(task)
             if 'fakeroot' in taskdep and taskname in taskdep['fakeroot'] and not self.cooker.configuration.dry_run:
                 if not mc in self.rq.fakeworker:
                     self.rq.start_fakeworker(self, mc)
-                self.rq.fakeworker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, True, self.cooker.collection.get_file_appends(taskfn), taskdepdata, False)) + b"</runtask>")
+                self.rq.fakeworker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, depid, True, self.cooker.collection.get_file_appends(taskfn), taskdepdata, False)) + b"</runtask>")
                 self.rq.fakeworker[mc].process.stdin.flush()
             else:
-                self.rq.worker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, True, self.cooker.collection.get_file_appends(taskfn), taskdepdata, False)) + b"</runtask>")
+                self.rq.worker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, depid, True, self.cooker.collection.get_file_appends(taskfn), taskdepdata, False)) + b"</runtask>")
                 self.rq.worker[mc].process.stdin.flush()
 
             self.build_stamps[task] = bb.build.stampfile(taskname, self.rqdata.dataCaches[mc], taskfn, noextra=True)
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [OE-core][PATCH v3 13/17] bitbake: runqueue: Pass dependency ID to hash validate
  2018-12-04  3:42     ` [PATCH " Joshua Watt
@ 2018-12-04  3:42       ` Joshua Watt
  -1 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-04  3:42 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

If the dependency ID is being used to track task dependencies, the hash
validation function needs to know about it in order to properly validate
the hash.

TODO: This currently isn't going to be backward compatible with older
hashvalidate functions. Is that necessary, and if so are there any
suggestions for a good approach?

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/lib/bb/runqueue.py | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/bitbake/lib/bb/runqueue.py b/bitbake/lib/bb/runqueue.py
index c5e4573b278..dd6c0208daf 100644
--- a/bitbake/lib/bb/runqueue.py
+++ b/bitbake/lib/bb/runqueue.py
@@ -1549,6 +1549,7 @@ class RunQueue:
         valid = []
         sq_hash = []
         sq_hashfn = []
+        sq_depid = []
         sq_fn = []
         sq_taskname = []
         sq_task = []
@@ -1567,15 +1568,16 @@ class RunQueue:
             sq_fn.append(fn)
             sq_hashfn.append(self.rqdata.dataCaches[mc].hashfn[taskfn])
             sq_hash.append(self.rqdata.runtaskentries[tid].hash)
+            sq_depid.append(self.rqdata.runtaskentries[tid].depid)
             sq_taskname.append(taskname)
             sq_task.append(tid)
-        locs = { "sq_fn" : sq_fn, "sq_task" : sq_taskname, "sq_hash" : sq_hash, "sq_hashfn" : sq_hashfn, "d" : self.cooker.data }
+        locs = { "sq_fn" : sq_fn, "sq_task" : sq_taskname, "sq_hash" : sq_hash, "sq_hashfn" : sq_hashfn, "sq_depid" : sq_depid, "d" : self.cooker.data }
         try:
-            call = self.hashvalidate + "(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=True)"
+            call = self.hashvalidate + "(sq_fn, sq_task, sq_hash, sq_hashfn, sq_depid, d, siginfo=True)"
             valid = bb.utils.better_eval(call, locs)
         # Handle version with no siginfo parameter
         except TypeError:
-            call = self.hashvalidate + "(sq_fn, sq_task, sq_hash, sq_hashfn, d)"
+            call = self.hashvalidate + "(sq_fn, sq_task, sq_hash, sq_hashfn, sq_depid, d)"
             valid = bb.utils.better_eval(call, locs)
         for v in valid:
             valid_new.add(sq_task[v])
@@ -2293,6 +2295,7 @@ class RunQueueExecuteScenequeue(RunQueueExecute):
         if self.rq.hashvalidate:
             sq_hash = []
             sq_hashfn = []
+            sq_depid = []
             sq_fn = []
             sq_taskname = []
             sq_task = []
@@ -2324,13 +2327,14 @@ class RunQueueExecuteScenequeue(RunQueueExecute):
                 sq_fn.append(fn)
                 sq_hashfn.append(self.rqdata.dataCaches[mc].hashfn[taskfn])
                 sq_hash.append(self.rqdata.runtaskentries[tid].hash)
+                sq_depid.append(self.rqdata.runtaskentries[tid].depid)
                 sq_taskname.append(taskname)
                 sq_task.append(tid)
 
             self.cooker.data.setVar("BB_SETSCENE_STAMPCURRENT_COUNT", len(stamppresent))
 
-            call = self.rq.hashvalidate + "(sq_fn, sq_task, sq_hash, sq_hashfn, d)"
-            locs = { "sq_fn" : sq_fn, "sq_task" : sq_taskname, "sq_hash" : sq_hash, "sq_hashfn" : sq_hashfn, "d" : self.cooker.data }
+            call = self.rq.hashvalidate + "(sq_fn, sq_task, sq_hash, sq_hashfn, sq_depid, d)"
+            locs = { "sq_fn" : sq_fn, "sq_task" : sq_taskname, "sq_hash" : sq_hash, "sq_hashfn" : sq_hashfn, "sq_depid": sq_depid, "d" : self.cooker.data }
             valid = bb.utils.better_eval(call, locs)
 
             self.cooker.data.delVar("BB_SETSCENE_STAMPCURRENT_COUNT")
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [PATCH v3 13/17] bitbake: runqueue: Pass dependency ID to hash validate
@ 2018-12-04  3:42       ` Joshua Watt
  0 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-04  3:42 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

If the dependency ID is being used to track task dependencies, the hash
validation function needs to know about it in order to properly validate
the hash.

TODO: This currently isn't going to be backward compatible with older
hashvalidate functions. Is that necessary, and if so are there any
suggestions for a good approach?

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/lib/bb/runqueue.py | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/bitbake/lib/bb/runqueue.py b/bitbake/lib/bb/runqueue.py
index c5e4573b278..dd6c0208daf 100644
--- a/bitbake/lib/bb/runqueue.py
+++ b/bitbake/lib/bb/runqueue.py
@@ -1549,6 +1549,7 @@ class RunQueue:
         valid = []
         sq_hash = []
         sq_hashfn = []
+        sq_depid = []
         sq_fn = []
         sq_taskname = []
         sq_task = []
@@ -1567,15 +1568,16 @@ class RunQueue:
             sq_fn.append(fn)
             sq_hashfn.append(self.rqdata.dataCaches[mc].hashfn[taskfn])
             sq_hash.append(self.rqdata.runtaskentries[tid].hash)
+            sq_depid.append(self.rqdata.runtaskentries[tid].depid)
             sq_taskname.append(taskname)
             sq_task.append(tid)
-        locs = { "sq_fn" : sq_fn, "sq_task" : sq_taskname, "sq_hash" : sq_hash, "sq_hashfn" : sq_hashfn, "d" : self.cooker.data }
+        locs = { "sq_fn" : sq_fn, "sq_task" : sq_taskname, "sq_hash" : sq_hash, "sq_hashfn" : sq_hashfn, "sq_depid" : sq_depid, "d" : self.cooker.data }
         try:
-            call = self.hashvalidate + "(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=True)"
+            call = self.hashvalidate + "(sq_fn, sq_task, sq_hash, sq_hashfn, sq_depid, d, siginfo=True)"
             valid = bb.utils.better_eval(call, locs)
         # Handle version with no siginfo parameter
         except TypeError:
-            call = self.hashvalidate + "(sq_fn, sq_task, sq_hash, sq_hashfn, d)"
+            call = self.hashvalidate + "(sq_fn, sq_task, sq_hash, sq_hashfn, sq_depid, d)"
             valid = bb.utils.better_eval(call, locs)
         for v in valid:
             valid_new.add(sq_task[v])
@@ -2293,6 +2295,7 @@ class RunQueueExecuteScenequeue(RunQueueExecute):
         if self.rq.hashvalidate:
             sq_hash = []
             sq_hashfn = []
+            sq_depid = []
             sq_fn = []
             sq_taskname = []
             sq_task = []
@@ -2324,13 +2327,14 @@ class RunQueueExecuteScenequeue(RunQueueExecute):
                 sq_fn.append(fn)
                 sq_hashfn.append(self.rqdata.dataCaches[mc].hashfn[taskfn])
                 sq_hash.append(self.rqdata.runtaskentries[tid].hash)
+                sq_depid.append(self.rqdata.runtaskentries[tid].depid)
                 sq_taskname.append(taskname)
                 sq_task.append(tid)
 
             self.cooker.data.setVar("BB_SETSCENE_STAMPCURRENT_COUNT", len(stamppresent))
 
-            call = self.rq.hashvalidate + "(sq_fn, sq_task, sq_hash, sq_hashfn, d)"
-            locs = { "sq_fn" : sq_fn, "sq_task" : sq_taskname, "sq_hash" : sq_hash, "sq_hashfn" : sq_hashfn, "d" : self.cooker.data }
+            call = self.rq.hashvalidate + "(sq_fn, sq_task, sq_hash, sq_hashfn, sq_depid, d)"
+            locs = { "sq_fn" : sq_fn, "sq_task" : sq_taskname, "sq_hash" : sq_hash, "sq_hashfn" : sq_hashfn, "sq_depid": sq_depid, "d" : self.cooker.data }
             valid = bb.utils.better_eval(call, locs)
 
             self.cooker.data.delVar("BB_SETSCENE_STAMPCURRENT_COUNT")
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [OE-core][PATCH v3 14/17] classes/sstate: Handle depid in hash check
  2018-12-04  3:42     ` [PATCH " Joshua Watt
@ 2018-12-04  3:42       ` Joshua Watt
  -1 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-04  3:42 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

Handles the argument that passes task dependency IDs in the hash check
function, as it is now required by bitbake

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 meta/classes/sstate.bbclass | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/meta/classes/sstate.bbclass b/meta/classes/sstate.bbclass
index 8b48ab465fd..4b91ff472d2 100644
--- a/meta/classes/sstate.bbclass
+++ b/meta/classes/sstate.bbclass
@@ -774,7 +774,7 @@ sstate_unpack_package () {
 
 BB_HASHCHECK_FUNCTION = "sstate_checkhashes"
 
-def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False):
+def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, sq_depid, d, siginfo=False):
 
     ret = []
     missed = []
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [PATCH v3 14/17] classes/sstate: Handle depid in hash check
@ 2018-12-04  3:42       ` Joshua Watt
  0 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-04  3:42 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

Handles the argument that passes task dependency IDs in the hash check
function, as it is now required by bitbake

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 meta/classes/sstate.bbclass | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/meta/classes/sstate.bbclass b/meta/classes/sstate.bbclass
index 8b48ab465fd..4b91ff472d2 100644
--- a/meta/classes/sstate.bbclass
+++ b/meta/classes/sstate.bbclass
@@ -774,7 +774,7 @@ sstate_unpack_package () {
 
 BB_HASHCHECK_FUNCTION = "sstate_checkhashes"
 
-def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False):
+def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, sq_depid, d, siginfo=False):
 
     ret = []
     missed = []
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [OE-core][PATCH v3 15/17] bitbake: hashserv: Add hash equivalence reference server
  2018-12-04  3:42     ` [PATCH " Joshua Watt
@ 2018-12-04  3:42       ` Joshua Watt
  -1 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-04  3:42 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

Implements a reference implementation of the hash equivalence server.
This server has minimal dependencies (and no dependencies outside of the
standard Python library), and implements the minimum required to be a
conforming hash equivalence server.

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/bin/bitbake-hashserv     |  67 ++++++++++++++
 bitbake/bin/bitbake-selftest     |   2 +
 bitbake/lib/hashserv/__init__.py | 152 +++++++++++++++++++++++++++++++
 bitbake/lib/hashserv/tests.py    | 141 ++++++++++++++++++++++++++++
 4 files changed, 362 insertions(+)
 create mode 100755 bitbake/bin/bitbake-hashserv
 create mode 100644 bitbake/lib/hashserv/__init__.py
 create mode 100644 bitbake/lib/hashserv/tests.py

diff --git a/bitbake/bin/bitbake-hashserv b/bitbake/bin/bitbake-hashserv
new file mode 100755
index 00000000000..c49397b73a5
--- /dev/null
+++ b/bitbake/bin/bitbake-hashserv
@@ -0,0 +1,67 @@
+#! /usr/bin/env python3
+#
+# Copyright (C) 2018 Garmin Ltd.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License version 2 as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License along
+# with this program; if not, write to the Free Software Foundation, Inc.,
+# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+import os
+import sys
+import logging
+import argparse
+import sqlite3
+
+sys.path.insert(0, os.path.join(os.path.dirname(os.path.dirname(__file__)),'lib'))
+
+import hashserv
+
+VERSION = "1.0.0"
+
+DEFAULT_HOST = ''
+DEFAULT_PORT = 8686
+
+def main():
+    parser = argparse.ArgumentParser(description='HTTP Equivalence Reference Server. Version=%s' % VERSION)
+    parser.add_argument('--address', default=DEFAULT_HOST, help='Bind address (default "%(default)s")')
+    parser.add_argument('--port', type=int, default=DEFAULT_PORT, help='Bind port (default %(default)d)')
+    parser.add_argument('--prefix', default='', help='HTTP path prefix (default "%(default)s")')
+    parser.add_argument('--database', default='./hashserv.db', help='Database file (default "%(default)s")')
+    parser.add_argument('--log', default='WARNING', help='Set logging level')
+
+    args = parser.parse_args()
+
+    logger = logging.getLogger('hashserv')
+
+    level = getattr(logging, args.log.upper(), None)
+    if not isinstance(level, int):
+        raise ValueError('Invalid log level: %s' % args.log)
+
+    logger.setLevel(level)
+    console = logging.StreamHandler()
+    console.setLevel(level)
+    logger.addHandler(console)
+
+    db = sqlite3.connect(args.database)
+
+    server = hashserv.create_server((args.address, args.port), db, args.prefix)
+    server.serve_forever()
+    return 0
+
+if __name__ == '__main__':
+    try:
+        ret = main()
+    except Exception:
+        ret = 1
+        import traceback
+        traceback.print_exc()
+    sys.exit(ret)
+
diff --git a/bitbake/bin/bitbake-selftest b/bitbake/bin/bitbake-selftest
index c970dcae90c..99f1af910f4 100755
--- a/bitbake/bin/bitbake-selftest
+++ b/bitbake/bin/bitbake-selftest
@@ -22,6 +22,7 @@ sys.path.insert(0, os.path.join(os.path.dirname(os.path.dirname(__file__)), 'lib
 import unittest
 try:
     import bb
+    import hashserv
     import layerindexlib
 except RuntimeError as exc:
     sys.exit(str(exc))
@@ -35,6 +36,7 @@ tests = ["bb.tests.codeparser",
          "bb.tests.parse",
          "bb.tests.persist_data",
          "bb.tests.utils",
+         "hashserv.tests",
          "layerindexlib.tests.layerindexobj",
          "layerindexlib.tests.restapi",
          "layerindexlib.tests.cooker"]
diff --git a/bitbake/lib/hashserv/__init__.py b/bitbake/lib/hashserv/__init__.py
new file mode 100644
index 00000000000..cde030cb88e
--- /dev/null
+++ b/bitbake/lib/hashserv/__init__.py
@@ -0,0 +1,152 @@
+# Copyright (C) 2018 Garmin Ltd.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License version 2 as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License along
+# with this program; if not, write to the Free Software Foundation, Inc.,
+# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+
+from http.server import BaseHTTPRequestHandler, HTTPServer
+import contextlib
+import urllib.parse
+import sqlite3
+import json
+import traceback
+import logging
+from datetime import datetime
+
+logger = logging.getLogger('hashserv')
+
+class HashEquivalenceServer(BaseHTTPRequestHandler):
+    def log_message(self, f, *args):
+        logger.debug(f, *args)
+
+    def do_GET(self):
+        try:
+            p = urllib.parse.urlparse(self.path)
+
+            if p.path != self.prefix + '/v1/equivalent':
+                self.send_error(404)
+                return
+
+            query = urllib.parse.parse_qs(p.query, strict_parsing=True)
+            method = query['method'][0]
+            taskhash = query['taskhash'][0]
+
+            d = None
+            with contextlib.closing(self.db.cursor()) as cursor:
+                cursor.execute('SELECT taskhash, method, depid FROM tasks_v1 WHERE method=:method AND taskhash=:taskhash ORDER BY created ASC LIMIT 1',
+                        {'method': method, 'taskhash': taskhash})
+
+                row = cursor.fetchone()
+
+                if row is not None:
+                    logger.debug('Found equivalent task %s', row['taskhash'])
+                    d = {k: row[k] for k in ('taskhash', 'method', 'depid')}
+
+            self.send_response(200)
+            self.send_header('Content-Type', 'application/json; charset=utf-8')
+            self.end_headers()
+            self.wfile.write(json.dumps(d).encode('utf-8'))
+        except:
+            logger.exception('Error in GET')
+            self.send_error(400, explain=traceback.format_exc())
+            return
+
+    def do_POST(self):
+        try:
+            p = urllib.parse.urlparse(self.path)
+
+            if p.path != self.prefix + '/v1/equivalent':
+                self.send_error(404)
+                return
+
+            length = int(self.headers['content-length'])
+            data = json.loads(self.rfile.read(length).decode('utf-8'))
+
+            with contextlib.closing(self.db.cursor()) as cursor:
+                cursor.execute('''
+                    SELECT taskhash, method, depid FROM tasks_v1 WHERE method=:method AND outhash=:outhash
+                    ORDER BY CASE WHEN taskhash=:taskhash THEN 1 ELSE 2 END,
+                        created ASC
+                    LIMIT 1
+                    ''', {k: data[k] for k in ('method', 'outhash', 'taskhash')})
+
+                row = cursor.fetchone()
+
+                if row is None or row['taskhash'] != data['taskhash']:
+                    depid = data['depid']
+                    if row is not None:
+                        depid = row['depid']
+
+                    insert_data = {
+                            'method': data['method'],
+                            'outhash': data['outhash'],
+                            'taskhash': data['taskhash'],
+                            'depid': depid,
+                            'created': datetime.now()
+                            }
+
+                    for k in ('owner', 'PN', 'PV', 'PR', 'task', 'outhash_siginfo'):
+                        if k in data:
+                            insert_data[k] = data[k]
+
+                    cursor.execute('''INSERT INTO tasks_v1 (%s) VALUES (%s)''' % (
+                            ', '.join(sorted(insert_data.keys())),
+                            ', '.join(':' + k for k in sorted(insert_data.keys()))),
+                        insert_data)
+
+                    logger.info('Adding taskhash %s with depid %s', data['taskhash'], depid)
+                    cursor.execute('SELECT taskhash, method, depid FROM tasks_v1 WHERE id=:id', {'id': cursor.lastrowid})
+                    row = cursor.fetchone()
+
+                    self.db.commit()
+
+                d = {k: row[k] for k in ('taskhash', 'method', 'depid')}
+
+                self.send_response(200)
+                self.send_header('Content-Type', 'application/json; charset=utf-8')
+                self.end_headers()
+                self.wfile.write(json.dumps(d).encode('utf-8'))
+        except:
+            logger.exception('Error in POST')
+            self.send_error(400, explain=traceback.format_exc())
+            return
+
+def create_server(addr, db, prefix=''):
+    class Handler(HashEquivalenceServer):
+        pass
+
+    Handler.prefix = prefix
+    Handler.db = db
+    db.row_factory = sqlite3.Row
+
+    with contextlib.closing(db.cursor()) as cursor:
+        cursor.execute('''
+            CREATE TABLE IF NOT EXISTS tasks_v1 (
+                id INTEGER PRIMARY KEY AUTOINCREMENT,
+                method TEXT NOT NULL,
+                outhash TEXT NOT NULL,
+                taskhash TEXT NOT NULL,
+                depid TEXT NOT NULL,
+                created DATETIME,
+
+                -- Optional fields
+                owner TEXT,
+                PN TEXT,
+                PV TEXT,
+                PR TEXT,
+                task TEXT,
+                outhash_siginfo TEXT
+                )
+            ''')
+
+    logger.info('Starting server on %s', addr)
+    return HTTPServer(addr, Handler)
diff --git a/bitbake/lib/hashserv/tests.py b/bitbake/lib/hashserv/tests.py
new file mode 100644
index 00000000000..7efb1fce0bc
--- /dev/null
+++ b/bitbake/lib/hashserv/tests.py
@@ -0,0 +1,141 @@
+#! /usr/bin/env python3
+#
+# Copyright (C) 2018 Garmin Ltd.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License version 2 as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License along
+# with this program; if not, write to the Free Software Foundation, Inc.,
+# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+
+import unittest
+import threading
+import sqlite3
+import hashlib
+import urllib.request
+import json
+from . import create_server
+
+class TestHashEquivalenceServer(unittest.TestCase):
+    def setUp(self):
+        # Start an in memory hash equivalence server in the background bound to
+        # an ephemeral port
+        db = sqlite3.connect(':memory:', check_same_thread=False)
+        self.server = create_server(('localhost', 0), db)
+        self.server_addr = 'http://localhost:%d' % self.server.socket.getsockname()[1]
+        self.server_thread = threading.Thread(target=self.server.serve_forever)
+        self.server_thread.start()
+
+    def tearDown(self):
+        # Shutdown server
+        s = getattr(self, 'server', None)
+        if s is not None:
+            self.server.shutdown()
+            self.server_thread.join()
+            self.server.server_close()
+
+    def send_get(self, path):
+        url = '%s/%s' % (self.server_addr, path)
+        request = urllib.request.Request(url)
+        response = urllib.request.urlopen(request)
+        return json.loads(response.read().decode('utf-8'))
+
+    def send_post(self, path, data):
+        headers = {'content-type': 'application/json'}
+        url = '%s/%s' % (self.server_addr, path)
+        request = urllib.request.Request(url, json.dumps(data).encode('utf-8'), headers)
+        response = urllib.request.urlopen(request)
+        return json.loads(response.read().decode('utf-8'))
+
+    def test_create_hash(self):
+        # Simple test that hashes can be created
+        taskhash = '35788efcb8dfb0a02659d81cf2bfd695fb30faf9'
+        outhash = '2765d4a5884be49b28601445c2760c5f21e7e5c0ee2b7e3fce98fd7e5970796f'
+        depid = 'f46d3fbb439bd9b921095da657a4de906510d2cd'
+
+        d = self.send_get('v1/equivalent?method=TestMethod&taskhash=%s' % taskhash)
+        self.assertIsNone(d, msg='Found unexpected task, %r' % d)
+
+        d = self.send_post('v1/equivalent', {
+            'taskhash': taskhash,
+            'method': 'TestMethod',
+            'outhash': outhash,
+            'depid': depid,
+            })
+        self.assertEqual(d['depid'], depid, 'Server returned bad depid')
+
+    def test_create_equivalent(self):
+        # Tests that a second reported task with the same outhash will be
+        # assigned the same depid
+        taskhash = '53b8dce672cb6d0c73170be43f540460bfc347b4'
+        outhash = '5a9cb1649625f0bf41fc7791b635cd9c2d7118c7f021ba87dcd03f72b67ce7a8'
+        depid = 'f37918cc02eb5a520b1aff86faacbc0a38124646'
+        d = self.send_post('v1/equivalent', {
+            'taskhash': taskhash,
+            'method': 'TestMethod',
+            'outhash': outhash,
+            'depid': depid,
+            })
+        self.assertEqual(d['depid'], depid, 'Server returned bad depid')
+
+        # Report a different task with the same outhash. The returned depid
+        # should match the first task
+        taskhash2 = '3bf6f1e89d26205aec90da04854fbdbf73afe6b4'
+        depid2 = 'af36b199320e611fbb16f1f277d3ee1d619ca58b'
+        d = self.send_post('v1/equivalent', {
+            'taskhash': taskhash2,
+            'method': 'TestMethod',
+            'outhash': outhash,
+            'depid': depid2,
+            })
+        self.assertEqual(d['depid'], depid, 'Server returned bad depid')
+
+    def test_duplicate_taskhash(self):
+        # Tests that duplicate reports of the same taskhash with different
+        # outhash & depid always return the depid from the first reported
+        # taskhash
+        taskhash = '8aa96fcffb5831b3c2c0cb75f0431e3f8b20554a'
+        outhash = 'afe240a439959ce86f5e322f8c208e1fedefea9e813f2140c81af866cc9edf7e'
+        depid = '218e57509998197d570e2c98512d0105985dffc9'
+        d = self.send_post('v1/equivalent', {
+            'taskhash': taskhash,
+            'method': 'TestMethod',
+            'outhash': outhash,
+            'depid': depid,
+            })
+
+        d = self.send_get('v1/equivalent?method=TestMethod&taskhash=%s' % taskhash)
+        self.assertEqual(d['depid'], depid)
+
+        outhash2 = '0904a7fe3dc712d9fd8a74a616ddca2a825a8ee97adf0bd3fc86082c7639914d'
+        depid2 = 'ae9a7d252735f0dafcdb10e2e02561ca3a47314c'
+        d = self.send_post('v1/equivalent', {
+            'taskhash': taskhash,
+            'method': 'TestMethod',
+            'outhash': outhash2,
+            'depid': depid2
+            })
+
+        d = self.send_get('v1/equivalent?method=TestMethod&taskhash=%s' % taskhash)
+        self.assertEqual(d['depid'], depid)
+
+        outhash3 = '77623a549b5b1a31e3732dfa8fe61d7ce5d44b3370f253c5360e136b852967b4'
+        depid3 = '9217a7d6398518e5dc002ed58f2cbbbc78696603'
+        d = self.send_post('v1/equivalent', {
+            'taskhash': taskhash,
+            'method': 'TestMethod',
+            'outhash': outhash3,
+            'depid': depid3
+            })
+
+        d = self.send_get('v1/equivalent?method=TestMethod&taskhash=%s' % taskhash)
+        self.assertEqual(d['depid'], depid)
+
+
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [PATCH v3 15/17] bitbake: hashserv: Add hash equivalence reference server
@ 2018-12-04  3:42       ` Joshua Watt
  0 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-04  3:42 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

Implements a reference implementation of the hash equivalence server.
This server has minimal dependencies (and no dependencies outside of the
standard Python library), and implements the minimum required to be a
conforming hash equivalence server.

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/bin/bitbake-hashserv     |  67 ++++++++++++++
 bitbake/bin/bitbake-selftest     |   2 +
 bitbake/lib/hashserv/__init__.py | 152 +++++++++++++++++++++++++++++++
 bitbake/lib/hashserv/tests.py    | 141 ++++++++++++++++++++++++++++
 4 files changed, 362 insertions(+)
 create mode 100755 bitbake/bin/bitbake-hashserv
 create mode 100644 bitbake/lib/hashserv/__init__.py
 create mode 100644 bitbake/lib/hashserv/tests.py

diff --git a/bitbake/bin/bitbake-hashserv b/bitbake/bin/bitbake-hashserv
new file mode 100755
index 00000000000..c49397b73a5
--- /dev/null
+++ b/bitbake/bin/bitbake-hashserv
@@ -0,0 +1,67 @@
+#! /usr/bin/env python3
+#
+# Copyright (C) 2018 Garmin Ltd.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License version 2 as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License along
+# with this program; if not, write to the Free Software Foundation, Inc.,
+# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+import os
+import sys
+import logging
+import argparse
+import sqlite3
+
+sys.path.insert(0, os.path.join(os.path.dirname(os.path.dirname(__file__)),'lib'))
+
+import hashserv
+
+VERSION = "1.0.0"
+
+DEFAULT_HOST = ''
+DEFAULT_PORT = 8686
+
+def main():
+    parser = argparse.ArgumentParser(description='HTTP Equivalence Reference Server. Version=%s' % VERSION)
+    parser.add_argument('--address', default=DEFAULT_HOST, help='Bind address (default "%(default)s")')
+    parser.add_argument('--port', type=int, default=DEFAULT_PORT, help='Bind port (default %(default)d)')
+    parser.add_argument('--prefix', default='', help='HTTP path prefix (default "%(default)s")')
+    parser.add_argument('--database', default='./hashserv.db', help='Database file (default "%(default)s")')
+    parser.add_argument('--log', default='WARNING', help='Set logging level')
+
+    args = parser.parse_args()
+
+    logger = logging.getLogger('hashserv')
+
+    level = getattr(logging, args.log.upper(), None)
+    if not isinstance(level, int):
+        raise ValueError('Invalid log level: %s' % args.log)
+
+    logger.setLevel(level)
+    console = logging.StreamHandler()
+    console.setLevel(level)
+    logger.addHandler(console)
+
+    db = sqlite3.connect(args.database)
+
+    server = hashserv.create_server((args.address, args.port), db, args.prefix)
+    server.serve_forever()
+    return 0
+
+if __name__ == '__main__':
+    try:
+        ret = main()
+    except Exception:
+        ret = 1
+        import traceback
+        traceback.print_exc()
+    sys.exit(ret)
+
diff --git a/bitbake/bin/bitbake-selftest b/bitbake/bin/bitbake-selftest
index c970dcae90c..99f1af910f4 100755
--- a/bitbake/bin/bitbake-selftest
+++ b/bitbake/bin/bitbake-selftest
@@ -22,6 +22,7 @@ sys.path.insert(0, os.path.join(os.path.dirname(os.path.dirname(__file__)), 'lib
 import unittest
 try:
     import bb
+    import hashserv
     import layerindexlib
 except RuntimeError as exc:
     sys.exit(str(exc))
@@ -35,6 +36,7 @@ tests = ["bb.tests.codeparser",
          "bb.tests.parse",
          "bb.tests.persist_data",
          "bb.tests.utils",
+         "hashserv.tests",
          "layerindexlib.tests.layerindexobj",
          "layerindexlib.tests.restapi",
          "layerindexlib.tests.cooker"]
diff --git a/bitbake/lib/hashserv/__init__.py b/bitbake/lib/hashserv/__init__.py
new file mode 100644
index 00000000000..cde030cb88e
--- /dev/null
+++ b/bitbake/lib/hashserv/__init__.py
@@ -0,0 +1,152 @@
+# Copyright (C) 2018 Garmin Ltd.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License version 2 as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License along
+# with this program; if not, write to the Free Software Foundation, Inc.,
+# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+
+from http.server import BaseHTTPRequestHandler, HTTPServer
+import contextlib
+import urllib.parse
+import sqlite3
+import json
+import traceback
+import logging
+from datetime import datetime
+
+logger = logging.getLogger('hashserv')
+
+class HashEquivalenceServer(BaseHTTPRequestHandler):
+    def log_message(self, f, *args):
+        logger.debug(f, *args)
+
+    def do_GET(self):
+        try:
+            p = urllib.parse.urlparse(self.path)
+
+            if p.path != self.prefix + '/v1/equivalent':
+                self.send_error(404)
+                return
+
+            query = urllib.parse.parse_qs(p.query, strict_parsing=True)
+            method = query['method'][0]
+            taskhash = query['taskhash'][0]
+
+            d = None
+            with contextlib.closing(self.db.cursor()) as cursor:
+                cursor.execute('SELECT taskhash, method, depid FROM tasks_v1 WHERE method=:method AND taskhash=:taskhash ORDER BY created ASC LIMIT 1',
+                        {'method': method, 'taskhash': taskhash})
+
+                row = cursor.fetchone()
+
+                if row is not None:
+                    logger.debug('Found equivalent task %s', row['taskhash'])
+                    d = {k: row[k] for k in ('taskhash', 'method', 'depid')}
+
+            self.send_response(200)
+            self.send_header('Content-Type', 'application/json; charset=utf-8')
+            self.end_headers()
+            self.wfile.write(json.dumps(d).encode('utf-8'))
+        except:
+            logger.exception('Error in GET')
+            self.send_error(400, explain=traceback.format_exc())
+            return
+
+    def do_POST(self):
+        try:
+            p = urllib.parse.urlparse(self.path)
+
+            if p.path != self.prefix + '/v1/equivalent':
+                self.send_error(404)
+                return
+
+            length = int(self.headers['content-length'])
+            data = json.loads(self.rfile.read(length).decode('utf-8'))
+
+            with contextlib.closing(self.db.cursor()) as cursor:
+                cursor.execute('''
+                    SELECT taskhash, method, depid FROM tasks_v1 WHERE method=:method AND outhash=:outhash
+                    ORDER BY CASE WHEN taskhash=:taskhash THEN 1 ELSE 2 END,
+                        created ASC
+                    LIMIT 1
+                    ''', {k: data[k] for k in ('method', 'outhash', 'taskhash')})
+
+                row = cursor.fetchone()
+
+                if row is None or row['taskhash'] != data['taskhash']:
+                    depid = data['depid']
+                    if row is not None:
+                        depid = row['depid']
+
+                    insert_data = {
+                            'method': data['method'],
+                            'outhash': data['outhash'],
+                            'taskhash': data['taskhash'],
+                            'depid': depid,
+                            'created': datetime.now()
+                            }
+
+                    for k in ('owner', 'PN', 'PV', 'PR', 'task', 'outhash_siginfo'):
+                        if k in data:
+                            insert_data[k] = data[k]
+
+                    cursor.execute('''INSERT INTO tasks_v1 (%s) VALUES (%s)''' % (
+                            ', '.join(sorted(insert_data.keys())),
+                            ', '.join(':' + k for k in sorted(insert_data.keys()))),
+                        insert_data)
+
+                    logger.info('Adding taskhash %s with depid %s', data['taskhash'], depid)
+                    cursor.execute('SELECT taskhash, method, depid FROM tasks_v1 WHERE id=:id', {'id': cursor.lastrowid})
+                    row = cursor.fetchone()
+
+                    self.db.commit()
+
+                d = {k: row[k] for k in ('taskhash', 'method', 'depid')}
+
+                self.send_response(200)
+                self.send_header('Content-Type', 'application/json; charset=utf-8')
+                self.end_headers()
+                self.wfile.write(json.dumps(d).encode('utf-8'))
+        except:
+            logger.exception('Error in POST')
+            self.send_error(400, explain=traceback.format_exc())
+            return
+
+def create_server(addr, db, prefix=''):
+    class Handler(HashEquivalenceServer):
+        pass
+
+    Handler.prefix = prefix
+    Handler.db = db
+    db.row_factory = sqlite3.Row
+
+    with contextlib.closing(db.cursor()) as cursor:
+        cursor.execute('''
+            CREATE TABLE IF NOT EXISTS tasks_v1 (
+                id INTEGER PRIMARY KEY AUTOINCREMENT,
+                method TEXT NOT NULL,
+                outhash TEXT NOT NULL,
+                taskhash TEXT NOT NULL,
+                depid TEXT NOT NULL,
+                created DATETIME,
+
+                -- Optional fields
+                owner TEXT,
+                PN TEXT,
+                PV TEXT,
+                PR TEXT,
+                task TEXT,
+                outhash_siginfo TEXT
+                )
+            ''')
+
+    logger.info('Starting server on %s', addr)
+    return HTTPServer(addr, Handler)
diff --git a/bitbake/lib/hashserv/tests.py b/bitbake/lib/hashserv/tests.py
new file mode 100644
index 00000000000..7efb1fce0bc
--- /dev/null
+++ b/bitbake/lib/hashserv/tests.py
@@ -0,0 +1,141 @@
+#! /usr/bin/env python3
+#
+# Copyright (C) 2018 Garmin Ltd.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License version 2 as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License along
+# with this program; if not, write to the Free Software Foundation, Inc.,
+# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+
+import unittest
+import threading
+import sqlite3
+import hashlib
+import urllib.request
+import json
+from . import create_server
+
+class TestHashEquivalenceServer(unittest.TestCase):
+    def setUp(self):
+        # Start an in memory hash equivalence server in the background bound to
+        # an ephemeral port
+        db = sqlite3.connect(':memory:', check_same_thread=False)
+        self.server = create_server(('localhost', 0), db)
+        self.server_addr = 'http://localhost:%d' % self.server.socket.getsockname()[1]
+        self.server_thread = threading.Thread(target=self.server.serve_forever)
+        self.server_thread.start()
+
+    def tearDown(self):
+        # Shutdown server
+        s = getattr(self, 'server', None)
+        if s is not None:
+            self.server.shutdown()
+            self.server_thread.join()
+            self.server.server_close()
+
+    def send_get(self, path):
+        url = '%s/%s' % (self.server_addr, path)
+        request = urllib.request.Request(url)
+        response = urllib.request.urlopen(request)
+        return json.loads(response.read().decode('utf-8'))
+
+    def send_post(self, path, data):
+        headers = {'content-type': 'application/json'}
+        url = '%s/%s' % (self.server_addr, path)
+        request = urllib.request.Request(url, json.dumps(data).encode('utf-8'), headers)
+        response = urllib.request.urlopen(request)
+        return json.loads(response.read().decode('utf-8'))
+
+    def test_create_hash(self):
+        # Simple test that hashes can be created
+        taskhash = '35788efcb8dfb0a02659d81cf2bfd695fb30faf9'
+        outhash = '2765d4a5884be49b28601445c2760c5f21e7e5c0ee2b7e3fce98fd7e5970796f'
+        depid = 'f46d3fbb439bd9b921095da657a4de906510d2cd'
+
+        d = self.send_get('v1/equivalent?method=TestMethod&taskhash=%s' % taskhash)
+        self.assertIsNone(d, msg='Found unexpected task, %r' % d)
+
+        d = self.send_post('v1/equivalent', {
+            'taskhash': taskhash,
+            'method': 'TestMethod',
+            'outhash': outhash,
+            'depid': depid,
+            })
+        self.assertEqual(d['depid'], depid, 'Server returned bad depid')
+
+    def test_create_equivalent(self):
+        # Tests that a second reported task with the same outhash will be
+        # assigned the same depid
+        taskhash = '53b8dce672cb6d0c73170be43f540460bfc347b4'
+        outhash = '5a9cb1649625f0bf41fc7791b635cd9c2d7118c7f021ba87dcd03f72b67ce7a8'
+        depid = 'f37918cc02eb5a520b1aff86faacbc0a38124646'
+        d = self.send_post('v1/equivalent', {
+            'taskhash': taskhash,
+            'method': 'TestMethod',
+            'outhash': outhash,
+            'depid': depid,
+            })
+        self.assertEqual(d['depid'], depid, 'Server returned bad depid')
+
+        # Report a different task with the same outhash. The returned depid
+        # should match the first task
+        taskhash2 = '3bf6f1e89d26205aec90da04854fbdbf73afe6b4'
+        depid2 = 'af36b199320e611fbb16f1f277d3ee1d619ca58b'
+        d = self.send_post('v1/equivalent', {
+            'taskhash': taskhash2,
+            'method': 'TestMethod',
+            'outhash': outhash,
+            'depid': depid2,
+            })
+        self.assertEqual(d['depid'], depid, 'Server returned bad depid')
+
+    def test_duplicate_taskhash(self):
+        # Tests that duplicate reports of the same taskhash with different
+        # outhash & depid always return the depid from the first reported
+        # taskhash
+        taskhash = '8aa96fcffb5831b3c2c0cb75f0431e3f8b20554a'
+        outhash = 'afe240a439959ce86f5e322f8c208e1fedefea9e813f2140c81af866cc9edf7e'
+        depid = '218e57509998197d570e2c98512d0105985dffc9'
+        d = self.send_post('v1/equivalent', {
+            'taskhash': taskhash,
+            'method': 'TestMethod',
+            'outhash': outhash,
+            'depid': depid,
+            })
+
+        d = self.send_get('v1/equivalent?method=TestMethod&taskhash=%s' % taskhash)
+        self.assertEqual(d['depid'], depid)
+
+        outhash2 = '0904a7fe3dc712d9fd8a74a616ddca2a825a8ee97adf0bd3fc86082c7639914d'
+        depid2 = 'ae9a7d252735f0dafcdb10e2e02561ca3a47314c'
+        d = self.send_post('v1/equivalent', {
+            'taskhash': taskhash,
+            'method': 'TestMethod',
+            'outhash': outhash2,
+            'depid': depid2
+            })
+
+        d = self.send_get('v1/equivalent?method=TestMethod&taskhash=%s' % taskhash)
+        self.assertEqual(d['depid'], depid)
+
+        outhash3 = '77623a549b5b1a31e3732dfa8fe61d7ce5d44b3370f253c5360e136b852967b4'
+        depid3 = '9217a7d6398518e5dc002ed58f2cbbbc78696603'
+        d = self.send_post('v1/equivalent', {
+            'taskhash': taskhash,
+            'method': 'TestMethod',
+            'outhash': outhash3,
+            'depid': depid3
+            })
+
+        d = self.send_get('v1/equivalent?method=TestMethod&taskhash=%s' % taskhash)
+        self.assertEqual(d['depid'], depid)
+
+
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [OE-core][PATCH v3 16/17] sstate: Implement hash equivalence sstate
  2018-12-04  3:42     ` [PATCH " Joshua Watt
@ 2018-12-04  3:42       ` Joshua Watt
  -1 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-04  3:42 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

Converts sstate so that it can use a hash equivalence server to
determine if a task really needs to be rebuilt, or if it can be restored
from a different (equivalent) sstate object.

The dependency IDs are cached persistently using persist_data. This has
a number of advantages:
 1) Dependency IDs can be cached between invocations of bitbake to
    prevent needing to contact the server every time (which is slow)
 2) The value of each tasks dependency ID can easily be synchronized
    between different threads, which will be useful if bitbake is
    updated to do on the fly task re-hashing.

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 meta/classes/sstate.bbclass | 100 ++++++++++++++++++++--
 meta/conf/bitbake.conf      |   4 +-
 meta/lib/oe/sstatesig.py    | 166 ++++++++++++++++++++++++++++++++++++
 3 files changed, 261 insertions(+), 9 deletions(-)

diff --git a/meta/classes/sstate.bbclass b/meta/classes/sstate.bbclass
index 4b91ff472d2..3d37ad2f5af 100644
--- a/meta/classes/sstate.bbclass
+++ b/meta/classes/sstate.bbclass
@@ -11,7 +11,7 @@ def generate_sstatefn(spec, hash, d):
 SSTATE_PKGARCH    = "${PACKAGE_ARCH}"
 SSTATE_PKGSPEC    = "sstate:${PN}:${PACKAGE_ARCH}${TARGET_VENDOR}-${TARGET_OS}:${PV}:${PR}:${SSTATE_PKGARCH}:${SSTATE_VERSION}:"
 SSTATE_SWSPEC     = "sstate:${PN}::${PV}:${PR}::${SSTATE_VERSION}:"
-SSTATE_PKGNAME    = "${SSTATE_EXTRAPATH}${@generate_sstatefn(d.getVar('SSTATE_PKGSPEC'), d.getVar('BB_TASKHASH'), d)}"
+SSTATE_PKGNAME    = "${SSTATE_EXTRAPATH}${@generate_sstatefn(d.getVar('SSTATE_PKGSPEC'), d.getVar('BB_DEPID'), d)}"
 SSTATE_PKG        = "${SSTATE_DIR}/${SSTATE_PKGNAME}"
 SSTATE_EXTRAPATH   = ""
 SSTATE_EXTRAPATHWILDCARD = ""
@@ -82,6 +82,23 @@ SSTATE_SIG_PASSPHRASE ?= ""
 # Whether to verify the GnUPG signatures when extracting sstate archives
 SSTATE_VERIFY_SIG ?= "0"
 
+SSTATE_HASHEQUIV_METHOD ?= "OEOuthashBasic"
+SSTATE_HASHEQUIV_METHOD[doc] = "The function used to calculate the output hash \
+    for a task, which in turn is used to determine equivalency. \
+    "
+
+SSTATE_HASHEQUIV_SERVER ?= ""
+SSTATE_HASHEQUIV_SERVER[doc] = "The hash equivalence sever. For example, \
+    'http://192.168.0.1:5000'. Do not include a trailing slash \
+    "
+
+SSTATE_HASHEQUIV_REPORT_TASKDATA ?= "0"
+SSTATE_HASHEQUIV_REPORT_TASKDATA[doc] = "Report additional useful data to the \
+    hash equivalency server, such as PN, PV, taskname, etc. This information \
+    is very useful for developers looking at task data, but may leak sensitive \
+    data if the equivalence server is public. \
+    "
+
 python () {
     if bb.data.inherits_class('native', d):
         d.setVar('SSTATE_PKGARCH', d.getVar('BUILD_ARCH', False))
@@ -634,7 +651,7 @@ def sstate_package(ss, d):
         return
 
     for f in (d.getVar('SSTATECREATEFUNCS') or '').split() + \
-             ['sstate_create_package', 'sstate_sign_package'] + \
+             ['sstate_report_depid', 'sstate_create_package', 'sstate_sign_package'] + \
              (d.getVar('SSTATEPOSTCREATEFUNCS') or '').split():
         # All hooks should run in SSTATE_BUILDDIR.
         bb.build.exec_func(f, d, (sstatebuild,))
@@ -758,6 +775,73 @@ python sstate_sign_package () {
                            d.getVar('SSTATE_SIG_PASSPHRASE'), armor=False)
 }
 
+def OEOuthashBasic(path, sigfile, task, d):
+    import hashlib
+    import stat
+
+    def update_hash(s):
+        s = s.encode('utf-8')
+        h.update(s)
+        if sigfile:
+            sigfile.write(s)
+
+    h = hashlib.sha1()
+    prev_dir = os.getcwd()
+
+    try:
+        os.chdir(path)
+
+        update_hash("OEOuthashBasic\n")
+
+        # It is only currently useful to get equivalent hashes for things that
+        # can be restored from sstate. Since the sstate object is named using
+        # SSTATE_PKGSPEC and the task name, those should be included in the
+        # output hash calculation.
+        update_hash("SSTATE_PKGSPEC=%s\n" % d.getVar('SSTATE_PKGSPEC'))
+        update_hash("task=%s\n" % task)
+
+        for root, dirs, files in os.walk('.', topdown=True):
+            # Sort directories and files to ensure consistent ordering
+            dirs.sort()
+            files.sort()
+
+            for f in files:
+                path = os.path.join(root, f)
+                s = os.lstat(path)
+
+                # Hash file path
+                update_hash(path + '\n')
+
+                # Hash file mode
+                update_hash("\tmode=0x%x\n" % stat.S_IMODE(s.st_mode))
+                update_hash("\ttype=0x%x\n" % stat.S_IFMT(s.st_mode))
+
+                if stat.S_ISBLK(s.st_mode) or stat.S_ISBLK(s.st_mode):
+                    # Hash device major and minor
+                    update_hash("\tdev=%d,%d\n" % (os.major(s.st_rdev), os.minor(s.st_rdev)))
+                elif stat.S_ISLNK(s.st_mode):
+                    # Hash symbolic link
+                    update_hash("\tsymlink=%s\n" % os.readlink(path))
+                else:
+                    fh = hashlib.sha1()
+                    # Hash file contents
+                    with open(path, 'rb') as d:
+                        for chunk in iter(lambda: d.read(4096), b""):
+                            fh.update(chunk)
+                    update_hash("\tdigest=%s\n" % fh.hexdigest())
+    finally:
+        os.chdir(prev_dir)
+
+    return h.hexdigest()
+
+python sstate_report_depid() {
+    report_depid = getattr(bb.parse.siggen, 'report_depid', None)
+
+    if report_depid:
+        ss = sstate_state_fromvars(d)
+        report_depid(os.getcwd(), ss['task'], d)
+}
+
 #
 # Shell function to decompress and prepare a package for installation
 # Will be run from within SSTATE_INSTDIR.
@@ -804,7 +888,7 @@ def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, sq_depid, d, siginfo=
 
         spec, extrapath, tname = getpathcomponents(task, d)
 
-        sstatefile = d.expand("${SSTATE_DIR}/" + extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + extension)
+        sstatefile = d.expand("${SSTATE_DIR}/" + extrapath + generate_sstatefn(spec, sq_depid[task], d) + "_" + tname + extension)
 
         if os.path.exists(sstatefile):
             bb.debug(2, "SState: Found valid sstate file %s" % sstatefile)
@@ -866,7 +950,7 @@ def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, sq_depid, d, siginfo=
             if task in ret:
                 continue
             spec, extrapath, tname = getpathcomponents(task, d)
-            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + extension)
+            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_depid[task], d) + "_" + tname + extension)
             tasklist.append((task, sstatefile))
 
         if tasklist:
@@ -892,12 +976,12 @@ def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, sq_depid, d, siginfo=
         evdata = {'missed': [], 'found': []};
         for task in missed:
             spec, extrapath, tname = getpathcomponents(task, d)
-            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + ".tgz")
-            evdata['missed'].append( (sq_fn[task], sq_task[task], sq_hash[task], sstatefile ) )
+            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_depid[task], d) + "_" + tname + ".tgz")
+            evdata['missed'].append( (sq_fn[task], sq_task[task], sq_depid[task], sstatefile ) )
         for task in ret:
             spec, extrapath, tname = getpathcomponents(task, d)
-            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + ".tgz")
-            evdata['found'].append( (sq_fn[task], sq_task[task], sq_hash[task], sstatefile ) )
+            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_depid[task], d) + "_" + tname + ".tgz")
+            evdata['found'].append( (sq_fn[task], sq_task[task], sq_depid[task], sstatefile ) )
         bb.event.fire(bb.event.MetadataEvent("MissedSstate", evdata), d)
 
     # Print some summary statistics about the current task completion and how much sstate
diff --git a/meta/conf/bitbake.conf b/meta/conf/bitbake.conf
index dcf20078831..7e17c6222eb 100644
--- a/meta/conf/bitbake.conf
+++ b/meta/conf/bitbake.conf
@@ -867,7 +867,9 @@ BB_HASHBASE_WHITELIST ?= "TMPDIR FILE PATH PWD BB_TASKHASH BBPATH BBSERVER DL_DI
     STAMPS_DIR PRSERV_DUMPDIR PRSERV_DUMPFILE PRSERV_LOCKDOWN PARALLEL_MAKE \
     CCACHE_DIR EXTERNAL_TOOLCHAIN CCACHE CCACHE_NOHASHDIR LICENSE_PATH SDKPKGSUFFIX \
     WARN_QA ERROR_QA WORKDIR STAMPCLEAN PKGDATA_DIR BUILD_ARCH SSTATE_PKGARCH \
-    BB_WORKERCONTEXT BB_LIMITEDDEPS extend_recipe_sysroot DEPLOY_DIR"
+    BB_WORKERCONTEXT BB_LIMITEDDEPS BB_DEPID extend_recipe_sysroot DEPLOY_DIR \
+    SSTATE_HASHEQUIV_METHOD SSTATE_HASHEQUIV_SERVER SSTATE_HASHEQUIV_REPORT_TASKDATA \
+    SSTATE_HASHEQUIV_OWNER"
 BB_HASHCONFIG_WHITELIST ?= "${BB_HASHBASE_WHITELIST} DATE TIME SSH_AGENT_PID \
     SSH_AUTH_SOCK PSEUDO_BUILD BB_ENV_EXTRAWHITE DISABLE_SANITY_CHECKS \
     PARALLEL_MAKE BB_NUMBER_THREADS BB_ORIGENV BB_INVALIDCONF BBINCLUDED \
diff --git a/meta/lib/oe/sstatesig.py b/meta/lib/oe/sstatesig.py
index 18c5a353a2a..7f75de3279f 100644
--- a/meta/lib/oe/sstatesig.py
+++ b/meta/lib/oe/sstatesig.py
@@ -263,10 +263,176 @@ class SignatureGeneratorOEBasicHash(bb.siggen.SignatureGeneratorBasicHash):
         if error_msgs:
             bb.fatal("\n".join(error_msgs))
 
+class SignatureGeneratorOEEquivHash(SignatureGeneratorOEBasicHash):
+    name = "OEEquivHash"
+
+    def init_rundepcheck(self, data):
+        super().init_rundepcheck(data)
+        self.server = data.getVar('SSTATE_HASHEQUIV_SERVER')
+        self.method = data.getVar('SSTATE_HASHEQUIV_METHOD')
+        self.depids = bb.persist_data.persist('SSTATESIG_DEPID_CACHE_v1_' + self.method, data)
+
+    def get_taskdata(self):
+        return (self.server, self.method) + super().get_taskdata()
+
+    def set_taskdata(self, data):
+        self.server, self.method = data[:2]
+        super().set_taskdata(data[2:])
+
+    def __get_task_depid_key(self, task):
+        # TODO: The key only *needs* to be the taskhash, the task is just
+        # convenient
+        return '%s:%s' % (task, self.taskhash[task])
+
+    def get_stampfile_hash(self, task):
+        if task in self.taskhash:
+            # If a depid is reported, use it as the stampfile hash. This
+            # ensures that if a task won't be re-run if the taskhash changes,
+            # but it would result in the same output hash
+            depid = self.depids.get(self.__get_task_depid_key(task))
+            if depid is not None:
+                return depid
+
+        return super().get_stampfile_hash(task)
+
+    def get_depid(self, task):
+        import urllib
+        import json
+
+        taskhash = self.taskhash[task]
+
+        key = self.__get_task_depid_key(task)
+
+        # TODO: This cache can grow unbounded. It probably only needs to keep
+        # for each task
+        depid = self.depids.get(key)
+        if depid is not None:
+            return depid
+
+        # In the absence of being able to discover a dependency ID from the
+        # server, make it be equivalent to the taskhash. The dependency ID only
+        # really needs to be a unique string (not even necessarily a hash), but
+        # making it match the taskhash has a few advantages:
+        #
+        # 1) All of the sstate code that assumes hashes can be the same
+        # 2) It provides maximal compatibility with builders that don't use
+        #    an equivalency server
+        # 3) The value is easy for multiple independent builders to derive the
+        #    same depid from the same input. This means that if the independent
+        #    builders find the same taskhash, but it isn't reported to the server,
+        #    there is a better chance that they will agree on the dependency ID.
+        depid = taskhash
+
+        try:
+            url = '%s/v1/equivalent?%s' % (self.server,
+                    urllib.parse.urlencode({'method': self.method, 'taskhash': self.taskhash[task]}))
+
+            request = urllib.request.Request(url)
+            response = urllib.request.urlopen(request)
+            data = response.read().decode('utf-8')
+
+            json_data = json.loads(data)
+
+            if json_data:
+                depid = json_data['depid']
+                # Dependency ID equal to the taskhash is not very interesting,
+                # so it is reported it at debug level 2. If they differ, that
+                # is much more interesting, so it is reported at debug level 1
+                bb.debug((1, 2)[depid == taskhash], 'Found depid %s in place of %s for %s from %s' % (depid, taskhash, task, self.server))
+            else:
+                bb.debug(2, 'No reported dependency ID for %s:%s from %s' % (task, taskhash, self.server))
+        except urllib.error.URLError as e:
+            bb.warn('Failure contacting Hash Equivalence Server %s: %s' % (self.server, str(e)))
+        except (KeyError, json.JSONDecodeError) as e:
+            bb.warn('Poorly formatted response from %s: %s' % (self.server, str(e)))
+
+        self.depids[key] = depid
+        return depid
+
+    def report_depid(self, path, task, d):
+        import urllib
+        import json
+        import tempfile
+        import base64
+
+        taskhash = d.getVar('BB_TASKHASH')
+        depid = d.getVar('BB_DEPID')
+        report_taskdata = d.getVar('SSTATE_HASHEQUIV_REPORT_TASKDATA') == '1'
+        tempdir = d.getVar('T')
+        fn = d.getVar('BB_FILENAME')
+        key = fn + '.do_' + task + ':' + taskhash
+
+        # Sanity checks
+        cache_depid = self.depids.get(key)
+        if cache_depid is None:
+            bb.fatal('%s not in depid cache. Please report this error' % key)
+
+        if cache_depid != depid:
+            bb.fatal("Cache depid %s doesn't match BB_DEPID %s" % (cache_depid, depid))
+
+        sigfile = None
+        sigfile_name = "depsig.do_%s.%d" % (task, os.getpid())
+        sigfile_link = "depsig.do_%s" % task
+
+        try:
+            call = self.method + '(path, sigfile, task, d)'
+            sigfile = open(os.path.join(tempdir, sigfile_name), 'w+b')
+            locs = {'path': path, 'sigfile': sigfile, 'task': task, 'd': d}
+
+            outhash = bb.utils.better_eval(call, locs)
+
+            try:
+                url = '%s/v1/equivalent' % self.server
+                task_data = {
+                    'taskhash': taskhash,
+                    'method': self.method,
+                    'outhash': outhash,
+                    'depid': depid,
+                    'owner': d.getVar('SSTATE_HASHEQUIV_OWNER')
+                    }
+
+                if report_taskdata:
+                    sigfile.seek(0)
+
+                    task_data['PN'] = d.getVar('PN')
+                    task_data['PV'] = d.getVar('PV')
+                    task_data['PR'] = d.getVar('PR')
+                    task_data['task'] = task
+                    task_data['outhash_siginfo'] = sigfile.read().decode('utf-8')
+
+                headers = {'content-type': 'application/json'}
+
+                request = urllib.request.Request(url, json.dumps(task_data).encode('utf-8'), headers)
+                response = urllib.request.urlopen(request)
+                data = response.read().decode('utf-8')
+
+                json_data = json.loads(data)
+                new_depid = json_data['depid']
+
+                if new_depid != depid:
+                    bb.debug(1, 'Task %s depid changed %s -> %s by server %s' % (taskhash, depid, new_depid, self.server))
+                else:
+                    bb.debug(1, 'Reported task %s as depid %s to %s' % (taskhash, depid, self.server))
+            except urllib.error.URLError as e:
+                bb.warn('Failure contacting Hash Equivalence Server %s: %s' % (self.server, str(e)))
+            except (KeyError, json.JSONDecodeError) as e:
+                bb.warn('Poorly formatted response from %s: %s' % (self.server, str(e)))
+        finally:
+            if sigfile:
+                sigfile.close()
+
+                sigfile_link_path = os.path.join(tempdir, sigfile_link)
+                bb.utils.remove(sigfile_link_path)
+
+                try:
+                    os.symlink(sigfile_name, sigfile_link_path)
+                except OSError:
+                    pass
 
 # Insert these classes into siggen's namespace so it can see and select them
 bb.siggen.SignatureGeneratorOEBasic = SignatureGeneratorOEBasic
 bb.siggen.SignatureGeneratorOEBasicHash = SignatureGeneratorOEBasicHash
+bb.siggen.SignatureGeneratorOEEquivHash = SignatureGeneratorOEEquivHash
 
 
 def find_siginfo(pn, taskname, taskhashlist, d):
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [PATCH v3 16/17] sstate: Implement hash equivalence sstate
@ 2018-12-04  3:42       ` Joshua Watt
  0 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-04  3:42 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

Converts sstate so that it can use a hash equivalence server to
determine if a task really needs to be rebuilt, or if it can be restored
from a different (equivalent) sstate object.

The dependency IDs are cached persistently using persist_data. This has
a number of advantages:
 1) Dependency IDs can be cached between invocations of bitbake to
    prevent needing to contact the server every time (which is slow)
 2) The value of each tasks dependency ID can easily be synchronized
    between different threads, which will be useful if bitbake is
    updated to do on the fly task re-hashing.

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 meta/classes/sstate.bbclass | 100 ++++++++++++++++++++--
 meta/conf/bitbake.conf      |   4 +-
 meta/lib/oe/sstatesig.py    | 166 ++++++++++++++++++++++++++++++++++++
 3 files changed, 261 insertions(+), 9 deletions(-)

diff --git a/meta/classes/sstate.bbclass b/meta/classes/sstate.bbclass
index 4b91ff472d2..3d37ad2f5af 100644
--- a/meta/classes/sstate.bbclass
+++ b/meta/classes/sstate.bbclass
@@ -11,7 +11,7 @@ def generate_sstatefn(spec, hash, d):
 SSTATE_PKGARCH    = "${PACKAGE_ARCH}"
 SSTATE_PKGSPEC    = "sstate:${PN}:${PACKAGE_ARCH}${TARGET_VENDOR}-${TARGET_OS}:${PV}:${PR}:${SSTATE_PKGARCH}:${SSTATE_VERSION}:"
 SSTATE_SWSPEC     = "sstate:${PN}::${PV}:${PR}::${SSTATE_VERSION}:"
-SSTATE_PKGNAME    = "${SSTATE_EXTRAPATH}${@generate_sstatefn(d.getVar('SSTATE_PKGSPEC'), d.getVar('BB_TASKHASH'), d)}"
+SSTATE_PKGNAME    = "${SSTATE_EXTRAPATH}${@generate_sstatefn(d.getVar('SSTATE_PKGSPEC'), d.getVar('BB_DEPID'), d)}"
 SSTATE_PKG        = "${SSTATE_DIR}/${SSTATE_PKGNAME}"
 SSTATE_EXTRAPATH   = ""
 SSTATE_EXTRAPATHWILDCARD = ""
@@ -82,6 +82,23 @@ SSTATE_SIG_PASSPHRASE ?= ""
 # Whether to verify the GnUPG signatures when extracting sstate archives
 SSTATE_VERIFY_SIG ?= "0"
 
+SSTATE_HASHEQUIV_METHOD ?= "OEOuthashBasic"
+SSTATE_HASHEQUIV_METHOD[doc] = "The function used to calculate the output hash \
+    for a task, which in turn is used to determine equivalency. \
+    "
+
+SSTATE_HASHEQUIV_SERVER ?= ""
+SSTATE_HASHEQUIV_SERVER[doc] = "The hash equivalence sever. For example, \
+    'http://192.168.0.1:5000'. Do not include a trailing slash \
+    "
+
+SSTATE_HASHEQUIV_REPORT_TASKDATA ?= "0"
+SSTATE_HASHEQUIV_REPORT_TASKDATA[doc] = "Report additional useful data to the \
+    hash equivalency server, such as PN, PV, taskname, etc. This information \
+    is very useful for developers looking at task data, but may leak sensitive \
+    data if the equivalence server is public. \
+    "
+
 python () {
     if bb.data.inherits_class('native', d):
         d.setVar('SSTATE_PKGARCH', d.getVar('BUILD_ARCH', False))
@@ -634,7 +651,7 @@ def sstate_package(ss, d):
         return
 
     for f in (d.getVar('SSTATECREATEFUNCS') or '').split() + \
-             ['sstate_create_package', 'sstate_sign_package'] + \
+             ['sstate_report_depid', 'sstate_create_package', 'sstate_sign_package'] + \
              (d.getVar('SSTATEPOSTCREATEFUNCS') or '').split():
         # All hooks should run in SSTATE_BUILDDIR.
         bb.build.exec_func(f, d, (sstatebuild,))
@@ -758,6 +775,73 @@ python sstate_sign_package () {
                            d.getVar('SSTATE_SIG_PASSPHRASE'), armor=False)
 }
 
+def OEOuthashBasic(path, sigfile, task, d):
+    import hashlib
+    import stat
+
+    def update_hash(s):
+        s = s.encode('utf-8')
+        h.update(s)
+        if sigfile:
+            sigfile.write(s)
+
+    h = hashlib.sha1()
+    prev_dir = os.getcwd()
+
+    try:
+        os.chdir(path)
+
+        update_hash("OEOuthashBasic\n")
+
+        # It is only currently useful to get equivalent hashes for things that
+        # can be restored from sstate. Since the sstate object is named using
+        # SSTATE_PKGSPEC and the task name, those should be included in the
+        # output hash calculation.
+        update_hash("SSTATE_PKGSPEC=%s\n" % d.getVar('SSTATE_PKGSPEC'))
+        update_hash("task=%s\n" % task)
+
+        for root, dirs, files in os.walk('.', topdown=True):
+            # Sort directories and files to ensure consistent ordering
+            dirs.sort()
+            files.sort()
+
+            for f in files:
+                path = os.path.join(root, f)
+                s = os.lstat(path)
+
+                # Hash file path
+                update_hash(path + '\n')
+
+                # Hash file mode
+                update_hash("\tmode=0x%x\n" % stat.S_IMODE(s.st_mode))
+                update_hash("\ttype=0x%x\n" % stat.S_IFMT(s.st_mode))
+
+                if stat.S_ISBLK(s.st_mode) or stat.S_ISBLK(s.st_mode):
+                    # Hash device major and minor
+                    update_hash("\tdev=%d,%d\n" % (os.major(s.st_rdev), os.minor(s.st_rdev)))
+                elif stat.S_ISLNK(s.st_mode):
+                    # Hash symbolic link
+                    update_hash("\tsymlink=%s\n" % os.readlink(path))
+                else:
+                    fh = hashlib.sha1()
+                    # Hash file contents
+                    with open(path, 'rb') as d:
+                        for chunk in iter(lambda: d.read(4096), b""):
+                            fh.update(chunk)
+                    update_hash("\tdigest=%s\n" % fh.hexdigest())
+    finally:
+        os.chdir(prev_dir)
+
+    return h.hexdigest()
+
+python sstate_report_depid() {
+    report_depid = getattr(bb.parse.siggen, 'report_depid', None)
+
+    if report_depid:
+        ss = sstate_state_fromvars(d)
+        report_depid(os.getcwd(), ss['task'], d)
+}
+
 #
 # Shell function to decompress and prepare a package for installation
 # Will be run from within SSTATE_INSTDIR.
@@ -804,7 +888,7 @@ def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, sq_depid, d, siginfo=
 
         spec, extrapath, tname = getpathcomponents(task, d)
 
-        sstatefile = d.expand("${SSTATE_DIR}/" + extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + extension)
+        sstatefile = d.expand("${SSTATE_DIR}/" + extrapath + generate_sstatefn(spec, sq_depid[task], d) + "_" + tname + extension)
 
         if os.path.exists(sstatefile):
             bb.debug(2, "SState: Found valid sstate file %s" % sstatefile)
@@ -866,7 +950,7 @@ def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, sq_depid, d, siginfo=
             if task in ret:
                 continue
             spec, extrapath, tname = getpathcomponents(task, d)
-            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + extension)
+            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_depid[task], d) + "_" + tname + extension)
             tasklist.append((task, sstatefile))
 
         if tasklist:
@@ -892,12 +976,12 @@ def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, sq_depid, d, siginfo=
         evdata = {'missed': [], 'found': []};
         for task in missed:
             spec, extrapath, tname = getpathcomponents(task, d)
-            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + ".tgz")
-            evdata['missed'].append( (sq_fn[task], sq_task[task], sq_hash[task], sstatefile ) )
+            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_depid[task], d) + "_" + tname + ".tgz")
+            evdata['missed'].append( (sq_fn[task], sq_task[task], sq_depid[task], sstatefile ) )
         for task in ret:
             spec, extrapath, tname = getpathcomponents(task, d)
-            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + ".tgz")
-            evdata['found'].append( (sq_fn[task], sq_task[task], sq_hash[task], sstatefile ) )
+            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_depid[task], d) + "_" + tname + ".tgz")
+            evdata['found'].append( (sq_fn[task], sq_task[task], sq_depid[task], sstatefile ) )
         bb.event.fire(bb.event.MetadataEvent("MissedSstate", evdata), d)
 
     # Print some summary statistics about the current task completion and how much sstate
diff --git a/meta/conf/bitbake.conf b/meta/conf/bitbake.conf
index dcf20078831..7e17c6222eb 100644
--- a/meta/conf/bitbake.conf
+++ b/meta/conf/bitbake.conf
@@ -867,7 +867,9 @@ BB_HASHBASE_WHITELIST ?= "TMPDIR FILE PATH PWD BB_TASKHASH BBPATH BBSERVER DL_DI
     STAMPS_DIR PRSERV_DUMPDIR PRSERV_DUMPFILE PRSERV_LOCKDOWN PARALLEL_MAKE \
     CCACHE_DIR EXTERNAL_TOOLCHAIN CCACHE CCACHE_NOHASHDIR LICENSE_PATH SDKPKGSUFFIX \
     WARN_QA ERROR_QA WORKDIR STAMPCLEAN PKGDATA_DIR BUILD_ARCH SSTATE_PKGARCH \
-    BB_WORKERCONTEXT BB_LIMITEDDEPS extend_recipe_sysroot DEPLOY_DIR"
+    BB_WORKERCONTEXT BB_LIMITEDDEPS BB_DEPID extend_recipe_sysroot DEPLOY_DIR \
+    SSTATE_HASHEQUIV_METHOD SSTATE_HASHEQUIV_SERVER SSTATE_HASHEQUIV_REPORT_TASKDATA \
+    SSTATE_HASHEQUIV_OWNER"
 BB_HASHCONFIG_WHITELIST ?= "${BB_HASHBASE_WHITELIST} DATE TIME SSH_AGENT_PID \
     SSH_AUTH_SOCK PSEUDO_BUILD BB_ENV_EXTRAWHITE DISABLE_SANITY_CHECKS \
     PARALLEL_MAKE BB_NUMBER_THREADS BB_ORIGENV BB_INVALIDCONF BBINCLUDED \
diff --git a/meta/lib/oe/sstatesig.py b/meta/lib/oe/sstatesig.py
index 18c5a353a2a..7f75de3279f 100644
--- a/meta/lib/oe/sstatesig.py
+++ b/meta/lib/oe/sstatesig.py
@@ -263,10 +263,176 @@ class SignatureGeneratorOEBasicHash(bb.siggen.SignatureGeneratorBasicHash):
         if error_msgs:
             bb.fatal("\n".join(error_msgs))
 
+class SignatureGeneratorOEEquivHash(SignatureGeneratorOEBasicHash):
+    name = "OEEquivHash"
+
+    def init_rundepcheck(self, data):
+        super().init_rundepcheck(data)
+        self.server = data.getVar('SSTATE_HASHEQUIV_SERVER')
+        self.method = data.getVar('SSTATE_HASHEQUIV_METHOD')
+        self.depids = bb.persist_data.persist('SSTATESIG_DEPID_CACHE_v1_' + self.method, data)
+
+    def get_taskdata(self):
+        return (self.server, self.method) + super().get_taskdata()
+
+    def set_taskdata(self, data):
+        self.server, self.method = data[:2]
+        super().set_taskdata(data[2:])
+
+    def __get_task_depid_key(self, task):
+        # TODO: The key only *needs* to be the taskhash, the task is just
+        # convenient
+        return '%s:%s' % (task, self.taskhash[task])
+
+    def get_stampfile_hash(self, task):
+        if task in self.taskhash:
+            # If a depid is reported, use it as the stampfile hash. This
+            # ensures that if a task won't be re-run if the taskhash changes,
+            # but it would result in the same output hash
+            depid = self.depids.get(self.__get_task_depid_key(task))
+            if depid is not None:
+                return depid
+
+        return super().get_stampfile_hash(task)
+
+    def get_depid(self, task):
+        import urllib
+        import json
+
+        taskhash = self.taskhash[task]
+
+        key = self.__get_task_depid_key(task)
+
+        # TODO: This cache can grow unbounded. It probably only needs to keep
+        # for each task
+        depid = self.depids.get(key)
+        if depid is not None:
+            return depid
+
+        # In the absence of being able to discover a dependency ID from the
+        # server, make it be equivalent to the taskhash. The dependency ID only
+        # really needs to be a unique string (not even necessarily a hash), but
+        # making it match the taskhash has a few advantages:
+        #
+        # 1) All of the sstate code that assumes hashes can be the same
+        # 2) It provides maximal compatibility with builders that don't use
+        #    an equivalency server
+        # 3) The value is easy for multiple independent builders to derive the
+        #    same depid from the same input. This means that if the independent
+        #    builders find the same taskhash, but it isn't reported to the server,
+        #    there is a better chance that they will agree on the dependency ID.
+        depid = taskhash
+
+        try:
+            url = '%s/v1/equivalent?%s' % (self.server,
+                    urllib.parse.urlencode({'method': self.method, 'taskhash': self.taskhash[task]}))
+
+            request = urllib.request.Request(url)
+            response = urllib.request.urlopen(request)
+            data = response.read().decode('utf-8')
+
+            json_data = json.loads(data)
+
+            if json_data:
+                depid = json_data['depid']
+                # Dependency ID equal to the taskhash is not very interesting,
+                # so it is reported it at debug level 2. If they differ, that
+                # is much more interesting, so it is reported at debug level 1
+                bb.debug((1, 2)[depid == taskhash], 'Found depid %s in place of %s for %s from %s' % (depid, taskhash, task, self.server))
+            else:
+                bb.debug(2, 'No reported dependency ID for %s:%s from %s' % (task, taskhash, self.server))
+        except urllib.error.URLError as e:
+            bb.warn('Failure contacting Hash Equivalence Server %s: %s' % (self.server, str(e)))
+        except (KeyError, json.JSONDecodeError) as e:
+            bb.warn('Poorly formatted response from %s: %s' % (self.server, str(e)))
+
+        self.depids[key] = depid
+        return depid
+
+    def report_depid(self, path, task, d):
+        import urllib
+        import json
+        import tempfile
+        import base64
+
+        taskhash = d.getVar('BB_TASKHASH')
+        depid = d.getVar('BB_DEPID')
+        report_taskdata = d.getVar('SSTATE_HASHEQUIV_REPORT_TASKDATA') == '1'
+        tempdir = d.getVar('T')
+        fn = d.getVar('BB_FILENAME')
+        key = fn + '.do_' + task + ':' + taskhash
+
+        # Sanity checks
+        cache_depid = self.depids.get(key)
+        if cache_depid is None:
+            bb.fatal('%s not in depid cache. Please report this error' % key)
+
+        if cache_depid != depid:
+            bb.fatal("Cache depid %s doesn't match BB_DEPID %s" % (cache_depid, depid))
+
+        sigfile = None
+        sigfile_name = "depsig.do_%s.%d" % (task, os.getpid())
+        sigfile_link = "depsig.do_%s" % task
+
+        try:
+            call = self.method + '(path, sigfile, task, d)'
+            sigfile = open(os.path.join(tempdir, sigfile_name), 'w+b')
+            locs = {'path': path, 'sigfile': sigfile, 'task': task, 'd': d}
+
+            outhash = bb.utils.better_eval(call, locs)
+
+            try:
+                url = '%s/v1/equivalent' % self.server
+                task_data = {
+                    'taskhash': taskhash,
+                    'method': self.method,
+                    'outhash': outhash,
+                    'depid': depid,
+                    'owner': d.getVar('SSTATE_HASHEQUIV_OWNER')
+                    }
+
+                if report_taskdata:
+                    sigfile.seek(0)
+
+                    task_data['PN'] = d.getVar('PN')
+                    task_data['PV'] = d.getVar('PV')
+                    task_data['PR'] = d.getVar('PR')
+                    task_data['task'] = task
+                    task_data['outhash_siginfo'] = sigfile.read().decode('utf-8')
+
+                headers = {'content-type': 'application/json'}
+
+                request = urllib.request.Request(url, json.dumps(task_data).encode('utf-8'), headers)
+                response = urllib.request.urlopen(request)
+                data = response.read().decode('utf-8')
+
+                json_data = json.loads(data)
+                new_depid = json_data['depid']
+
+                if new_depid != depid:
+                    bb.debug(1, 'Task %s depid changed %s -> %s by server %s' % (taskhash, depid, new_depid, self.server))
+                else:
+                    bb.debug(1, 'Reported task %s as depid %s to %s' % (taskhash, depid, self.server))
+            except urllib.error.URLError as e:
+                bb.warn('Failure contacting Hash Equivalence Server %s: %s' % (self.server, str(e)))
+            except (KeyError, json.JSONDecodeError) as e:
+                bb.warn('Poorly formatted response from %s: %s' % (self.server, str(e)))
+        finally:
+            if sigfile:
+                sigfile.close()
+
+                sigfile_link_path = os.path.join(tempdir, sigfile_link)
+                bb.utils.remove(sigfile_link_path)
+
+                try:
+                    os.symlink(sigfile_name, sigfile_link_path)
+                except OSError:
+                    pass
 
 # Insert these classes into siggen's namespace so it can see and select them
 bb.siggen.SignatureGeneratorOEBasic = SignatureGeneratorOEBasic
 bb.siggen.SignatureGeneratorOEBasicHash = SignatureGeneratorOEBasicHash
+bb.siggen.SignatureGeneratorOEEquivHash = SignatureGeneratorOEEquivHash
 
 
 def find_siginfo(pn, taskname, taskhashlist, d):
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [OE-core][PATCH v3 17/17] classes/image-buildinfo: Remove unused argument
  2018-12-04  3:42     ` [PATCH " Joshua Watt
@ 2018-12-04  3:42       ` Joshua Watt
  -1 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-04  3:42 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

Removes the listvars argument to image_buildinfo_outputvars(). It
doesn't appear that this argument ever did anything.

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 meta/classes/image-buildinfo.bbclass | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/meta/classes/image-buildinfo.bbclass b/meta/classes/image-buildinfo.bbclass
index 87a6a1a4893..94c585d4cd9 100644
--- a/meta/classes/image-buildinfo.bbclass
+++ b/meta/classes/image-buildinfo.bbclass
@@ -16,9 +16,8 @@ IMAGE_BUILDINFO_VARS ?= "DISTRO DISTRO_VERSION"
 IMAGE_BUILDINFO_FILE ??= "${sysconfdir}/build"
 
 # From buildhistory.bbclass
-def image_buildinfo_outputvars(vars, listvars, d): 
+def image_buildinfo_outputvars(vars, d):
     vars = vars.split()
-    listvars = listvars.split()
     ret = ""
     for var in vars:
         value = d.getVar(var) or ""
@@ -59,8 +58,7 @@ def buildinfo_target(d):
                 return ""
         # Single and list variables to be read
         vars = (d.getVar("IMAGE_BUILDINFO_VARS") or "")
-        listvars = (d.getVar("IMAGE_BUILDINFO_LVARS") or "")
-        return image_buildinfo_outputvars(vars, listvars, d)
+        return image_buildinfo_outputvars(vars, d)
 
 # Write build information to target filesystem
 python buildinfo () {
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [PATCH v3 17/17] classes/image-buildinfo: Remove unused argument
@ 2018-12-04  3:42       ` Joshua Watt
  0 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-04  3:42 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

Removes the listvars argument to image_buildinfo_outputvars(). It
doesn't appear that this argument ever did anything.

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 meta/classes/image-buildinfo.bbclass | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/meta/classes/image-buildinfo.bbclass b/meta/classes/image-buildinfo.bbclass
index 87a6a1a4893..94c585d4cd9 100644
--- a/meta/classes/image-buildinfo.bbclass
+++ b/meta/classes/image-buildinfo.bbclass
@@ -16,9 +16,8 @@ IMAGE_BUILDINFO_VARS ?= "DISTRO DISTRO_VERSION"
 IMAGE_BUILDINFO_FILE ??= "${sysconfdir}/build"
 
 # From buildhistory.bbclass
-def image_buildinfo_outputvars(vars, listvars, d): 
+def image_buildinfo_outputvars(vars, d):
     vars = vars.split()
-    listvars = listvars.split()
     ret = ""
     for var in vars:
         value = d.getVar(var) or ""
@@ -59,8 +58,7 @@ def buildinfo_target(d):
                 return ""
         # Single and list variables to be read
         vars = (d.getVar("IMAGE_BUILDINFO_VARS") or "")
-        listvars = (d.getVar("IMAGE_BUILDINFO_LVARS") or "")
-        return image_buildinfo_outputvars(vars, listvars, d)
+        return image_buildinfo_outputvars(vars, d)
 
 # Write build information to target filesystem
 python buildinfo () {
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* ✗ patchtest: failure for Hash Equivalency Server
  2018-08-09 22:08 ` [RFC v2 00/16] Hash Equivalency Server Joshua Watt
                     ` (16 preceding siblings ...)
  2018-12-04  3:42     ` [PATCH " Joshua Watt
@ 2018-12-04  4:05   ` Patchwork
  17 siblings, 0 replies; 158+ messages in thread
From: Patchwork @ 2018-12-04  4:05 UTC (permalink / raw)
  To: Joshua Watt; +Cc: openembedded-core

== Series Details ==

Series: Hash Equivalency Server
Revision: 1
URL   : https://patchwork.openembedded.org/series/15190/
State : failure

== Summary ==


Thank you for submitting this patch series to OpenEmbedded Core. This is
an automated response. Several tests have been executed on the proposed
series by patchtest resulting in the following failures:



* Issue             Series sent to the wrong mailing list or some patches from the series correspond to different mailing lists [test_target_mailing_list] 
  Suggested fix    Send the series again to the correct mailing list (ML)
  Suggested ML     bitbake-devel@lists.openembedded.org [http://git.openembedded.org/bitbake/]
  Patch's path:    bitbake/bin/bitbake-worker

* Issue             Series does not apply on top of target branch [test_series_merge_on_head] 
  Suggested fix    Rebase your series on top of targeted branch
  Targeted branch  master (currently at 9f5f4b31df)



If you believe any of these test results are incorrect, please reply to the
mailing list (openembedded-core@lists.openembedded.org) raising your concerns.
Otherwise we would appreciate you correcting the issues and submitting a new
version of the patchset if applicable. Please ensure you add/increment the
version number when sending the new version (i.e. [PATCH] -> [PATCH v2] ->
[PATCH v3] -> ...).

---
Guidelines:     https://www.openembedded.org/wiki/Commit_Patch_Message_Guidelines
Test framework: http://git.yoctoproject.org/cgit/cgit.cgi/patchtest
Test suite:     http://git.yoctoproject.org/cgit/cgit.cgi/patchtest-oe



^ permalink raw reply	[flat|nested] 158+ messages in thread

* Re: [OE-core][PATCH v3 10/17] bitbake: siggen: Split out task depend ID
  2018-12-04  3:42       ` [PATCH " Joshua Watt
@ 2018-12-05 22:50         ` Richard Purdie
  -1 siblings, 0 replies; 158+ messages in thread
From: Richard Purdie @ 2018-12-05 22:50 UTC (permalink / raw)
  To: Joshua Watt, openembedded-core, bitbake-devel

On Mon, 2018-12-03 at 21:42 -0600, Joshua Watt wrote:
> Abstracts the function to get the dependency ID for a task so it can
> return something other that the taskhash
> 
> [YOCTO #13030]
> 
> Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
> ---
>  bitbake/lib/bb/siggen.py | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/bitbake/lib/bb/siggen.py b/bitbake/lib/bb/siggen.py
> index ab6df7603c8..2daca70538a 100644
> --- a/bitbake/lib/bb/siggen.py
> +++ b/bitbake/lib/bb/siggen.py
> @@ -41,6 +41,9 @@ class SignatureGenerator(object):
>      def finalise(self, fn, d, varient):
>          return
>  
> +    def get_depid(self, task):
> +        return self.taskhash[task]
> +
>      def get_taskhash(self, fn, task, deps, dataCache):
>          return "0"

I spent a while wondering why we still had "IDs" in the runqueue code
when I thought I'd removed them all. Once I'd gotten over that I
somehow thought this related to the task's dependencies and then how
could it only have one of them?

I therefore suspect calling this "depid" is going to be confusing and
we need a better name for it. I'm wondering about taskresid?
taskresolvid? taskresolvedid? taskreshash?

I appreciate why you're calling it an 'ID', hash may be clearer
thought, not sure...

Cheers,

Richard












^ permalink raw reply	[flat|nested] 158+ messages in thread

* Re: [bitbake-devel] [PATCH v3 10/17] bitbake: siggen: Split out task depend ID
@ 2018-12-05 22:50         ` Richard Purdie
  0 siblings, 0 replies; 158+ messages in thread
From: Richard Purdie @ 2018-12-05 22:50 UTC (permalink / raw)
  To: Joshua Watt, openembedded-core, bitbake-devel

On Mon, 2018-12-03 at 21:42 -0600, Joshua Watt wrote:
> Abstracts the function to get the dependency ID for a task so it can
> return something other that the taskhash
> 
> [YOCTO #13030]
> 
> Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
> ---
>  bitbake/lib/bb/siggen.py | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/bitbake/lib/bb/siggen.py b/bitbake/lib/bb/siggen.py
> index ab6df7603c8..2daca70538a 100644
> --- a/bitbake/lib/bb/siggen.py
> +++ b/bitbake/lib/bb/siggen.py
> @@ -41,6 +41,9 @@ class SignatureGenerator(object):
>      def finalise(self, fn, d, varient):
>          return
>  
> +    def get_depid(self, task):
> +        return self.taskhash[task]
> +
>      def get_taskhash(self, fn, task, deps, dataCache):
>          return "0"

I spent a while wondering why we still had "IDs" in the runqueue code
when I thought I'd removed them all. Once I'd gotten over that I
somehow thought this related to the task's dependencies and then how
could it only have one of them?

I therefore suspect calling this "depid" is going to be confusing and
we need a better name for it. I'm wondering about taskresid?
taskresolvid? taskresolvedid? taskreshash?

I appreciate why you're calling it an 'ID', hash may be clearer
thought, not sure...

Cheers,

Richard












^ permalink raw reply	[flat|nested] 158+ messages in thread

* Re: [OE-core][PATCH v3 13/17] bitbake: runqueue: Pass dependency ID to hash validate
  2018-12-04  3:42       ` [PATCH " Joshua Watt
@ 2018-12-05 22:52         ` Richard Purdie
  -1 siblings, 0 replies; 158+ messages in thread
From: Richard Purdie @ 2018-12-05 22:52 UTC (permalink / raw)
  To: Joshua Watt, openembedded-core, bitbake-devel

On Mon, 2018-12-03 at 21:42 -0600, Joshua Watt wrote:
> If the dependency ID is being used to track task dependencies, the
> hash
> validation function needs to know about it in order to properly
> validate
> the hash.
> 
> TODO: This currently isn't going to be backward compatible with older
> hashvalidate functions. Is that necessary, and if so are there any
> suggestions for a good approach?

That is necessary or bitbake and the metadata need to be in lock step.

@@ -1567,15 +1568,16 @@ class RunQueue:
>              sq_fn.append(fn)
>              sq_hashfn.append(self.rqdata.dataCaches[mc].hashfn[taskfn])
>              sq_hash.append(self.rqdata.runtaskentries[tid].hash)
> +            sq_depid.append(self.rqdata.runtaskentries[tid].depid)
>              sq_taskname.append(taskname)
>              sq_task.append(tid)
> -        locs = { "sq_fn" : sq_fn, "sq_task" : sq_taskname, "sq_hash" : sq_hash, "sq_hashfn" : sq_hashfn, "d" : self.cooker.data }
> +        locs = { "sq_fn" : sq_fn, "sq_task" : sq_taskname, "sq_hash" : sq_hash, "sq_hashfn" : sq_hashfn, "sq_depid" : sq_depid, "d" : self.cooker.data }
>          try:
> -            call = self.hashvalidate + "(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=True)"
> +            call = self.hashvalidate + "(sq_fn, sq_task, sq_hash, sq_hashfn, sq_depid, d, siginfo=True)"
>              valid = bb.utils.better_eval(call, locs)
>          # Handle version with no siginfo parameter
>          except TypeError:
> -            call = self.hashvalidate + "(sq_fn, sq_task, sq_hash, sq_hashfn, d)"
> +            call = self.hashvalidate + "(sq_fn, sq_task, sq_hash, sq_hashfn, sq_depid, d)"
>              valid = bb.utils.better_eval(call, locs)
>          for v in valid:
> 

The hint on how we've handled this before is in the code, the TypeError
was the fallback code for older metadata (which you just broke!). We
may need to do something similar here.

Cheers,

Richard



^ permalink raw reply	[flat|nested] 158+ messages in thread

* Re: [bitbake-devel] [PATCH v3 13/17] bitbake: runqueue: Pass dependency ID to hash validate
@ 2018-12-05 22:52         ` Richard Purdie
  0 siblings, 0 replies; 158+ messages in thread
From: Richard Purdie @ 2018-12-05 22:52 UTC (permalink / raw)
  To: Joshua Watt, openembedded-core, bitbake-devel

On Mon, 2018-12-03 at 21:42 -0600, Joshua Watt wrote:
> If the dependency ID is being used to track task dependencies, the
> hash
> validation function needs to know about it in order to properly
> validate
> the hash.
> 
> TODO: This currently isn't going to be backward compatible with older
> hashvalidate functions. Is that necessary, and if so are there any
> suggestions for a good approach?

That is necessary or bitbake and the metadata need to be in lock step.

@@ -1567,15 +1568,16 @@ class RunQueue:
>              sq_fn.append(fn)
>              sq_hashfn.append(self.rqdata.dataCaches[mc].hashfn[taskfn])
>              sq_hash.append(self.rqdata.runtaskentries[tid].hash)
> +            sq_depid.append(self.rqdata.runtaskentries[tid].depid)
>              sq_taskname.append(taskname)
>              sq_task.append(tid)
> -        locs = { "sq_fn" : sq_fn, "sq_task" : sq_taskname, "sq_hash" : sq_hash, "sq_hashfn" : sq_hashfn, "d" : self.cooker.data }
> +        locs = { "sq_fn" : sq_fn, "sq_task" : sq_taskname, "sq_hash" : sq_hash, "sq_hashfn" : sq_hashfn, "sq_depid" : sq_depid, "d" : self.cooker.data }
>          try:
> -            call = self.hashvalidate + "(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=True)"
> +            call = self.hashvalidate + "(sq_fn, sq_task, sq_hash, sq_hashfn, sq_depid, d, siginfo=True)"
>              valid = bb.utils.better_eval(call, locs)
>          # Handle version with no siginfo parameter
>          except TypeError:
> -            call = self.hashvalidate + "(sq_fn, sq_task, sq_hash, sq_hashfn, d)"
> +            call = self.hashvalidate + "(sq_fn, sq_task, sq_hash, sq_hashfn, sq_depid, d)"
>              valid = bb.utils.better_eval(call, locs)
>          for v in valid:
> 

The hint on how we've handled this before is in the code, the TypeError
was the fallback code for older metadata (which you just broke!). We
may need to do something similar here.

Cheers,

Richard



^ permalink raw reply	[flat|nested] 158+ messages in thread

* Re: [OE-core][PATCH v3 10/17] bitbake: siggen: Split out task depend ID
  2018-12-05 22:50         ` [bitbake-devel] [PATCH " Richard Purdie
@ 2018-12-06 14:58           ` Joshua Watt
  -1 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-06 14:58 UTC (permalink / raw)
  To: Richard Purdie, openembedded-core, bitbake-devel

On Wed, 2018-12-05 at 22:50 +0000, Richard Purdie wrote:
> On Mon, 2018-12-03 at 21:42 -0600, Joshua Watt wrote:
> > Abstracts the function to get the dependency ID for a task so it
> > can
> > return something other that the taskhash
> > 
> > [YOCTO #13030]
> > 
> > Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
> > ---
> >  bitbake/lib/bb/siggen.py | 7 +++++--
> >  1 file changed, 5 insertions(+), 2 deletions(-)
> > 
> > diff --git a/bitbake/lib/bb/siggen.py b/bitbake/lib/bb/siggen.py
> > index ab6df7603c8..2daca70538a 100644
> > --- a/bitbake/lib/bb/siggen.py
> > +++ b/bitbake/lib/bb/siggen.py
> > @@ -41,6 +41,9 @@ class SignatureGenerator(object):
> >      def finalise(self, fn, d, varient):
> >          return
> >  
> > +    def get_depid(self, task):
> > +        return self.taskhash[task]
> > +
> >      def get_taskhash(self, fn, task, deps, dataCache):
> >          return "0"
> 
> I spent a while wondering why we still had "IDs" in the runqueue code
> when I thought I'd removed them all. Once I'd gotten over that I

I suppose I'm not familiar enough with bitbake's history to have
understood that these "IDs" were problematic in the past :)

> somehow thought this related to the task's dependencies and then how
> could it only have one of them?
> 
> I therefore suspect calling this "depid" is going to be confusing and
> we need a better name for it. I'm wondering about taskresid?
> taskresolvid? taskresolvedid? taskreshash?
> 
> I appreciate why you're calling it an 'ID', hash may be clearer
> thought, not sure...

Ya, I struggled with the naming. There is no reason it has to be a
hash, so I went with a (probably too generic) "ID"... in practice using
a hash is reasonable so I don't have a problem using "hash" in the
name. My runner up names were "dephash" or "taskdephash". I think
indicating that it is involved in the dependency calculations is
important, although I can see how it might be some confusion about
where it is involved. I'm not too keen on the "resolved" names... but
maybe I'm missing where the are stemming from?

Another option if you want to go more of the graph theory approach
might be "taskedgehash", "edgehash", "taskedgeid", "edgeid", etc.

> 
> Cheers,
> 
> Richard
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
-- 
Joshua Watt <JPEWhacker@gmail.com>



^ permalink raw reply	[flat|nested] 158+ messages in thread

* Re: [bitbake-devel] [PATCH v3 10/17] bitbake: siggen: Split out task depend ID
@ 2018-12-06 14:58           ` Joshua Watt
  0 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-06 14:58 UTC (permalink / raw)
  To: Richard Purdie, openembedded-core, bitbake-devel

On Wed, 2018-12-05 at 22:50 +0000, Richard Purdie wrote:
> On Mon, 2018-12-03 at 21:42 -0600, Joshua Watt wrote:
> > Abstracts the function to get the dependency ID for a task so it
> > can
> > return something other that the taskhash
> > 
> > [YOCTO #13030]
> > 
> > Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
> > ---
> >  bitbake/lib/bb/siggen.py | 7 +++++--
> >  1 file changed, 5 insertions(+), 2 deletions(-)
> > 
> > diff --git a/bitbake/lib/bb/siggen.py b/bitbake/lib/bb/siggen.py
> > index ab6df7603c8..2daca70538a 100644
> > --- a/bitbake/lib/bb/siggen.py
> > +++ b/bitbake/lib/bb/siggen.py
> > @@ -41,6 +41,9 @@ class SignatureGenerator(object):
> >      def finalise(self, fn, d, varient):
> >          return
> >  
> > +    def get_depid(self, task):
> > +        return self.taskhash[task]
> > +
> >      def get_taskhash(self, fn, task, deps, dataCache):
> >          return "0"
> 
> I spent a while wondering why we still had "IDs" in the runqueue code
> when I thought I'd removed them all. Once I'd gotten over that I

I suppose I'm not familiar enough with bitbake's history to have
understood that these "IDs" were problematic in the past :)

> somehow thought this related to the task's dependencies and then how
> could it only have one of them?
> 
> I therefore suspect calling this "depid" is going to be confusing and
> we need a better name for it. I'm wondering about taskresid?
> taskresolvid? taskresolvedid? taskreshash?
> 
> I appreciate why you're calling it an 'ID', hash may be clearer
> thought, not sure...

Ya, I struggled with the naming. There is no reason it has to be a
hash, so I went with a (probably too generic) "ID"... in practice using
a hash is reasonable so I don't have a problem using "hash" in the
name. My runner up names were "dephash" or "taskdephash". I think
indicating that it is involved in the dependency calculations is
important, although I can see how it might be some confusion about
where it is involved. I'm not too keen on the "resolved" names... but
maybe I'm missing where the are stemming from?

Another option if you want to go more of the graph theory approach
might be "taskedgehash", "edgehash", "taskedgeid", "edgeid", etc.

> 
> Cheers,
> 
> Richard
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
-- 
Joshua Watt <JPEWhacker@gmail.com>



^ permalink raw reply	[flat|nested] 158+ messages in thread

* [OE-core][PATCH v4 00/10] Hash Equivalency Server
  2018-12-04  3:42     ` [PATCH " Joshua Watt
@ 2018-12-18 15:30       ` Joshua Watt
  -1 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-18 15:30 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

Apologies for cross-posting this to both the bitbake-devel and
openembedded-devel; this work necessarily intertwines both places, and
it is really necessary to look at both parts to get an idea of what is
going on. For convenience, the bitbake patches are listed first,
followed by the oe-core patches.

The basic premise is that any given task no longer hashes a dependent
task's taskhash to determine it's own taskhash, but instead hashes the
dependent task's "unique hash" (which doesn't strictly need to be a
hash, but is for consistency.  This allows multiple taskhashes to map to
the same unique hash, meaning that trivial changes to a recipe that
would change the taskhash don't necessarily need to change the unique
hash, and thus don't need to cause downstream tasks to be rebuilt (with
caveats, see below).

In the absence of any interaction by the user, the unique hash for a
task is just that task's taskhash, which effectively maintains the
current behavior. However, if the user enables the "OEEquivHash"
signature generator, they can direct it to look at a hash equivalency
server (of which a reference implementation is provided). The sstate
code will provide the server with an output hash that it calculates, and
the server will record all tasks with the same output hash as
"equivalent" and report the same unique hash for them when requested.
When initializing tasks, bitbake can ask the server about the unique
hash for new tasks it has never seen before and potentially skip
rebuilding, or restore the task from an equivalent sstate file. To
facilitate restoring tasks from sstate, sstate objects are now named
based on the tasks unique hash instead of the taskhash (which, again has
no effect if the server is in use).

This patchset doesn't make any attempt to dynamically update task unique
hash after bitbake initializes the tasks, and as such there are some
cases where this isn't accelerating the build as much as it possibly
could. I think it will be possible to add support for this, but this
preliminary support needs to come first.

You can also see these patches (and my first attempts at dynamic task
re-hashing) on the "jpew/hash-equivalence" branch in poky-contrib.

As always, thanks for your feedback and time

VERSION 2:

At the core, this patch does the same thing as V1 with some very minor
tweaks. The main things that have changed are:
 1) Per request, the Hash Equivalence Server reference implementation is
    now based entirely on built in Python modules and requires no
    external libraries. It also has a wrapper script to launch it
    (bitbake-hashserv) and unittests.
 2) There is a major rework of persist_data in bitbake. I
    think these patches could be submitted independently, but I doubt
    anyone is clamoring for them. The general gist of them is that there
    were a lot of strange edge cases that I found when using
    persist_data as an IPC mechanism between the main bitbake process
    and the bitbake-worker processes. I went ahead and added extensive
    unit tests for this as well.

VERSION 3:

Minor tweak to version 2 that should fix timeout errors seen on the
autobuilder

VERSION 4:

Based on discussion, the term "dependency ID" was dropped in favor of
"unique hash" (unihash).

The hash validation checks were updated to properly fallback to the old
function signatures (that don't pass the unihashes) for compatibility
with older implementations.

Joshua Watt (10):
  bitbake: fork: Add os.fork() wrappers
  bitbake: persist_data: Close databases across fork
  bitbake: tests/persist_data: Add tests
  bitbake: siggen: Split out task unique hash
  bitbake: runqueue: Track task unique hash
  bitbake: runqueue: Pass unique hash to task
  bitbake: runqueue: Pass unique hash to hash validate
  classes/sstate: Handle unihash in hash check
  bitbake: hashserv: Add hash equivalence reference server
  sstate: Implement hash equivalence sstate

 bitbake/bin/bitbake-hashserv         |  67 ++++++++++
 bitbake/bin/bitbake-selftest         |   3 +
 bitbake/bin/bitbake-worker           |   9 +-
 bitbake/lib/bb/fork.py               |  73 +++++++++++
 bitbake/lib/bb/persist_data.py       |  32 ++++-
 bitbake/lib/bb/runqueue.py           |  73 +++++++----
 bitbake/lib/bb/siggen.py             |   7 +-
 bitbake/lib/bb/tests/persist_data.py | 188 +++++++++++++++++++++++++++
 bitbake/lib/hashserv/__init__.py     | 152 ++++++++++++++++++++++
 bitbake/lib/hashserv/tests.py        | 141 ++++++++++++++++++++
 meta/classes/sstate.bbclass          | 102 +++++++++++++--
 meta/conf/bitbake.conf               |   4 +-
 meta/lib/oe/sstatesig.py             | 167 ++++++++++++++++++++++++
 13 files changed, 978 insertions(+), 40 deletions(-)
 create mode 100755 bitbake/bin/bitbake-hashserv
 create mode 100644 bitbake/lib/bb/fork.py
 create mode 100644 bitbake/lib/bb/tests/persist_data.py
 create mode 100644 bitbake/lib/hashserv/__init__.py
 create mode 100644 bitbake/lib/hashserv/tests.py

-- 
2.19.2



^ permalink raw reply	[flat|nested] 158+ messages in thread

* [PATCH v4 00/10] Hash Equivalency Server
@ 2018-12-18 15:30       ` Joshua Watt
  0 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-18 15:30 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

Apologies for cross-posting this to both the bitbake-devel and
openembedded-devel; this work necessarily intertwines both places, and
it is really necessary to look at both parts to get an idea of what is
going on. For convenience, the bitbake patches are listed first,
followed by the oe-core patches.

The basic premise is that any given task no longer hashes a dependent
task's taskhash to determine it's own taskhash, but instead hashes the
dependent task's "unique hash" (which doesn't strictly need to be a
hash, but is for consistency.  This allows multiple taskhashes to map to
the same unique hash, meaning that trivial changes to a recipe that
would change the taskhash don't necessarily need to change the unique
hash, and thus don't need to cause downstream tasks to be rebuilt (with
caveats, see below).

In the absence of any interaction by the user, the unique hash for a
task is just that task's taskhash, which effectively maintains the
current behavior. However, if the user enables the "OEEquivHash"
signature generator, they can direct it to look at a hash equivalency
server (of which a reference implementation is provided). The sstate
code will provide the server with an output hash that it calculates, and
the server will record all tasks with the same output hash as
"equivalent" and report the same unique hash for them when requested.
When initializing tasks, bitbake can ask the server about the unique
hash for new tasks it has never seen before and potentially skip
rebuilding, or restore the task from an equivalent sstate file. To
facilitate restoring tasks from sstate, sstate objects are now named
based on the tasks unique hash instead of the taskhash (which, again has
no effect if the server is in use).

This patchset doesn't make any attempt to dynamically update task unique
hash after bitbake initializes the tasks, and as such there are some
cases where this isn't accelerating the build as much as it possibly
could. I think it will be possible to add support for this, but this
preliminary support needs to come first.

You can also see these patches (and my first attempts at dynamic task
re-hashing) on the "jpew/hash-equivalence" branch in poky-contrib.

As always, thanks for your feedback and time

VERSION 2:

At the core, this patch does the same thing as V1 with some very minor
tweaks. The main things that have changed are:
 1) Per request, the Hash Equivalence Server reference implementation is
    now based entirely on built in Python modules and requires no
    external libraries. It also has a wrapper script to launch it
    (bitbake-hashserv) and unittests.
 2) There is a major rework of persist_data in bitbake. I
    think these patches could be submitted independently, but I doubt
    anyone is clamoring for them. The general gist of them is that there
    were a lot of strange edge cases that I found when using
    persist_data as an IPC mechanism between the main bitbake process
    and the bitbake-worker processes. I went ahead and added extensive
    unit tests for this as well.

VERSION 3:

Minor tweak to version 2 that should fix timeout errors seen on the
autobuilder

VERSION 4:

Based on discussion, the term "dependency ID" was dropped in favor of
"unique hash" (unihash).

The hash validation checks were updated to properly fallback to the old
function signatures (that don't pass the unihashes) for compatibility
with older implementations.

Joshua Watt (10):
  bitbake: fork: Add os.fork() wrappers
  bitbake: persist_data: Close databases across fork
  bitbake: tests/persist_data: Add tests
  bitbake: siggen: Split out task unique hash
  bitbake: runqueue: Track task unique hash
  bitbake: runqueue: Pass unique hash to task
  bitbake: runqueue: Pass unique hash to hash validate
  classes/sstate: Handle unihash in hash check
  bitbake: hashserv: Add hash equivalence reference server
  sstate: Implement hash equivalence sstate

 bitbake/bin/bitbake-hashserv         |  67 ++++++++++
 bitbake/bin/bitbake-selftest         |   3 +
 bitbake/bin/bitbake-worker           |   9 +-
 bitbake/lib/bb/fork.py               |  73 +++++++++++
 bitbake/lib/bb/persist_data.py       |  32 ++++-
 bitbake/lib/bb/runqueue.py           |  73 +++++++----
 bitbake/lib/bb/siggen.py             |   7 +-
 bitbake/lib/bb/tests/persist_data.py | 188 +++++++++++++++++++++++++++
 bitbake/lib/hashserv/__init__.py     | 152 ++++++++++++++++++++++
 bitbake/lib/hashserv/tests.py        | 141 ++++++++++++++++++++
 meta/classes/sstate.bbclass          | 102 +++++++++++++--
 meta/conf/bitbake.conf               |   4 +-
 meta/lib/oe/sstatesig.py             | 167 ++++++++++++++++++++++++
 13 files changed, 978 insertions(+), 40 deletions(-)
 create mode 100755 bitbake/bin/bitbake-hashserv
 create mode 100644 bitbake/lib/bb/fork.py
 create mode 100644 bitbake/lib/bb/tests/persist_data.py
 create mode 100644 bitbake/lib/hashserv/__init__.py
 create mode 100644 bitbake/lib/hashserv/tests.py

-- 
2.19.2



^ permalink raw reply	[flat|nested] 158+ messages in thread

* [OE-core][PATCH v4 01/10] bitbake: fork: Add os.fork() wrappers
  2018-12-18 15:30       ` [PATCH " Joshua Watt
@ 2018-12-18 15:30         ` Joshua Watt
  -1 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-18 15:30 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

Adds a compatibility wrapper around os.fork() that backports the ability
to register fork event handlers (os.register_at_fork()) from Python 3.7

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/bin/bitbake-worker |  2 +-
 bitbake/lib/bb/fork.py     | 73 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 74 insertions(+), 1 deletion(-)
 create mode 100644 bitbake/lib/bb/fork.py

diff --git a/bitbake/bin/bitbake-worker b/bitbake/bin/bitbake-worker
index cd687e6e433..41ef6d848ac 100755
--- a/bitbake/bin/bitbake-worker
+++ b/bitbake/bin/bitbake-worker
@@ -181,7 +181,7 @@ def fork_off_task(cfg, data, databuilder, workerdata, fn, task, taskname, taskha
         pipein, pipeout = os.pipe()
         pipein = os.fdopen(pipein, 'rb', 4096)
         pipeout = os.fdopen(pipeout, 'wb', 0)
-        pid = os.fork()
+        pid = bb.fork.fork()
     except OSError as e:
         logger.critical("fork failed: %d (%s)" % (e.errno, e.strerror))
         sys.exit(1)
diff --git a/bitbake/lib/bb/fork.py b/bitbake/lib/bb/fork.py
new file mode 100644
index 00000000000..2b2b0b73b62
--- /dev/null
+++ b/bitbake/lib/bb/fork.py
@@ -0,0 +1,73 @@
+# ex:ts=4:sw=4:sts=4:et
+# -*- tab-width: 4; c-basic-offset: 4; indent-tabs-mode: nil -*-
+#
+# Copyright (C) 2018 Garmin Ltd.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License version 2 as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License along
+# with this program; if not, write to the Free Software Foundation, Inc.,
+# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+#
+"""
+Python wrappers for os.fork() that allow the insertion of callbacks for fork events.
+This is designed to exacmimic os.register_at_fork() available in Python 3.7 with the
+intent that it can be removed when that version becomes standard
+"""
+
+import sys
+import os
+
+before_calls = []
+after_in_parent_calls = []
+after_in_child_calls = []
+
+def _do_calls(l, reverse=False):
+    # Make a copy in case the list is modified in the callback
+    copy = l[:]
+    if reverse:
+        copy = reversed(copy)
+
+    for f in copy:
+        # All exception in calls are ignored
+        try:
+            f()
+        except:
+            pass
+
+def fork():
+    if sys.hexversion >= 0x030700F0:
+        return os.fork()
+
+    _do_calls(before_calls, reverse=True)
+
+    ret = os.fork()
+    if ret == 0:
+        _do_calls(after_in_child_calls)
+    else:
+        _do_calls(after_in_parent_calls)
+    return ret
+
+def register_at_fork(**kwargs):
+    def add_arg_to_list(name, lst):
+        if name in kwargs:
+            arg = kwargs[name]
+            if not callable(arg):
+                raise TypeError("'%s' must be callable, not %s" % (name, type(arg)))
+            lst.append(arg)
+
+    if sys.hexversion >= 0x030700F0:
+        os.register_at_fork(**kwargs)
+        return
+
+    add_arg_to_list('before', before_calls)
+    add_arg_to_list('after_in_parent', after_in_parent_calls)
+    add_arg_to_list('after_in_child', after_in_child_calls)
+
-- 
2.19.2



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [PATCH v4 01/10] bitbake: fork: Add os.fork() wrappers
@ 2018-12-18 15:30         ` Joshua Watt
  0 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-18 15:30 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

Adds a compatibility wrapper around os.fork() that backports the ability
to register fork event handlers (os.register_at_fork()) from Python 3.7

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/bin/bitbake-worker |  2 +-
 bitbake/lib/bb/fork.py     | 73 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 74 insertions(+), 1 deletion(-)
 create mode 100644 bitbake/lib/bb/fork.py

diff --git a/bitbake/bin/bitbake-worker b/bitbake/bin/bitbake-worker
index cd687e6e433..41ef6d848ac 100755
--- a/bitbake/bin/bitbake-worker
+++ b/bitbake/bin/bitbake-worker
@@ -181,7 +181,7 @@ def fork_off_task(cfg, data, databuilder, workerdata, fn, task, taskname, taskha
         pipein, pipeout = os.pipe()
         pipein = os.fdopen(pipein, 'rb', 4096)
         pipeout = os.fdopen(pipeout, 'wb', 0)
-        pid = os.fork()
+        pid = bb.fork.fork()
     except OSError as e:
         logger.critical("fork failed: %d (%s)" % (e.errno, e.strerror))
         sys.exit(1)
diff --git a/bitbake/lib/bb/fork.py b/bitbake/lib/bb/fork.py
new file mode 100644
index 00000000000..2b2b0b73b62
--- /dev/null
+++ b/bitbake/lib/bb/fork.py
@@ -0,0 +1,73 @@
+# ex:ts=4:sw=4:sts=4:et
+# -*- tab-width: 4; c-basic-offset: 4; indent-tabs-mode: nil -*-
+#
+# Copyright (C) 2018 Garmin Ltd.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License version 2 as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License along
+# with this program; if not, write to the Free Software Foundation, Inc.,
+# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+#
+"""
+Python wrappers for os.fork() that allow the insertion of callbacks for fork events.
+This is designed to exacmimic os.register_at_fork() available in Python 3.7 with the
+intent that it can be removed when that version becomes standard
+"""
+
+import sys
+import os
+
+before_calls = []
+after_in_parent_calls = []
+after_in_child_calls = []
+
+def _do_calls(l, reverse=False):
+    # Make a copy in case the list is modified in the callback
+    copy = l[:]
+    if reverse:
+        copy = reversed(copy)
+
+    for f in copy:
+        # All exception in calls are ignored
+        try:
+            f()
+        except:
+            pass
+
+def fork():
+    if sys.hexversion >= 0x030700F0:
+        return os.fork()
+
+    _do_calls(before_calls, reverse=True)
+
+    ret = os.fork()
+    if ret == 0:
+        _do_calls(after_in_child_calls)
+    else:
+        _do_calls(after_in_parent_calls)
+    return ret
+
+def register_at_fork(**kwargs):
+    def add_arg_to_list(name, lst):
+        if name in kwargs:
+            arg = kwargs[name]
+            if not callable(arg):
+                raise TypeError("'%s' must be callable, not %s" % (name, type(arg)))
+            lst.append(arg)
+
+    if sys.hexversion >= 0x030700F0:
+        os.register_at_fork(**kwargs)
+        return
+
+    add_arg_to_list('before', before_calls)
+    add_arg_to_list('after_in_parent', after_in_parent_calls)
+    add_arg_to_list('after_in_child', after_in_child_calls)
+
-- 
2.19.2



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [OE-core][PATCH v4 02/10] bitbake: persist_data: Close databases across fork
  2018-12-18 15:30       ` [PATCH " Joshua Watt
@ 2018-12-18 15:30         ` Joshua Watt
  -1 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-18 15:30 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

sqlite gets really angry if a database is open across a fork() call,
and will give all sorts of messages ranging from I/O errors to database
corruption errors. To deal with this, close all database connections
before forking, and reopen them (lazily) on the other side.

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/lib/bb/persist_data.py | 32 +++++++++++++++++++++++++++++++-
 1 file changed, 31 insertions(+), 1 deletion(-)

diff --git a/bitbake/lib/bb/persist_data.py b/bitbake/lib/bb/persist_data.py
index 4468facd18f..27ffb1ddaa4 100644
--- a/bitbake/lib/bb/persist_data.py
+++ b/bitbake/lib/bb/persist_data.py
@@ -30,6 +30,8 @@ from bb.compat import total_ordering
 from collections import Mapping
 import sqlite3
 import contextlib
+import bb.fork
+import weakref
 
 sqlversion = sqlite3.sqlite_version_info
 if sqlversion[0] < 3 or (sqlversion[0] == 3 and sqlversion[1] < 3):
@@ -38,6 +40,28 @@ if sqlversion[0] < 3 or (sqlversion[0] == 3 and sqlversion[1] < 3):
 
 logger = logging.getLogger("BitBake.PersistData")
 
+# Carrying an open database connection across a fork() confuses sqlite and
+# results in fun errors like 'database disk image is malformed'.
+# To remedy this, close all connections before forking, then they will be
+# (lazily) reopen them on the other side. This will cause a lot of problems if
+# there are threads running and trying to access the database at the same time,
+# but if you are mixing threads and fork() you have no one to blame but
+# yourself. If that is discovered to be a problem in the future, some sort of
+# per-table reader-writer lock could be used to block the fork() until all
+# pending transactions complete
+sql_table_weakrefs = []
+def _fork_before_handler():
+    for ref in sql_table_weakrefs:
+        t = ref()
+        if t is not None and t.connection is not None:
+            t.connection.close()
+            t.connection = None
+
+bb.fork.register_at_fork(before=_fork_before_handler)
+
+def _remove_table_weakref(ref):
+    sql_table_weakrefs.remove(ref)
+
 @total_ordering
 class SQLTable(collections.MutableMapping):
     class _Decorators(object):
@@ -305,4 +329,10 @@ def persist(domain, d):
 
     bb.utils.mkdirhier(cachedir)
     cachefile = os.path.join(cachedir, "bb_persist_data.sqlite3")
-    return SQLTable(cachefile, domain)
+    t = SQLTable(cachefile, domain)
+
+    # Add a weak reference to the table list. The weak reference will not keep
+    # the object alive by itself, so it prevents circular reference counts
+    sql_table_weakrefs.append(weakref.ref(t, _remove_table_weakref))
+
+    return t
-- 
2.19.2



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [PATCH v4 02/10] bitbake: persist_data: Close databases across fork
@ 2018-12-18 15:30         ` Joshua Watt
  0 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-18 15:30 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

sqlite gets really angry if a database is open across a fork() call,
and will give all sorts of messages ranging from I/O errors to database
corruption errors. To deal with this, close all database connections
before forking, and reopen them (lazily) on the other side.

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/lib/bb/persist_data.py | 32 +++++++++++++++++++++++++++++++-
 1 file changed, 31 insertions(+), 1 deletion(-)

diff --git a/bitbake/lib/bb/persist_data.py b/bitbake/lib/bb/persist_data.py
index 4468facd18f..27ffb1ddaa4 100644
--- a/bitbake/lib/bb/persist_data.py
+++ b/bitbake/lib/bb/persist_data.py
@@ -30,6 +30,8 @@ from bb.compat import total_ordering
 from collections import Mapping
 import sqlite3
 import contextlib
+import bb.fork
+import weakref
 
 sqlversion = sqlite3.sqlite_version_info
 if sqlversion[0] < 3 or (sqlversion[0] == 3 and sqlversion[1] < 3):
@@ -38,6 +40,28 @@ if sqlversion[0] < 3 or (sqlversion[0] == 3 and sqlversion[1] < 3):
 
 logger = logging.getLogger("BitBake.PersistData")
 
+# Carrying an open database connection across a fork() confuses sqlite and
+# results in fun errors like 'database disk image is malformed'.
+# To remedy this, close all connections before forking, then they will be
+# (lazily) reopen them on the other side. This will cause a lot of problems if
+# there are threads running and trying to access the database at the same time,
+# but if you are mixing threads and fork() you have no one to blame but
+# yourself. If that is discovered to be a problem in the future, some sort of
+# per-table reader-writer lock could be used to block the fork() until all
+# pending transactions complete
+sql_table_weakrefs = []
+def _fork_before_handler():
+    for ref in sql_table_weakrefs:
+        t = ref()
+        if t is not None and t.connection is not None:
+            t.connection.close()
+            t.connection = None
+
+bb.fork.register_at_fork(before=_fork_before_handler)
+
+def _remove_table_weakref(ref):
+    sql_table_weakrefs.remove(ref)
+
 @total_ordering
 class SQLTable(collections.MutableMapping):
     class _Decorators(object):
@@ -305,4 +329,10 @@ def persist(domain, d):
 
     bb.utils.mkdirhier(cachedir)
     cachefile = os.path.join(cachedir, "bb_persist_data.sqlite3")
-    return SQLTable(cachefile, domain)
+    t = SQLTable(cachefile, domain)
+
+    # Add a weak reference to the table list. The weak reference will not keep
+    # the object alive by itself, so it prevents circular reference counts
+    sql_table_weakrefs.append(weakref.ref(t, _remove_table_weakref))
+
+    return t
-- 
2.19.2



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [OE-core][PATCH v4 03/10] bitbake: tests/persist_data: Add tests
  2018-12-18 15:30       ` [PATCH " Joshua Watt
@ 2018-12-18 15:30         ` Joshua Watt
  -1 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-18 15:30 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

Adds a test suite for testing the persistent data cache

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/bin/bitbake-selftest         |   1 +
 bitbake/lib/bb/tests/persist_data.py | 188 +++++++++++++++++++++++++++
 2 files changed, 189 insertions(+)
 create mode 100644 bitbake/lib/bb/tests/persist_data.py

diff --git a/bitbake/bin/bitbake-selftest b/bitbake/bin/bitbake-selftest
index cfa7ac5391b..c970dcae90c 100755
--- a/bitbake/bin/bitbake-selftest
+++ b/bitbake/bin/bitbake-selftest
@@ -33,6 +33,7 @@ tests = ["bb.tests.codeparser",
          "bb.tests.event",
          "bb.tests.fetch",
          "bb.tests.parse",
+         "bb.tests.persist_data",
          "bb.tests.utils",
          "layerindexlib.tests.layerindexobj",
          "layerindexlib.tests.restapi",
diff --git a/bitbake/lib/bb/tests/persist_data.py b/bitbake/lib/bb/tests/persist_data.py
new file mode 100644
index 00000000000..055f1d9ce47
--- /dev/null
+++ b/bitbake/lib/bb/tests/persist_data.py
@@ -0,0 +1,188 @@
+# ex:ts=4:sw=4:sts=4:et
+# -*- tab-width: 4; c-basic-offset: 4; indent-tabs-mode: nil -*-
+#
+# BitBake Test for lib/bb/persist_data/
+#
+# Copyright (C) 2018 Garmin Ltd.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License version 2 as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License along
+# with this program; if not, write to the Free Software Foundation, Inc.,
+# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+#
+
+import unittest
+import bb.data
+import bb.persist_data
+import bb.fork
+import tempfile
+import threading
+
+class PersistDataTest(unittest.TestCase):
+    def _create_data(self):
+        return bb.persist_data.persist('TEST_PERSIST_DATA', self.d)
+
+    def setUp(self):
+        self.d = bb.data.init()
+        self.tempdir = tempfile.TemporaryDirectory()
+        self.d['PERSISTENT_DIR'] = self.tempdir.name
+        self.data = self._create_data()
+        self.items = {
+                'A1': '1',
+                'B1': '2',
+                'C2': '3'
+                }
+        self.stress_count = 10000
+        self.thread_count = 5
+
+        for k,v in self.items.items():
+            self.data[k] = v
+
+    def tearDown(self):
+        self.tempdir.cleanup()
+
+    def _iter_helper(self, seen, iterator):
+        with iter(iterator):
+            for v in iterator:
+                self.assertTrue(v in seen)
+                seen.remove(v)
+        self.assertEqual(len(seen), 0, '%s not seen' % seen)
+
+    def test_get(self):
+        for k, v in self.items.items():
+            self.assertEqual(self.data[k], v)
+
+        self.assertIsNone(self.data.get('D'))
+        with self.assertRaises(KeyError):
+            self.data['D']
+
+    def test_set(self):
+        for k, v in self.items.items():
+            self.data[k] += '-foo'
+
+        for k, v in self.items.items():
+            self.assertEqual(self.data[k], v + '-foo')
+
+    def test_delete(self):
+        self.data['D'] = '4'
+        self.assertEqual(self.data['D'], '4')
+        del self.data['D']
+        self.assertIsNone(self.data.get('D'))
+        with self.assertRaises(KeyError):
+            self.data['D']
+
+    def test_contains(self):
+        for k in self.items:
+            self.assertTrue(k in self.data)
+            self.assertTrue(self.data.has_key(k))
+        self.assertFalse('NotFound' in self.data)
+        self.assertFalse(self.data.has_key('NotFound'))
+
+    def test_len(self):
+        self.assertEqual(len(self.data), len(self.items))
+
+    def test_iter(self):
+        self._iter_helper(set(self.items.keys()), self.data)
+
+    def test_itervalues(self):
+        self._iter_helper(set(self.items.values()), self.data.itervalues())
+
+    def test_iteritems(self):
+        self._iter_helper(set(self.items.items()), self.data.iteritems())
+
+    def test_get_by_pattern(self):
+        self._iter_helper({'1', '2'}, self.data.get_by_pattern('_1'))
+
+    def _stress_read(self, data):
+        for i in range(self.stress_count):
+            for k in self.items:
+                data[k]
+
+    def _stress_write(self, data):
+        for i in range(self.stress_count):
+            for k, v in self.items.items():
+                data[k] = v + str(i)
+
+    def _validate_stress(self):
+        for k, v in self.items.items():
+            self.assertEqual(self.data[k], v + str(self.stress_count - 1))
+
+    def test_stress(self):
+        self._stress_read(self.data)
+        self._stress_write(self.data)
+        self._validate_stress()
+
+    def test_stress_threads(self):
+        def read_thread():
+            data = self._create_data()
+            self._stress_read(data)
+
+        def write_thread():
+            data = self._create_data()
+            self._stress_write(data)
+
+        threads = []
+        for i in range(self.thread_count):
+            threads.append(threading.Thread(target=read_thread))
+            threads.append(threading.Thread(target=write_thread))
+
+        for t in threads:
+            t.start()
+        self._stress_read(self.data)
+        for t in threads:
+            t.join()
+        self._validate_stress()
+
+    def test_stress_fork(self):
+        children = []
+        for i in range(self.thread_count):
+            # Create a writer
+            pid = bb.fork.fork()
+            if pid == 0:
+                try:
+                    self._stress_write(self.data)
+                except:
+                    os._exit(1)
+                else:
+                    os._exit(0)
+            else:
+                children.append(pid)
+
+            # Create a reader
+            pid = bb.fork.fork()
+            if pid == 0:
+                try:
+                    self._stress_read(self.data)
+                except:
+                    os._exit(1)
+                else:
+                    os._exit(0)
+            else:
+                children.append(pid)
+
+        self._stress_read(self.data)
+
+        for pid in children:
+            while True:
+                try:
+                    (_, status) = os.waitpid(pid, 0)
+                    break
+                # Python < 3.5 will raise this if waitpid() is interrupted
+                except InterruptedError:
+                    pass
+                except:
+                    raise
+
+            self.assertTrue(os.WIFEXITED(status), "PID %d did not exit normally" % pid)
+            self.assertEqual(os.WEXITSTATUS(status), 0, "PID %d exited with code %d" % (pid, os.WEXITSTATUS(status)))
+
+        self._validate_stress()
+
-- 
2.19.2



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [PATCH v4 03/10] bitbake: tests/persist_data: Add tests
@ 2018-12-18 15:30         ` Joshua Watt
  0 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-18 15:30 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

Adds a test suite for testing the persistent data cache

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/bin/bitbake-selftest         |   1 +
 bitbake/lib/bb/tests/persist_data.py | 188 +++++++++++++++++++++++++++
 2 files changed, 189 insertions(+)
 create mode 100644 bitbake/lib/bb/tests/persist_data.py

diff --git a/bitbake/bin/bitbake-selftest b/bitbake/bin/bitbake-selftest
index cfa7ac5391b..c970dcae90c 100755
--- a/bitbake/bin/bitbake-selftest
+++ b/bitbake/bin/bitbake-selftest
@@ -33,6 +33,7 @@ tests = ["bb.tests.codeparser",
          "bb.tests.event",
          "bb.tests.fetch",
          "bb.tests.parse",
+         "bb.tests.persist_data",
          "bb.tests.utils",
          "layerindexlib.tests.layerindexobj",
          "layerindexlib.tests.restapi",
diff --git a/bitbake/lib/bb/tests/persist_data.py b/bitbake/lib/bb/tests/persist_data.py
new file mode 100644
index 00000000000..055f1d9ce47
--- /dev/null
+++ b/bitbake/lib/bb/tests/persist_data.py
@@ -0,0 +1,188 @@
+# ex:ts=4:sw=4:sts=4:et
+# -*- tab-width: 4; c-basic-offset: 4; indent-tabs-mode: nil -*-
+#
+# BitBake Test for lib/bb/persist_data/
+#
+# Copyright (C) 2018 Garmin Ltd.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License version 2 as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License along
+# with this program; if not, write to the Free Software Foundation, Inc.,
+# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+#
+
+import unittest
+import bb.data
+import bb.persist_data
+import bb.fork
+import tempfile
+import threading
+
+class PersistDataTest(unittest.TestCase):
+    def _create_data(self):
+        return bb.persist_data.persist('TEST_PERSIST_DATA', self.d)
+
+    def setUp(self):
+        self.d = bb.data.init()
+        self.tempdir = tempfile.TemporaryDirectory()
+        self.d['PERSISTENT_DIR'] = self.tempdir.name
+        self.data = self._create_data()
+        self.items = {
+                'A1': '1',
+                'B1': '2',
+                'C2': '3'
+                }
+        self.stress_count = 10000
+        self.thread_count = 5
+
+        for k,v in self.items.items():
+            self.data[k] = v
+
+    def tearDown(self):
+        self.tempdir.cleanup()
+
+    def _iter_helper(self, seen, iterator):
+        with iter(iterator):
+            for v in iterator:
+                self.assertTrue(v in seen)
+                seen.remove(v)
+        self.assertEqual(len(seen), 0, '%s not seen' % seen)
+
+    def test_get(self):
+        for k, v in self.items.items():
+            self.assertEqual(self.data[k], v)
+
+        self.assertIsNone(self.data.get('D'))
+        with self.assertRaises(KeyError):
+            self.data['D']
+
+    def test_set(self):
+        for k, v in self.items.items():
+            self.data[k] += '-foo'
+
+        for k, v in self.items.items():
+            self.assertEqual(self.data[k], v + '-foo')
+
+    def test_delete(self):
+        self.data['D'] = '4'
+        self.assertEqual(self.data['D'], '4')
+        del self.data['D']
+        self.assertIsNone(self.data.get('D'))
+        with self.assertRaises(KeyError):
+            self.data['D']
+
+    def test_contains(self):
+        for k in self.items:
+            self.assertTrue(k in self.data)
+            self.assertTrue(self.data.has_key(k))
+        self.assertFalse('NotFound' in self.data)
+        self.assertFalse(self.data.has_key('NotFound'))
+
+    def test_len(self):
+        self.assertEqual(len(self.data), len(self.items))
+
+    def test_iter(self):
+        self._iter_helper(set(self.items.keys()), self.data)
+
+    def test_itervalues(self):
+        self._iter_helper(set(self.items.values()), self.data.itervalues())
+
+    def test_iteritems(self):
+        self._iter_helper(set(self.items.items()), self.data.iteritems())
+
+    def test_get_by_pattern(self):
+        self._iter_helper({'1', '2'}, self.data.get_by_pattern('_1'))
+
+    def _stress_read(self, data):
+        for i in range(self.stress_count):
+            for k in self.items:
+                data[k]
+
+    def _stress_write(self, data):
+        for i in range(self.stress_count):
+            for k, v in self.items.items():
+                data[k] = v + str(i)
+
+    def _validate_stress(self):
+        for k, v in self.items.items():
+            self.assertEqual(self.data[k], v + str(self.stress_count - 1))
+
+    def test_stress(self):
+        self._stress_read(self.data)
+        self._stress_write(self.data)
+        self._validate_stress()
+
+    def test_stress_threads(self):
+        def read_thread():
+            data = self._create_data()
+            self._stress_read(data)
+
+        def write_thread():
+            data = self._create_data()
+            self._stress_write(data)
+
+        threads = []
+        for i in range(self.thread_count):
+            threads.append(threading.Thread(target=read_thread))
+            threads.append(threading.Thread(target=write_thread))
+
+        for t in threads:
+            t.start()
+        self._stress_read(self.data)
+        for t in threads:
+            t.join()
+        self._validate_stress()
+
+    def test_stress_fork(self):
+        children = []
+        for i in range(self.thread_count):
+            # Create a writer
+            pid = bb.fork.fork()
+            if pid == 0:
+                try:
+                    self._stress_write(self.data)
+                except:
+                    os._exit(1)
+                else:
+                    os._exit(0)
+            else:
+                children.append(pid)
+
+            # Create a reader
+            pid = bb.fork.fork()
+            if pid == 0:
+                try:
+                    self._stress_read(self.data)
+                except:
+                    os._exit(1)
+                else:
+                    os._exit(0)
+            else:
+                children.append(pid)
+
+        self._stress_read(self.data)
+
+        for pid in children:
+            while True:
+                try:
+                    (_, status) = os.waitpid(pid, 0)
+                    break
+                # Python < 3.5 will raise this if waitpid() is interrupted
+                except InterruptedError:
+                    pass
+                except:
+                    raise
+
+            self.assertTrue(os.WIFEXITED(status), "PID %d did not exit normally" % pid)
+            self.assertEqual(os.WEXITSTATUS(status), 0, "PID %d exited with code %d" % (pid, os.WEXITSTATUS(status)))
+
+        self._validate_stress()
+
-- 
2.19.2



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [OE-core][PATCH v4 04/10] bitbake: siggen: Split out task unique hash
  2018-12-18 15:30       ` [PATCH " Joshua Watt
@ 2018-12-18 15:30         ` Joshua Watt
  -1 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-18 15:30 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

Abstracts the function to get the unique hash for a task. This hash is
used as in place of the taskhash for the purpose of determine how other
tasks depend on this one. Unless overridden, the taskhash is the same as
the unique hash, preserving the original behavior.

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/lib/bb/siggen.py | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/bitbake/lib/bb/siggen.py b/bitbake/lib/bb/siggen.py
index ab6df7603c8..5508523f2da 100644
--- a/bitbake/lib/bb/siggen.py
+++ b/bitbake/lib/bb/siggen.py
@@ -41,6 +41,9 @@ class SignatureGenerator(object):
     def finalise(self, fn, d, varient):
         return
 
+    def get_unihash(self, task):
+        return self.taskhash[task]
+
     def get_taskhash(self, fn, task, deps, dataCache):
         return "0"
 
@@ -186,7 +189,7 @@ class SignatureGeneratorBasic(SignatureGenerator):
                 continue
             if dep not in self.taskhash:
                 bb.fatal("%s is not in taskhash, caller isn't calling in dependency order?" % dep)
-            data = data + self.taskhash[dep]
+            data = data + self.get_unihash(dep)
             self.runtaskdeps[k].append(dep)
 
         if task in dataCache.file_checksums[fn]:
@@ -261,7 +264,7 @@ class SignatureGeneratorBasic(SignatureGenerator):
             data['file_checksum_values'] = [(os.path.basename(f), cs) for f,cs in self.file_checksum_values[k]]
             data['runtaskhashes'] = {}
             for dep in data['runtaskdeps']:
-                data['runtaskhashes'][dep] = self.taskhash[dep]
+                data['runtaskhashes'][dep] = self.get_unihash(dep)
             data['taskhash'] = self.taskhash[k]
 
         taint = self.read_taint(fn, task, referencestamp)
-- 
2.19.2



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [PATCH v4 04/10] bitbake: siggen: Split out task unique hash
@ 2018-12-18 15:30         ` Joshua Watt
  0 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-18 15:30 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

Abstracts the function to get the unique hash for a task. This hash is
used as in place of the taskhash for the purpose of determine how other
tasks depend on this one. Unless overridden, the taskhash is the same as
the unique hash, preserving the original behavior.

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/lib/bb/siggen.py | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/bitbake/lib/bb/siggen.py b/bitbake/lib/bb/siggen.py
index ab6df7603c8..5508523f2da 100644
--- a/bitbake/lib/bb/siggen.py
+++ b/bitbake/lib/bb/siggen.py
@@ -41,6 +41,9 @@ class SignatureGenerator(object):
     def finalise(self, fn, d, varient):
         return
 
+    def get_unihash(self, task):
+        return self.taskhash[task]
+
     def get_taskhash(self, fn, task, deps, dataCache):
         return "0"
 
@@ -186,7 +189,7 @@ class SignatureGeneratorBasic(SignatureGenerator):
                 continue
             if dep not in self.taskhash:
                 bb.fatal("%s is not in taskhash, caller isn't calling in dependency order?" % dep)
-            data = data + self.taskhash[dep]
+            data = data + self.get_unihash(dep)
             self.runtaskdeps[k].append(dep)
 
         if task in dataCache.file_checksums[fn]:
@@ -261,7 +264,7 @@ class SignatureGeneratorBasic(SignatureGenerator):
             data['file_checksum_values'] = [(os.path.basename(f), cs) for f,cs in self.file_checksum_values[k]]
             data['runtaskhashes'] = {}
             for dep in data['runtaskdeps']:
-                data['runtaskhashes'][dep] = self.taskhash[dep]
+                data['runtaskhashes'][dep] = self.get_unihash(dep)
             data['taskhash'] = self.taskhash[k]
 
         taint = self.read_taint(fn, task, referencestamp)
-- 
2.19.2



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [OE-core][PATCH v4 05/10] bitbake: runqueue: Track task unique hash
  2018-12-18 15:30       ` [PATCH " Joshua Watt
@ 2018-12-18 15:30         ` Joshua Watt
  -1 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-18 15:30 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

Requests the task unique hash from siggen and tracks it

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/lib/bb/runqueue.py | 25 +++++++++++++++++--------
 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/bitbake/lib/bb/runqueue.py b/bitbake/lib/bb/runqueue.py
index f2b95a9829b..27b188256dd 100644
--- a/bitbake/lib/bb/runqueue.py
+++ b/bitbake/lib/bb/runqueue.py
@@ -346,6 +346,7 @@ class RunTaskEntry(object):
         self.depends = set()
         self.revdeps = set()
         self.hash = None
+        self.unihash = None
         self.task = None
         self.weight = 1
 
@@ -385,6 +386,9 @@ class RunQueueData:
     def get_task_hash(self, tid):
         return self.runtaskentries[tid].hash
 
+    def get_task_unihash(self, tid):
+        return self.runtaskentries[tid].unihash
+
     def get_user_idstring(self, tid, task_name_suffix = ""):
         return tid + task_name_suffix
 
@@ -1150,18 +1154,21 @@ class RunQueueData:
                 if len(self.runtaskentries[tid].depends - dealtwith) == 0:
                     dealtwith.add(tid)
                     todeal.remove(tid)
-                    procdep = []
-                    for dep in self.runtaskentries[tid].depends:
-                        procdep.append(fn_from_tid(dep) + "." + taskname_from_tid(dep))
-                    (mc, fn, taskname, taskfn) = split_tid_mcfn(tid)
-                    self.runtaskentries[tid].hash = bb.parse.siggen.get_taskhash(taskfn, taskname, procdep, self.dataCaches[mc])
-                    task = self.runtaskentries[tid].task
+                    self.prepare_task_hash(tid)
 
         bb.parse.siggen.writeout_file_checksum_cache()
 
         #self.dump_data()
         return len(self.runtaskentries)
 
+    def prepare_task_hash(self, tid):
+        procdep = []
+        for dep in self.runtaskentries[tid].depends:
+            procdep.append(fn_from_tid(dep) + "." + taskname_from_tid(dep))
+        (mc, fn, taskname, taskfn) = split_tid_mcfn(tid)
+        self.runtaskentries[tid].hash = bb.parse.siggen.get_taskhash(taskfn, taskname, procdep, self.dataCaches[mc])
+        self.runtaskentries[tid].unihash = bb.parse.siggen.get_unihash(fn + "." + taskname)
+
     def dump_data(self):
         """
         Dump some debug information on the internal data structures
@@ -2081,7 +2088,8 @@ class RunQueueExecuteTasks(RunQueueExecute):
                 deps = self.rqdata.runtaskentries[revdep].depends
                 provides = self.rqdata.dataCaches[mc].fn_provides[taskfn]
                 taskhash = self.rqdata.runtaskentries[revdep].hash
-                taskdepdata[revdep] = [pn, taskname, fn, deps, provides, taskhash]
+                unihash = self.rqdata.runtaskentries[revdep].unihash
+                taskdepdata[revdep] = [pn, taskname, fn, deps, provides, taskhash, unihash]
                 for revdep2 in deps:
                     if revdep2 not in taskdepdata:
                         additional.append(revdep2)
@@ -2524,7 +2532,8 @@ class RunQueueExecuteScenequeue(RunQueueExecute):
                 deps = getsetscenedeps(revdep)
                 provides = self.rqdata.dataCaches[mc].fn_provides[taskfn]
                 taskhash = self.rqdata.runtaskentries[revdep].hash
-                taskdepdata[revdep] = [pn, taskname, fn, deps, provides, taskhash]
+                unihash = self.rqdata.runtaskentries[revdep].unihash
+                taskdepdata[revdep] = [pn, taskname, fn, deps, provides, taskhash, unihash]
                 for revdep2 in deps:
                     if revdep2 not in taskdepdata:
                         additional.append(revdep2)
-- 
2.19.2



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [PATCH v4 05/10] bitbake: runqueue: Track task unique hash
@ 2018-12-18 15:30         ` Joshua Watt
  0 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-18 15:30 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

Requests the task unique hash from siggen and tracks it

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/lib/bb/runqueue.py | 25 +++++++++++++++++--------
 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/bitbake/lib/bb/runqueue.py b/bitbake/lib/bb/runqueue.py
index f2b95a9829b..27b188256dd 100644
--- a/bitbake/lib/bb/runqueue.py
+++ b/bitbake/lib/bb/runqueue.py
@@ -346,6 +346,7 @@ class RunTaskEntry(object):
         self.depends = set()
         self.revdeps = set()
         self.hash = None
+        self.unihash = None
         self.task = None
         self.weight = 1
 
@@ -385,6 +386,9 @@ class RunQueueData:
     def get_task_hash(self, tid):
         return self.runtaskentries[tid].hash
 
+    def get_task_unihash(self, tid):
+        return self.runtaskentries[tid].unihash
+
     def get_user_idstring(self, tid, task_name_suffix = ""):
         return tid + task_name_suffix
 
@@ -1150,18 +1154,21 @@ class RunQueueData:
                 if len(self.runtaskentries[tid].depends - dealtwith) == 0:
                     dealtwith.add(tid)
                     todeal.remove(tid)
-                    procdep = []
-                    for dep in self.runtaskentries[tid].depends:
-                        procdep.append(fn_from_tid(dep) + "." + taskname_from_tid(dep))
-                    (mc, fn, taskname, taskfn) = split_tid_mcfn(tid)
-                    self.runtaskentries[tid].hash = bb.parse.siggen.get_taskhash(taskfn, taskname, procdep, self.dataCaches[mc])
-                    task = self.runtaskentries[tid].task
+                    self.prepare_task_hash(tid)
 
         bb.parse.siggen.writeout_file_checksum_cache()
 
         #self.dump_data()
         return len(self.runtaskentries)
 
+    def prepare_task_hash(self, tid):
+        procdep = []
+        for dep in self.runtaskentries[tid].depends:
+            procdep.append(fn_from_tid(dep) + "." + taskname_from_tid(dep))
+        (mc, fn, taskname, taskfn) = split_tid_mcfn(tid)
+        self.runtaskentries[tid].hash = bb.parse.siggen.get_taskhash(taskfn, taskname, procdep, self.dataCaches[mc])
+        self.runtaskentries[tid].unihash = bb.parse.siggen.get_unihash(fn + "." + taskname)
+
     def dump_data(self):
         """
         Dump some debug information on the internal data structures
@@ -2081,7 +2088,8 @@ class RunQueueExecuteTasks(RunQueueExecute):
                 deps = self.rqdata.runtaskentries[revdep].depends
                 provides = self.rqdata.dataCaches[mc].fn_provides[taskfn]
                 taskhash = self.rqdata.runtaskentries[revdep].hash
-                taskdepdata[revdep] = [pn, taskname, fn, deps, provides, taskhash]
+                unihash = self.rqdata.runtaskentries[revdep].unihash
+                taskdepdata[revdep] = [pn, taskname, fn, deps, provides, taskhash, unihash]
                 for revdep2 in deps:
                     if revdep2 not in taskdepdata:
                         additional.append(revdep2)
@@ -2524,7 +2532,8 @@ class RunQueueExecuteScenequeue(RunQueueExecute):
                 deps = getsetscenedeps(revdep)
                 provides = self.rqdata.dataCaches[mc].fn_provides[taskfn]
                 taskhash = self.rqdata.runtaskentries[revdep].hash
-                taskdepdata[revdep] = [pn, taskname, fn, deps, provides, taskhash]
+                unihash = self.rqdata.runtaskentries[revdep].unihash
+                taskdepdata[revdep] = [pn, taskname, fn, deps, provides, taskhash, unihash]
                 for revdep2 in deps:
                     if revdep2 not in taskdepdata:
                         additional.append(revdep2)
-- 
2.19.2



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [OE-core][PATCH v4 06/10] bitbake: runqueue: Pass unique hash to task
  2018-12-18 15:30       ` [PATCH " Joshua Watt
@ 2018-12-18 15:30         ` Joshua Watt
  -1 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-18 15:30 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

The unique hash is now passed to the task in the BB_UNIHASH variable

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/bin/bitbake-worker |  7 ++++---
 bitbake/lib/bb/runqueue.py | 10 ++++++----
 2 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/bitbake/bin/bitbake-worker b/bitbake/bin/bitbake-worker
index 41ef6d848ac..7b1884a7f88 100755
--- a/bitbake/bin/bitbake-worker
+++ b/bitbake/bin/bitbake-worker
@@ -136,7 +136,7 @@ def sigterm_handler(signum, frame):
     os.killpg(0, signal.SIGTERM)
     sys.exit()
 
-def fork_off_task(cfg, data, databuilder, workerdata, fn, task, taskname, taskhash, appends, taskdepdata, extraconfigdata, quieterrors=False, dry_run_exec=False):
+def fork_off_task(cfg, data, databuilder, workerdata, fn, task, taskname, taskhash, unihash, appends, taskdepdata, extraconfigdata, quieterrors=False, dry_run_exec=False):
     # We need to setup the environment BEFORE the fork, since
     # a fork() or exec*() activates PSEUDO...
 
@@ -235,6 +235,7 @@ def fork_off_task(cfg, data, databuilder, workerdata, fn, task, taskname, taskha
 
                 the_data = bb_cache.loadDataFull(fn, appends)
                 the_data.setVar('BB_TASKHASH', taskhash)
+                the_data.setVar('BB_UNIHASH', unihash)
 
                 bb.utils.set_process_name("%s:%s" % (the_data.getVar("PN"), taskname.replace("do_", "")))
 
@@ -425,10 +426,10 @@ class BitbakeWorker(object):
         sys.exit(0)
 
     def handle_runtask(self, data):
-        fn, task, taskname, taskhash, quieterrors, appends, taskdepdata, dry_run_exec = pickle.loads(data)
+        fn, task, taskname, taskhash, unihash, quieterrors, appends, taskdepdata, dry_run_exec = pickle.loads(data)
         workerlog_write("Handling runtask %s %s %s\n" % (task, fn, taskname))
 
-        pid, pipein, pipeout = fork_off_task(self.cookercfg, self.data, self.databuilder, self.workerdata, fn, task, taskname, taskhash, appends, taskdepdata, self.extraconfigdata, quieterrors, dry_run_exec)
+        pid, pipein, pipeout = fork_off_task(self.cookercfg, self.data, self.databuilder, self.workerdata, fn, task, taskname, taskhash, unihash, appends, taskdepdata, self.extraconfigdata, quieterrors, dry_run_exec)
 
         self.build_pids[pid] = task
         self.build_pipes[pid] = runQueueWorkerPipe(pipein, pipeout)
diff --git a/bitbake/lib/bb/runqueue.py b/bitbake/lib/bb/runqueue.py
index 27b188256dd..de57dcb37b8 100644
--- a/bitbake/lib/bb/runqueue.py
+++ b/bitbake/lib/bb/runqueue.py
@@ -2034,6 +2034,7 @@ class RunQueueExecuteTasks(RunQueueExecute):
 
             taskdep = self.rqdata.dataCaches[mc].task_deps[taskfn]
             taskhash = self.rqdata.get_task_hash(task)
+            unihash = self.rqdata.get_task_unihash(task)
             if 'fakeroot' in taskdep and taskname in taskdep['fakeroot'] and not (self.cooker.configuration.dry_run or self.rqdata.setscene_enforce):
                 if not mc in self.rq.fakeworker:
                     try:
@@ -2043,10 +2044,10 @@ class RunQueueExecuteTasks(RunQueueExecute):
                         self.rq.state = runQueueFailed
                         self.stats.taskFailed()
                         return True
-                self.rq.fakeworker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, False, self.cooker.collection.get_file_appends(taskfn), taskdepdata, self.rqdata.setscene_enforce)) + b"</runtask>")
+                self.rq.fakeworker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, unihash, False, self.cooker.collection.get_file_appends(taskfn), taskdepdata, self.rqdata.setscene_enforce)) + b"</runtask>")
                 self.rq.fakeworker[mc].process.stdin.flush()
             else:
-                self.rq.worker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, False, self.cooker.collection.get_file_appends(taskfn), taskdepdata, self.rqdata.setscene_enforce)) + b"</runtask>")
+                self.rq.worker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, unihash, False, self.cooker.collection.get_file_appends(taskfn), taskdepdata, self.rqdata.setscene_enforce)) + b"</runtask>")
                 self.rq.worker[mc].process.stdin.flush()
 
             self.build_stamps[task] = bb.build.stampfile(taskname, self.rqdata.dataCaches[mc], taskfn, noextra=True)
@@ -2462,13 +2463,14 @@ class RunQueueExecuteScenequeue(RunQueueExecute):
 
             taskdep = self.rqdata.dataCaches[mc].task_deps[taskfn]
             taskhash = self.rqdata.get_task_hash(task)
+            unihash = self.rqdata.get_task_unihash(task)
             if 'fakeroot' in taskdep and taskname in taskdep['fakeroot'] and not self.cooker.configuration.dry_run:
                 if not mc in self.rq.fakeworker:
                     self.rq.start_fakeworker(self, mc)
-                self.rq.fakeworker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, True, self.cooker.collection.get_file_appends(taskfn), taskdepdata, False)) + b"</runtask>")
+                self.rq.fakeworker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, unihash, True, self.cooker.collection.get_file_appends(taskfn), taskdepdata, False)) + b"</runtask>")
                 self.rq.fakeworker[mc].process.stdin.flush()
             else:
-                self.rq.worker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, True, self.cooker.collection.get_file_appends(taskfn), taskdepdata, False)) + b"</runtask>")
+                self.rq.worker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, unihash, True, self.cooker.collection.get_file_appends(taskfn), taskdepdata, False)) + b"</runtask>")
                 self.rq.worker[mc].process.stdin.flush()
 
             self.build_stamps[task] = bb.build.stampfile(taskname, self.rqdata.dataCaches[mc], taskfn, noextra=True)
-- 
2.19.2



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [PATCH v4 06/10] bitbake: runqueue: Pass unique hash to task
@ 2018-12-18 15:30         ` Joshua Watt
  0 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-18 15:30 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

The unique hash is now passed to the task in the BB_UNIHASH variable

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/bin/bitbake-worker |  7 ++++---
 bitbake/lib/bb/runqueue.py | 10 ++++++----
 2 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/bitbake/bin/bitbake-worker b/bitbake/bin/bitbake-worker
index 41ef6d848ac..7b1884a7f88 100755
--- a/bitbake/bin/bitbake-worker
+++ b/bitbake/bin/bitbake-worker
@@ -136,7 +136,7 @@ def sigterm_handler(signum, frame):
     os.killpg(0, signal.SIGTERM)
     sys.exit()
 
-def fork_off_task(cfg, data, databuilder, workerdata, fn, task, taskname, taskhash, appends, taskdepdata, extraconfigdata, quieterrors=False, dry_run_exec=False):
+def fork_off_task(cfg, data, databuilder, workerdata, fn, task, taskname, taskhash, unihash, appends, taskdepdata, extraconfigdata, quieterrors=False, dry_run_exec=False):
     # We need to setup the environment BEFORE the fork, since
     # a fork() or exec*() activates PSEUDO...
 
@@ -235,6 +235,7 @@ def fork_off_task(cfg, data, databuilder, workerdata, fn, task, taskname, taskha
 
                 the_data = bb_cache.loadDataFull(fn, appends)
                 the_data.setVar('BB_TASKHASH', taskhash)
+                the_data.setVar('BB_UNIHASH', unihash)
 
                 bb.utils.set_process_name("%s:%s" % (the_data.getVar("PN"), taskname.replace("do_", "")))
 
@@ -425,10 +426,10 @@ class BitbakeWorker(object):
         sys.exit(0)
 
     def handle_runtask(self, data):
-        fn, task, taskname, taskhash, quieterrors, appends, taskdepdata, dry_run_exec = pickle.loads(data)
+        fn, task, taskname, taskhash, unihash, quieterrors, appends, taskdepdata, dry_run_exec = pickle.loads(data)
         workerlog_write("Handling runtask %s %s %s\n" % (task, fn, taskname))
 
-        pid, pipein, pipeout = fork_off_task(self.cookercfg, self.data, self.databuilder, self.workerdata, fn, task, taskname, taskhash, appends, taskdepdata, self.extraconfigdata, quieterrors, dry_run_exec)
+        pid, pipein, pipeout = fork_off_task(self.cookercfg, self.data, self.databuilder, self.workerdata, fn, task, taskname, taskhash, unihash, appends, taskdepdata, self.extraconfigdata, quieterrors, dry_run_exec)
 
         self.build_pids[pid] = task
         self.build_pipes[pid] = runQueueWorkerPipe(pipein, pipeout)
diff --git a/bitbake/lib/bb/runqueue.py b/bitbake/lib/bb/runqueue.py
index 27b188256dd..de57dcb37b8 100644
--- a/bitbake/lib/bb/runqueue.py
+++ b/bitbake/lib/bb/runqueue.py
@@ -2034,6 +2034,7 @@ class RunQueueExecuteTasks(RunQueueExecute):
 
             taskdep = self.rqdata.dataCaches[mc].task_deps[taskfn]
             taskhash = self.rqdata.get_task_hash(task)
+            unihash = self.rqdata.get_task_unihash(task)
             if 'fakeroot' in taskdep and taskname in taskdep['fakeroot'] and not (self.cooker.configuration.dry_run or self.rqdata.setscene_enforce):
                 if not mc in self.rq.fakeworker:
                     try:
@@ -2043,10 +2044,10 @@ class RunQueueExecuteTasks(RunQueueExecute):
                         self.rq.state = runQueueFailed
                         self.stats.taskFailed()
                         return True
-                self.rq.fakeworker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, False, self.cooker.collection.get_file_appends(taskfn), taskdepdata, self.rqdata.setscene_enforce)) + b"</runtask>")
+                self.rq.fakeworker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, unihash, False, self.cooker.collection.get_file_appends(taskfn), taskdepdata, self.rqdata.setscene_enforce)) + b"</runtask>")
                 self.rq.fakeworker[mc].process.stdin.flush()
             else:
-                self.rq.worker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, False, self.cooker.collection.get_file_appends(taskfn), taskdepdata, self.rqdata.setscene_enforce)) + b"</runtask>")
+                self.rq.worker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, unihash, False, self.cooker.collection.get_file_appends(taskfn), taskdepdata, self.rqdata.setscene_enforce)) + b"</runtask>")
                 self.rq.worker[mc].process.stdin.flush()
 
             self.build_stamps[task] = bb.build.stampfile(taskname, self.rqdata.dataCaches[mc], taskfn, noextra=True)
@@ -2462,13 +2463,14 @@ class RunQueueExecuteScenequeue(RunQueueExecute):
 
             taskdep = self.rqdata.dataCaches[mc].task_deps[taskfn]
             taskhash = self.rqdata.get_task_hash(task)
+            unihash = self.rqdata.get_task_unihash(task)
             if 'fakeroot' in taskdep and taskname in taskdep['fakeroot'] and not self.cooker.configuration.dry_run:
                 if not mc in self.rq.fakeworker:
                     self.rq.start_fakeworker(self, mc)
-                self.rq.fakeworker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, True, self.cooker.collection.get_file_appends(taskfn), taskdepdata, False)) + b"</runtask>")
+                self.rq.fakeworker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, unihash, True, self.cooker.collection.get_file_appends(taskfn), taskdepdata, False)) + b"</runtask>")
                 self.rq.fakeworker[mc].process.stdin.flush()
             else:
-                self.rq.worker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, True, self.cooker.collection.get_file_appends(taskfn), taskdepdata, False)) + b"</runtask>")
+                self.rq.worker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, unihash, True, self.cooker.collection.get_file_appends(taskfn), taskdepdata, False)) + b"</runtask>")
                 self.rq.worker[mc].process.stdin.flush()
 
             self.build_stamps[task] = bb.build.stampfile(taskname, self.rqdata.dataCaches[mc], taskfn, noextra=True)
-- 
2.19.2



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [OE-core][PATCH v4 07/10] bitbake: runqueue: Pass unique hash to hash validate
  2018-12-18 15:30       ` [PATCH " Joshua Watt
@ 2018-12-18 15:30         ` Joshua Watt
  -1 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-18 15:30 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

If the unique hash is being used to track task dependencies, the hash
validation function needs to know about it in order to properly validate
the hash.

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/lib/bb/runqueue.py | 38 +++++++++++++++++++++++++++-----------
 1 file changed, 27 insertions(+), 11 deletions(-)

diff --git a/bitbake/lib/bb/runqueue.py b/bitbake/lib/bb/runqueue.py
index de57dcb37b8..161f53c7cb1 100644
--- a/bitbake/lib/bb/runqueue.py
+++ b/bitbake/lib/bb/runqueue.py
@@ -1378,6 +1378,23 @@ class RunQueue:
             cache[tid] = iscurrent
         return iscurrent
 
+    def validate_hash(self, *, sq_fn, sq_task, sq_hash, sq_hashfn, siginfo, sq_unihash, d):
+        locs = {"sq_fn" : sq_fn, "sq_task" : sq_task, "sq_hash" : sq_hash, "sq_hashfn" : sq_hashfn,
+                "sq_unihash" : sq_unihash, "siginfo" : siginfo, "d" : d}
+
+        for hashvalidate_args in ("(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=siginfo, sq_unihash=sq_unihash)",
+                                  "(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=siginfo)"):
+            try:
+                call = self.hashvalidate + hashvalidate_args
+                return bb.utils.better_eval(call, locs)
+            except TypeError:
+                continue
+
+        # If none of the hash validate functions worked, try one more time
+        # with the oldest type
+        call = self.hashvalidate + "(sq_fn, sq_task, sq_hash, sq_hashfn, sq_unihash, d)"
+        return bb.utils.better_eval(call, locs)
+
     def _execute_runqueue(self):
         """
         Run the tasks in a queue prepared by rqdata.prepare()
@@ -1549,6 +1566,7 @@ class RunQueue:
         valid = []
         sq_hash = []
         sq_hashfn = []
+        sq_unihash = []
         sq_fn = []
         sq_taskname = []
         sq_task = []
@@ -1567,16 +1585,13 @@ class RunQueue:
             sq_fn.append(fn)
             sq_hashfn.append(self.rqdata.dataCaches[mc].hashfn[taskfn])
             sq_hash.append(self.rqdata.runtaskentries[tid].hash)
+            sq_unihash.append(self.rqdata.runtaskentries[tid].unihash)
             sq_taskname.append(taskname)
             sq_task.append(tid)
-        locs = { "sq_fn" : sq_fn, "sq_task" : sq_taskname, "sq_hash" : sq_hash, "sq_hashfn" : sq_hashfn, "d" : self.cooker.data }
-        try:
-            call = self.hashvalidate + "(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=True)"
-            valid = bb.utils.better_eval(call, locs)
-        # Handle version with no siginfo parameter
-        except TypeError:
-            call = self.hashvalidate + "(sq_fn, sq_task, sq_hash, sq_hashfn, d)"
-            valid = bb.utils.better_eval(call, locs)
+
+        valid = self.validate_hash(sq_fn=sq_fn, sq_task=sq_taskname, sq_hash=sq_hash, sq_hashfn=sq_hashfn,
+                siginfo=True, sq_unihash=sq_unihash, d=self.cooker.data)
+
         for v in valid:
             valid_new.add(sq_task[v])
 
@@ -2293,6 +2308,7 @@ class RunQueueExecuteScenequeue(RunQueueExecute):
         if self.rq.hashvalidate:
             sq_hash = []
             sq_hashfn = []
+            sq_unihash = []
             sq_fn = []
             sq_taskname = []
             sq_task = []
@@ -2324,14 +2340,14 @@ class RunQueueExecuteScenequeue(RunQueueExecute):
                 sq_fn.append(fn)
                 sq_hashfn.append(self.rqdata.dataCaches[mc].hashfn[taskfn])
                 sq_hash.append(self.rqdata.runtaskentries[tid].hash)
+                sq_unihash.append(self.rqdata.runtaskentries[tid].unihash)
                 sq_taskname.append(taskname)
                 sq_task.append(tid)
 
             self.cooker.data.setVar("BB_SETSCENE_STAMPCURRENT_COUNT", len(stamppresent))
 
-            call = self.rq.hashvalidate + "(sq_fn, sq_task, sq_hash, sq_hashfn, d)"
-            locs = { "sq_fn" : sq_fn, "sq_task" : sq_taskname, "sq_hash" : sq_hash, "sq_hashfn" : sq_hashfn, "d" : self.cooker.data }
-            valid = bb.utils.better_eval(call, locs)
+            valid = self.rq.validate_hash(sq_fn=sq_fn, sq_task=sq_taskname, sq_hash=sq_hash, sq_hashfn=sq_hashfn,
+                    siginfo=False, sq_unihash=sq_unihash, d=self.cooker.data)
 
             self.cooker.data.delVar("BB_SETSCENE_STAMPCURRENT_COUNT")
 
-- 
2.19.2



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [PATCH v4 07/10] bitbake: runqueue: Pass unique hash to hash validate
@ 2018-12-18 15:30         ` Joshua Watt
  0 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-18 15:30 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

If the unique hash is being used to track task dependencies, the hash
validation function needs to know about it in order to properly validate
the hash.

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/lib/bb/runqueue.py | 38 +++++++++++++++++++++++++++-----------
 1 file changed, 27 insertions(+), 11 deletions(-)

diff --git a/bitbake/lib/bb/runqueue.py b/bitbake/lib/bb/runqueue.py
index de57dcb37b8..161f53c7cb1 100644
--- a/bitbake/lib/bb/runqueue.py
+++ b/bitbake/lib/bb/runqueue.py
@@ -1378,6 +1378,23 @@ class RunQueue:
             cache[tid] = iscurrent
         return iscurrent
 
+    def validate_hash(self, *, sq_fn, sq_task, sq_hash, sq_hashfn, siginfo, sq_unihash, d):
+        locs = {"sq_fn" : sq_fn, "sq_task" : sq_task, "sq_hash" : sq_hash, "sq_hashfn" : sq_hashfn,
+                "sq_unihash" : sq_unihash, "siginfo" : siginfo, "d" : d}
+
+        for hashvalidate_args in ("(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=siginfo, sq_unihash=sq_unihash)",
+                                  "(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=siginfo)"):
+            try:
+                call = self.hashvalidate + hashvalidate_args
+                return bb.utils.better_eval(call, locs)
+            except TypeError:
+                continue
+
+        # If none of the hash validate functions worked, try one more time
+        # with the oldest type
+        call = self.hashvalidate + "(sq_fn, sq_task, sq_hash, sq_hashfn, sq_unihash, d)"
+        return bb.utils.better_eval(call, locs)
+
     def _execute_runqueue(self):
         """
         Run the tasks in a queue prepared by rqdata.prepare()
@@ -1549,6 +1566,7 @@ class RunQueue:
         valid = []
         sq_hash = []
         sq_hashfn = []
+        sq_unihash = []
         sq_fn = []
         sq_taskname = []
         sq_task = []
@@ -1567,16 +1585,13 @@ class RunQueue:
             sq_fn.append(fn)
             sq_hashfn.append(self.rqdata.dataCaches[mc].hashfn[taskfn])
             sq_hash.append(self.rqdata.runtaskentries[tid].hash)
+            sq_unihash.append(self.rqdata.runtaskentries[tid].unihash)
             sq_taskname.append(taskname)
             sq_task.append(tid)
-        locs = { "sq_fn" : sq_fn, "sq_task" : sq_taskname, "sq_hash" : sq_hash, "sq_hashfn" : sq_hashfn, "d" : self.cooker.data }
-        try:
-            call = self.hashvalidate + "(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=True)"
-            valid = bb.utils.better_eval(call, locs)
-        # Handle version with no siginfo parameter
-        except TypeError:
-            call = self.hashvalidate + "(sq_fn, sq_task, sq_hash, sq_hashfn, d)"
-            valid = bb.utils.better_eval(call, locs)
+
+        valid = self.validate_hash(sq_fn=sq_fn, sq_task=sq_taskname, sq_hash=sq_hash, sq_hashfn=sq_hashfn,
+                siginfo=True, sq_unihash=sq_unihash, d=self.cooker.data)
+
         for v in valid:
             valid_new.add(sq_task[v])
 
@@ -2293,6 +2308,7 @@ class RunQueueExecuteScenequeue(RunQueueExecute):
         if self.rq.hashvalidate:
             sq_hash = []
             sq_hashfn = []
+            sq_unihash = []
             sq_fn = []
             sq_taskname = []
             sq_task = []
@@ -2324,14 +2340,14 @@ class RunQueueExecuteScenequeue(RunQueueExecute):
                 sq_fn.append(fn)
                 sq_hashfn.append(self.rqdata.dataCaches[mc].hashfn[taskfn])
                 sq_hash.append(self.rqdata.runtaskentries[tid].hash)
+                sq_unihash.append(self.rqdata.runtaskentries[tid].unihash)
                 sq_taskname.append(taskname)
                 sq_task.append(tid)
 
             self.cooker.data.setVar("BB_SETSCENE_STAMPCURRENT_COUNT", len(stamppresent))
 
-            call = self.rq.hashvalidate + "(sq_fn, sq_task, sq_hash, sq_hashfn, d)"
-            locs = { "sq_fn" : sq_fn, "sq_task" : sq_taskname, "sq_hash" : sq_hash, "sq_hashfn" : sq_hashfn, "d" : self.cooker.data }
-            valid = bb.utils.better_eval(call, locs)
+            valid = self.rq.validate_hash(sq_fn=sq_fn, sq_task=sq_taskname, sq_hash=sq_hash, sq_hashfn=sq_hashfn,
+                    siginfo=False, sq_unihash=sq_unihash, d=self.cooker.data)
 
             self.cooker.data.delVar("BB_SETSCENE_STAMPCURRENT_COUNT")
 
-- 
2.19.2



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [OE-core][PATCH v4 08/10] classes/sstate: Handle unihash in hash check
  2018-12-18 15:30       ` [PATCH " Joshua Watt
@ 2018-12-18 15:30         ` Joshua Watt
  -1 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-18 15:30 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

Handles the argument that passes task unique hash in the hash check
function, as it is now required by bitbake

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 meta/classes/sstate.bbclass | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/meta/classes/sstate.bbclass b/meta/classes/sstate.bbclass
index 8b48ab465fd..41a2f9b7b77 100644
--- a/meta/classes/sstate.bbclass
+++ b/meta/classes/sstate.bbclass
@@ -774,7 +774,7 @@ sstate_unpack_package () {
 
 BB_HASHCHECK_FUNCTION = "sstate_checkhashes"
 
-def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False):
+def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False, *, sq_unihash):
 
     ret = []
     missed = []
-- 
2.19.2



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [PATCH v4 08/10] classes/sstate: Handle unihash in hash check
@ 2018-12-18 15:30         ` Joshua Watt
  0 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-18 15:30 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

Handles the argument that passes task unique hash in the hash check
function, as it is now required by bitbake

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 meta/classes/sstate.bbclass | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/meta/classes/sstate.bbclass b/meta/classes/sstate.bbclass
index 8b48ab465fd..41a2f9b7b77 100644
--- a/meta/classes/sstate.bbclass
+++ b/meta/classes/sstate.bbclass
@@ -774,7 +774,7 @@ sstate_unpack_package () {
 
 BB_HASHCHECK_FUNCTION = "sstate_checkhashes"
 
-def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False):
+def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False, *, sq_unihash):
 
     ret = []
     missed = []
-- 
2.19.2



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [OE-core][PATCH v4 09/10] bitbake: hashserv: Add hash equivalence reference server
  2018-12-18 15:30       ` [PATCH " Joshua Watt
@ 2018-12-18 15:31         ` Joshua Watt
  -1 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-18 15:31 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

Implements a reference implementation of the hash equivalence server.
This server has minimal dependencies (and no dependencies outside of the
standard Python library), and implements the minimum required to be a
conforming hash equivalence server.

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/bin/bitbake-hashserv     |  67 ++++++++++++++
 bitbake/bin/bitbake-selftest     |   2 +
 bitbake/lib/hashserv/__init__.py | 152 +++++++++++++++++++++++++++++++
 bitbake/lib/hashserv/tests.py    | 141 ++++++++++++++++++++++++++++
 4 files changed, 362 insertions(+)
 create mode 100755 bitbake/bin/bitbake-hashserv
 create mode 100644 bitbake/lib/hashserv/__init__.py
 create mode 100644 bitbake/lib/hashserv/tests.py

diff --git a/bitbake/bin/bitbake-hashserv b/bitbake/bin/bitbake-hashserv
new file mode 100755
index 00000000000..c49397b73a5
--- /dev/null
+++ b/bitbake/bin/bitbake-hashserv
@@ -0,0 +1,67 @@
+#! /usr/bin/env python3
+#
+# Copyright (C) 2018 Garmin Ltd.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License version 2 as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License along
+# with this program; if not, write to the Free Software Foundation, Inc.,
+# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+import os
+import sys
+import logging
+import argparse
+import sqlite3
+
+sys.path.insert(0, os.path.join(os.path.dirname(os.path.dirname(__file__)),'lib'))
+
+import hashserv
+
+VERSION = "1.0.0"
+
+DEFAULT_HOST = ''
+DEFAULT_PORT = 8686
+
+def main():
+    parser = argparse.ArgumentParser(description='HTTP Equivalence Reference Server. Version=%s' % VERSION)
+    parser.add_argument('--address', default=DEFAULT_HOST, help='Bind address (default "%(default)s")')
+    parser.add_argument('--port', type=int, default=DEFAULT_PORT, help='Bind port (default %(default)d)')
+    parser.add_argument('--prefix', default='', help='HTTP path prefix (default "%(default)s")')
+    parser.add_argument('--database', default='./hashserv.db', help='Database file (default "%(default)s")')
+    parser.add_argument('--log', default='WARNING', help='Set logging level')
+
+    args = parser.parse_args()
+
+    logger = logging.getLogger('hashserv')
+
+    level = getattr(logging, args.log.upper(), None)
+    if not isinstance(level, int):
+        raise ValueError('Invalid log level: %s' % args.log)
+
+    logger.setLevel(level)
+    console = logging.StreamHandler()
+    console.setLevel(level)
+    logger.addHandler(console)
+
+    db = sqlite3.connect(args.database)
+
+    server = hashserv.create_server((args.address, args.port), db, args.prefix)
+    server.serve_forever()
+    return 0
+
+if __name__ == '__main__':
+    try:
+        ret = main()
+    except Exception:
+        ret = 1
+        import traceback
+        traceback.print_exc()
+    sys.exit(ret)
+
diff --git a/bitbake/bin/bitbake-selftest b/bitbake/bin/bitbake-selftest
index c970dcae90c..99f1af910f4 100755
--- a/bitbake/bin/bitbake-selftest
+++ b/bitbake/bin/bitbake-selftest
@@ -22,6 +22,7 @@ sys.path.insert(0, os.path.join(os.path.dirname(os.path.dirname(__file__)), 'lib
 import unittest
 try:
     import bb
+    import hashserv
     import layerindexlib
 except RuntimeError as exc:
     sys.exit(str(exc))
@@ -35,6 +36,7 @@ tests = ["bb.tests.codeparser",
          "bb.tests.parse",
          "bb.tests.persist_data",
          "bb.tests.utils",
+         "hashserv.tests",
          "layerindexlib.tests.layerindexobj",
          "layerindexlib.tests.restapi",
          "layerindexlib.tests.cooker"]
diff --git a/bitbake/lib/hashserv/__init__.py b/bitbake/lib/hashserv/__init__.py
new file mode 100644
index 00000000000..46bca7cab32
--- /dev/null
+++ b/bitbake/lib/hashserv/__init__.py
@@ -0,0 +1,152 @@
+# Copyright (C) 2018 Garmin Ltd.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License version 2 as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License along
+# with this program; if not, write to the Free Software Foundation, Inc.,
+# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+
+from http.server import BaseHTTPRequestHandler, HTTPServer
+import contextlib
+import urllib.parse
+import sqlite3
+import json
+import traceback
+import logging
+from datetime import datetime
+
+logger = logging.getLogger('hashserv')
+
+class HashEquivalenceServer(BaseHTTPRequestHandler):
+    def log_message(self, f, *args):
+        logger.debug(f, *args)
+
+    def do_GET(self):
+        try:
+            p = urllib.parse.urlparse(self.path)
+
+            if p.path != self.prefix + '/v1/equivalent':
+                self.send_error(404)
+                return
+
+            query = urllib.parse.parse_qs(p.query, strict_parsing=True)
+            method = query['method'][0]
+            taskhash = query['taskhash'][0]
+
+            d = None
+            with contextlib.closing(self.db.cursor()) as cursor:
+                cursor.execute('SELECT taskhash, method, unihash FROM tasks_v1 WHERE method=:method AND taskhash=:taskhash ORDER BY created ASC LIMIT 1',
+                        {'method': method, 'taskhash': taskhash})
+
+                row = cursor.fetchone()
+
+                if row is not None:
+                    logger.debug('Found equivalent task %s', row['taskhash'])
+                    d = {k: row[k] for k in ('taskhash', 'method', 'unihash')}
+
+            self.send_response(200)
+            self.send_header('Content-Type', 'application/json; charset=utf-8')
+            self.end_headers()
+            self.wfile.write(json.dumps(d).encode('utf-8'))
+        except:
+            logger.exception('Error in GET')
+            self.send_error(400, explain=traceback.format_exc())
+            return
+
+    def do_POST(self):
+        try:
+            p = urllib.parse.urlparse(self.path)
+
+            if p.path != self.prefix + '/v1/equivalent':
+                self.send_error(404)
+                return
+
+            length = int(self.headers['content-length'])
+            data = json.loads(self.rfile.read(length).decode('utf-8'))
+
+            with contextlib.closing(self.db.cursor()) as cursor:
+                cursor.execute('''
+                    SELECT taskhash, method, unihash FROM tasks_v1 WHERE method=:method AND outhash=:outhash
+                    ORDER BY CASE WHEN taskhash=:taskhash THEN 1 ELSE 2 END,
+                        created ASC
+                    LIMIT 1
+                    ''', {k: data[k] for k in ('method', 'outhash', 'taskhash')})
+
+                row = cursor.fetchone()
+
+                if row is None or row['taskhash'] != data['taskhash']:
+                    unihash = data['unihash']
+                    if row is not None:
+                        unihash = row['unihash']
+
+                    insert_data = {
+                            'method': data['method'],
+                            'outhash': data['outhash'],
+                            'taskhash': data['taskhash'],
+                            'unihash': unihash,
+                            'created': datetime.now()
+                            }
+
+                    for k in ('owner', 'PN', 'PV', 'PR', 'task', 'outhash_siginfo'):
+                        if k in data:
+                            insert_data[k] = data[k]
+
+                    cursor.execute('''INSERT INTO tasks_v1 (%s) VALUES (%s)''' % (
+                            ', '.join(sorted(insert_data.keys())),
+                            ', '.join(':' + k for k in sorted(insert_data.keys()))),
+                        insert_data)
+
+                    logger.info('Adding taskhash %s with unihash %s', data['taskhash'], unihash)
+                    cursor.execute('SELECT taskhash, method, unihash FROM tasks_v1 WHERE id=:id', {'id': cursor.lastrowid})
+                    row = cursor.fetchone()
+
+                    self.db.commit()
+
+                d = {k: row[k] for k in ('taskhash', 'method', 'unihash')}
+
+                self.send_response(200)
+                self.send_header('Content-Type', 'application/json; charset=utf-8')
+                self.end_headers()
+                self.wfile.write(json.dumps(d).encode('utf-8'))
+        except:
+            logger.exception('Error in POST')
+            self.send_error(400, explain=traceback.format_exc())
+            return
+
+def create_server(addr, db, prefix=''):
+    class Handler(HashEquivalenceServer):
+        pass
+
+    Handler.prefix = prefix
+    Handler.db = db
+    db.row_factory = sqlite3.Row
+
+    with contextlib.closing(db.cursor()) as cursor:
+        cursor.execute('''
+            CREATE TABLE IF NOT EXISTS tasks_v1 (
+                id INTEGER PRIMARY KEY AUTOINCREMENT,
+                method TEXT NOT NULL,
+                outhash TEXT NOT NULL,
+                taskhash TEXT NOT NULL,
+                unihash TEXT NOT NULL,
+                created DATETIME,
+
+                -- Optional fields
+                owner TEXT,
+                PN TEXT,
+                PV TEXT,
+                PR TEXT,
+                task TEXT,
+                outhash_siginfo TEXT
+                )
+            ''')
+
+    logger.info('Starting server on %s', addr)
+    return HTTPServer(addr, Handler)
diff --git a/bitbake/lib/hashserv/tests.py b/bitbake/lib/hashserv/tests.py
new file mode 100644
index 00000000000..806b54c5ebd
--- /dev/null
+++ b/bitbake/lib/hashserv/tests.py
@@ -0,0 +1,141 @@
+#! /usr/bin/env python3
+#
+# Copyright (C) 2018 Garmin Ltd.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License version 2 as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License along
+# with this program; if not, write to the Free Software Foundation, Inc.,
+# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+
+import unittest
+import threading
+import sqlite3
+import hashlib
+import urllib.request
+import json
+from . import create_server
+
+class TestHashEquivalenceServer(unittest.TestCase):
+    def setUp(self):
+        # Start an in memory hash equivalence server in the background bound to
+        # an ephemeral port
+        db = sqlite3.connect(':memory:', check_same_thread=False)
+        self.server = create_server(('localhost', 0), db)
+        self.server_addr = 'http://localhost:%d' % self.server.socket.getsockname()[1]
+        self.server_thread = threading.Thread(target=self.server.serve_forever)
+        self.server_thread.start()
+
+    def tearDown(self):
+        # Shutdown server
+        s = getattr(self, 'server', None)
+        if s is not None:
+            self.server.shutdown()
+            self.server_thread.join()
+            self.server.server_close()
+
+    def send_get(self, path):
+        url = '%s/%s' % (self.server_addr, path)
+        request = urllib.request.Request(url)
+        response = urllib.request.urlopen(request)
+        return json.loads(response.read().decode('utf-8'))
+
+    def send_post(self, path, data):
+        headers = {'content-type': 'application/json'}
+        url = '%s/%s' % (self.server_addr, path)
+        request = urllib.request.Request(url, json.dumps(data).encode('utf-8'), headers)
+        response = urllib.request.urlopen(request)
+        return json.loads(response.read().decode('utf-8'))
+
+    def test_create_hash(self):
+        # Simple test that hashes can be created
+        taskhash = '35788efcb8dfb0a02659d81cf2bfd695fb30faf9'
+        outhash = '2765d4a5884be49b28601445c2760c5f21e7e5c0ee2b7e3fce98fd7e5970796f'
+        unihash = 'f46d3fbb439bd9b921095da657a4de906510d2cd'
+
+        d = self.send_get('v1/equivalent?method=TestMethod&taskhash=%s' % taskhash)
+        self.assertIsNone(d, msg='Found unexpected task, %r' % d)
+
+        d = self.send_post('v1/equivalent', {
+            'taskhash': taskhash,
+            'method': 'TestMethod',
+            'outhash': outhash,
+            'unihash': unihash,
+            })
+        self.assertEqual(d['unihash'], unihash, 'Server returned bad unihash')
+
+    def test_create_equivalent(self):
+        # Tests that a second reported task with the same outhash will be
+        # assigned the same unihash
+        taskhash = '53b8dce672cb6d0c73170be43f540460bfc347b4'
+        outhash = '5a9cb1649625f0bf41fc7791b635cd9c2d7118c7f021ba87dcd03f72b67ce7a8'
+        unihash = 'f37918cc02eb5a520b1aff86faacbc0a38124646'
+        d = self.send_post('v1/equivalent', {
+            'taskhash': taskhash,
+            'method': 'TestMethod',
+            'outhash': outhash,
+            'unihash': unihash,
+            })
+        self.assertEqual(d['unihash'], unihash, 'Server returned bad unihash')
+
+        # Report a different task with the same outhash. The returned unihash
+        # should match the first task
+        taskhash2 = '3bf6f1e89d26205aec90da04854fbdbf73afe6b4'
+        unihash2 = 'af36b199320e611fbb16f1f277d3ee1d619ca58b'
+        d = self.send_post('v1/equivalent', {
+            'taskhash': taskhash2,
+            'method': 'TestMethod',
+            'outhash': outhash,
+            'unihash': unihash2,
+            })
+        self.assertEqual(d['unihash'], unihash, 'Server returned bad unihash')
+
+    def test_duplicate_taskhash(self):
+        # Tests that duplicate reports of the same taskhash with different
+        # outhash & unihash always return the unihash from the first reported
+        # taskhash
+        taskhash = '8aa96fcffb5831b3c2c0cb75f0431e3f8b20554a'
+        outhash = 'afe240a439959ce86f5e322f8c208e1fedefea9e813f2140c81af866cc9edf7e'
+        unihash = '218e57509998197d570e2c98512d0105985dffc9'
+        d = self.send_post('v1/equivalent', {
+            'taskhash': taskhash,
+            'method': 'TestMethod',
+            'outhash': outhash,
+            'unihash': unihash,
+            })
+
+        d = self.send_get('v1/equivalent?method=TestMethod&taskhash=%s' % taskhash)
+        self.assertEqual(d['unihash'], unihash)
+
+        outhash2 = '0904a7fe3dc712d9fd8a74a616ddca2a825a8ee97adf0bd3fc86082c7639914d'
+        unihash2 = 'ae9a7d252735f0dafcdb10e2e02561ca3a47314c'
+        d = self.send_post('v1/equivalent', {
+            'taskhash': taskhash,
+            'method': 'TestMethod',
+            'outhash': outhash2,
+            'unihash': unihash2
+            })
+
+        d = self.send_get('v1/equivalent?method=TestMethod&taskhash=%s' % taskhash)
+        self.assertEqual(d['unihash'], unihash)
+
+        outhash3 = '77623a549b5b1a31e3732dfa8fe61d7ce5d44b3370f253c5360e136b852967b4'
+        unihash3 = '9217a7d6398518e5dc002ed58f2cbbbc78696603'
+        d = self.send_post('v1/equivalent', {
+            'taskhash': taskhash,
+            'method': 'TestMethod',
+            'outhash': outhash3,
+            'unihash': unihash3
+            })
+
+        d = self.send_get('v1/equivalent?method=TestMethod&taskhash=%s' % taskhash)
+        self.assertEqual(d['unihash'], unihash)
+
+
-- 
2.19.2



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [PATCH v4 09/10] bitbake: hashserv: Add hash equivalence reference server
@ 2018-12-18 15:31         ` Joshua Watt
  0 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-18 15:31 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

Implements a reference implementation of the hash equivalence server.
This server has minimal dependencies (and no dependencies outside of the
standard Python library), and implements the minimum required to be a
conforming hash equivalence server.

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/bin/bitbake-hashserv     |  67 ++++++++++++++
 bitbake/bin/bitbake-selftest     |   2 +
 bitbake/lib/hashserv/__init__.py | 152 +++++++++++++++++++++++++++++++
 bitbake/lib/hashserv/tests.py    | 141 ++++++++++++++++++++++++++++
 4 files changed, 362 insertions(+)
 create mode 100755 bitbake/bin/bitbake-hashserv
 create mode 100644 bitbake/lib/hashserv/__init__.py
 create mode 100644 bitbake/lib/hashserv/tests.py

diff --git a/bitbake/bin/bitbake-hashserv b/bitbake/bin/bitbake-hashserv
new file mode 100755
index 00000000000..c49397b73a5
--- /dev/null
+++ b/bitbake/bin/bitbake-hashserv
@@ -0,0 +1,67 @@
+#! /usr/bin/env python3
+#
+# Copyright (C) 2018 Garmin Ltd.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License version 2 as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License along
+# with this program; if not, write to the Free Software Foundation, Inc.,
+# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+import os
+import sys
+import logging
+import argparse
+import sqlite3
+
+sys.path.insert(0, os.path.join(os.path.dirname(os.path.dirname(__file__)),'lib'))
+
+import hashserv
+
+VERSION = "1.0.0"
+
+DEFAULT_HOST = ''
+DEFAULT_PORT = 8686
+
+def main():
+    parser = argparse.ArgumentParser(description='HTTP Equivalence Reference Server. Version=%s' % VERSION)
+    parser.add_argument('--address', default=DEFAULT_HOST, help='Bind address (default "%(default)s")')
+    parser.add_argument('--port', type=int, default=DEFAULT_PORT, help='Bind port (default %(default)d)')
+    parser.add_argument('--prefix', default='', help='HTTP path prefix (default "%(default)s")')
+    parser.add_argument('--database', default='./hashserv.db', help='Database file (default "%(default)s")')
+    parser.add_argument('--log', default='WARNING', help='Set logging level')
+
+    args = parser.parse_args()
+
+    logger = logging.getLogger('hashserv')
+
+    level = getattr(logging, args.log.upper(), None)
+    if not isinstance(level, int):
+        raise ValueError('Invalid log level: %s' % args.log)
+
+    logger.setLevel(level)
+    console = logging.StreamHandler()
+    console.setLevel(level)
+    logger.addHandler(console)
+
+    db = sqlite3.connect(args.database)
+
+    server = hashserv.create_server((args.address, args.port), db, args.prefix)
+    server.serve_forever()
+    return 0
+
+if __name__ == '__main__':
+    try:
+        ret = main()
+    except Exception:
+        ret = 1
+        import traceback
+        traceback.print_exc()
+    sys.exit(ret)
+
diff --git a/bitbake/bin/bitbake-selftest b/bitbake/bin/bitbake-selftest
index c970dcae90c..99f1af910f4 100755
--- a/bitbake/bin/bitbake-selftest
+++ b/bitbake/bin/bitbake-selftest
@@ -22,6 +22,7 @@ sys.path.insert(0, os.path.join(os.path.dirname(os.path.dirname(__file__)), 'lib
 import unittest
 try:
     import bb
+    import hashserv
     import layerindexlib
 except RuntimeError as exc:
     sys.exit(str(exc))
@@ -35,6 +36,7 @@ tests = ["bb.tests.codeparser",
          "bb.tests.parse",
          "bb.tests.persist_data",
          "bb.tests.utils",
+         "hashserv.tests",
          "layerindexlib.tests.layerindexobj",
          "layerindexlib.tests.restapi",
          "layerindexlib.tests.cooker"]
diff --git a/bitbake/lib/hashserv/__init__.py b/bitbake/lib/hashserv/__init__.py
new file mode 100644
index 00000000000..46bca7cab32
--- /dev/null
+++ b/bitbake/lib/hashserv/__init__.py
@@ -0,0 +1,152 @@
+# Copyright (C) 2018 Garmin Ltd.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License version 2 as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License along
+# with this program; if not, write to the Free Software Foundation, Inc.,
+# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+
+from http.server import BaseHTTPRequestHandler, HTTPServer
+import contextlib
+import urllib.parse
+import sqlite3
+import json
+import traceback
+import logging
+from datetime import datetime
+
+logger = logging.getLogger('hashserv')
+
+class HashEquivalenceServer(BaseHTTPRequestHandler):
+    def log_message(self, f, *args):
+        logger.debug(f, *args)
+
+    def do_GET(self):
+        try:
+            p = urllib.parse.urlparse(self.path)
+
+            if p.path != self.prefix + '/v1/equivalent':
+                self.send_error(404)
+                return
+
+            query = urllib.parse.parse_qs(p.query, strict_parsing=True)
+            method = query['method'][0]
+            taskhash = query['taskhash'][0]
+
+            d = None
+            with contextlib.closing(self.db.cursor()) as cursor:
+                cursor.execute('SELECT taskhash, method, unihash FROM tasks_v1 WHERE method=:method AND taskhash=:taskhash ORDER BY created ASC LIMIT 1',
+                        {'method': method, 'taskhash': taskhash})
+
+                row = cursor.fetchone()
+
+                if row is not None:
+                    logger.debug('Found equivalent task %s', row['taskhash'])
+                    d = {k: row[k] for k in ('taskhash', 'method', 'unihash')}
+
+            self.send_response(200)
+            self.send_header('Content-Type', 'application/json; charset=utf-8')
+            self.end_headers()
+            self.wfile.write(json.dumps(d).encode('utf-8'))
+        except:
+            logger.exception('Error in GET')
+            self.send_error(400, explain=traceback.format_exc())
+            return
+
+    def do_POST(self):
+        try:
+            p = urllib.parse.urlparse(self.path)
+
+            if p.path != self.prefix + '/v1/equivalent':
+                self.send_error(404)
+                return
+
+            length = int(self.headers['content-length'])
+            data = json.loads(self.rfile.read(length).decode('utf-8'))
+
+            with contextlib.closing(self.db.cursor()) as cursor:
+                cursor.execute('''
+                    SELECT taskhash, method, unihash FROM tasks_v1 WHERE method=:method AND outhash=:outhash
+                    ORDER BY CASE WHEN taskhash=:taskhash THEN 1 ELSE 2 END,
+                        created ASC
+                    LIMIT 1
+                    ''', {k: data[k] for k in ('method', 'outhash', 'taskhash')})
+
+                row = cursor.fetchone()
+
+                if row is None or row['taskhash'] != data['taskhash']:
+                    unihash = data['unihash']
+                    if row is not None:
+                        unihash = row['unihash']
+
+                    insert_data = {
+                            'method': data['method'],
+                            'outhash': data['outhash'],
+                            'taskhash': data['taskhash'],
+                            'unihash': unihash,
+                            'created': datetime.now()
+                            }
+
+                    for k in ('owner', 'PN', 'PV', 'PR', 'task', 'outhash_siginfo'):
+                        if k in data:
+                            insert_data[k] = data[k]
+
+                    cursor.execute('''INSERT INTO tasks_v1 (%s) VALUES (%s)''' % (
+                            ', '.join(sorted(insert_data.keys())),
+                            ', '.join(':' + k for k in sorted(insert_data.keys()))),
+                        insert_data)
+
+                    logger.info('Adding taskhash %s with unihash %s', data['taskhash'], unihash)
+                    cursor.execute('SELECT taskhash, method, unihash FROM tasks_v1 WHERE id=:id', {'id': cursor.lastrowid})
+                    row = cursor.fetchone()
+
+                    self.db.commit()
+
+                d = {k: row[k] for k in ('taskhash', 'method', 'unihash')}
+
+                self.send_response(200)
+                self.send_header('Content-Type', 'application/json; charset=utf-8')
+                self.end_headers()
+                self.wfile.write(json.dumps(d).encode('utf-8'))
+        except:
+            logger.exception('Error in POST')
+            self.send_error(400, explain=traceback.format_exc())
+            return
+
+def create_server(addr, db, prefix=''):
+    class Handler(HashEquivalenceServer):
+        pass
+
+    Handler.prefix = prefix
+    Handler.db = db
+    db.row_factory = sqlite3.Row
+
+    with contextlib.closing(db.cursor()) as cursor:
+        cursor.execute('''
+            CREATE TABLE IF NOT EXISTS tasks_v1 (
+                id INTEGER PRIMARY KEY AUTOINCREMENT,
+                method TEXT NOT NULL,
+                outhash TEXT NOT NULL,
+                taskhash TEXT NOT NULL,
+                unihash TEXT NOT NULL,
+                created DATETIME,
+
+                -- Optional fields
+                owner TEXT,
+                PN TEXT,
+                PV TEXT,
+                PR TEXT,
+                task TEXT,
+                outhash_siginfo TEXT
+                )
+            ''')
+
+    logger.info('Starting server on %s', addr)
+    return HTTPServer(addr, Handler)
diff --git a/bitbake/lib/hashserv/tests.py b/bitbake/lib/hashserv/tests.py
new file mode 100644
index 00000000000..806b54c5ebd
--- /dev/null
+++ b/bitbake/lib/hashserv/tests.py
@@ -0,0 +1,141 @@
+#! /usr/bin/env python3
+#
+# Copyright (C) 2018 Garmin Ltd.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License version 2 as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License along
+# with this program; if not, write to the Free Software Foundation, Inc.,
+# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+
+import unittest
+import threading
+import sqlite3
+import hashlib
+import urllib.request
+import json
+from . import create_server
+
+class TestHashEquivalenceServer(unittest.TestCase):
+    def setUp(self):
+        # Start an in memory hash equivalence server in the background bound to
+        # an ephemeral port
+        db = sqlite3.connect(':memory:', check_same_thread=False)
+        self.server = create_server(('localhost', 0), db)
+        self.server_addr = 'http://localhost:%d' % self.server.socket.getsockname()[1]
+        self.server_thread = threading.Thread(target=self.server.serve_forever)
+        self.server_thread.start()
+
+    def tearDown(self):
+        # Shutdown server
+        s = getattr(self, 'server', None)
+        if s is not None:
+            self.server.shutdown()
+            self.server_thread.join()
+            self.server.server_close()
+
+    def send_get(self, path):
+        url = '%s/%s' % (self.server_addr, path)
+        request = urllib.request.Request(url)
+        response = urllib.request.urlopen(request)
+        return json.loads(response.read().decode('utf-8'))
+
+    def send_post(self, path, data):
+        headers = {'content-type': 'application/json'}
+        url = '%s/%s' % (self.server_addr, path)
+        request = urllib.request.Request(url, json.dumps(data).encode('utf-8'), headers)
+        response = urllib.request.urlopen(request)
+        return json.loads(response.read().decode('utf-8'))
+
+    def test_create_hash(self):
+        # Simple test that hashes can be created
+        taskhash = '35788efcb8dfb0a02659d81cf2bfd695fb30faf9'
+        outhash = '2765d4a5884be49b28601445c2760c5f21e7e5c0ee2b7e3fce98fd7e5970796f'
+        unihash = 'f46d3fbb439bd9b921095da657a4de906510d2cd'
+
+        d = self.send_get('v1/equivalent?method=TestMethod&taskhash=%s' % taskhash)
+        self.assertIsNone(d, msg='Found unexpected task, %r' % d)
+
+        d = self.send_post('v1/equivalent', {
+            'taskhash': taskhash,
+            'method': 'TestMethod',
+            'outhash': outhash,
+            'unihash': unihash,
+            })
+        self.assertEqual(d['unihash'], unihash, 'Server returned bad unihash')
+
+    def test_create_equivalent(self):
+        # Tests that a second reported task with the same outhash will be
+        # assigned the same unihash
+        taskhash = '53b8dce672cb6d0c73170be43f540460bfc347b4'
+        outhash = '5a9cb1649625f0bf41fc7791b635cd9c2d7118c7f021ba87dcd03f72b67ce7a8'
+        unihash = 'f37918cc02eb5a520b1aff86faacbc0a38124646'
+        d = self.send_post('v1/equivalent', {
+            'taskhash': taskhash,
+            'method': 'TestMethod',
+            'outhash': outhash,
+            'unihash': unihash,
+            })
+        self.assertEqual(d['unihash'], unihash, 'Server returned bad unihash')
+
+        # Report a different task with the same outhash. The returned unihash
+        # should match the first task
+        taskhash2 = '3bf6f1e89d26205aec90da04854fbdbf73afe6b4'
+        unihash2 = 'af36b199320e611fbb16f1f277d3ee1d619ca58b'
+        d = self.send_post('v1/equivalent', {
+            'taskhash': taskhash2,
+            'method': 'TestMethod',
+            'outhash': outhash,
+            'unihash': unihash2,
+            })
+        self.assertEqual(d['unihash'], unihash, 'Server returned bad unihash')
+
+    def test_duplicate_taskhash(self):
+        # Tests that duplicate reports of the same taskhash with different
+        # outhash & unihash always return the unihash from the first reported
+        # taskhash
+        taskhash = '8aa96fcffb5831b3c2c0cb75f0431e3f8b20554a'
+        outhash = 'afe240a439959ce86f5e322f8c208e1fedefea9e813f2140c81af866cc9edf7e'
+        unihash = '218e57509998197d570e2c98512d0105985dffc9'
+        d = self.send_post('v1/equivalent', {
+            'taskhash': taskhash,
+            'method': 'TestMethod',
+            'outhash': outhash,
+            'unihash': unihash,
+            })
+
+        d = self.send_get('v1/equivalent?method=TestMethod&taskhash=%s' % taskhash)
+        self.assertEqual(d['unihash'], unihash)
+
+        outhash2 = '0904a7fe3dc712d9fd8a74a616ddca2a825a8ee97adf0bd3fc86082c7639914d'
+        unihash2 = 'ae9a7d252735f0dafcdb10e2e02561ca3a47314c'
+        d = self.send_post('v1/equivalent', {
+            'taskhash': taskhash,
+            'method': 'TestMethod',
+            'outhash': outhash2,
+            'unihash': unihash2
+            })
+
+        d = self.send_get('v1/equivalent?method=TestMethod&taskhash=%s' % taskhash)
+        self.assertEqual(d['unihash'], unihash)
+
+        outhash3 = '77623a549b5b1a31e3732dfa8fe61d7ce5d44b3370f253c5360e136b852967b4'
+        unihash3 = '9217a7d6398518e5dc002ed58f2cbbbc78696603'
+        d = self.send_post('v1/equivalent', {
+            'taskhash': taskhash,
+            'method': 'TestMethod',
+            'outhash': outhash3,
+            'unihash': unihash3
+            })
+
+        d = self.send_get('v1/equivalent?method=TestMethod&taskhash=%s' % taskhash)
+        self.assertEqual(d['unihash'], unihash)
+
+
-- 
2.19.2



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [OE-core][PATCH v4 10/10] sstate: Implement hash equivalence sstate
  2018-12-18 15:30       ` [PATCH " Joshua Watt
@ 2018-12-18 15:31         ` Joshua Watt
  -1 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-18 15:31 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

Converts sstate so that it can use a hash equivalence server to
determine if a task really needs to be rebuilt, or if it can be restored
from a different (equivalent) sstate object.

The unique hashes are cached persistently using persist_data. This has
a number of advantages:
 1) Unique hashes can be cached between invocations of bitbake to
    prevent needing to contact the server every time (which is slow)
 2) The value of each tasks unique hash can easily be synchronized
    between different threads, which will be useful if bitbake is
    updated to do on the fly task re-hashing.

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 meta/classes/sstate.bbclass | 100 +++++++++++++++++++--
 meta/conf/bitbake.conf      |   4 +-
 meta/lib/oe/sstatesig.py    | 167 ++++++++++++++++++++++++++++++++++++
 3 files changed, 262 insertions(+), 9 deletions(-)

diff --git a/meta/classes/sstate.bbclass b/meta/classes/sstate.bbclass
index 41a2f9b7b77..8ffefd68344 100644
--- a/meta/classes/sstate.bbclass
+++ b/meta/classes/sstate.bbclass
@@ -11,7 +11,7 @@ def generate_sstatefn(spec, hash, d):
 SSTATE_PKGARCH    = "${PACKAGE_ARCH}"
 SSTATE_PKGSPEC    = "sstate:${PN}:${PACKAGE_ARCH}${TARGET_VENDOR}-${TARGET_OS}:${PV}:${PR}:${SSTATE_PKGARCH}:${SSTATE_VERSION}:"
 SSTATE_SWSPEC     = "sstate:${PN}::${PV}:${PR}::${SSTATE_VERSION}:"
-SSTATE_PKGNAME    = "${SSTATE_EXTRAPATH}${@generate_sstatefn(d.getVar('SSTATE_PKGSPEC'), d.getVar('BB_TASKHASH'), d)}"
+SSTATE_PKGNAME    = "${SSTATE_EXTRAPATH}${@generate_sstatefn(d.getVar('SSTATE_PKGSPEC'), d.getVar('BB_UNIHASH'), d)}"
 SSTATE_PKG        = "${SSTATE_DIR}/${SSTATE_PKGNAME}"
 SSTATE_EXTRAPATH   = ""
 SSTATE_EXTRAPATHWILDCARD = ""
@@ -82,6 +82,23 @@ SSTATE_SIG_PASSPHRASE ?= ""
 # Whether to verify the GnUPG signatures when extracting sstate archives
 SSTATE_VERIFY_SIG ?= "0"
 
+SSTATE_HASHEQUIV_METHOD ?= "OEOuthashBasic"
+SSTATE_HASHEQUIV_METHOD[doc] = "The function used to calculate the output hash \
+    for a task, which in turn is used to determine equivalency. \
+    "
+
+SSTATE_HASHEQUIV_SERVER ?= ""
+SSTATE_HASHEQUIV_SERVER[doc] = "The hash equivalence sever. For example, \
+    'http://192.168.0.1:5000'. Do not include a trailing slash \
+    "
+
+SSTATE_HASHEQUIV_REPORT_TASKDATA ?= "0"
+SSTATE_HASHEQUIV_REPORT_TASKDATA[doc] = "Report additional useful data to the \
+    hash equivalency server, such as PN, PV, taskname, etc. This information \
+    is very useful for developers looking at task data, but may leak sensitive \
+    data if the equivalence server is public. \
+    "
+
 python () {
     if bb.data.inherits_class('native', d):
         d.setVar('SSTATE_PKGARCH', d.getVar('BUILD_ARCH', False))
@@ -634,7 +651,7 @@ def sstate_package(ss, d):
         return
 
     for f in (d.getVar('SSTATECREATEFUNCS') or '').split() + \
-             ['sstate_create_package', 'sstate_sign_package'] + \
+             ['sstate_report_unihash', 'sstate_create_package', 'sstate_sign_package'] + \
              (d.getVar('SSTATEPOSTCREATEFUNCS') or '').split():
         # All hooks should run in SSTATE_BUILDDIR.
         bb.build.exec_func(f, d, (sstatebuild,))
@@ -758,6 +775,73 @@ python sstate_sign_package () {
                            d.getVar('SSTATE_SIG_PASSPHRASE'), armor=False)
 }
 
+def OEOuthashBasic(path, sigfile, task, d):
+    import hashlib
+    import stat
+
+    def update_hash(s):
+        s = s.encode('utf-8')
+        h.update(s)
+        if sigfile:
+            sigfile.write(s)
+
+    h = hashlib.sha256()
+    prev_dir = os.getcwd()
+
+    try:
+        os.chdir(path)
+
+        update_hash("OEOuthashBasic\n")
+
+        # It is only currently useful to get equivalent hashes for things that
+        # can be restored from sstate. Since the sstate object is named using
+        # SSTATE_PKGSPEC and the task name, those should be included in the
+        # output hash calculation.
+        update_hash("SSTATE_PKGSPEC=%s\n" % d.getVar('SSTATE_PKGSPEC'))
+        update_hash("task=%s\n" % task)
+
+        for root, dirs, files in os.walk('.', topdown=True):
+            # Sort directories and files to ensure consistent ordering
+            dirs.sort()
+            files.sort()
+
+            for f in files:
+                path = os.path.join(root, f)
+                s = os.lstat(path)
+
+                # Hash file path
+                update_hash(path + '\n')
+
+                # Hash file mode
+                update_hash("\tmode=0x%x\n" % stat.S_IMODE(s.st_mode))
+                update_hash("\ttype=0x%x\n" % stat.S_IFMT(s.st_mode))
+
+                if stat.S_ISBLK(s.st_mode) or stat.S_ISBLK(s.st_mode):
+                    # Hash device major and minor
+                    update_hash("\tdev=%d,%d\n" % (os.major(s.st_rdev), os.minor(s.st_rdev)))
+                elif stat.S_ISLNK(s.st_mode):
+                    # Hash symbolic link
+                    update_hash("\tsymlink=%s\n" % os.readlink(path))
+                else:
+                    fh = hashlib.sha256()
+                    # Hash file contents
+                    with open(path, 'rb') as d:
+                        for chunk in iter(lambda: d.read(4096), b""):
+                            fh.update(chunk)
+                    update_hash("\tdigest=%s\n" % fh.hexdigest())
+    finally:
+        os.chdir(prev_dir)
+
+    return h.hexdigest()
+
+python sstate_report_unihash() {
+    report_unihash = getattr(bb.parse.siggen, 'report_unihash', None)
+
+    if report_unihash:
+        ss = sstate_state_fromvars(d)
+        report_unihash(os.getcwd(), ss['task'], d)
+}
+
 #
 # Shell function to decompress and prepare a package for installation
 # Will be run from within SSTATE_INSTDIR.
@@ -804,7 +888,7 @@ def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False, *,
 
         spec, extrapath, tname = getpathcomponents(task, d)
 
-        sstatefile = d.expand("${SSTATE_DIR}/" + extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + extension)
+        sstatefile = d.expand("${SSTATE_DIR}/" + extrapath + generate_sstatefn(spec, sq_unihash[task], d) + "_" + tname + extension)
 
         if os.path.exists(sstatefile):
             bb.debug(2, "SState: Found valid sstate file %s" % sstatefile)
@@ -866,7 +950,7 @@ def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False, *,
             if task in ret:
                 continue
             spec, extrapath, tname = getpathcomponents(task, d)
-            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + extension)
+            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_unihash[task], d) + "_" + tname + extension)
             tasklist.append((task, sstatefile))
 
         if tasklist:
@@ -892,12 +976,12 @@ def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False, *,
         evdata = {'missed': [], 'found': []};
         for task in missed:
             spec, extrapath, tname = getpathcomponents(task, d)
-            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + ".tgz")
-            evdata['missed'].append( (sq_fn[task], sq_task[task], sq_hash[task], sstatefile ) )
+            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_unihash[task], d) + "_" + tname + ".tgz")
+            evdata['missed'].append( (sq_fn[task], sq_task[task], sq_unihash[task], sstatefile ) )
         for task in ret:
             spec, extrapath, tname = getpathcomponents(task, d)
-            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + ".tgz")
-            evdata['found'].append( (sq_fn[task], sq_task[task], sq_hash[task], sstatefile ) )
+            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_unihash[task], d) + "_" + tname + ".tgz")
+            evdata['found'].append( (sq_fn[task], sq_task[task], sq_unihash[task], sstatefile ) )
         bb.event.fire(bb.event.MetadataEvent("MissedSstate", evdata), d)
 
     # Print some summary statistics about the current task completion and how much sstate
diff --git a/meta/conf/bitbake.conf b/meta/conf/bitbake.conf
index 64800623545..e64ce6a6dab 100644
--- a/meta/conf/bitbake.conf
+++ b/meta/conf/bitbake.conf
@@ -867,7 +867,9 @@ BB_HASHBASE_WHITELIST ?= "TMPDIR FILE PATH PWD BB_TASKHASH BBPATH BBSERVER DL_DI
     STAMPS_DIR PRSERV_DUMPDIR PRSERV_DUMPFILE PRSERV_LOCKDOWN PARALLEL_MAKE \
     CCACHE_DIR EXTERNAL_TOOLCHAIN CCACHE CCACHE_NOHASHDIR LICENSE_PATH SDKPKGSUFFIX \
     WARN_QA ERROR_QA WORKDIR STAMPCLEAN PKGDATA_DIR BUILD_ARCH SSTATE_PKGARCH \
-    BB_WORKERCONTEXT BB_LIMITEDDEPS extend_recipe_sysroot DEPLOY_DIR"
+    BB_WORKERCONTEXT BB_LIMITEDDEPS BB_UNIHASH extend_recipe_sysroot DEPLOY_DIR \
+    SSTATE_HASHEQUIV_METHOD SSTATE_HASHEQUIV_SERVER SSTATE_HASHEQUIV_REPORT_TASKDATA \
+    SSTATE_HASHEQUIV_OWNER"
 BB_HASHCONFIG_WHITELIST ?= "${BB_HASHBASE_WHITELIST} DATE TIME SSH_AGENT_PID \
     SSH_AUTH_SOCK PSEUDO_BUILD BB_ENV_EXTRAWHITE DISABLE_SANITY_CHECKS \
     PARALLEL_MAKE BB_NUMBER_THREADS BB_ORIGENV BB_INVALIDCONF BBINCLUDED \
diff --git a/meta/lib/oe/sstatesig.py b/meta/lib/oe/sstatesig.py
index 18c5a353a2a..503f2452807 100644
--- a/meta/lib/oe/sstatesig.py
+++ b/meta/lib/oe/sstatesig.py
@@ -263,10 +263,177 @@ class SignatureGeneratorOEBasicHash(bb.siggen.SignatureGeneratorBasicHash):
         if error_msgs:
             bb.fatal("\n".join(error_msgs))
 
+class SignatureGeneratorOEEquivHash(SignatureGeneratorOEBasicHash):
+    name = "OEEquivHash"
+
+    def init_rundepcheck(self, data):
+        super().init_rundepcheck(data)
+        self.server = data.getVar('SSTATE_HASHEQUIV_SERVER')
+        self.method = data.getVar('SSTATE_HASHEQUIV_METHOD')
+        self.unihashes = bb.persist_data.persist('SSTATESIG_UNIHASH_CACHE_v1_' + self.method, data)
+
+    def get_taskdata(self):
+        return (self.server, self.method) + super().get_taskdata()
+
+    def set_taskdata(self, data):
+        self.server, self.method = data[:2]
+        super().set_taskdata(data[2:])
+
+    def __get_task_unihash_key(self, task):
+        # TODO: The key only *needs* to be the taskhash, the task is just
+        # convenient
+        return '%s:%s' % (task, self.taskhash[task])
+
+    def get_stampfile_hash(self, task):
+        if task in self.taskhash:
+            # If a unique hash is reported, use it as the stampfile hash. This
+            # ensures that if a task won't be re-run if the taskhash changes,
+            # but it would result in the same output hash
+            unihash = self.unihashes.get(self.__get_task_unihash_key(task))
+            if unihash is not None:
+                return unihash
+
+        return super().get_stampfile_hash(task)
+
+    def get_unihash(self, task):
+        import urllib
+        import json
+
+        taskhash = self.taskhash[task]
+
+        key = self.__get_task_unihash_key(task)
+
+        # TODO: This cache can grow unbounded. It probably only needs to keep
+        # for each task
+        unihash = self.taskhash.get(key)
+        if unihash is not None:
+            return unihash
+
+        # In the absence of being able to discover a unique hash from the
+        # server, make it be equivalent to the taskhash. The unique "hash" only
+        # really needs to be a unique string (not even necessarily a hash), but
+        # making it match the taskhash has a few advantages:
+        #
+        # 1) All of the sstate code that assumes hashes can be the same
+        # 2) It provides maximal compatibility with builders that don't use
+        #    an equivalency server
+        # 3) The value is easy for multiple independent builders to derive the
+        #    same unique hash from the same input. This means that if the
+        #    independent builders find the same taskhash, but it isn't reported
+        #    to the server, there is a better chance that they will agree on
+        #    the unique hash.
+        unihash = taskhash
+
+        try:
+            url = '%s/v1/equivalent?%s' % (self.server,
+                    urllib.parse.urlencode({'method': self.method, 'taskhash': self.taskhash[task]}))
+
+            request = urllib.request.Request(url)
+            response = urllib.request.urlopen(request)
+            data = response.read().decode('utf-8')
+
+            json_data = json.loads(data)
+
+            if json_data:
+                unihash = json_data['unihash']
+                # A unique hash equal to the taskhash is not very interesting,
+                # so it is reported it at debug level 2. If they differ, that
+                # is much more interesting, so it is reported at debug level 1
+                bb.debug((1, 2)[unihash == taskhash], 'Found unihash %s in place of %s for %s from %s' % (unihash, taskhash, task, self.server))
+            else:
+                bb.debug(2, 'No reported unihash for %s:%s from %s' % (task, taskhash, self.server))
+        except urllib.error.URLError as e:
+            bb.warn('Failure contacting Hash Equivalence Server %s: %s' % (self.server, str(e)))
+        except (KeyError, json.JSONDecodeError) as e:
+            bb.warn('Poorly formatted response from %s: %s' % (self.server, str(e)))
+
+        self.taskhash[key] = unihash
+        return unihash
+
+    def report_unihash(self, path, task, d):
+        import urllib
+        import json
+        import tempfile
+        import base64
+
+        taskhash = d.getVar('BB_TASKHASH')
+        unihash = d.getVar('BB_UNIHASH')
+        report_taskdata = d.getVar('SSTATE_HASHEQUIV_REPORT_TASKDATA') == '1'
+        tempdir = d.getVar('T')
+        fn = d.getVar('BB_FILENAME')
+        key = fn + '.do_' + task + ':' + taskhash
+
+        # Sanity checks
+        cache_unihash = self.taskhash.get(key)
+        if cache_unihash is None:
+            bb.fatal('%s not in unihash cache. Please report this error' % key)
+
+        if cache_unihash != unihash:
+            bb.fatal("Cache unihash %s doesn't match BB_UNIHASH %s" % (cache_unihash, unihash))
+
+        sigfile = None
+        sigfile_name = "depsig.do_%s.%d" % (task, os.getpid())
+        sigfile_link = "depsig.do_%s" % task
+
+        try:
+            call = self.method + '(path, sigfile, task, d)'
+            sigfile = open(os.path.join(tempdir, sigfile_name), 'w+b')
+            locs = {'path': path, 'sigfile': sigfile, 'task': task, 'd': d}
+
+            outhash = bb.utils.better_eval(call, locs)
+
+            try:
+                url = '%s/v1/equivalent' % self.server
+                task_data = {
+                    'taskhash': taskhash,
+                    'method': self.method,
+                    'outhash': outhash,
+                    'unihash': unihash,
+                    'owner': d.getVar('SSTATE_HASHEQUIV_OWNER')
+                    }
+
+                if report_taskdata:
+                    sigfile.seek(0)
+
+                    task_data['PN'] = d.getVar('PN')
+                    task_data['PV'] = d.getVar('PV')
+                    task_data['PR'] = d.getVar('PR')
+                    task_data['task'] = task
+                    task_data['outhash_siginfo'] = sigfile.read().decode('utf-8')
+
+                headers = {'content-type': 'application/json'}
+
+                request = urllib.request.Request(url, json.dumps(task_data).encode('utf-8'), headers)
+                response = urllib.request.urlopen(request)
+                data = response.read().decode('utf-8')
+
+                json_data = json.loads(data)
+                new_unihash = json_data['unihash']
+
+                if new_unihash != unihash:
+                    bb.debug(1, 'Task %s unihash changed %s -> %s by server %s' % (taskhash, unihash, new_unihash, self.server))
+                else:
+                    bb.debug(1, 'Reported task %s as unihash %s to %s' % (taskhash, unihash, self.server))
+            except urllib.error.URLError as e:
+                bb.warn('Failure contacting Hash Equivalence Server %s: %s' % (self.server, str(e)))
+            except (KeyError, json.JSONDecodeError) as e:
+                bb.warn('Poorly formatted response from %s: %s' % (self.server, str(e)))
+        finally:
+            if sigfile:
+                sigfile.close()
+
+                sigfile_link_path = os.path.join(tempdir, sigfile_link)
+                bb.utils.remove(sigfile_link_path)
+
+                try:
+                    os.symlink(sigfile_name, sigfile_link_path)
+                except OSError:
+                    pass
 
 # Insert these classes into siggen's namespace so it can see and select them
 bb.siggen.SignatureGeneratorOEBasic = SignatureGeneratorOEBasic
 bb.siggen.SignatureGeneratorOEBasicHash = SignatureGeneratorOEBasicHash
+bb.siggen.SignatureGeneratorOEEquivHash = SignatureGeneratorOEEquivHash
 
 
 def find_siginfo(pn, taskname, taskhashlist, d):
-- 
2.19.2



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [PATCH v4 10/10] sstate: Implement hash equivalence sstate
@ 2018-12-18 15:31         ` Joshua Watt
  0 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-18 15:31 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

Converts sstate so that it can use a hash equivalence server to
determine if a task really needs to be rebuilt, or if it can be restored
from a different (equivalent) sstate object.

The unique hashes are cached persistently using persist_data. This has
a number of advantages:
 1) Unique hashes can be cached between invocations of bitbake to
    prevent needing to contact the server every time (which is slow)
 2) The value of each tasks unique hash can easily be synchronized
    between different threads, which will be useful if bitbake is
    updated to do on the fly task re-hashing.

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 meta/classes/sstate.bbclass | 100 +++++++++++++++++++--
 meta/conf/bitbake.conf      |   4 +-
 meta/lib/oe/sstatesig.py    | 167 ++++++++++++++++++++++++++++++++++++
 3 files changed, 262 insertions(+), 9 deletions(-)

diff --git a/meta/classes/sstate.bbclass b/meta/classes/sstate.bbclass
index 41a2f9b7b77..8ffefd68344 100644
--- a/meta/classes/sstate.bbclass
+++ b/meta/classes/sstate.bbclass
@@ -11,7 +11,7 @@ def generate_sstatefn(spec, hash, d):
 SSTATE_PKGARCH    = "${PACKAGE_ARCH}"
 SSTATE_PKGSPEC    = "sstate:${PN}:${PACKAGE_ARCH}${TARGET_VENDOR}-${TARGET_OS}:${PV}:${PR}:${SSTATE_PKGARCH}:${SSTATE_VERSION}:"
 SSTATE_SWSPEC     = "sstate:${PN}::${PV}:${PR}::${SSTATE_VERSION}:"
-SSTATE_PKGNAME    = "${SSTATE_EXTRAPATH}${@generate_sstatefn(d.getVar('SSTATE_PKGSPEC'), d.getVar('BB_TASKHASH'), d)}"
+SSTATE_PKGNAME    = "${SSTATE_EXTRAPATH}${@generate_sstatefn(d.getVar('SSTATE_PKGSPEC'), d.getVar('BB_UNIHASH'), d)}"
 SSTATE_PKG        = "${SSTATE_DIR}/${SSTATE_PKGNAME}"
 SSTATE_EXTRAPATH   = ""
 SSTATE_EXTRAPATHWILDCARD = ""
@@ -82,6 +82,23 @@ SSTATE_SIG_PASSPHRASE ?= ""
 # Whether to verify the GnUPG signatures when extracting sstate archives
 SSTATE_VERIFY_SIG ?= "0"
 
+SSTATE_HASHEQUIV_METHOD ?= "OEOuthashBasic"
+SSTATE_HASHEQUIV_METHOD[doc] = "The function used to calculate the output hash \
+    for a task, which in turn is used to determine equivalency. \
+    "
+
+SSTATE_HASHEQUIV_SERVER ?= ""
+SSTATE_HASHEQUIV_SERVER[doc] = "The hash equivalence sever. For example, \
+    'http://192.168.0.1:5000'. Do not include a trailing slash \
+    "
+
+SSTATE_HASHEQUIV_REPORT_TASKDATA ?= "0"
+SSTATE_HASHEQUIV_REPORT_TASKDATA[doc] = "Report additional useful data to the \
+    hash equivalency server, such as PN, PV, taskname, etc. This information \
+    is very useful for developers looking at task data, but may leak sensitive \
+    data if the equivalence server is public. \
+    "
+
 python () {
     if bb.data.inherits_class('native', d):
         d.setVar('SSTATE_PKGARCH', d.getVar('BUILD_ARCH', False))
@@ -634,7 +651,7 @@ def sstate_package(ss, d):
         return
 
     for f in (d.getVar('SSTATECREATEFUNCS') or '').split() + \
-             ['sstate_create_package', 'sstate_sign_package'] + \
+             ['sstate_report_unihash', 'sstate_create_package', 'sstate_sign_package'] + \
              (d.getVar('SSTATEPOSTCREATEFUNCS') or '').split():
         # All hooks should run in SSTATE_BUILDDIR.
         bb.build.exec_func(f, d, (sstatebuild,))
@@ -758,6 +775,73 @@ python sstate_sign_package () {
                            d.getVar('SSTATE_SIG_PASSPHRASE'), armor=False)
 }
 
+def OEOuthashBasic(path, sigfile, task, d):
+    import hashlib
+    import stat
+
+    def update_hash(s):
+        s = s.encode('utf-8')
+        h.update(s)
+        if sigfile:
+            sigfile.write(s)
+
+    h = hashlib.sha256()
+    prev_dir = os.getcwd()
+
+    try:
+        os.chdir(path)
+
+        update_hash("OEOuthashBasic\n")
+
+        # It is only currently useful to get equivalent hashes for things that
+        # can be restored from sstate. Since the sstate object is named using
+        # SSTATE_PKGSPEC and the task name, those should be included in the
+        # output hash calculation.
+        update_hash("SSTATE_PKGSPEC=%s\n" % d.getVar('SSTATE_PKGSPEC'))
+        update_hash("task=%s\n" % task)
+
+        for root, dirs, files in os.walk('.', topdown=True):
+            # Sort directories and files to ensure consistent ordering
+            dirs.sort()
+            files.sort()
+
+            for f in files:
+                path = os.path.join(root, f)
+                s = os.lstat(path)
+
+                # Hash file path
+                update_hash(path + '\n')
+
+                # Hash file mode
+                update_hash("\tmode=0x%x\n" % stat.S_IMODE(s.st_mode))
+                update_hash("\ttype=0x%x\n" % stat.S_IFMT(s.st_mode))
+
+                if stat.S_ISBLK(s.st_mode) or stat.S_ISBLK(s.st_mode):
+                    # Hash device major and minor
+                    update_hash("\tdev=%d,%d\n" % (os.major(s.st_rdev), os.minor(s.st_rdev)))
+                elif stat.S_ISLNK(s.st_mode):
+                    # Hash symbolic link
+                    update_hash("\tsymlink=%s\n" % os.readlink(path))
+                else:
+                    fh = hashlib.sha256()
+                    # Hash file contents
+                    with open(path, 'rb') as d:
+                        for chunk in iter(lambda: d.read(4096), b""):
+                            fh.update(chunk)
+                    update_hash("\tdigest=%s\n" % fh.hexdigest())
+    finally:
+        os.chdir(prev_dir)
+
+    return h.hexdigest()
+
+python sstate_report_unihash() {
+    report_unihash = getattr(bb.parse.siggen, 'report_unihash', None)
+
+    if report_unihash:
+        ss = sstate_state_fromvars(d)
+        report_unihash(os.getcwd(), ss['task'], d)
+}
+
 #
 # Shell function to decompress and prepare a package for installation
 # Will be run from within SSTATE_INSTDIR.
@@ -804,7 +888,7 @@ def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False, *,
 
         spec, extrapath, tname = getpathcomponents(task, d)
 
-        sstatefile = d.expand("${SSTATE_DIR}/" + extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + extension)
+        sstatefile = d.expand("${SSTATE_DIR}/" + extrapath + generate_sstatefn(spec, sq_unihash[task], d) + "_" + tname + extension)
 
         if os.path.exists(sstatefile):
             bb.debug(2, "SState: Found valid sstate file %s" % sstatefile)
@@ -866,7 +950,7 @@ def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False, *,
             if task in ret:
                 continue
             spec, extrapath, tname = getpathcomponents(task, d)
-            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + extension)
+            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_unihash[task], d) + "_" + tname + extension)
             tasklist.append((task, sstatefile))
 
         if tasklist:
@@ -892,12 +976,12 @@ def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False, *,
         evdata = {'missed': [], 'found': []};
         for task in missed:
             spec, extrapath, tname = getpathcomponents(task, d)
-            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + ".tgz")
-            evdata['missed'].append( (sq_fn[task], sq_task[task], sq_hash[task], sstatefile ) )
+            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_unihash[task], d) + "_" + tname + ".tgz")
+            evdata['missed'].append( (sq_fn[task], sq_task[task], sq_unihash[task], sstatefile ) )
         for task in ret:
             spec, extrapath, tname = getpathcomponents(task, d)
-            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + ".tgz")
-            evdata['found'].append( (sq_fn[task], sq_task[task], sq_hash[task], sstatefile ) )
+            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_unihash[task], d) + "_" + tname + ".tgz")
+            evdata['found'].append( (sq_fn[task], sq_task[task], sq_unihash[task], sstatefile ) )
         bb.event.fire(bb.event.MetadataEvent("MissedSstate", evdata), d)
 
     # Print some summary statistics about the current task completion and how much sstate
diff --git a/meta/conf/bitbake.conf b/meta/conf/bitbake.conf
index 64800623545..e64ce6a6dab 100644
--- a/meta/conf/bitbake.conf
+++ b/meta/conf/bitbake.conf
@@ -867,7 +867,9 @@ BB_HASHBASE_WHITELIST ?= "TMPDIR FILE PATH PWD BB_TASKHASH BBPATH BBSERVER DL_DI
     STAMPS_DIR PRSERV_DUMPDIR PRSERV_DUMPFILE PRSERV_LOCKDOWN PARALLEL_MAKE \
     CCACHE_DIR EXTERNAL_TOOLCHAIN CCACHE CCACHE_NOHASHDIR LICENSE_PATH SDKPKGSUFFIX \
     WARN_QA ERROR_QA WORKDIR STAMPCLEAN PKGDATA_DIR BUILD_ARCH SSTATE_PKGARCH \
-    BB_WORKERCONTEXT BB_LIMITEDDEPS extend_recipe_sysroot DEPLOY_DIR"
+    BB_WORKERCONTEXT BB_LIMITEDDEPS BB_UNIHASH extend_recipe_sysroot DEPLOY_DIR \
+    SSTATE_HASHEQUIV_METHOD SSTATE_HASHEQUIV_SERVER SSTATE_HASHEQUIV_REPORT_TASKDATA \
+    SSTATE_HASHEQUIV_OWNER"
 BB_HASHCONFIG_WHITELIST ?= "${BB_HASHBASE_WHITELIST} DATE TIME SSH_AGENT_PID \
     SSH_AUTH_SOCK PSEUDO_BUILD BB_ENV_EXTRAWHITE DISABLE_SANITY_CHECKS \
     PARALLEL_MAKE BB_NUMBER_THREADS BB_ORIGENV BB_INVALIDCONF BBINCLUDED \
diff --git a/meta/lib/oe/sstatesig.py b/meta/lib/oe/sstatesig.py
index 18c5a353a2a..503f2452807 100644
--- a/meta/lib/oe/sstatesig.py
+++ b/meta/lib/oe/sstatesig.py
@@ -263,10 +263,177 @@ class SignatureGeneratorOEBasicHash(bb.siggen.SignatureGeneratorBasicHash):
         if error_msgs:
             bb.fatal("\n".join(error_msgs))
 
+class SignatureGeneratorOEEquivHash(SignatureGeneratorOEBasicHash):
+    name = "OEEquivHash"
+
+    def init_rundepcheck(self, data):
+        super().init_rundepcheck(data)
+        self.server = data.getVar('SSTATE_HASHEQUIV_SERVER')
+        self.method = data.getVar('SSTATE_HASHEQUIV_METHOD')
+        self.unihashes = bb.persist_data.persist('SSTATESIG_UNIHASH_CACHE_v1_' + self.method, data)
+
+    def get_taskdata(self):
+        return (self.server, self.method) + super().get_taskdata()
+
+    def set_taskdata(self, data):
+        self.server, self.method = data[:2]
+        super().set_taskdata(data[2:])
+
+    def __get_task_unihash_key(self, task):
+        # TODO: The key only *needs* to be the taskhash, the task is just
+        # convenient
+        return '%s:%s' % (task, self.taskhash[task])
+
+    def get_stampfile_hash(self, task):
+        if task in self.taskhash:
+            # If a unique hash is reported, use it as the stampfile hash. This
+            # ensures that if a task won't be re-run if the taskhash changes,
+            # but it would result in the same output hash
+            unihash = self.unihashes.get(self.__get_task_unihash_key(task))
+            if unihash is not None:
+                return unihash
+
+        return super().get_stampfile_hash(task)
+
+    def get_unihash(self, task):
+        import urllib
+        import json
+
+        taskhash = self.taskhash[task]
+
+        key = self.__get_task_unihash_key(task)
+
+        # TODO: This cache can grow unbounded. It probably only needs to keep
+        # for each task
+        unihash = self.taskhash.get(key)
+        if unihash is not None:
+            return unihash
+
+        # In the absence of being able to discover a unique hash from the
+        # server, make it be equivalent to the taskhash. The unique "hash" only
+        # really needs to be a unique string (not even necessarily a hash), but
+        # making it match the taskhash has a few advantages:
+        #
+        # 1) All of the sstate code that assumes hashes can be the same
+        # 2) It provides maximal compatibility with builders that don't use
+        #    an equivalency server
+        # 3) The value is easy for multiple independent builders to derive the
+        #    same unique hash from the same input. This means that if the
+        #    independent builders find the same taskhash, but it isn't reported
+        #    to the server, there is a better chance that they will agree on
+        #    the unique hash.
+        unihash = taskhash
+
+        try:
+            url = '%s/v1/equivalent?%s' % (self.server,
+                    urllib.parse.urlencode({'method': self.method, 'taskhash': self.taskhash[task]}))
+
+            request = urllib.request.Request(url)
+            response = urllib.request.urlopen(request)
+            data = response.read().decode('utf-8')
+
+            json_data = json.loads(data)
+
+            if json_data:
+                unihash = json_data['unihash']
+                # A unique hash equal to the taskhash is not very interesting,
+                # so it is reported it at debug level 2. If they differ, that
+                # is much more interesting, so it is reported at debug level 1
+                bb.debug((1, 2)[unihash == taskhash], 'Found unihash %s in place of %s for %s from %s' % (unihash, taskhash, task, self.server))
+            else:
+                bb.debug(2, 'No reported unihash for %s:%s from %s' % (task, taskhash, self.server))
+        except urllib.error.URLError as e:
+            bb.warn('Failure contacting Hash Equivalence Server %s: %s' % (self.server, str(e)))
+        except (KeyError, json.JSONDecodeError) as e:
+            bb.warn('Poorly formatted response from %s: %s' % (self.server, str(e)))
+
+        self.taskhash[key] = unihash
+        return unihash
+
+    def report_unihash(self, path, task, d):
+        import urllib
+        import json
+        import tempfile
+        import base64
+
+        taskhash = d.getVar('BB_TASKHASH')
+        unihash = d.getVar('BB_UNIHASH')
+        report_taskdata = d.getVar('SSTATE_HASHEQUIV_REPORT_TASKDATA') == '1'
+        tempdir = d.getVar('T')
+        fn = d.getVar('BB_FILENAME')
+        key = fn + '.do_' + task + ':' + taskhash
+
+        # Sanity checks
+        cache_unihash = self.taskhash.get(key)
+        if cache_unihash is None:
+            bb.fatal('%s not in unihash cache. Please report this error' % key)
+
+        if cache_unihash != unihash:
+            bb.fatal("Cache unihash %s doesn't match BB_UNIHASH %s" % (cache_unihash, unihash))
+
+        sigfile = None
+        sigfile_name = "depsig.do_%s.%d" % (task, os.getpid())
+        sigfile_link = "depsig.do_%s" % task
+
+        try:
+            call = self.method + '(path, sigfile, task, d)'
+            sigfile = open(os.path.join(tempdir, sigfile_name), 'w+b')
+            locs = {'path': path, 'sigfile': sigfile, 'task': task, 'd': d}
+
+            outhash = bb.utils.better_eval(call, locs)
+
+            try:
+                url = '%s/v1/equivalent' % self.server
+                task_data = {
+                    'taskhash': taskhash,
+                    'method': self.method,
+                    'outhash': outhash,
+                    'unihash': unihash,
+                    'owner': d.getVar('SSTATE_HASHEQUIV_OWNER')
+                    }
+
+                if report_taskdata:
+                    sigfile.seek(0)
+
+                    task_data['PN'] = d.getVar('PN')
+                    task_data['PV'] = d.getVar('PV')
+                    task_data['PR'] = d.getVar('PR')
+                    task_data['task'] = task
+                    task_data['outhash_siginfo'] = sigfile.read().decode('utf-8')
+
+                headers = {'content-type': 'application/json'}
+
+                request = urllib.request.Request(url, json.dumps(task_data).encode('utf-8'), headers)
+                response = urllib.request.urlopen(request)
+                data = response.read().decode('utf-8')
+
+                json_data = json.loads(data)
+                new_unihash = json_data['unihash']
+
+                if new_unihash != unihash:
+                    bb.debug(1, 'Task %s unihash changed %s -> %s by server %s' % (taskhash, unihash, new_unihash, self.server))
+                else:
+                    bb.debug(1, 'Reported task %s as unihash %s to %s' % (taskhash, unihash, self.server))
+            except urllib.error.URLError as e:
+                bb.warn('Failure contacting Hash Equivalence Server %s: %s' % (self.server, str(e)))
+            except (KeyError, json.JSONDecodeError) as e:
+                bb.warn('Poorly formatted response from %s: %s' % (self.server, str(e)))
+        finally:
+            if sigfile:
+                sigfile.close()
+
+                sigfile_link_path = os.path.join(tempdir, sigfile_link)
+                bb.utils.remove(sigfile_link_path)
+
+                try:
+                    os.symlink(sigfile_name, sigfile_link_path)
+                except OSError:
+                    pass
 
 # Insert these classes into siggen's namespace so it can see and select them
 bb.siggen.SignatureGeneratorOEBasic = SignatureGeneratorOEBasic
 bb.siggen.SignatureGeneratorOEBasicHash = SignatureGeneratorOEBasicHash
+bb.siggen.SignatureGeneratorOEEquivHash = SignatureGeneratorOEEquivHash
 
 
 def find_siginfo(pn, taskname, taskhashlist, d):
-- 
2.19.2



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* ✗ patchtest: failure for Hash Equivalency Server (rev2)
  2018-12-04  3:42     ` [PATCH " Joshua Watt
                       ` (18 preceding siblings ...)
  (?)
@ 2018-12-18 16:03     ` Patchwork
  -1 siblings, 0 replies; 158+ messages in thread
From: Patchwork @ 2018-12-18 16:03 UTC (permalink / raw)
  To: Joshua Watt; +Cc: openembedded-core

== Series Details ==

Series: Hash Equivalency Server (rev2)
Revision: 2
URL   : https://patchwork.openembedded.org/series/15190/
State : failure

== Summary ==


Thank you for submitting this patch series to OpenEmbedded Core. This is
an automated response. Several tests have been executed on the proposed
series by patchtest resulting in the following failures:



* Issue             Series sent to the wrong mailing list or some patches from the series correspond to different mailing lists [test_target_mailing_list] 
  Suggested fix    Send the series again to the correct mailing list (ML)
  Suggested ML     bitbake-devel@lists.openembedded.org [http://git.openembedded.org/bitbake/]
  Patch's path:    bitbake/bin/bitbake-worker

* Issue             Series does not apply on top of target branch [test_series_merge_on_head] 
  Suggested fix    Rebase your series on top of targeted branch
  Targeted branch  master (currently at 20aea61385)



If you believe any of these test results are incorrect, please reply to the
mailing list (openembedded-core@lists.openembedded.org) raising your concerns.
Otherwise we would appreciate you correcting the issues and submitting a new
version of the patchset if applicable. Please ensure you add/increment the
version number when sending the new version (i.e. [PATCH] -> [PATCH v2] ->
[PATCH v3] -> ...).

---
Guidelines:     https://www.openembedded.org/wiki/Commit_Patch_Message_Guidelines
Test framework: http://git.yoctoproject.org/cgit/cgit.cgi/patchtest
Test suite:     http://git.yoctoproject.org/cgit/cgit.cgi/patchtest-oe



^ permalink raw reply	[flat|nested] 158+ messages in thread

* Re: [OE-core] [PATCH v4 07/10] bitbake: runqueue: Pass unique hash to hash validate
  2018-12-18 15:30         ` [PATCH " Joshua Watt
@ 2018-12-18 16:24           ` Richard Purdie
  -1 siblings, 0 replies; 158+ messages in thread
From: Richard Purdie @ 2018-12-18 16:24 UTC (permalink / raw)
  To: Joshua Watt, openembedded-core, bitbake-devel

On Tue, 2018-12-18 at 09:30 -0600, Joshua Watt wrote:
> If the unique hash is being used to track task dependencies, the hash
> validation function needs to know about it in order to properly validate
> the hash.
> 
> [YOCTO #13030]
> 
> Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
> ---
>  bitbake/lib/bb/runqueue.py | 38 +++++++++++++++++++++++++++-----------
>  1 file changed, 27 insertions(+), 11 deletions(-)
> 
> diff --git a/bitbake/lib/bb/runqueue.py b/bitbake/lib/bb/runqueue.py
> index de57dcb37b8..161f53c7cb1 100644
> --- a/bitbake/lib/bb/runqueue.py
> +++ b/bitbake/lib/bb/runqueue.py
> @@ -1378,6 +1378,23 @@ class RunQueue:
>              cache[tid] = iscurrent
>          return iscurrent
>  
> +    def validate_hash(self, *, sq_fn, sq_task, sq_hash, sq_hashfn, siginfo, sq_unihash, d):
> +        locs = {"sq_fn" : sq_fn, "sq_task" : sq_task, "sq_hash" : sq_hash, "sq_hashfn" : sq_hashfn,
> +                "sq_unihash" : sq_unihash, "siginfo" : siginfo, "d" : d}
> +
> +        for hashvalidate_args in ("(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=siginfo, sq_unihash=sq_unihash)",
> +                                  "(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=siginfo)"):
> +            try:
> +                call = self.hashvalidate + hashvalidate_args
> +                return bb.utils.better_eval(call, locs)
> +            except TypeError:
> +                continue
> +
> +        # If none of the hash validate functions worked, try one more time
> +        # with the oldest type
> +        call = self.hashvalidate + "(sq_fn, sq_task, sq_hash, sq_hashfn, sq_unihash, d)"
> +        return bb.utils.better_eval(call, locs)


I think sq_unihash shouldn't be in this last fallback option. You copy
and pasted from your updated code rather than the original (pre
unihash?) You could add (sq_fn, sq_task, sq_hash, sq_hashfn) to the
args list?

Cheers,

Richard



^ permalink raw reply	[flat|nested] 158+ messages in thread

* Re: [PATCH v4 07/10] bitbake: runqueue: Pass unique hash to hash validate
@ 2018-12-18 16:24           ` Richard Purdie
  0 siblings, 0 replies; 158+ messages in thread
From: Richard Purdie @ 2018-12-18 16:24 UTC (permalink / raw)
  To: Joshua Watt, openembedded-core, bitbake-devel

On Tue, 2018-12-18 at 09:30 -0600, Joshua Watt wrote:
> If the unique hash is being used to track task dependencies, the hash
> validation function needs to know about it in order to properly validate
> the hash.
> 
> [YOCTO #13030]
> 
> Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
> ---
>  bitbake/lib/bb/runqueue.py | 38 +++++++++++++++++++++++++++-----------
>  1 file changed, 27 insertions(+), 11 deletions(-)
> 
> diff --git a/bitbake/lib/bb/runqueue.py b/bitbake/lib/bb/runqueue.py
> index de57dcb37b8..161f53c7cb1 100644
> --- a/bitbake/lib/bb/runqueue.py
> +++ b/bitbake/lib/bb/runqueue.py
> @@ -1378,6 +1378,23 @@ class RunQueue:
>              cache[tid] = iscurrent
>          return iscurrent
>  
> +    def validate_hash(self, *, sq_fn, sq_task, sq_hash, sq_hashfn, siginfo, sq_unihash, d):
> +        locs = {"sq_fn" : sq_fn, "sq_task" : sq_task, "sq_hash" : sq_hash, "sq_hashfn" : sq_hashfn,
> +                "sq_unihash" : sq_unihash, "siginfo" : siginfo, "d" : d}
> +
> +        for hashvalidate_args in ("(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=siginfo, sq_unihash=sq_unihash)",
> +                                  "(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=siginfo)"):
> +            try:
> +                call = self.hashvalidate + hashvalidate_args
> +                return bb.utils.better_eval(call, locs)
> +            except TypeError:
> +                continue
> +
> +        # If none of the hash validate functions worked, try one more time
> +        # with the oldest type
> +        call = self.hashvalidate + "(sq_fn, sq_task, sq_hash, sq_hashfn, sq_unihash, d)"
> +        return bb.utils.better_eval(call, locs)


I think sq_unihash shouldn't be in this last fallback option. You copy
and pasted from your updated code rather than the original (pre
unihash?) You could add (sq_fn, sq_task, sq_hash, sq_hashfn) to the
args list?

Cheers,

Richard



^ permalink raw reply	[flat|nested] 158+ messages in thread

* Re: [OE-core] [PATCH v4 07/10] bitbake: runqueue: Pass unique hash to hash validate
  2018-12-18 16:24           ` Richard Purdie
@ 2018-12-18 16:31             ` Joshua Watt
  -1 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-18 16:31 UTC (permalink / raw)
  To: Richard Purdie, openembedded-core, bitbake-devel

On Tue, 2018-12-18 at 16:24 +0000, Richard Purdie wrote:
> On Tue, 2018-12-18 at 09:30 -0600, Joshua Watt wrote:
> > If the unique hash is being used to track task dependencies, the
> > hash
> > validation function needs to know about it in order to properly
> > validate
> > the hash.
> > 
> > [YOCTO #13030]
> > 
> > Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
> > ---
> >  bitbake/lib/bb/runqueue.py | 38 +++++++++++++++++++++++++++-------
> > ----
> >  1 file changed, 27 insertions(+), 11 deletions(-)
> > 
> > diff --git a/bitbake/lib/bb/runqueue.py
> > b/bitbake/lib/bb/runqueue.py
> > index de57dcb37b8..161f53c7cb1 100644
> > --- a/bitbake/lib/bb/runqueue.py
> > +++ b/bitbake/lib/bb/runqueue.py
> > @@ -1378,6 +1378,23 @@ class RunQueue:
> >              cache[tid] = iscurrent
> >          return iscurrent
> >  
> > +    def validate_hash(self, *, sq_fn, sq_task, sq_hash, sq_hashfn,
> > siginfo, sq_unihash, d):
> > +        locs = {"sq_fn" : sq_fn, "sq_task" : sq_task, "sq_hash" :
> > sq_hash, "sq_hashfn" : sq_hashfn,
> > +                "sq_unihash" : sq_unihash, "siginfo" : siginfo,
> > "d" : d}
> > +
> > +        for hashvalidate_args in ("(sq_fn, sq_task, sq_hash,
> > sq_hashfn, d, siginfo=siginfo, sq_unihash=sq_unihash)",
> > +                                  "(sq_fn, sq_task, sq_hash,
> > sq_hashfn, d, siginfo=siginfo)"):
> > +            try:
> > +                call = self.hashvalidate + hashvalidate_args
> > +                return bb.utils.better_eval(call, locs)
> > +            except TypeError:
> > +                continue
> > +
> > +        # If none of the hash validate functions worked, try one
> > more time
> > +        # with the oldest type
> > +        call = self.hashvalidate + "(sq_fn, sq_task, sq_hash,
> > sq_hashfn, sq_unihash, d)"
> > +        return bb.utils.better_eval(call, locs)
> 
> I think sq_unihash shouldn't be in this last fallback option. You
> copy
> and pasted from your updated code rather than the original (pre
> unihash?) You could add (sq_fn, sq_task, sq_hash, sq_hashfn) to the
> args list?

Oops, yep I'll fix that. 

The reason I did it like this to make it not suppress the final
TypeError if the last try failed.... I wasn't sure if there was some
other error I should raise instead if we exit the loop when they all
fail?

> 
> Cheers,
> 
> Richard
> 
-- 
Joshua Watt <JPEWhacker@gmail.com>



^ permalink raw reply	[flat|nested] 158+ messages in thread

* Re: [PATCH v4 07/10] bitbake: runqueue: Pass unique hash to hash validate
@ 2018-12-18 16:31             ` Joshua Watt
  0 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-18 16:31 UTC (permalink / raw)
  To: Richard Purdie, openembedded-core, bitbake-devel

On Tue, 2018-12-18 at 16:24 +0000, Richard Purdie wrote:
> On Tue, 2018-12-18 at 09:30 -0600, Joshua Watt wrote:
> > If the unique hash is being used to track task dependencies, the
> > hash
> > validation function needs to know about it in order to properly
> > validate
> > the hash.
> > 
> > [YOCTO #13030]
> > 
> > Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
> > ---
> >  bitbake/lib/bb/runqueue.py | 38 +++++++++++++++++++++++++++-------
> > ----
> >  1 file changed, 27 insertions(+), 11 deletions(-)
> > 
> > diff --git a/bitbake/lib/bb/runqueue.py
> > b/bitbake/lib/bb/runqueue.py
> > index de57dcb37b8..161f53c7cb1 100644
> > --- a/bitbake/lib/bb/runqueue.py
> > +++ b/bitbake/lib/bb/runqueue.py
> > @@ -1378,6 +1378,23 @@ class RunQueue:
> >              cache[tid] = iscurrent
> >          return iscurrent
> >  
> > +    def validate_hash(self, *, sq_fn, sq_task, sq_hash, sq_hashfn,
> > siginfo, sq_unihash, d):
> > +        locs = {"sq_fn" : sq_fn, "sq_task" : sq_task, "sq_hash" :
> > sq_hash, "sq_hashfn" : sq_hashfn,
> > +                "sq_unihash" : sq_unihash, "siginfo" : siginfo,
> > "d" : d}
> > +
> > +        for hashvalidate_args in ("(sq_fn, sq_task, sq_hash,
> > sq_hashfn, d, siginfo=siginfo, sq_unihash=sq_unihash)",
> > +                                  "(sq_fn, sq_task, sq_hash,
> > sq_hashfn, d, siginfo=siginfo)"):
> > +            try:
> > +                call = self.hashvalidate + hashvalidate_args
> > +                return bb.utils.better_eval(call, locs)
> > +            except TypeError:
> > +                continue
> > +
> > +        # If none of the hash validate functions worked, try one
> > more time
> > +        # with the oldest type
> > +        call = self.hashvalidate + "(sq_fn, sq_task, sq_hash,
> > sq_hashfn, sq_unihash, d)"
> > +        return bb.utils.better_eval(call, locs)
> 
> I think sq_unihash shouldn't be in this last fallback option. You
> copy
> and pasted from your updated code rather than the original (pre
> unihash?) You could add (sq_fn, sq_task, sq_hash, sq_hashfn) to the
> args list?

Oops, yep I'll fix that. 

The reason I did it like this to make it not suppress the final
TypeError if the last try failed.... I wasn't sure if there was some
other error I should raise instead if we exit the loop when they all
fail?

> 
> Cheers,
> 
> Richard
> 
-- 
Joshua Watt <JPEWhacker@gmail.com>



^ permalink raw reply	[flat|nested] 158+ messages in thread

* [OE-core][PATCH v5 0/8] Hash Equivalency Server
  2018-12-18 15:30       ` [PATCH " Joshua Watt
@ 2018-12-19  3:10         ` Joshua Watt
  -1 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-19  3:10 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

Apologies for cross-posting this to both the bitbake-devel and
openembedded-devel; this work necessarily intertwines both places, and
it is really necessary to look at both parts to get an idea of what is
going on. For convenience, the bitbake patches are listed first,
followed by the oe-core patches.

The basic premise is that any given task no longer hashes a dependent
task's taskhash to determine it's own taskhash, but instead hashes the
dependent task's "unique hash" (which doesn't strictly need to be a
hash, but is for consistency.  This allows multiple taskhashes to map to
the same unique hash, meaning that trivial changes to a recipe that
would change the taskhash don't necessarily need to change the unique
hash, and thus don't need to cause downstream tasks to be rebuilt (with
caveats, see below).

In the absence of any interaction by the user, the unique hash for a
task is just that task's taskhash, which effectively maintains the
current behavior. However, if the user enables the "OEEquivHash"
signature generator, they can direct it to look at a hash equivalency
server (of which a reference implementation is provided). The sstate
code will provide the server with an output hash that it calculates, and
the server will record all tasks with the same output hash as
"equivalent" and report the same unique hash for them when requested.
When initializing tasks, bitbake can ask the server about the unique
hash for new tasks it has never seen before and potentially skip
rebuilding, or restore the task from an equivalent sstate file. To
facilitate restoring tasks from sstate, sstate objects are now named
based on the tasks unique hash instead of the taskhash (which, again has
no effect if the server is in use).

This patchset doesn't make any attempt to dynamically update task unique
hash after bitbake initializes the tasks, and as such there are some
cases where this isn't accelerating the build as much as it possibly
could. I think it will be possible to add support for this, but this
preliminary support needs to come first.

You can also see these patches (and my first attempts at dynamic task
re-hashing) on the "jpew/hash-equivalence" branch in poky-contrib.

As always, thanks for your feedback and time

VERSION 2:

At the core, this patch does the same thing as V1 with some very minor
tweaks. The main things that have changed are:
 1) Per request, the Hash Equivalence Server reference implementation is
    now based entirely on built in Python modules and requires no
    external libraries. It also has a wrapper script to launch it
    (bitbake-hashserv) and unittests.
 2) There is a major rework of persist_data in bitbake. I
    think these patches could be submitted independently, but I doubt
    anyone is clamoring for them. The general gist of them is that there
    were a lot of strange edge cases that I found when using
    persist_data as an IPC mechanism between the main bitbake process
    and the bitbake-worker processes. I went ahead and added extensive
    unit tests for this as well.

VERSION 3:

Minor tweak to version 2 that should fix timeout errors seen on the
autobuilder

VERSION 4:

Based on discussion, the term "dependency ID" was dropped in favor of
"unique hash" (unihash).

The hash validation checks were updated to properly fallback to the old
function signatures (that don't pass the unihashes) for compatibility
with older implementations.

VERSION 5:

Removed os.fork() handlers for persist_data. They can be added back if
actually necessary.

Reworked hash validation slightly based on feedback.

Joshua Watt (8):
  bitbake: tests/persist_data: Add tests
  bitbake: siggen: Split out task unique hash
  bitbake: runqueue: Track task unique hash
  bitbake: runqueue: Pass unique hash to task
  bitbake: runqueue: Pass unique hash to hash validate
  classes/sstate: Handle unihash in hash check
  bitbake: hashserv: Add hash equivalence reference server
  sstate: Implement hash equivalence sstate

 bitbake/bin/bitbake-hashserv         |  67 +++++++++++
 bitbake/bin/bitbake-selftest         |   3 +
 bitbake/bin/bitbake-worker           |   7 +-
 bitbake/lib/bb/runqueue.py           |  76 ++++++++----
 bitbake/lib/bb/siggen.py             |   7 +-
 bitbake/lib/bb/tests/persist_data.py | 142 +++++++++++++++++++++++
 bitbake/lib/hashserv/__init__.py     | 152 ++++++++++++++++++++++++
 bitbake/lib/hashserv/tests.py        | 141 ++++++++++++++++++++++
 meta/classes/sstate.bbclass          | 102 ++++++++++++++--
 meta/conf/bitbake.conf               |   4 +-
 meta/lib/oe/sstatesig.py             | 167 +++++++++++++++++++++++++++
 11 files changed, 830 insertions(+), 38 deletions(-)
 create mode 100755 bitbake/bin/bitbake-hashserv
 create mode 100644 bitbake/lib/bb/tests/persist_data.py
 create mode 100644 bitbake/lib/hashserv/__init__.py
 create mode 100644 bitbake/lib/hashserv/tests.py

-- 
2.19.2



^ permalink raw reply	[flat|nested] 158+ messages in thread

* [PATCH v5 0/8] Hash Equivalency Server
@ 2018-12-19  3:10         ` Joshua Watt
  0 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-19  3:10 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

Apologies for cross-posting this to both the bitbake-devel and
openembedded-devel; this work necessarily intertwines both places, and
it is really necessary to look at both parts to get an idea of what is
going on. For convenience, the bitbake patches are listed first,
followed by the oe-core patches.

The basic premise is that any given task no longer hashes a dependent
task's taskhash to determine it's own taskhash, but instead hashes the
dependent task's "unique hash" (which doesn't strictly need to be a
hash, but is for consistency.  This allows multiple taskhashes to map to
the same unique hash, meaning that trivial changes to a recipe that
would change the taskhash don't necessarily need to change the unique
hash, and thus don't need to cause downstream tasks to be rebuilt (with
caveats, see below).

In the absence of any interaction by the user, the unique hash for a
task is just that task's taskhash, which effectively maintains the
current behavior. However, if the user enables the "OEEquivHash"
signature generator, they can direct it to look at a hash equivalency
server (of which a reference implementation is provided). The sstate
code will provide the server with an output hash that it calculates, and
the server will record all tasks with the same output hash as
"equivalent" and report the same unique hash for them when requested.
When initializing tasks, bitbake can ask the server about the unique
hash for new tasks it has never seen before and potentially skip
rebuilding, or restore the task from an equivalent sstate file. To
facilitate restoring tasks from sstate, sstate objects are now named
based on the tasks unique hash instead of the taskhash (which, again has
no effect if the server is in use).

This patchset doesn't make any attempt to dynamically update task unique
hash after bitbake initializes the tasks, and as such there are some
cases where this isn't accelerating the build as much as it possibly
could. I think it will be possible to add support for this, but this
preliminary support needs to come first.

You can also see these patches (and my first attempts at dynamic task
re-hashing) on the "jpew/hash-equivalence" branch in poky-contrib.

As always, thanks for your feedback and time

VERSION 2:

At the core, this patch does the same thing as V1 with some very minor
tweaks. The main things that have changed are:
 1) Per request, the Hash Equivalence Server reference implementation is
    now based entirely on built in Python modules and requires no
    external libraries. It also has a wrapper script to launch it
    (bitbake-hashserv) and unittests.
 2) There is a major rework of persist_data in bitbake. I
    think these patches could be submitted independently, but I doubt
    anyone is clamoring for them. The general gist of them is that there
    were a lot of strange edge cases that I found when using
    persist_data as an IPC mechanism between the main bitbake process
    and the bitbake-worker processes. I went ahead and added extensive
    unit tests for this as well.

VERSION 3:

Minor tweak to version 2 that should fix timeout errors seen on the
autobuilder

VERSION 4:

Based on discussion, the term "dependency ID" was dropped in favor of
"unique hash" (unihash).

The hash validation checks were updated to properly fallback to the old
function signatures (that don't pass the unihashes) for compatibility
with older implementations.

VERSION 5:

Removed os.fork() handlers for persist_data. They can be added back if
actually necessary.

Reworked hash validation slightly based on feedback.

Joshua Watt (8):
  bitbake: tests/persist_data: Add tests
  bitbake: siggen: Split out task unique hash
  bitbake: runqueue: Track task unique hash
  bitbake: runqueue: Pass unique hash to task
  bitbake: runqueue: Pass unique hash to hash validate
  classes/sstate: Handle unihash in hash check
  bitbake: hashserv: Add hash equivalence reference server
  sstate: Implement hash equivalence sstate

 bitbake/bin/bitbake-hashserv         |  67 +++++++++++
 bitbake/bin/bitbake-selftest         |   3 +
 bitbake/bin/bitbake-worker           |   7 +-
 bitbake/lib/bb/runqueue.py           |  76 ++++++++----
 bitbake/lib/bb/siggen.py             |   7 +-
 bitbake/lib/bb/tests/persist_data.py | 142 +++++++++++++++++++++++
 bitbake/lib/hashserv/__init__.py     | 152 ++++++++++++++++++++++++
 bitbake/lib/hashserv/tests.py        | 141 ++++++++++++++++++++++
 meta/classes/sstate.bbclass          | 102 ++++++++++++++--
 meta/conf/bitbake.conf               |   4 +-
 meta/lib/oe/sstatesig.py             | 167 +++++++++++++++++++++++++++
 11 files changed, 830 insertions(+), 38 deletions(-)
 create mode 100755 bitbake/bin/bitbake-hashserv
 create mode 100644 bitbake/lib/bb/tests/persist_data.py
 create mode 100644 bitbake/lib/hashserv/__init__.py
 create mode 100644 bitbake/lib/hashserv/tests.py

-- 
2.19.2



^ permalink raw reply	[flat|nested] 158+ messages in thread

* [OE-core][PATCH v5 1/8] bitbake: tests/persist_data: Add tests
  2018-12-19  3:10         ` [PATCH " Joshua Watt
@ 2018-12-19  3:10           ` Joshua Watt
  -1 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-19  3:10 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

Adds a test suite for testing the persistent data cache

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/bin/bitbake-selftest         |   1 +
 bitbake/lib/bb/tests/persist_data.py | 142 +++++++++++++++++++++++++++
 2 files changed, 143 insertions(+)
 create mode 100644 bitbake/lib/bb/tests/persist_data.py

diff --git a/bitbake/bin/bitbake-selftest b/bitbake/bin/bitbake-selftest
index cfa7ac5391b..c970dcae90c 100755
--- a/bitbake/bin/bitbake-selftest
+++ b/bitbake/bin/bitbake-selftest
@@ -33,6 +33,7 @@ tests = ["bb.tests.codeparser",
          "bb.tests.event",
          "bb.tests.fetch",
          "bb.tests.parse",
+         "bb.tests.persist_data",
          "bb.tests.utils",
          "layerindexlib.tests.layerindexobj",
          "layerindexlib.tests.restapi",
diff --git a/bitbake/lib/bb/tests/persist_data.py b/bitbake/lib/bb/tests/persist_data.py
new file mode 100644
index 00000000000..812bcbd7b8b
--- /dev/null
+++ b/bitbake/lib/bb/tests/persist_data.py
@@ -0,0 +1,142 @@
+# ex:ts=4:sw=4:sts=4:et
+# -*- tab-width: 4; c-basic-offset: 4; indent-tabs-mode: nil -*-
+#
+# BitBake Test for lib/bb/persist_data/
+#
+# Copyright (C) 2018 Garmin Ltd.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License version 2 as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License along
+# with this program; if not, write to the Free Software Foundation, Inc.,
+# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+#
+
+import unittest
+import bb.data
+import bb.persist_data
+import tempfile
+import threading
+
+class PersistDataTest(unittest.TestCase):
+    def _create_data(self):
+        return bb.persist_data.persist('TEST_PERSIST_DATA', self.d)
+
+    def setUp(self):
+        self.d = bb.data.init()
+        self.tempdir = tempfile.TemporaryDirectory()
+        self.d['PERSISTENT_DIR'] = self.tempdir.name
+        self.data = self._create_data()
+        self.items = {
+                'A1': '1',
+                'B1': '2',
+                'C2': '3'
+                }
+        self.stress_count = 10000
+        self.thread_count = 5
+
+        for k,v in self.items.items():
+            self.data[k] = v
+
+    def tearDown(self):
+        self.tempdir.cleanup()
+
+    def _iter_helper(self, seen, iterator):
+        with iter(iterator):
+            for v in iterator:
+                self.assertTrue(v in seen)
+                seen.remove(v)
+        self.assertEqual(len(seen), 0, '%s not seen' % seen)
+
+    def test_get(self):
+        for k, v in self.items.items():
+            self.assertEqual(self.data[k], v)
+
+        self.assertIsNone(self.data.get('D'))
+        with self.assertRaises(KeyError):
+            self.data['D']
+
+    def test_set(self):
+        for k, v in self.items.items():
+            self.data[k] += '-foo'
+
+        for k, v in self.items.items():
+            self.assertEqual(self.data[k], v + '-foo')
+
+    def test_delete(self):
+        self.data['D'] = '4'
+        self.assertEqual(self.data['D'], '4')
+        del self.data['D']
+        self.assertIsNone(self.data.get('D'))
+        with self.assertRaises(KeyError):
+            self.data['D']
+
+    def test_contains(self):
+        for k in self.items:
+            self.assertTrue(k in self.data)
+            self.assertTrue(self.data.has_key(k))
+        self.assertFalse('NotFound' in self.data)
+        self.assertFalse(self.data.has_key('NotFound'))
+
+    def test_len(self):
+        self.assertEqual(len(self.data), len(self.items))
+
+    def test_iter(self):
+        self._iter_helper(set(self.items.keys()), self.data)
+
+    def test_itervalues(self):
+        self._iter_helper(set(self.items.values()), self.data.itervalues())
+
+    def test_iteritems(self):
+        self._iter_helper(set(self.items.items()), self.data.iteritems())
+
+    def test_get_by_pattern(self):
+        self._iter_helper({'1', '2'}, self.data.get_by_pattern('_1'))
+
+    def _stress_read(self, data):
+        for i in range(self.stress_count):
+            for k in self.items:
+                data[k]
+
+    def _stress_write(self, data):
+        for i in range(self.stress_count):
+            for k, v in self.items.items():
+                data[k] = v + str(i)
+
+    def _validate_stress(self):
+        for k, v in self.items.items():
+            self.assertEqual(self.data[k], v + str(self.stress_count - 1))
+
+    def test_stress(self):
+        self._stress_read(self.data)
+        self._stress_write(self.data)
+        self._validate_stress()
+
+    def test_stress_threads(self):
+        def read_thread():
+            data = self._create_data()
+            self._stress_read(data)
+
+        def write_thread():
+            data = self._create_data()
+            self._stress_write(data)
+
+        threads = []
+        for i in range(self.thread_count):
+            threads.append(threading.Thread(target=read_thread))
+            threads.append(threading.Thread(target=write_thread))
+
+        for t in threads:
+            t.start()
+        self._stress_read(self.data)
+        for t in threads:
+            t.join()
+        self._validate_stress()
+
-- 
2.19.2



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [PATCH v5 1/8] bitbake: tests/persist_data: Add tests
@ 2018-12-19  3:10           ` Joshua Watt
  0 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-19  3:10 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

Adds a test suite for testing the persistent data cache

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/bin/bitbake-selftest         |   1 +
 bitbake/lib/bb/tests/persist_data.py | 142 +++++++++++++++++++++++++++
 2 files changed, 143 insertions(+)
 create mode 100644 bitbake/lib/bb/tests/persist_data.py

diff --git a/bitbake/bin/bitbake-selftest b/bitbake/bin/bitbake-selftest
index cfa7ac5391b..c970dcae90c 100755
--- a/bitbake/bin/bitbake-selftest
+++ b/bitbake/bin/bitbake-selftest
@@ -33,6 +33,7 @@ tests = ["bb.tests.codeparser",
          "bb.tests.event",
          "bb.tests.fetch",
          "bb.tests.parse",
+         "bb.tests.persist_data",
          "bb.tests.utils",
          "layerindexlib.tests.layerindexobj",
          "layerindexlib.tests.restapi",
diff --git a/bitbake/lib/bb/tests/persist_data.py b/bitbake/lib/bb/tests/persist_data.py
new file mode 100644
index 00000000000..812bcbd7b8b
--- /dev/null
+++ b/bitbake/lib/bb/tests/persist_data.py
@@ -0,0 +1,142 @@
+# ex:ts=4:sw=4:sts=4:et
+# -*- tab-width: 4; c-basic-offset: 4; indent-tabs-mode: nil -*-
+#
+# BitBake Test for lib/bb/persist_data/
+#
+# Copyright (C) 2018 Garmin Ltd.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License version 2 as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License along
+# with this program; if not, write to the Free Software Foundation, Inc.,
+# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+#
+
+import unittest
+import bb.data
+import bb.persist_data
+import tempfile
+import threading
+
+class PersistDataTest(unittest.TestCase):
+    def _create_data(self):
+        return bb.persist_data.persist('TEST_PERSIST_DATA', self.d)
+
+    def setUp(self):
+        self.d = bb.data.init()
+        self.tempdir = tempfile.TemporaryDirectory()
+        self.d['PERSISTENT_DIR'] = self.tempdir.name
+        self.data = self._create_data()
+        self.items = {
+                'A1': '1',
+                'B1': '2',
+                'C2': '3'
+                }
+        self.stress_count = 10000
+        self.thread_count = 5
+
+        for k,v in self.items.items():
+            self.data[k] = v
+
+    def tearDown(self):
+        self.tempdir.cleanup()
+
+    def _iter_helper(self, seen, iterator):
+        with iter(iterator):
+            for v in iterator:
+                self.assertTrue(v in seen)
+                seen.remove(v)
+        self.assertEqual(len(seen), 0, '%s not seen' % seen)
+
+    def test_get(self):
+        for k, v in self.items.items():
+            self.assertEqual(self.data[k], v)
+
+        self.assertIsNone(self.data.get('D'))
+        with self.assertRaises(KeyError):
+            self.data['D']
+
+    def test_set(self):
+        for k, v in self.items.items():
+            self.data[k] += '-foo'
+
+        for k, v in self.items.items():
+            self.assertEqual(self.data[k], v + '-foo')
+
+    def test_delete(self):
+        self.data['D'] = '4'
+        self.assertEqual(self.data['D'], '4')
+        del self.data['D']
+        self.assertIsNone(self.data.get('D'))
+        with self.assertRaises(KeyError):
+            self.data['D']
+
+    def test_contains(self):
+        for k in self.items:
+            self.assertTrue(k in self.data)
+            self.assertTrue(self.data.has_key(k))
+        self.assertFalse('NotFound' in self.data)
+        self.assertFalse(self.data.has_key('NotFound'))
+
+    def test_len(self):
+        self.assertEqual(len(self.data), len(self.items))
+
+    def test_iter(self):
+        self._iter_helper(set(self.items.keys()), self.data)
+
+    def test_itervalues(self):
+        self._iter_helper(set(self.items.values()), self.data.itervalues())
+
+    def test_iteritems(self):
+        self._iter_helper(set(self.items.items()), self.data.iteritems())
+
+    def test_get_by_pattern(self):
+        self._iter_helper({'1', '2'}, self.data.get_by_pattern('_1'))
+
+    def _stress_read(self, data):
+        for i in range(self.stress_count):
+            for k in self.items:
+                data[k]
+
+    def _stress_write(self, data):
+        for i in range(self.stress_count):
+            for k, v in self.items.items():
+                data[k] = v + str(i)
+
+    def _validate_stress(self):
+        for k, v in self.items.items():
+            self.assertEqual(self.data[k], v + str(self.stress_count - 1))
+
+    def test_stress(self):
+        self._stress_read(self.data)
+        self._stress_write(self.data)
+        self._validate_stress()
+
+    def test_stress_threads(self):
+        def read_thread():
+            data = self._create_data()
+            self._stress_read(data)
+
+        def write_thread():
+            data = self._create_data()
+            self._stress_write(data)
+
+        threads = []
+        for i in range(self.thread_count):
+            threads.append(threading.Thread(target=read_thread))
+            threads.append(threading.Thread(target=write_thread))
+
+        for t in threads:
+            t.start()
+        self._stress_read(self.data)
+        for t in threads:
+            t.join()
+        self._validate_stress()
+
-- 
2.19.2



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [OE-core][PATCH v5 2/8] bitbake: siggen: Split out task unique hash
  2018-12-19  3:10         ` [PATCH " Joshua Watt
@ 2018-12-19  3:10           ` Joshua Watt
  -1 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-19  3:10 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

Abstracts the function to get the unique hash for a task. This hash is
used as in place of the taskhash for the purpose of determine how other
tasks depend on this one. Unless overridden, the taskhash is the same as
the unique hash, preserving the original behavior.

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/lib/bb/siggen.py | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/bitbake/lib/bb/siggen.py b/bitbake/lib/bb/siggen.py
index ab6df7603c8..5508523f2da 100644
--- a/bitbake/lib/bb/siggen.py
+++ b/bitbake/lib/bb/siggen.py
@@ -41,6 +41,9 @@ class SignatureGenerator(object):
     def finalise(self, fn, d, varient):
         return
 
+    def get_unihash(self, task):
+        return self.taskhash[task]
+
     def get_taskhash(self, fn, task, deps, dataCache):
         return "0"
 
@@ -186,7 +189,7 @@ class SignatureGeneratorBasic(SignatureGenerator):
                 continue
             if dep not in self.taskhash:
                 bb.fatal("%s is not in taskhash, caller isn't calling in dependency order?" % dep)
-            data = data + self.taskhash[dep]
+            data = data + self.get_unihash(dep)
             self.runtaskdeps[k].append(dep)
 
         if task in dataCache.file_checksums[fn]:
@@ -261,7 +264,7 @@ class SignatureGeneratorBasic(SignatureGenerator):
             data['file_checksum_values'] = [(os.path.basename(f), cs) for f,cs in self.file_checksum_values[k]]
             data['runtaskhashes'] = {}
             for dep in data['runtaskdeps']:
-                data['runtaskhashes'][dep] = self.taskhash[dep]
+                data['runtaskhashes'][dep] = self.get_unihash(dep)
             data['taskhash'] = self.taskhash[k]
 
         taint = self.read_taint(fn, task, referencestamp)
-- 
2.19.2



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [PATCH v5 2/8] bitbake: siggen: Split out task unique hash
@ 2018-12-19  3:10           ` Joshua Watt
  0 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-19  3:10 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

Abstracts the function to get the unique hash for a task. This hash is
used as in place of the taskhash for the purpose of determine how other
tasks depend on this one. Unless overridden, the taskhash is the same as
the unique hash, preserving the original behavior.

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/lib/bb/siggen.py | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/bitbake/lib/bb/siggen.py b/bitbake/lib/bb/siggen.py
index ab6df7603c8..5508523f2da 100644
--- a/bitbake/lib/bb/siggen.py
+++ b/bitbake/lib/bb/siggen.py
@@ -41,6 +41,9 @@ class SignatureGenerator(object):
     def finalise(self, fn, d, varient):
         return
 
+    def get_unihash(self, task):
+        return self.taskhash[task]
+
     def get_taskhash(self, fn, task, deps, dataCache):
         return "0"
 
@@ -186,7 +189,7 @@ class SignatureGeneratorBasic(SignatureGenerator):
                 continue
             if dep not in self.taskhash:
                 bb.fatal("%s is not in taskhash, caller isn't calling in dependency order?" % dep)
-            data = data + self.taskhash[dep]
+            data = data + self.get_unihash(dep)
             self.runtaskdeps[k].append(dep)
 
         if task in dataCache.file_checksums[fn]:
@@ -261,7 +264,7 @@ class SignatureGeneratorBasic(SignatureGenerator):
             data['file_checksum_values'] = [(os.path.basename(f), cs) for f,cs in self.file_checksum_values[k]]
             data['runtaskhashes'] = {}
             for dep in data['runtaskdeps']:
-                data['runtaskhashes'][dep] = self.taskhash[dep]
+                data['runtaskhashes'][dep] = self.get_unihash(dep)
             data['taskhash'] = self.taskhash[k]
 
         taint = self.read_taint(fn, task, referencestamp)
-- 
2.19.2



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [OE-core][PATCH v5 3/8] bitbake: runqueue: Track task unique hash
  2018-12-19  3:10         ` [PATCH " Joshua Watt
@ 2018-12-19  3:10           ` Joshua Watt
  -1 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-19  3:10 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

Requests the task unique hash from siggen and tracks it

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/lib/bb/runqueue.py | 25 +++++++++++++++++--------
 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/bitbake/lib/bb/runqueue.py b/bitbake/lib/bb/runqueue.py
index f2b95a9829b..27b188256dd 100644
--- a/bitbake/lib/bb/runqueue.py
+++ b/bitbake/lib/bb/runqueue.py
@@ -346,6 +346,7 @@ class RunTaskEntry(object):
         self.depends = set()
         self.revdeps = set()
         self.hash = None
+        self.unihash = None
         self.task = None
         self.weight = 1
 
@@ -385,6 +386,9 @@ class RunQueueData:
     def get_task_hash(self, tid):
         return self.runtaskentries[tid].hash
 
+    def get_task_unihash(self, tid):
+        return self.runtaskentries[tid].unihash
+
     def get_user_idstring(self, tid, task_name_suffix = ""):
         return tid + task_name_suffix
 
@@ -1150,18 +1154,21 @@ class RunQueueData:
                 if len(self.runtaskentries[tid].depends - dealtwith) == 0:
                     dealtwith.add(tid)
                     todeal.remove(tid)
-                    procdep = []
-                    for dep in self.runtaskentries[tid].depends:
-                        procdep.append(fn_from_tid(dep) + "." + taskname_from_tid(dep))
-                    (mc, fn, taskname, taskfn) = split_tid_mcfn(tid)
-                    self.runtaskentries[tid].hash = bb.parse.siggen.get_taskhash(taskfn, taskname, procdep, self.dataCaches[mc])
-                    task = self.runtaskentries[tid].task
+                    self.prepare_task_hash(tid)
 
         bb.parse.siggen.writeout_file_checksum_cache()
 
         #self.dump_data()
         return len(self.runtaskentries)
 
+    def prepare_task_hash(self, tid):
+        procdep = []
+        for dep in self.runtaskentries[tid].depends:
+            procdep.append(fn_from_tid(dep) + "." + taskname_from_tid(dep))
+        (mc, fn, taskname, taskfn) = split_tid_mcfn(tid)
+        self.runtaskentries[tid].hash = bb.parse.siggen.get_taskhash(taskfn, taskname, procdep, self.dataCaches[mc])
+        self.runtaskentries[tid].unihash = bb.parse.siggen.get_unihash(fn + "." + taskname)
+
     def dump_data(self):
         """
         Dump some debug information on the internal data structures
@@ -2081,7 +2088,8 @@ class RunQueueExecuteTasks(RunQueueExecute):
                 deps = self.rqdata.runtaskentries[revdep].depends
                 provides = self.rqdata.dataCaches[mc].fn_provides[taskfn]
                 taskhash = self.rqdata.runtaskentries[revdep].hash
-                taskdepdata[revdep] = [pn, taskname, fn, deps, provides, taskhash]
+                unihash = self.rqdata.runtaskentries[revdep].unihash
+                taskdepdata[revdep] = [pn, taskname, fn, deps, provides, taskhash, unihash]
                 for revdep2 in deps:
                     if revdep2 not in taskdepdata:
                         additional.append(revdep2)
@@ -2524,7 +2532,8 @@ class RunQueueExecuteScenequeue(RunQueueExecute):
                 deps = getsetscenedeps(revdep)
                 provides = self.rqdata.dataCaches[mc].fn_provides[taskfn]
                 taskhash = self.rqdata.runtaskentries[revdep].hash
-                taskdepdata[revdep] = [pn, taskname, fn, deps, provides, taskhash]
+                unihash = self.rqdata.runtaskentries[revdep].unihash
+                taskdepdata[revdep] = [pn, taskname, fn, deps, provides, taskhash, unihash]
                 for revdep2 in deps:
                     if revdep2 not in taskdepdata:
                         additional.append(revdep2)
-- 
2.19.2



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [PATCH v5 3/8] bitbake: runqueue: Track task unique hash
@ 2018-12-19  3:10           ` Joshua Watt
  0 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-19  3:10 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

Requests the task unique hash from siggen and tracks it

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/lib/bb/runqueue.py | 25 +++++++++++++++++--------
 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/bitbake/lib/bb/runqueue.py b/bitbake/lib/bb/runqueue.py
index f2b95a9829b..27b188256dd 100644
--- a/bitbake/lib/bb/runqueue.py
+++ b/bitbake/lib/bb/runqueue.py
@@ -346,6 +346,7 @@ class RunTaskEntry(object):
         self.depends = set()
         self.revdeps = set()
         self.hash = None
+        self.unihash = None
         self.task = None
         self.weight = 1
 
@@ -385,6 +386,9 @@ class RunQueueData:
     def get_task_hash(self, tid):
         return self.runtaskentries[tid].hash
 
+    def get_task_unihash(self, tid):
+        return self.runtaskentries[tid].unihash
+
     def get_user_idstring(self, tid, task_name_suffix = ""):
         return tid + task_name_suffix
 
@@ -1150,18 +1154,21 @@ class RunQueueData:
                 if len(self.runtaskentries[tid].depends - dealtwith) == 0:
                     dealtwith.add(tid)
                     todeal.remove(tid)
-                    procdep = []
-                    for dep in self.runtaskentries[tid].depends:
-                        procdep.append(fn_from_tid(dep) + "." + taskname_from_tid(dep))
-                    (mc, fn, taskname, taskfn) = split_tid_mcfn(tid)
-                    self.runtaskentries[tid].hash = bb.parse.siggen.get_taskhash(taskfn, taskname, procdep, self.dataCaches[mc])
-                    task = self.runtaskentries[tid].task
+                    self.prepare_task_hash(tid)
 
         bb.parse.siggen.writeout_file_checksum_cache()
 
         #self.dump_data()
         return len(self.runtaskentries)
 
+    def prepare_task_hash(self, tid):
+        procdep = []
+        for dep in self.runtaskentries[tid].depends:
+            procdep.append(fn_from_tid(dep) + "." + taskname_from_tid(dep))
+        (mc, fn, taskname, taskfn) = split_tid_mcfn(tid)
+        self.runtaskentries[tid].hash = bb.parse.siggen.get_taskhash(taskfn, taskname, procdep, self.dataCaches[mc])
+        self.runtaskentries[tid].unihash = bb.parse.siggen.get_unihash(fn + "." + taskname)
+
     def dump_data(self):
         """
         Dump some debug information on the internal data structures
@@ -2081,7 +2088,8 @@ class RunQueueExecuteTasks(RunQueueExecute):
                 deps = self.rqdata.runtaskentries[revdep].depends
                 provides = self.rqdata.dataCaches[mc].fn_provides[taskfn]
                 taskhash = self.rqdata.runtaskentries[revdep].hash
-                taskdepdata[revdep] = [pn, taskname, fn, deps, provides, taskhash]
+                unihash = self.rqdata.runtaskentries[revdep].unihash
+                taskdepdata[revdep] = [pn, taskname, fn, deps, provides, taskhash, unihash]
                 for revdep2 in deps:
                     if revdep2 not in taskdepdata:
                         additional.append(revdep2)
@@ -2524,7 +2532,8 @@ class RunQueueExecuteScenequeue(RunQueueExecute):
                 deps = getsetscenedeps(revdep)
                 provides = self.rqdata.dataCaches[mc].fn_provides[taskfn]
                 taskhash = self.rqdata.runtaskentries[revdep].hash
-                taskdepdata[revdep] = [pn, taskname, fn, deps, provides, taskhash]
+                unihash = self.rqdata.runtaskentries[revdep].unihash
+                taskdepdata[revdep] = [pn, taskname, fn, deps, provides, taskhash, unihash]
                 for revdep2 in deps:
                     if revdep2 not in taskdepdata:
                         additional.append(revdep2)
-- 
2.19.2



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [OE-core][PATCH v5 4/8] bitbake: runqueue: Pass unique hash to task
  2018-12-19  3:10         ` [PATCH " Joshua Watt
@ 2018-12-19  3:10           ` Joshua Watt
  -1 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-19  3:10 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

The unique hash is now passed to the task in the BB_UNIHASH variable

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/bin/bitbake-worker |  7 ++++---
 bitbake/lib/bb/runqueue.py | 10 ++++++----
 2 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/bitbake/bin/bitbake-worker b/bitbake/bin/bitbake-worker
index cd687e6e433..a9e997e1f63 100755
--- a/bitbake/bin/bitbake-worker
+++ b/bitbake/bin/bitbake-worker
@@ -136,7 +136,7 @@ def sigterm_handler(signum, frame):
     os.killpg(0, signal.SIGTERM)
     sys.exit()
 
-def fork_off_task(cfg, data, databuilder, workerdata, fn, task, taskname, taskhash, appends, taskdepdata, extraconfigdata, quieterrors=False, dry_run_exec=False):
+def fork_off_task(cfg, data, databuilder, workerdata, fn, task, taskname, taskhash, unihash, appends, taskdepdata, extraconfigdata, quieterrors=False, dry_run_exec=False):
     # We need to setup the environment BEFORE the fork, since
     # a fork() or exec*() activates PSEUDO...
 
@@ -235,6 +235,7 @@ def fork_off_task(cfg, data, databuilder, workerdata, fn, task, taskname, taskha
 
                 the_data = bb_cache.loadDataFull(fn, appends)
                 the_data.setVar('BB_TASKHASH', taskhash)
+                the_data.setVar('BB_UNIHASH', unihash)
 
                 bb.utils.set_process_name("%s:%s" % (the_data.getVar("PN"), taskname.replace("do_", "")))
 
@@ -425,10 +426,10 @@ class BitbakeWorker(object):
         sys.exit(0)
 
     def handle_runtask(self, data):
-        fn, task, taskname, taskhash, quieterrors, appends, taskdepdata, dry_run_exec = pickle.loads(data)
+        fn, task, taskname, taskhash, unihash, quieterrors, appends, taskdepdata, dry_run_exec = pickle.loads(data)
         workerlog_write("Handling runtask %s %s %s\n" % (task, fn, taskname))
 
-        pid, pipein, pipeout = fork_off_task(self.cookercfg, self.data, self.databuilder, self.workerdata, fn, task, taskname, taskhash, appends, taskdepdata, self.extraconfigdata, quieterrors, dry_run_exec)
+        pid, pipein, pipeout = fork_off_task(self.cookercfg, self.data, self.databuilder, self.workerdata, fn, task, taskname, taskhash, unihash, appends, taskdepdata, self.extraconfigdata, quieterrors, dry_run_exec)
 
         self.build_pids[pid] = task
         self.build_pipes[pid] = runQueueWorkerPipe(pipein, pipeout)
diff --git a/bitbake/lib/bb/runqueue.py b/bitbake/lib/bb/runqueue.py
index 27b188256dd..de57dcb37b8 100644
--- a/bitbake/lib/bb/runqueue.py
+++ b/bitbake/lib/bb/runqueue.py
@@ -2034,6 +2034,7 @@ class RunQueueExecuteTasks(RunQueueExecute):
 
             taskdep = self.rqdata.dataCaches[mc].task_deps[taskfn]
             taskhash = self.rqdata.get_task_hash(task)
+            unihash = self.rqdata.get_task_unihash(task)
             if 'fakeroot' in taskdep and taskname in taskdep['fakeroot'] and not (self.cooker.configuration.dry_run or self.rqdata.setscene_enforce):
                 if not mc in self.rq.fakeworker:
                     try:
@@ -2043,10 +2044,10 @@ class RunQueueExecuteTasks(RunQueueExecute):
                         self.rq.state = runQueueFailed
                         self.stats.taskFailed()
                         return True
-                self.rq.fakeworker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, False, self.cooker.collection.get_file_appends(taskfn), taskdepdata, self.rqdata.setscene_enforce)) + b"</runtask>")
+                self.rq.fakeworker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, unihash, False, self.cooker.collection.get_file_appends(taskfn), taskdepdata, self.rqdata.setscene_enforce)) + b"</runtask>")
                 self.rq.fakeworker[mc].process.stdin.flush()
             else:
-                self.rq.worker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, False, self.cooker.collection.get_file_appends(taskfn), taskdepdata, self.rqdata.setscene_enforce)) + b"</runtask>")
+                self.rq.worker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, unihash, False, self.cooker.collection.get_file_appends(taskfn), taskdepdata, self.rqdata.setscene_enforce)) + b"</runtask>")
                 self.rq.worker[mc].process.stdin.flush()
 
             self.build_stamps[task] = bb.build.stampfile(taskname, self.rqdata.dataCaches[mc], taskfn, noextra=True)
@@ -2462,13 +2463,14 @@ class RunQueueExecuteScenequeue(RunQueueExecute):
 
             taskdep = self.rqdata.dataCaches[mc].task_deps[taskfn]
             taskhash = self.rqdata.get_task_hash(task)
+            unihash = self.rqdata.get_task_unihash(task)
             if 'fakeroot' in taskdep and taskname in taskdep['fakeroot'] and not self.cooker.configuration.dry_run:
                 if not mc in self.rq.fakeworker:
                     self.rq.start_fakeworker(self, mc)
-                self.rq.fakeworker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, True, self.cooker.collection.get_file_appends(taskfn), taskdepdata, False)) + b"</runtask>")
+                self.rq.fakeworker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, unihash, True, self.cooker.collection.get_file_appends(taskfn), taskdepdata, False)) + b"</runtask>")
                 self.rq.fakeworker[mc].process.stdin.flush()
             else:
-                self.rq.worker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, True, self.cooker.collection.get_file_appends(taskfn), taskdepdata, False)) + b"</runtask>")
+                self.rq.worker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, unihash, True, self.cooker.collection.get_file_appends(taskfn), taskdepdata, False)) + b"</runtask>")
                 self.rq.worker[mc].process.stdin.flush()
 
             self.build_stamps[task] = bb.build.stampfile(taskname, self.rqdata.dataCaches[mc], taskfn, noextra=True)
-- 
2.19.2



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [PATCH v5 4/8] bitbake: runqueue: Pass unique hash to task
@ 2018-12-19  3:10           ` Joshua Watt
  0 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-19  3:10 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

The unique hash is now passed to the task in the BB_UNIHASH variable

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/bin/bitbake-worker |  7 ++++---
 bitbake/lib/bb/runqueue.py | 10 ++++++----
 2 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/bitbake/bin/bitbake-worker b/bitbake/bin/bitbake-worker
index cd687e6e433..a9e997e1f63 100755
--- a/bitbake/bin/bitbake-worker
+++ b/bitbake/bin/bitbake-worker
@@ -136,7 +136,7 @@ def sigterm_handler(signum, frame):
     os.killpg(0, signal.SIGTERM)
     sys.exit()
 
-def fork_off_task(cfg, data, databuilder, workerdata, fn, task, taskname, taskhash, appends, taskdepdata, extraconfigdata, quieterrors=False, dry_run_exec=False):
+def fork_off_task(cfg, data, databuilder, workerdata, fn, task, taskname, taskhash, unihash, appends, taskdepdata, extraconfigdata, quieterrors=False, dry_run_exec=False):
     # We need to setup the environment BEFORE the fork, since
     # a fork() or exec*() activates PSEUDO...
 
@@ -235,6 +235,7 @@ def fork_off_task(cfg, data, databuilder, workerdata, fn, task, taskname, taskha
 
                 the_data = bb_cache.loadDataFull(fn, appends)
                 the_data.setVar('BB_TASKHASH', taskhash)
+                the_data.setVar('BB_UNIHASH', unihash)
 
                 bb.utils.set_process_name("%s:%s" % (the_data.getVar("PN"), taskname.replace("do_", "")))
 
@@ -425,10 +426,10 @@ class BitbakeWorker(object):
         sys.exit(0)
 
     def handle_runtask(self, data):
-        fn, task, taskname, taskhash, quieterrors, appends, taskdepdata, dry_run_exec = pickle.loads(data)
+        fn, task, taskname, taskhash, unihash, quieterrors, appends, taskdepdata, dry_run_exec = pickle.loads(data)
         workerlog_write("Handling runtask %s %s %s\n" % (task, fn, taskname))
 
-        pid, pipein, pipeout = fork_off_task(self.cookercfg, self.data, self.databuilder, self.workerdata, fn, task, taskname, taskhash, appends, taskdepdata, self.extraconfigdata, quieterrors, dry_run_exec)
+        pid, pipein, pipeout = fork_off_task(self.cookercfg, self.data, self.databuilder, self.workerdata, fn, task, taskname, taskhash, unihash, appends, taskdepdata, self.extraconfigdata, quieterrors, dry_run_exec)
 
         self.build_pids[pid] = task
         self.build_pipes[pid] = runQueueWorkerPipe(pipein, pipeout)
diff --git a/bitbake/lib/bb/runqueue.py b/bitbake/lib/bb/runqueue.py
index 27b188256dd..de57dcb37b8 100644
--- a/bitbake/lib/bb/runqueue.py
+++ b/bitbake/lib/bb/runqueue.py
@@ -2034,6 +2034,7 @@ class RunQueueExecuteTasks(RunQueueExecute):
 
             taskdep = self.rqdata.dataCaches[mc].task_deps[taskfn]
             taskhash = self.rqdata.get_task_hash(task)
+            unihash = self.rqdata.get_task_unihash(task)
             if 'fakeroot' in taskdep and taskname in taskdep['fakeroot'] and not (self.cooker.configuration.dry_run or self.rqdata.setscene_enforce):
                 if not mc in self.rq.fakeworker:
                     try:
@@ -2043,10 +2044,10 @@ class RunQueueExecuteTasks(RunQueueExecute):
                         self.rq.state = runQueueFailed
                         self.stats.taskFailed()
                         return True
-                self.rq.fakeworker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, False, self.cooker.collection.get_file_appends(taskfn), taskdepdata, self.rqdata.setscene_enforce)) + b"</runtask>")
+                self.rq.fakeworker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, unihash, False, self.cooker.collection.get_file_appends(taskfn), taskdepdata, self.rqdata.setscene_enforce)) + b"</runtask>")
                 self.rq.fakeworker[mc].process.stdin.flush()
             else:
-                self.rq.worker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, False, self.cooker.collection.get_file_appends(taskfn), taskdepdata, self.rqdata.setscene_enforce)) + b"</runtask>")
+                self.rq.worker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, unihash, False, self.cooker.collection.get_file_appends(taskfn), taskdepdata, self.rqdata.setscene_enforce)) + b"</runtask>")
                 self.rq.worker[mc].process.stdin.flush()
 
             self.build_stamps[task] = bb.build.stampfile(taskname, self.rqdata.dataCaches[mc], taskfn, noextra=True)
@@ -2462,13 +2463,14 @@ class RunQueueExecuteScenequeue(RunQueueExecute):
 
             taskdep = self.rqdata.dataCaches[mc].task_deps[taskfn]
             taskhash = self.rqdata.get_task_hash(task)
+            unihash = self.rqdata.get_task_unihash(task)
             if 'fakeroot' in taskdep and taskname in taskdep['fakeroot'] and not self.cooker.configuration.dry_run:
                 if not mc in self.rq.fakeworker:
                     self.rq.start_fakeworker(self, mc)
-                self.rq.fakeworker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, True, self.cooker.collection.get_file_appends(taskfn), taskdepdata, False)) + b"</runtask>")
+                self.rq.fakeworker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, unihash, True, self.cooker.collection.get_file_appends(taskfn), taskdepdata, False)) + b"</runtask>")
                 self.rq.fakeworker[mc].process.stdin.flush()
             else:
-                self.rq.worker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, True, self.cooker.collection.get_file_appends(taskfn), taskdepdata, False)) + b"</runtask>")
+                self.rq.worker[mc].process.stdin.write(b"<runtask>" + pickle.dumps((taskfn, task, taskname, taskhash, unihash, True, self.cooker.collection.get_file_appends(taskfn), taskdepdata, False)) + b"</runtask>")
                 self.rq.worker[mc].process.stdin.flush()
 
             self.build_stamps[task] = bb.build.stampfile(taskname, self.rqdata.dataCaches[mc], taskfn, noextra=True)
-- 
2.19.2



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [OE-core][PATCH v5 5/8] bitbake: runqueue: Pass unique hash to hash validate
  2018-12-19  3:10         ` [PATCH " Joshua Watt
@ 2018-12-19  3:10           ` Joshua Watt
  -1 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-19  3:10 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

If the unique hash is being used to track task dependencies, the hash
validation function needs to know about it in order to properly validate
the hash.

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/lib/bb/runqueue.py | 41 ++++++++++++++++++++++++++++----------
 1 file changed, 30 insertions(+), 11 deletions(-)

diff --git a/bitbake/lib/bb/runqueue.py b/bitbake/lib/bb/runqueue.py
index de57dcb37b8..f44eff46759 100644
--- a/bitbake/lib/bb/runqueue.py
+++ b/bitbake/lib/bb/runqueue.py
@@ -1378,6 +1378,26 @@ class RunQueue:
             cache[tid] = iscurrent
         return iscurrent
 
+    def validate_hash(self, *, sq_fn, sq_task, sq_hash, sq_hashfn, siginfo, sq_unihash, d):
+        locs = {"sq_fn" : sq_fn, "sq_task" : sq_task, "sq_hash" : sq_hash, "sq_hashfn" : sq_hashfn,
+                "sq_unihash" : sq_unihash, "siginfo" : siginfo, "d" : d}
+
+        hashvalidate_args = ("(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=siginfo, sq_unihash=sq_unihash)",
+                             "(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=siginfo)",
+                             "(sq_fn, sq_task, sq_hash, sq_hashfn, d)")
+
+        for args in hashvalidate_args[:-1]:
+            try:
+                call = self.hashvalidate + args
+                return bb.utils.better_eval(call, locs)
+            except TypeError:
+                continue
+
+        # Call the last entry without a try...catch to propagate any thrown
+        # TypeError
+        call = self.hashvalidate + hashvalidate_args[-1]
+        return bb.utils.better_eval(call, locs)
+
     def _execute_runqueue(self):
         """
         Run the tasks in a queue prepared by rqdata.prepare()
@@ -1549,6 +1569,7 @@ class RunQueue:
         valid = []
         sq_hash = []
         sq_hashfn = []
+        sq_unihash = []
         sq_fn = []
         sq_taskname = []
         sq_task = []
@@ -1567,16 +1588,13 @@ class RunQueue:
             sq_fn.append(fn)
             sq_hashfn.append(self.rqdata.dataCaches[mc].hashfn[taskfn])
             sq_hash.append(self.rqdata.runtaskentries[tid].hash)
+            sq_unihash.append(self.rqdata.runtaskentries[tid].unihash)
             sq_taskname.append(taskname)
             sq_task.append(tid)
-        locs = { "sq_fn" : sq_fn, "sq_task" : sq_taskname, "sq_hash" : sq_hash, "sq_hashfn" : sq_hashfn, "d" : self.cooker.data }
-        try:
-            call = self.hashvalidate + "(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=True)"
-            valid = bb.utils.better_eval(call, locs)
-        # Handle version with no siginfo parameter
-        except TypeError:
-            call = self.hashvalidate + "(sq_fn, sq_task, sq_hash, sq_hashfn, d)"
-            valid = bb.utils.better_eval(call, locs)
+
+        valid = self.validate_hash(sq_fn=sq_fn, sq_task=sq_taskname, sq_hash=sq_hash, sq_hashfn=sq_hashfn,
+                siginfo=True, sq_unihash=sq_unihash, d=self.cooker.data)
+
         for v in valid:
             valid_new.add(sq_task[v])
 
@@ -2293,6 +2311,7 @@ class RunQueueExecuteScenequeue(RunQueueExecute):
         if self.rq.hashvalidate:
             sq_hash = []
             sq_hashfn = []
+            sq_unihash = []
             sq_fn = []
             sq_taskname = []
             sq_task = []
@@ -2324,14 +2343,14 @@ class RunQueueExecuteScenequeue(RunQueueExecute):
                 sq_fn.append(fn)
                 sq_hashfn.append(self.rqdata.dataCaches[mc].hashfn[taskfn])
                 sq_hash.append(self.rqdata.runtaskentries[tid].hash)
+                sq_unihash.append(self.rqdata.runtaskentries[tid].unihash)
                 sq_taskname.append(taskname)
                 sq_task.append(tid)
 
             self.cooker.data.setVar("BB_SETSCENE_STAMPCURRENT_COUNT", len(stamppresent))
 
-            call = self.rq.hashvalidate + "(sq_fn, sq_task, sq_hash, sq_hashfn, d)"
-            locs = { "sq_fn" : sq_fn, "sq_task" : sq_taskname, "sq_hash" : sq_hash, "sq_hashfn" : sq_hashfn, "d" : self.cooker.data }
-            valid = bb.utils.better_eval(call, locs)
+            valid = self.rq.validate_hash(sq_fn=sq_fn, sq_task=sq_taskname, sq_hash=sq_hash, sq_hashfn=sq_hashfn,
+                    siginfo=False, sq_unihash=sq_unihash, d=self.cooker.data)
 
             self.cooker.data.delVar("BB_SETSCENE_STAMPCURRENT_COUNT")
 
-- 
2.19.2



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [PATCH v5 5/8] bitbake: runqueue: Pass unique hash to hash validate
@ 2018-12-19  3:10           ` Joshua Watt
  0 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-19  3:10 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

If the unique hash is being used to track task dependencies, the hash
validation function needs to know about it in order to properly validate
the hash.

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/lib/bb/runqueue.py | 41 ++++++++++++++++++++++++++++----------
 1 file changed, 30 insertions(+), 11 deletions(-)

diff --git a/bitbake/lib/bb/runqueue.py b/bitbake/lib/bb/runqueue.py
index de57dcb37b8..f44eff46759 100644
--- a/bitbake/lib/bb/runqueue.py
+++ b/bitbake/lib/bb/runqueue.py
@@ -1378,6 +1378,26 @@ class RunQueue:
             cache[tid] = iscurrent
         return iscurrent
 
+    def validate_hash(self, *, sq_fn, sq_task, sq_hash, sq_hashfn, siginfo, sq_unihash, d):
+        locs = {"sq_fn" : sq_fn, "sq_task" : sq_task, "sq_hash" : sq_hash, "sq_hashfn" : sq_hashfn,
+                "sq_unihash" : sq_unihash, "siginfo" : siginfo, "d" : d}
+
+        hashvalidate_args = ("(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=siginfo, sq_unihash=sq_unihash)",
+                             "(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=siginfo)",
+                             "(sq_fn, sq_task, sq_hash, sq_hashfn, d)")
+
+        for args in hashvalidate_args[:-1]:
+            try:
+                call = self.hashvalidate + args
+                return bb.utils.better_eval(call, locs)
+            except TypeError:
+                continue
+
+        # Call the last entry without a try...catch to propagate any thrown
+        # TypeError
+        call = self.hashvalidate + hashvalidate_args[-1]
+        return bb.utils.better_eval(call, locs)
+
     def _execute_runqueue(self):
         """
         Run the tasks in a queue prepared by rqdata.prepare()
@@ -1549,6 +1569,7 @@ class RunQueue:
         valid = []
         sq_hash = []
         sq_hashfn = []
+        sq_unihash = []
         sq_fn = []
         sq_taskname = []
         sq_task = []
@@ -1567,16 +1588,13 @@ class RunQueue:
             sq_fn.append(fn)
             sq_hashfn.append(self.rqdata.dataCaches[mc].hashfn[taskfn])
             sq_hash.append(self.rqdata.runtaskentries[tid].hash)
+            sq_unihash.append(self.rqdata.runtaskentries[tid].unihash)
             sq_taskname.append(taskname)
             sq_task.append(tid)
-        locs = { "sq_fn" : sq_fn, "sq_task" : sq_taskname, "sq_hash" : sq_hash, "sq_hashfn" : sq_hashfn, "d" : self.cooker.data }
-        try:
-            call = self.hashvalidate + "(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=True)"
-            valid = bb.utils.better_eval(call, locs)
-        # Handle version with no siginfo parameter
-        except TypeError:
-            call = self.hashvalidate + "(sq_fn, sq_task, sq_hash, sq_hashfn, d)"
-            valid = bb.utils.better_eval(call, locs)
+
+        valid = self.validate_hash(sq_fn=sq_fn, sq_task=sq_taskname, sq_hash=sq_hash, sq_hashfn=sq_hashfn,
+                siginfo=True, sq_unihash=sq_unihash, d=self.cooker.data)
+
         for v in valid:
             valid_new.add(sq_task[v])
 
@@ -2293,6 +2311,7 @@ class RunQueueExecuteScenequeue(RunQueueExecute):
         if self.rq.hashvalidate:
             sq_hash = []
             sq_hashfn = []
+            sq_unihash = []
             sq_fn = []
             sq_taskname = []
             sq_task = []
@@ -2324,14 +2343,14 @@ class RunQueueExecuteScenequeue(RunQueueExecute):
                 sq_fn.append(fn)
                 sq_hashfn.append(self.rqdata.dataCaches[mc].hashfn[taskfn])
                 sq_hash.append(self.rqdata.runtaskentries[tid].hash)
+                sq_unihash.append(self.rqdata.runtaskentries[tid].unihash)
                 sq_taskname.append(taskname)
                 sq_task.append(tid)
 
             self.cooker.data.setVar("BB_SETSCENE_STAMPCURRENT_COUNT", len(stamppresent))
 
-            call = self.rq.hashvalidate + "(sq_fn, sq_task, sq_hash, sq_hashfn, d)"
-            locs = { "sq_fn" : sq_fn, "sq_task" : sq_taskname, "sq_hash" : sq_hash, "sq_hashfn" : sq_hashfn, "d" : self.cooker.data }
-            valid = bb.utils.better_eval(call, locs)
+            valid = self.rq.validate_hash(sq_fn=sq_fn, sq_task=sq_taskname, sq_hash=sq_hash, sq_hashfn=sq_hashfn,
+                    siginfo=False, sq_unihash=sq_unihash, d=self.cooker.data)
 
             self.cooker.data.delVar("BB_SETSCENE_STAMPCURRENT_COUNT")
 
-- 
2.19.2



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [OE-core][PATCH v5 6/8] classes/sstate: Handle unihash in hash check
  2018-12-19  3:10         ` [PATCH " Joshua Watt
@ 2018-12-19  3:10           ` Joshua Watt
  -1 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-19  3:10 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

Handles the argument that passes task unique hash in the hash check
function, as it is now required by bitbake

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 meta/classes/sstate.bbclass | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/meta/classes/sstate.bbclass b/meta/classes/sstate.bbclass
index 8b48ab465fd..41a2f9b7b77 100644
--- a/meta/classes/sstate.bbclass
+++ b/meta/classes/sstate.bbclass
@@ -774,7 +774,7 @@ sstate_unpack_package () {
 
 BB_HASHCHECK_FUNCTION = "sstate_checkhashes"
 
-def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False):
+def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False, *, sq_unihash):
 
     ret = []
     missed = []
-- 
2.19.2



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [PATCH v5 6/8] classes/sstate: Handle unihash in hash check
@ 2018-12-19  3:10           ` Joshua Watt
  0 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-19  3:10 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

Handles the argument that passes task unique hash in the hash check
function, as it is now required by bitbake

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 meta/classes/sstate.bbclass | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/meta/classes/sstate.bbclass b/meta/classes/sstate.bbclass
index 8b48ab465fd..41a2f9b7b77 100644
--- a/meta/classes/sstate.bbclass
+++ b/meta/classes/sstate.bbclass
@@ -774,7 +774,7 @@ sstate_unpack_package () {
 
 BB_HASHCHECK_FUNCTION = "sstate_checkhashes"
 
-def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False):
+def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False, *, sq_unihash):
 
     ret = []
     missed = []
-- 
2.19.2



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [OE-core][PATCH v5 7/8] bitbake: hashserv: Add hash equivalence reference server
  2018-12-19  3:10         ` [PATCH " Joshua Watt
@ 2018-12-19  3:10           ` Joshua Watt
  -1 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-19  3:10 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

Implements a reference implementation of the hash equivalence server.
This server has minimal dependencies (and no dependencies outside of the
standard Python library), and implements the minimum required to be a
conforming hash equivalence server.

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/bin/bitbake-hashserv     |  67 ++++++++++++++
 bitbake/bin/bitbake-selftest     |   2 +
 bitbake/lib/hashserv/__init__.py | 152 +++++++++++++++++++++++++++++++
 bitbake/lib/hashserv/tests.py    | 141 ++++++++++++++++++++++++++++
 4 files changed, 362 insertions(+)
 create mode 100755 bitbake/bin/bitbake-hashserv
 create mode 100644 bitbake/lib/hashserv/__init__.py
 create mode 100644 bitbake/lib/hashserv/tests.py

diff --git a/bitbake/bin/bitbake-hashserv b/bitbake/bin/bitbake-hashserv
new file mode 100755
index 00000000000..c49397b73a5
--- /dev/null
+++ b/bitbake/bin/bitbake-hashserv
@@ -0,0 +1,67 @@
+#! /usr/bin/env python3
+#
+# Copyright (C) 2018 Garmin Ltd.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License version 2 as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License along
+# with this program; if not, write to the Free Software Foundation, Inc.,
+# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+import os
+import sys
+import logging
+import argparse
+import sqlite3
+
+sys.path.insert(0, os.path.join(os.path.dirname(os.path.dirname(__file__)),'lib'))
+
+import hashserv
+
+VERSION = "1.0.0"
+
+DEFAULT_HOST = ''
+DEFAULT_PORT = 8686
+
+def main():
+    parser = argparse.ArgumentParser(description='HTTP Equivalence Reference Server. Version=%s' % VERSION)
+    parser.add_argument('--address', default=DEFAULT_HOST, help='Bind address (default "%(default)s")')
+    parser.add_argument('--port', type=int, default=DEFAULT_PORT, help='Bind port (default %(default)d)')
+    parser.add_argument('--prefix', default='', help='HTTP path prefix (default "%(default)s")')
+    parser.add_argument('--database', default='./hashserv.db', help='Database file (default "%(default)s")')
+    parser.add_argument('--log', default='WARNING', help='Set logging level')
+
+    args = parser.parse_args()
+
+    logger = logging.getLogger('hashserv')
+
+    level = getattr(logging, args.log.upper(), None)
+    if not isinstance(level, int):
+        raise ValueError('Invalid log level: %s' % args.log)
+
+    logger.setLevel(level)
+    console = logging.StreamHandler()
+    console.setLevel(level)
+    logger.addHandler(console)
+
+    db = sqlite3.connect(args.database)
+
+    server = hashserv.create_server((args.address, args.port), db, args.prefix)
+    server.serve_forever()
+    return 0
+
+if __name__ == '__main__':
+    try:
+        ret = main()
+    except Exception:
+        ret = 1
+        import traceback
+        traceback.print_exc()
+    sys.exit(ret)
+
diff --git a/bitbake/bin/bitbake-selftest b/bitbake/bin/bitbake-selftest
index c970dcae90c..99f1af910f4 100755
--- a/bitbake/bin/bitbake-selftest
+++ b/bitbake/bin/bitbake-selftest
@@ -22,6 +22,7 @@ sys.path.insert(0, os.path.join(os.path.dirname(os.path.dirname(__file__)), 'lib
 import unittest
 try:
     import bb
+    import hashserv
     import layerindexlib
 except RuntimeError as exc:
     sys.exit(str(exc))
@@ -35,6 +36,7 @@ tests = ["bb.tests.codeparser",
          "bb.tests.parse",
          "bb.tests.persist_data",
          "bb.tests.utils",
+         "hashserv.tests",
          "layerindexlib.tests.layerindexobj",
          "layerindexlib.tests.restapi",
          "layerindexlib.tests.cooker"]
diff --git a/bitbake/lib/hashserv/__init__.py b/bitbake/lib/hashserv/__init__.py
new file mode 100644
index 00000000000..46bca7cab32
--- /dev/null
+++ b/bitbake/lib/hashserv/__init__.py
@@ -0,0 +1,152 @@
+# Copyright (C) 2018 Garmin Ltd.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License version 2 as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License along
+# with this program; if not, write to the Free Software Foundation, Inc.,
+# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+
+from http.server import BaseHTTPRequestHandler, HTTPServer
+import contextlib
+import urllib.parse
+import sqlite3
+import json
+import traceback
+import logging
+from datetime import datetime
+
+logger = logging.getLogger('hashserv')
+
+class HashEquivalenceServer(BaseHTTPRequestHandler):
+    def log_message(self, f, *args):
+        logger.debug(f, *args)
+
+    def do_GET(self):
+        try:
+            p = urllib.parse.urlparse(self.path)
+
+            if p.path != self.prefix + '/v1/equivalent':
+                self.send_error(404)
+                return
+
+            query = urllib.parse.parse_qs(p.query, strict_parsing=True)
+            method = query['method'][0]
+            taskhash = query['taskhash'][0]
+
+            d = None
+            with contextlib.closing(self.db.cursor()) as cursor:
+                cursor.execute('SELECT taskhash, method, unihash FROM tasks_v1 WHERE method=:method AND taskhash=:taskhash ORDER BY created ASC LIMIT 1',
+                        {'method': method, 'taskhash': taskhash})
+
+                row = cursor.fetchone()
+
+                if row is not None:
+                    logger.debug('Found equivalent task %s', row['taskhash'])
+                    d = {k: row[k] for k in ('taskhash', 'method', 'unihash')}
+
+            self.send_response(200)
+            self.send_header('Content-Type', 'application/json; charset=utf-8')
+            self.end_headers()
+            self.wfile.write(json.dumps(d).encode('utf-8'))
+        except:
+            logger.exception('Error in GET')
+            self.send_error(400, explain=traceback.format_exc())
+            return
+
+    def do_POST(self):
+        try:
+            p = urllib.parse.urlparse(self.path)
+
+            if p.path != self.prefix + '/v1/equivalent':
+                self.send_error(404)
+                return
+
+            length = int(self.headers['content-length'])
+            data = json.loads(self.rfile.read(length).decode('utf-8'))
+
+            with contextlib.closing(self.db.cursor()) as cursor:
+                cursor.execute('''
+                    SELECT taskhash, method, unihash FROM tasks_v1 WHERE method=:method AND outhash=:outhash
+                    ORDER BY CASE WHEN taskhash=:taskhash THEN 1 ELSE 2 END,
+                        created ASC
+                    LIMIT 1
+                    ''', {k: data[k] for k in ('method', 'outhash', 'taskhash')})
+
+                row = cursor.fetchone()
+
+                if row is None or row['taskhash'] != data['taskhash']:
+                    unihash = data['unihash']
+                    if row is not None:
+                        unihash = row['unihash']
+
+                    insert_data = {
+                            'method': data['method'],
+                            'outhash': data['outhash'],
+                            'taskhash': data['taskhash'],
+                            'unihash': unihash,
+                            'created': datetime.now()
+                            }
+
+                    for k in ('owner', 'PN', 'PV', 'PR', 'task', 'outhash_siginfo'):
+                        if k in data:
+                            insert_data[k] = data[k]
+
+                    cursor.execute('''INSERT INTO tasks_v1 (%s) VALUES (%s)''' % (
+                            ', '.join(sorted(insert_data.keys())),
+                            ', '.join(':' + k for k in sorted(insert_data.keys()))),
+                        insert_data)
+
+                    logger.info('Adding taskhash %s with unihash %s', data['taskhash'], unihash)
+                    cursor.execute('SELECT taskhash, method, unihash FROM tasks_v1 WHERE id=:id', {'id': cursor.lastrowid})
+                    row = cursor.fetchone()
+
+                    self.db.commit()
+
+                d = {k: row[k] for k in ('taskhash', 'method', 'unihash')}
+
+                self.send_response(200)
+                self.send_header('Content-Type', 'application/json; charset=utf-8')
+                self.end_headers()
+                self.wfile.write(json.dumps(d).encode('utf-8'))
+        except:
+            logger.exception('Error in POST')
+            self.send_error(400, explain=traceback.format_exc())
+            return
+
+def create_server(addr, db, prefix=''):
+    class Handler(HashEquivalenceServer):
+        pass
+
+    Handler.prefix = prefix
+    Handler.db = db
+    db.row_factory = sqlite3.Row
+
+    with contextlib.closing(db.cursor()) as cursor:
+        cursor.execute('''
+            CREATE TABLE IF NOT EXISTS tasks_v1 (
+                id INTEGER PRIMARY KEY AUTOINCREMENT,
+                method TEXT NOT NULL,
+                outhash TEXT NOT NULL,
+                taskhash TEXT NOT NULL,
+                unihash TEXT NOT NULL,
+                created DATETIME,
+
+                -- Optional fields
+                owner TEXT,
+                PN TEXT,
+                PV TEXT,
+                PR TEXT,
+                task TEXT,
+                outhash_siginfo TEXT
+                )
+            ''')
+
+    logger.info('Starting server on %s', addr)
+    return HTTPServer(addr, Handler)
diff --git a/bitbake/lib/hashserv/tests.py b/bitbake/lib/hashserv/tests.py
new file mode 100644
index 00000000000..806b54c5ebd
--- /dev/null
+++ b/bitbake/lib/hashserv/tests.py
@@ -0,0 +1,141 @@
+#! /usr/bin/env python3
+#
+# Copyright (C) 2018 Garmin Ltd.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License version 2 as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License along
+# with this program; if not, write to the Free Software Foundation, Inc.,
+# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+
+import unittest
+import threading
+import sqlite3
+import hashlib
+import urllib.request
+import json
+from . import create_server
+
+class TestHashEquivalenceServer(unittest.TestCase):
+    def setUp(self):
+        # Start an in memory hash equivalence server in the background bound to
+        # an ephemeral port
+        db = sqlite3.connect(':memory:', check_same_thread=False)
+        self.server = create_server(('localhost', 0), db)
+        self.server_addr = 'http://localhost:%d' % self.server.socket.getsockname()[1]
+        self.server_thread = threading.Thread(target=self.server.serve_forever)
+        self.server_thread.start()
+
+    def tearDown(self):
+        # Shutdown server
+        s = getattr(self, 'server', None)
+        if s is not None:
+            self.server.shutdown()
+            self.server_thread.join()
+            self.server.server_close()
+
+    def send_get(self, path):
+        url = '%s/%s' % (self.server_addr, path)
+        request = urllib.request.Request(url)
+        response = urllib.request.urlopen(request)
+        return json.loads(response.read().decode('utf-8'))
+
+    def send_post(self, path, data):
+        headers = {'content-type': 'application/json'}
+        url = '%s/%s' % (self.server_addr, path)
+        request = urllib.request.Request(url, json.dumps(data).encode('utf-8'), headers)
+        response = urllib.request.urlopen(request)
+        return json.loads(response.read().decode('utf-8'))
+
+    def test_create_hash(self):
+        # Simple test that hashes can be created
+        taskhash = '35788efcb8dfb0a02659d81cf2bfd695fb30faf9'
+        outhash = '2765d4a5884be49b28601445c2760c5f21e7e5c0ee2b7e3fce98fd7e5970796f'
+        unihash = 'f46d3fbb439bd9b921095da657a4de906510d2cd'
+
+        d = self.send_get('v1/equivalent?method=TestMethod&taskhash=%s' % taskhash)
+        self.assertIsNone(d, msg='Found unexpected task, %r' % d)
+
+        d = self.send_post('v1/equivalent', {
+            'taskhash': taskhash,
+            'method': 'TestMethod',
+            'outhash': outhash,
+            'unihash': unihash,
+            })
+        self.assertEqual(d['unihash'], unihash, 'Server returned bad unihash')
+
+    def test_create_equivalent(self):
+        # Tests that a second reported task with the same outhash will be
+        # assigned the same unihash
+        taskhash = '53b8dce672cb6d0c73170be43f540460bfc347b4'
+        outhash = '5a9cb1649625f0bf41fc7791b635cd9c2d7118c7f021ba87dcd03f72b67ce7a8'
+        unihash = 'f37918cc02eb5a520b1aff86faacbc0a38124646'
+        d = self.send_post('v1/equivalent', {
+            'taskhash': taskhash,
+            'method': 'TestMethod',
+            'outhash': outhash,
+            'unihash': unihash,
+            })
+        self.assertEqual(d['unihash'], unihash, 'Server returned bad unihash')
+
+        # Report a different task with the same outhash. The returned unihash
+        # should match the first task
+        taskhash2 = '3bf6f1e89d26205aec90da04854fbdbf73afe6b4'
+        unihash2 = 'af36b199320e611fbb16f1f277d3ee1d619ca58b'
+        d = self.send_post('v1/equivalent', {
+            'taskhash': taskhash2,
+            'method': 'TestMethod',
+            'outhash': outhash,
+            'unihash': unihash2,
+            })
+        self.assertEqual(d['unihash'], unihash, 'Server returned bad unihash')
+
+    def test_duplicate_taskhash(self):
+        # Tests that duplicate reports of the same taskhash with different
+        # outhash & unihash always return the unihash from the first reported
+        # taskhash
+        taskhash = '8aa96fcffb5831b3c2c0cb75f0431e3f8b20554a'
+        outhash = 'afe240a439959ce86f5e322f8c208e1fedefea9e813f2140c81af866cc9edf7e'
+        unihash = '218e57509998197d570e2c98512d0105985dffc9'
+        d = self.send_post('v1/equivalent', {
+            'taskhash': taskhash,
+            'method': 'TestMethod',
+            'outhash': outhash,
+            'unihash': unihash,
+            })
+
+        d = self.send_get('v1/equivalent?method=TestMethod&taskhash=%s' % taskhash)
+        self.assertEqual(d['unihash'], unihash)
+
+        outhash2 = '0904a7fe3dc712d9fd8a74a616ddca2a825a8ee97adf0bd3fc86082c7639914d'
+        unihash2 = 'ae9a7d252735f0dafcdb10e2e02561ca3a47314c'
+        d = self.send_post('v1/equivalent', {
+            'taskhash': taskhash,
+            'method': 'TestMethod',
+            'outhash': outhash2,
+            'unihash': unihash2
+            })
+
+        d = self.send_get('v1/equivalent?method=TestMethod&taskhash=%s' % taskhash)
+        self.assertEqual(d['unihash'], unihash)
+
+        outhash3 = '77623a549b5b1a31e3732dfa8fe61d7ce5d44b3370f253c5360e136b852967b4'
+        unihash3 = '9217a7d6398518e5dc002ed58f2cbbbc78696603'
+        d = self.send_post('v1/equivalent', {
+            'taskhash': taskhash,
+            'method': 'TestMethod',
+            'outhash': outhash3,
+            'unihash': unihash3
+            })
+
+        d = self.send_get('v1/equivalent?method=TestMethod&taskhash=%s' % taskhash)
+        self.assertEqual(d['unihash'], unihash)
+
+
-- 
2.19.2



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [PATCH v5 7/8] bitbake: hashserv: Add hash equivalence reference server
@ 2018-12-19  3:10           ` Joshua Watt
  0 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-19  3:10 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

Implements a reference implementation of the hash equivalence server.
This server has minimal dependencies (and no dependencies outside of the
standard Python library), and implements the minimum required to be a
conforming hash equivalence server.

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/bin/bitbake-hashserv     |  67 ++++++++++++++
 bitbake/bin/bitbake-selftest     |   2 +
 bitbake/lib/hashserv/__init__.py | 152 +++++++++++++++++++++++++++++++
 bitbake/lib/hashserv/tests.py    | 141 ++++++++++++++++++++++++++++
 4 files changed, 362 insertions(+)
 create mode 100755 bitbake/bin/bitbake-hashserv
 create mode 100644 bitbake/lib/hashserv/__init__.py
 create mode 100644 bitbake/lib/hashserv/tests.py

diff --git a/bitbake/bin/bitbake-hashserv b/bitbake/bin/bitbake-hashserv
new file mode 100755
index 00000000000..c49397b73a5
--- /dev/null
+++ b/bitbake/bin/bitbake-hashserv
@@ -0,0 +1,67 @@
+#! /usr/bin/env python3
+#
+# Copyright (C) 2018 Garmin Ltd.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License version 2 as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License along
+# with this program; if not, write to the Free Software Foundation, Inc.,
+# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+import os
+import sys
+import logging
+import argparse
+import sqlite3
+
+sys.path.insert(0, os.path.join(os.path.dirname(os.path.dirname(__file__)),'lib'))
+
+import hashserv
+
+VERSION = "1.0.0"
+
+DEFAULT_HOST = ''
+DEFAULT_PORT = 8686
+
+def main():
+    parser = argparse.ArgumentParser(description='HTTP Equivalence Reference Server. Version=%s' % VERSION)
+    parser.add_argument('--address', default=DEFAULT_HOST, help='Bind address (default "%(default)s")')
+    parser.add_argument('--port', type=int, default=DEFAULT_PORT, help='Bind port (default %(default)d)')
+    parser.add_argument('--prefix', default='', help='HTTP path prefix (default "%(default)s")')
+    parser.add_argument('--database', default='./hashserv.db', help='Database file (default "%(default)s")')
+    parser.add_argument('--log', default='WARNING', help='Set logging level')
+
+    args = parser.parse_args()
+
+    logger = logging.getLogger('hashserv')
+
+    level = getattr(logging, args.log.upper(), None)
+    if not isinstance(level, int):
+        raise ValueError('Invalid log level: %s' % args.log)
+
+    logger.setLevel(level)
+    console = logging.StreamHandler()
+    console.setLevel(level)
+    logger.addHandler(console)
+
+    db = sqlite3.connect(args.database)
+
+    server = hashserv.create_server((args.address, args.port), db, args.prefix)
+    server.serve_forever()
+    return 0
+
+if __name__ == '__main__':
+    try:
+        ret = main()
+    except Exception:
+        ret = 1
+        import traceback
+        traceback.print_exc()
+    sys.exit(ret)
+
diff --git a/bitbake/bin/bitbake-selftest b/bitbake/bin/bitbake-selftest
index c970dcae90c..99f1af910f4 100755
--- a/bitbake/bin/bitbake-selftest
+++ b/bitbake/bin/bitbake-selftest
@@ -22,6 +22,7 @@ sys.path.insert(0, os.path.join(os.path.dirname(os.path.dirname(__file__)), 'lib
 import unittest
 try:
     import bb
+    import hashserv
     import layerindexlib
 except RuntimeError as exc:
     sys.exit(str(exc))
@@ -35,6 +36,7 @@ tests = ["bb.tests.codeparser",
          "bb.tests.parse",
          "bb.tests.persist_data",
          "bb.tests.utils",
+         "hashserv.tests",
          "layerindexlib.tests.layerindexobj",
          "layerindexlib.tests.restapi",
          "layerindexlib.tests.cooker"]
diff --git a/bitbake/lib/hashserv/__init__.py b/bitbake/lib/hashserv/__init__.py
new file mode 100644
index 00000000000..46bca7cab32
--- /dev/null
+++ b/bitbake/lib/hashserv/__init__.py
@@ -0,0 +1,152 @@
+# Copyright (C) 2018 Garmin Ltd.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License version 2 as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License along
+# with this program; if not, write to the Free Software Foundation, Inc.,
+# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+
+from http.server import BaseHTTPRequestHandler, HTTPServer
+import contextlib
+import urllib.parse
+import sqlite3
+import json
+import traceback
+import logging
+from datetime import datetime
+
+logger = logging.getLogger('hashserv')
+
+class HashEquivalenceServer(BaseHTTPRequestHandler):
+    def log_message(self, f, *args):
+        logger.debug(f, *args)
+
+    def do_GET(self):
+        try:
+            p = urllib.parse.urlparse(self.path)
+
+            if p.path != self.prefix + '/v1/equivalent':
+                self.send_error(404)
+                return
+
+            query = urllib.parse.parse_qs(p.query, strict_parsing=True)
+            method = query['method'][0]
+            taskhash = query['taskhash'][0]
+
+            d = None
+            with contextlib.closing(self.db.cursor()) as cursor:
+                cursor.execute('SELECT taskhash, method, unihash FROM tasks_v1 WHERE method=:method AND taskhash=:taskhash ORDER BY created ASC LIMIT 1',
+                        {'method': method, 'taskhash': taskhash})
+
+                row = cursor.fetchone()
+
+                if row is not None:
+                    logger.debug('Found equivalent task %s', row['taskhash'])
+                    d = {k: row[k] for k in ('taskhash', 'method', 'unihash')}
+
+            self.send_response(200)
+            self.send_header('Content-Type', 'application/json; charset=utf-8')
+            self.end_headers()
+            self.wfile.write(json.dumps(d).encode('utf-8'))
+        except:
+            logger.exception('Error in GET')
+            self.send_error(400, explain=traceback.format_exc())
+            return
+
+    def do_POST(self):
+        try:
+            p = urllib.parse.urlparse(self.path)
+
+            if p.path != self.prefix + '/v1/equivalent':
+                self.send_error(404)
+                return
+
+            length = int(self.headers['content-length'])
+            data = json.loads(self.rfile.read(length).decode('utf-8'))
+
+            with contextlib.closing(self.db.cursor()) as cursor:
+                cursor.execute('''
+                    SELECT taskhash, method, unihash FROM tasks_v1 WHERE method=:method AND outhash=:outhash
+                    ORDER BY CASE WHEN taskhash=:taskhash THEN 1 ELSE 2 END,
+                        created ASC
+                    LIMIT 1
+                    ''', {k: data[k] for k in ('method', 'outhash', 'taskhash')})
+
+                row = cursor.fetchone()
+
+                if row is None or row['taskhash'] != data['taskhash']:
+                    unihash = data['unihash']
+                    if row is not None:
+                        unihash = row['unihash']
+
+                    insert_data = {
+                            'method': data['method'],
+                            'outhash': data['outhash'],
+                            'taskhash': data['taskhash'],
+                            'unihash': unihash,
+                            'created': datetime.now()
+                            }
+
+                    for k in ('owner', 'PN', 'PV', 'PR', 'task', 'outhash_siginfo'):
+                        if k in data:
+                            insert_data[k] = data[k]
+
+                    cursor.execute('''INSERT INTO tasks_v1 (%s) VALUES (%s)''' % (
+                            ', '.join(sorted(insert_data.keys())),
+                            ', '.join(':' + k for k in sorted(insert_data.keys()))),
+                        insert_data)
+
+                    logger.info('Adding taskhash %s with unihash %s', data['taskhash'], unihash)
+                    cursor.execute('SELECT taskhash, method, unihash FROM tasks_v1 WHERE id=:id', {'id': cursor.lastrowid})
+                    row = cursor.fetchone()
+
+                    self.db.commit()
+
+                d = {k: row[k] for k in ('taskhash', 'method', 'unihash')}
+
+                self.send_response(200)
+                self.send_header('Content-Type', 'application/json; charset=utf-8')
+                self.end_headers()
+                self.wfile.write(json.dumps(d).encode('utf-8'))
+        except:
+            logger.exception('Error in POST')
+            self.send_error(400, explain=traceback.format_exc())
+            return
+
+def create_server(addr, db, prefix=''):
+    class Handler(HashEquivalenceServer):
+        pass
+
+    Handler.prefix = prefix
+    Handler.db = db
+    db.row_factory = sqlite3.Row
+
+    with contextlib.closing(db.cursor()) as cursor:
+        cursor.execute('''
+            CREATE TABLE IF NOT EXISTS tasks_v1 (
+                id INTEGER PRIMARY KEY AUTOINCREMENT,
+                method TEXT NOT NULL,
+                outhash TEXT NOT NULL,
+                taskhash TEXT NOT NULL,
+                unihash TEXT NOT NULL,
+                created DATETIME,
+
+                -- Optional fields
+                owner TEXT,
+                PN TEXT,
+                PV TEXT,
+                PR TEXT,
+                task TEXT,
+                outhash_siginfo TEXT
+                )
+            ''')
+
+    logger.info('Starting server on %s', addr)
+    return HTTPServer(addr, Handler)
diff --git a/bitbake/lib/hashserv/tests.py b/bitbake/lib/hashserv/tests.py
new file mode 100644
index 00000000000..806b54c5ebd
--- /dev/null
+++ b/bitbake/lib/hashserv/tests.py
@@ -0,0 +1,141 @@
+#! /usr/bin/env python3
+#
+# Copyright (C) 2018 Garmin Ltd.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License version 2 as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License along
+# with this program; if not, write to the Free Software Foundation, Inc.,
+# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+
+import unittest
+import threading
+import sqlite3
+import hashlib
+import urllib.request
+import json
+from . import create_server
+
+class TestHashEquivalenceServer(unittest.TestCase):
+    def setUp(self):
+        # Start an in memory hash equivalence server in the background bound to
+        # an ephemeral port
+        db = sqlite3.connect(':memory:', check_same_thread=False)
+        self.server = create_server(('localhost', 0), db)
+        self.server_addr = 'http://localhost:%d' % self.server.socket.getsockname()[1]
+        self.server_thread = threading.Thread(target=self.server.serve_forever)
+        self.server_thread.start()
+
+    def tearDown(self):
+        # Shutdown server
+        s = getattr(self, 'server', None)
+        if s is not None:
+            self.server.shutdown()
+            self.server_thread.join()
+            self.server.server_close()
+
+    def send_get(self, path):
+        url = '%s/%s' % (self.server_addr, path)
+        request = urllib.request.Request(url)
+        response = urllib.request.urlopen(request)
+        return json.loads(response.read().decode('utf-8'))
+
+    def send_post(self, path, data):
+        headers = {'content-type': 'application/json'}
+        url = '%s/%s' % (self.server_addr, path)
+        request = urllib.request.Request(url, json.dumps(data).encode('utf-8'), headers)
+        response = urllib.request.urlopen(request)
+        return json.loads(response.read().decode('utf-8'))
+
+    def test_create_hash(self):
+        # Simple test that hashes can be created
+        taskhash = '35788efcb8dfb0a02659d81cf2bfd695fb30faf9'
+        outhash = '2765d4a5884be49b28601445c2760c5f21e7e5c0ee2b7e3fce98fd7e5970796f'
+        unihash = 'f46d3fbb439bd9b921095da657a4de906510d2cd'
+
+        d = self.send_get('v1/equivalent?method=TestMethod&taskhash=%s' % taskhash)
+        self.assertIsNone(d, msg='Found unexpected task, %r' % d)
+
+        d = self.send_post('v1/equivalent', {
+            'taskhash': taskhash,
+            'method': 'TestMethod',
+            'outhash': outhash,
+            'unihash': unihash,
+            })
+        self.assertEqual(d['unihash'], unihash, 'Server returned bad unihash')
+
+    def test_create_equivalent(self):
+        # Tests that a second reported task with the same outhash will be
+        # assigned the same unihash
+        taskhash = '53b8dce672cb6d0c73170be43f540460bfc347b4'
+        outhash = '5a9cb1649625f0bf41fc7791b635cd9c2d7118c7f021ba87dcd03f72b67ce7a8'
+        unihash = 'f37918cc02eb5a520b1aff86faacbc0a38124646'
+        d = self.send_post('v1/equivalent', {
+            'taskhash': taskhash,
+            'method': 'TestMethod',
+            'outhash': outhash,
+            'unihash': unihash,
+            })
+        self.assertEqual(d['unihash'], unihash, 'Server returned bad unihash')
+
+        # Report a different task with the same outhash. The returned unihash
+        # should match the first task
+        taskhash2 = '3bf6f1e89d26205aec90da04854fbdbf73afe6b4'
+        unihash2 = 'af36b199320e611fbb16f1f277d3ee1d619ca58b'
+        d = self.send_post('v1/equivalent', {
+            'taskhash': taskhash2,
+            'method': 'TestMethod',
+            'outhash': outhash,
+            'unihash': unihash2,
+            })
+        self.assertEqual(d['unihash'], unihash, 'Server returned bad unihash')
+
+    def test_duplicate_taskhash(self):
+        # Tests that duplicate reports of the same taskhash with different
+        # outhash & unihash always return the unihash from the first reported
+        # taskhash
+        taskhash = '8aa96fcffb5831b3c2c0cb75f0431e3f8b20554a'
+        outhash = 'afe240a439959ce86f5e322f8c208e1fedefea9e813f2140c81af866cc9edf7e'
+        unihash = '218e57509998197d570e2c98512d0105985dffc9'
+        d = self.send_post('v1/equivalent', {
+            'taskhash': taskhash,
+            'method': 'TestMethod',
+            'outhash': outhash,
+            'unihash': unihash,
+            })
+
+        d = self.send_get('v1/equivalent?method=TestMethod&taskhash=%s' % taskhash)
+        self.assertEqual(d['unihash'], unihash)
+
+        outhash2 = '0904a7fe3dc712d9fd8a74a616ddca2a825a8ee97adf0bd3fc86082c7639914d'
+        unihash2 = 'ae9a7d252735f0dafcdb10e2e02561ca3a47314c'
+        d = self.send_post('v1/equivalent', {
+            'taskhash': taskhash,
+            'method': 'TestMethod',
+            'outhash': outhash2,
+            'unihash': unihash2
+            })
+
+        d = self.send_get('v1/equivalent?method=TestMethod&taskhash=%s' % taskhash)
+        self.assertEqual(d['unihash'], unihash)
+
+        outhash3 = '77623a549b5b1a31e3732dfa8fe61d7ce5d44b3370f253c5360e136b852967b4'
+        unihash3 = '9217a7d6398518e5dc002ed58f2cbbbc78696603'
+        d = self.send_post('v1/equivalent', {
+            'taskhash': taskhash,
+            'method': 'TestMethod',
+            'outhash': outhash3,
+            'unihash': unihash3
+            })
+
+        d = self.send_get('v1/equivalent?method=TestMethod&taskhash=%s' % taskhash)
+        self.assertEqual(d['unihash'], unihash)
+
+
-- 
2.19.2



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [OE-core][PATCH v5 8/8] sstate: Implement hash equivalence sstate
  2018-12-19  3:10         ` [PATCH " Joshua Watt
@ 2018-12-19  3:10           ` Joshua Watt
  -1 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-19  3:10 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

Converts sstate so that it can use a hash equivalence server to
determine if a task really needs to be rebuilt, or if it can be restored
from a different (equivalent) sstate object.

The unique hashes are cached persistently using persist_data. This has
a number of advantages:
 1) Unique hashes can be cached between invocations of bitbake to
    prevent needing to contact the server every time (which is slow)
 2) The value of each tasks unique hash can easily be synchronized
    between different threads, which will be useful if bitbake is
    updated to do on the fly task re-hashing.

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 meta/classes/sstate.bbclass | 100 +++++++++++++++++++--
 meta/conf/bitbake.conf      |   4 +-
 meta/lib/oe/sstatesig.py    | 167 ++++++++++++++++++++++++++++++++++++
 3 files changed, 262 insertions(+), 9 deletions(-)

diff --git a/meta/classes/sstate.bbclass b/meta/classes/sstate.bbclass
index 41a2f9b7b77..8ffefd68344 100644
--- a/meta/classes/sstate.bbclass
+++ b/meta/classes/sstate.bbclass
@@ -11,7 +11,7 @@ def generate_sstatefn(spec, hash, d):
 SSTATE_PKGARCH    = "${PACKAGE_ARCH}"
 SSTATE_PKGSPEC    = "sstate:${PN}:${PACKAGE_ARCH}${TARGET_VENDOR}-${TARGET_OS}:${PV}:${PR}:${SSTATE_PKGARCH}:${SSTATE_VERSION}:"
 SSTATE_SWSPEC     = "sstate:${PN}::${PV}:${PR}::${SSTATE_VERSION}:"
-SSTATE_PKGNAME    = "${SSTATE_EXTRAPATH}${@generate_sstatefn(d.getVar('SSTATE_PKGSPEC'), d.getVar('BB_TASKHASH'), d)}"
+SSTATE_PKGNAME    = "${SSTATE_EXTRAPATH}${@generate_sstatefn(d.getVar('SSTATE_PKGSPEC'), d.getVar('BB_UNIHASH'), d)}"
 SSTATE_PKG        = "${SSTATE_DIR}/${SSTATE_PKGNAME}"
 SSTATE_EXTRAPATH   = ""
 SSTATE_EXTRAPATHWILDCARD = ""
@@ -82,6 +82,23 @@ SSTATE_SIG_PASSPHRASE ?= ""
 # Whether to verify the GnUPG signatures when extracting sstate archives
 SSTATE_VERIFY_SIG ?= "0"
 
+SSTATE_HASHEQUIV_METHOD ?= "OEOuthashBasic"
+SSTATE_HASHEQUIV_METHOD[doc] = "The function used to calculate the output hash \
+    for a task, which in turn is used to determine equivalency. \
+    "
+
+SSTATE_HASHEQUIV_SERVER ?= ""
+SSTATE_HASHEQUIV_SERVER[doc] = "The hash equivalence sever. For example, \
+    'http://192.168.0.1:5000'. Do not include a trailing slash \
+    "
+
+SSTATE_HASHEQUIV_REPORT_TASKDATA ?= "0"
+SSTATE_HASHEQUIV_REPORT_TASKDATA[doc] = "Report additional useful data to the \
+    hash equivalency server, such as PN, PV, taskname, etc. This information \
+    is very useful for developers looking at task data, but may leak sensitive \
+    data if the equivalence server is public. \
+    "
+
 python () {
     if bb.data.inherits_class('native', d):
         d.setVar('SSTATE_PKGARCH', d.getVar('BUILD_ARCH', False))
@@ -634,7 +651,7 @@ def sstate_package(ss, d):
         return
 
     for f in (d.getVar('SSTATECREATEFUNCS') or '').split() + \
-             ['sstate_create_package', 'sstate_sign_package'] + \
+             ['sstate_report_unihash', 'sstate_create_package', 'sstate_sign_package'] + \
              (d.getVar('SSTATEPOSTCREATEFUNCS') or '').split():
         # All hooks should run in SSTATE_BUILDDIR.
         bb.build.exec_func(f, d, (sstatebuild,))
@@ -758,6 +775,73 @@ python sstate_sign_package () {
                            d.getVar('SSTATE_SIG_PASSPHRASE'), armor=False)
 }
 
+def OEOuthashBasic(path, sigfile, task, d):
+    import hashlib
+    import stat
+
+    def update_hash(s):
+        s = s.encode('utf-8')
+        h.update(s)
+        if sigfile:
+            sigfile.write(s)
+
+    h = hashlib.sha256()
+    prev_dir = os.getcwd()
+
+    try:
+        os.chdir(path)
+
+        update_hash("OEOuthashBasic\n")
+
+        # It is only currently useful to get equivalent hashes for things that
+        # can be restored from sstate. Since the sstate object is named using
+        # SSTATE_PKGSPEC and the task name, those should be included in the
+        # output hash calculation.
+        update_hash("SSTATE_PKGSPEC=%s\n" % d.getVar('SSTATE_PKGSPEC'))
+        update_hash("task=%s\n" % task)
+
+        for root, dirs, files in os.walk('.', topdown=True):
+            # Sort directories and files to ensure consistent ordering
+            dirs.sort()
+            files.sort()
+
+            for f in files:
+                path = os.path.join(root, f)
+                s = os.lstat(path)
+
+                # Hash file path
+                update_hash(path + '\n')
+
+                # Hash file mode
+                update_hash("\tmode=0x%x\n" % stat.S_IMODE(s.st_mode))
+                update_hash("\ttype=0x%x\n" % stat.S_IFMT(s.st_mode))
+
+                if stat.S_ISBLK(s.st_mode) or stat.S_ISBLK(s.st_mode):
+                    # Hash device major and minor
+                    update_hash("\tdev=%d,%d\n" % (os.major(s.st_rdev), os.minor(s.st_rdev)))
+                elif stat.S_ISLNK(s.st_mode):
+                    # Hash symbolic link
+                    update_hash("\tsymlink=%s\n" % os.readlink(path))
+                else:
+                    fh = hashlib.sha256()
+                    # Hash file contents
+                    with open(path, 'rb') as d:
+                        for chunk in iter(lambda: d.read(4096), b""):
+                            fh.update(chunk)
+                    update_hash("\tdigest=%s\n" % fh.hexdigest())
+    finally:
+        os.chdir(prev_dir)
+
+    return h.hexdigest()
+
+python sstate_report_unihash() {
+    report_unihash = getattr(bb.parse.siggen, 'report_unihash', None)
+
+    if report_unihash:
+        ss = sstate_state_fromvars(d)
+        report_unihash(os.getcwd(), ss['task'], d)
+}
+
 #
 # Shell function to decompress and prepare a package for installation
 # Will be run from within SSTATE_INSTDIR.
@@ -804,7 +888,7 @@ def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False, *,
 
         spec, extrapath, tname = getpathcomponents(task, d)
 
-        sstatefile = d.expand("${SSTATE_DIR}/" + extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + extension)
+        sstatefile = d.expand("${SSTATE_DIR}/" + extrapath + generate_sstatefn(spec, sq_unihash[task], d) + "_" + tname + extension)
 
         if os.path.exists(sstatefile):
             bb.debug(2, "SState: Found valid sstate file %s" % sstatefile)
@@ -866,7 +950,7 @@ def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False, *,
             if task in ret:
                 continue
             spec, extrapath, tname = getpathcomponents(task, d)
-            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + extension)
+            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_unihash[task], d) + "_" + tname + extension)
             tasklist.append((task, sstatefile))
 
         if tasklist:
@@ -892,12 +976,12 @@ def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False, *,
         evdata = {'missed': [], 'found': []};
         for task in missed:
             spec, extrapath, tname = getpathcomponents(task, d)
-            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + ".tgz")
-            evdata['missed'].append( (sq_fn[task], sq_task[task], sq_hash[task], sstatefile ) )
+            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_unihash[task], d) + "_" + tname + ".tgz")
+            evdata['missed'].append( (sq_fn[task], sq_task[task], sq_unihash[task], sstatefile ) )
         for task in ret:
             spec, extrapath, tname = getpathcomponents(task, d)
-            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + ".tgz")
-            evdata['found'].append( (sq_fn[task], sq_task[task], sq_hash[task], sstatefile ) )
+            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_unihash[task], d) + "_" + tname + ".tgz")
+            evdata['found'].append( (sq_fn[task], sq_task[task], sq_unihash[task], sstatefile ) )
         bb.event.fire(bb.event.MetadataEvent("MissedSstate", evdata), d)
 
     # Print some summary statistics about the current task completion and how much sstate
diff --git a/meta/conf/bitbake.conf b/meta/conf/bitbake.conf
index 64800623545..e64ce6a6dab 100644
--- a/meta/conf/bitbake.conf
+++ b/meta/conf/bitbake.conf
@@ -867,7 +867,9 @@ BB_HASHBASE_WHITELIST ?= "TMPDIR FILE PATH PWD BB_TASKHASH BBPATH BBSERVER DL_DI
     STAMPS_DIR PRSERV_DUMPDIR PRSERV_DUMPFILE PRSERV_LOCKDOWN PARALLEL_MAKE \
     CCACHE_DIR EXTERNAL_TOOLCHAIN CCACHE CCACHE_NOHASHDIR LICENSE_PATH SDKPKGSUFFIX \
     WARN_QA ERROR_QA WORKDIR STAMPCLEAN PKGDATA_DIR BUILD_ARCH SSTATE_PKGARCH \
-    BB_WORKERCONTEXT BB_LIMITEDDEPS extend_recipe_sysroot DEPLOY_DIR"
+    BB_WORKERCONTEXT BB_LIMITEDDEPS BB_UNIHASH extend_recipe_sysroot DEPLOY_DIR \
+    SSTATE_HASHEQUIV_METHOD SSTATE_HASHEQUIV_SERVER SSTATE_HASHEQUIV_REPORT_TASKDATA \
+    SSTATE_HASHEQUIV_OWNER"
 BB_HASHCONFIG_WHITELIST ?= "${BB_HASHBASE_WHITELIST} DATE TIME SSH_AGENT_PID \
     SSH_AUTH_SOCK PSEUDO_BUILD BB_ENV_EXTRAWHITE DISABLE_SANITY_CHECKS \
     PARALLEL_MAKE BB_NUMBER_THREADS BB_ORIGENV BB_INVALIDCONF BBINCLUDED \
diff --git a/meta/lib/oe/sstatesig.py b/meta/lib/oe/sstatesig.py
index 18c5a353a2a..503f2452807 100644
--- a/meta/lib/oe/sstatesig.py
+++ b/meta/lib/oe/sstatesig.py
@@ -263,10 +263,177 @@ class SignatureGeneratorOEBasicHash(bb.siggen.SignatureGeneratorBasicHash):
         if error_msgs:
             bb.fatal("\n".join(error_msgs))
 
+class SignatureGeneratorOEEquivHash(SignatureGeneratorOEBasicHash):
+    name = "OEEquivHash"
+
+    def init_rundepcheck(self, data):
+        super().init_rundepcheck(data)
+        self.server = data.getVar('SSTATE_HASHEQUIV_SERVER')
+        self.method = data.getVar('SSTATE_HASHEQUIV_METHOD')
+        self.unihashes = bb.persist_data.persist('SSTATESIG_UNIHASH_CACHE_v1_' + self.method, data)
+
+    def get_taskdata(self):
+        return (self.server, self.method) + super().get_taskdata()
+
+    def set_taskdata(self, data):
+        self.server, self.method = data[:2]
+        super().set_taskdata(data[2:])
+
+    def __get_task_unihash_key(self, task):
+        # TODO: The key only *needs* to be the taskhash, the task is just
+        # convenient
+        return '%s:%s' % (task, self.taskhash[task])
+
+    def get_stampfile_hash(self, task):
+        if task in self.taskhash:
+            # If a unique hash is reported, use it as the stampfile hash. This
+            # ensures that if a task won't be re-run if the taskhash changes,
+            # but it would result in the same output hash
+            unihash = self.unihashes.get(self.__get_task_unihash_key(task))
+            if unihash is not None:
+                return unihash
+
+        return super().get_stampfile_hash(task)
+
+    def get_unihash(self, task):
+        import urllib
+        import json
+
+        taskhash = self.taskhash[task]
+
+        key = self.__get_task_unihash_key(task)
+
+        # TODO: This cache can grow unbounded. It probably only needs to keep
+        # for each task
+        unihash = self.taskhash.get(key)
+        if unihash is not None:
+            return unihash
+
+        # In the absence of being able to discover a unique hash from the
+        # server, make it be equivalent to the taskhash. The unique "hash" only
+        # really needs to be a unique string (not even necessarily a hash), but
+        # making it match the taskhash has a few advantages:
+        #
+        # 1) All of the sstate code that assumes hashes can be the same
+        # 2) It provides maximal compatibility with builders that don't use
+        #    an equivalency server
+        # 3) The value is easy for multiple independent builders to derive the
+        #    same unique hash from the same input. This means that if the
+        #    independent builders find the same taskhash, but it isn't reported
+        #    to the server, there is a better chance that they will agree on
+        #    the unique hash.
+        unihash = taskhash
+
+        try:
+            url = '%s/v1/equivalent?%s' % (self.server,
+                    urllib.parse.urlencode({'method': self.method, 'taskhash': self.taskhash[task]}))
+
+            request = urllib.request.Request(url)
+            response = urllib.request.urlopen(request)
+            data = response.read().decode('utf-8')
+
+            json_data = json.loads(data)
+
+            if json_data:
+                unihash = json_data['unihash']
+                # A unique hash equal to the taskhash is not very interesting,
+                # so it is reported it at debug level 2. If they differ, that
+                # is much more interesting, so it is reported at debug level 1
+                bb.debug((1, 2)[unihash == taskhash], 'Found unihash %s in place of %s for %s from %s' % (unihash, taskhash, task, self.server))
+            else:
+                bb.debug(2, 'No reported unihash for %s:%s from %s' % (task, taskhash, self.server))
+        except urllib.error.URLError as e:
+            bb.warn('Failure contacting Hash Equivalence Server %s: %s' % (self.server, str(e)))
+        except (KeyError, json.JSONDecodeError) as e:
+            bb.warn('Poorly formatted response from %s: %s' % (self.server, str(e)))
+
+        self.taskhash[key] = unihash
+        return unihash
+
+    def report_unihash(self, path, task, d):
+        import urllib
+        import json
+        import tempfile
+        import base64
+
+        taskhash = d.getVar('BB_TASKHASH')
+        unihash = d.getVar('BB_UNIHASH')
+        report_taskdata = d.getVar('SSTATE_HASHEQUIV_REPORT_TASKDATA') == '1'
+        tempdir = d.getVar('T')
+        fn = d.getVar('BB_FILENAME')
+        key = fn + '.do_' + task + ':' + taskhash
+
+        # Sanity checks
+        cache_unihash = self.taskhash.get(key)
+        if cache_unihash is None:
+            bb.fatal('%s not in unihash cache. Please report this error' % key)
+
+        if cache_unihash != unihash:
+            bb.fatal("Cache unihash %s doesn't match BB_UNIHASH %s" % (cache_unihash, unihash))
+
+        sigfile = None
+        sigfile_name = "depsig.do_%s.%d" % (task, os.getpid())
+        sigfile_link = "depsig.do_%s" % task
+
+        try:
+            call = self.method + '(path, sigfile, task, d)'
+            sigfile = open(os.path.join(tempdir, sigfile_name), 'w+b')
+            locs = {'path': path, 'sigfile': sigfile, 'task': task, 'd': d}
+
+            outhash = bb.utils.better_eval(call, locs)
+
+            try:
+                url = '%s/v1/equivalent' % self.server
+                task_data = {
+                    'taskhash': taskhash,
+                    'method': self.method,
+                    'outhash': outhash,
+                    'unihash': unihash,
+                    'owner': d.getVar('SSTATE_HASHEQUIV_OWNER')
+                    }
+
+                if report_taskdata:
+                    sigfile.seek(0)
+
+                    task_data['PN'] = d.getVar('PN')
+                    task_data['PV'] = d.getVar('PV')
+                    task_data['PR'] = d.getVar('PR')
+                    task_data['task'] = task
+                    task_data['outhash_siginfo'] = sigfile.read().decode('utf-8')
+
+                headers = {'content-type': 'application/json'}
+
+                request = urllib.request.Request(url, json.dumps(task_data).encode('utf-8'), headers)
+                response = urllib.request.urlopen(request)
+                data = response.read().decode('utf-8')
+
+                json_data = json.loads(data)
+                new_unihash = json_data['unihash']
+
+                if new_unihash != unihash:
+                    bb.debug(1, 'Task %s unihash changed %s -> %s by server %s' % (taskhash, unihash, new_unihash, self.server))
+                else:
+                    bb.debug(1, 'Reported task %s as unihash %s to %s' % (taskhash, unihash, self.server))
+            except urllib.error.URLError as e:
+                bb.warn('Failure contacting Hash Equivalence Server %s: %s' % (self.server, str(e)))
+            except (KeyError, json.JSONDecodeError) as e:
+                bb.warn('Poorly formatted response from %s: %s' % (self.server, str(e)))
+        finally:
+            if sigfile:
+                sigfile.close()
+
+                sigfile_link_path = os.path.join(tempdir, sigfile_link)
+                bb.utils.remove(sigfile_link_path)
+
+                try:
+                    os.symlink(sigfile_name, sigfile_link_path)
+                except OSError:
+                    pass
 
 # Insert these classes into siggen's namespace so it can see and select them
 bb.siggen.SignatureGeneratorOEBasic = SignatureGeneratorOEBasic
 bb.siggen.SignatureGeneratorOEBasicHash = SignatureGeneratorOEBasicHash
+bb.siggen.SignatureGeneratorOEEquivHash = SignatureGeneratorOEEquivHash
 
 
 def find_siginfo(pn, taskname, taskhashlist, d):
-- 
2.19.2



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [PATCH v5 8/8] sstate: Implement hash equivalence sstate
@ 2018-12-19  3:10           ` Joshua Watt
  0 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2018-12-19  3:10 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

Converts sstate so that it can use a hash equivalence server to
determine if a task really needs to be rebuilt, or if it can be restored
from a different (equivalent) sstate object.

The unique hashes are cached persistently using persist_data. This has
a number of advantages:
 1) Unique hashes can be cached between invocations of bitbake to
    prevent needing to contact the server every time (which is slow)
 2) The value of each tasks unique hash can easily be synchronized
    between different threads, which will be useful if bitbake is
    updated to do on the fly task re-hashing.

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 meta/classes/sstate.bbclass | 100 +++++++++++++++++++--
 meta/conf/bitbake.conf      |   4 +-
 meta/lib/oe/sstatesig.py    | 167 ++++++++++++++++++++++++++++++++++++
 3 files changed, 262 insertions(+), 9 deletions(-)

diff --git a/meta/classes/sstate.bbclass b/meta/classes/sstate.bbclass
index 41a2f9b7b77..8ffefd68344 100644
--- a/meta/classes/sstate.bbclass
+++ b/meta/classes/sstate.bbclass
@@ -11,7 +11,7 @@ def generate_sstatefn(spec, hash, d):
 SSTATE_PKGARCH    = "${PACKAGE_ARCH}"
 SSTATE_PKGSPEC    = "sstate:${PN}:${PACKAGE_ARCH}${TARGET_VENDOR}-${TARGET_OS}:${PV}:${PR}:${SSTATE_PKGARCH}:${SSTATE_VERSION}:"
 SSTATE_SWSPEC     = "sstate:${PN}::${PV}:${PR}::${SSTATE_VERSION}:"
-SSTATE_PKGNAME    = "${SSTATE_EXTRAPATH}${@generate_sstatefn(d.getVar('SSTATE_PKGSPEC'), d.getVar('BB_TASKHASH'), d)}"
+SSTATE_PKGNAME    = "${SSTATE_EXTRAPATH}${@generate_sstatefn(d.getVar('SSTATE_PKGSPEC'), d.getVar('BB_UNIHASH'), d)}"
 SSTATE_PKG        = "${SSTATE_DIR}/${SSTATE_PKGNAME}"
 SSTATE_EXTRAPATH   = ""
 SSTATE_EXTRAPATHWILDCARD = ""
@@ -82,6 +82,23 @@ SSTATE_SIG_PASSPHRASE ?= ""
 # Whether to verify the GnUPG signatures when extracting sstate archives
 SSTATE_VERIFY_SIG ?= "0"
 
+SSTATE_HASHEQUIV_METHOD ?= "OEOuthashBasic"
+SSTATE_HASHEQUIV_METHOD[doc] = "The function used to calculate the output hash \
+    for a task, which in turn is used to determine equivalency. \
+    "
+
+SSTATE_HASHEQUIV_SERVER ?= ""
+SSTATE_HASHEQUIV_SERVER[doc] = "The hash equivalence sever. For example, \
+    'http://192.168.0.1:5000'. Do not include a trailing slash \
+    "
+
+SSTATE_HASHEQUIV_REPORT_TASKDATA ?= "0"
+SSTATE_HASHEQUIV_REPORT_TASKDATA[doc] = "Report additional useful data to the \
+    hash equivalency server, such as PN, PV, taskname, etc. This information \
+    is very useful for developers looking at task data, but may leak sensitive \
+    data if the equivalence server is public. \
+    "
+
 python () {
     if bb.data.inherits_class('native', d):
         d.setVar('SSTATE_PKGARCH', d.getVar('BUILD_ARCH', False))
@@ -634,7 +651,7 @@ def sstate_package(ss, d):
         return
 
     for f in (d.getVar('SSTATECREATEFUNCS') or '').split() + \
-             ['sstate_create_package', 'sstate_sign_package'] + \
+             ['sstate_report_unihash', 'sstate_create_package', 'sstate_sign_package'] + \
              (d.getVar('SSTATEPOSTCREATEFUNCS') or '').split():
         # All hooks should run in SSTATE_BUILDDIR.
         bb.build.exec_func(f, d, (sstatebuild,))
@@ -758,6 +775,73 @@ python sstate_sign_package () {
                            d.getVar('SSTATE_SIG_PASSPHRASE'), armor=False)
 }
 
+def OEOuthashBasic(path, sigfile, task, d):
+    import hashlib
+    import stat
+
+    def update_hash(s):
+        s = s.encode('utf-8')
+        h.update(s)
+        if sigfile:
+            sigfile.write(s)
+
+    h = hashlib.sha256()
+    prev_dir = os.getcwd()
+
+    try:
+        os.chdir(path)
+
+        update_hash("OEOuthashBasic\n")
+
+        # It is only currently useful to get equivalent hashes for things that
+        # can be restored from sstate. Since the sstate object is named using
+        # SSTATE_PKGSPEC and the task name, those should be included in the
+        # output hash calculation.
+        update_hash("SSTATE_PKGSPEC=%s\n" % d.getVar('SSTATE_PKGSPEC'))
+        update_hash("task=%s\n" % task)
+
+        for root, dirs, files in os.walk('.', topdown=True):
+            # Sort directories and files to ensure consistent ordering
+            dirs.sort()
+            files.sort()
+
+            for f in files:
+                path = os.path.join(root, f)
+                s = os.lstat(path)
+
+                # Hash file path
+                update_hash(path + '\n')
+
+                # Hash file mode
+                update_hash("\tmode=0x%x\n" % stat.S_IMODE(s.st_mode))
+                update_hash("\ttype=0x%x\n" % stat.S_IFMT(s.st_mode))
+
+                if stat.S_ISBLK(s.st_mode) or stat.S_ISBLK(s.st_mode):
+                    # Hash device major and minor
+                    update_hash("\tdev=%d,%d\n" % (os.major(s.st_rdev), os.minor(s.st_rdev)))
+                elif stat.S_ISLNK(s.st_mode):
+                    # Hash symbolic link
+                    update_hash("\tsymlink=%s\n" % os.readlink(path))
+                else:
+                    fh = hashlib.sha256()
+                    # Hash file contents
+                    with open(path, 'rb') as d:
+                        for chunk in iter(lambda: d.read(4096), b""):
+                            fh.update(chunk)
+                    update_hash("\tdigest=%s\n" % fh.hexdigest())
+    finally:
+        os.chdir(prev_dir)
+
+    return h.hexdigest()
+
+python sstate_report_unihash() {
+    report_unihash = getattr(bb.parse.siggen, 'report_unihash', None)
+
+    if report_unihash:
+        ss = sstate_state_fromvars(d)
+        report_unihash(os.getcwd(), ss['task'], d)
+}
+
 #
 # Shell function to decompress and prepare a package for installation
 # Will be run from within SSTATE_INSTDIR.
@@ -804,7 +888,7 @@ def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False, *,
 
         spec, extrapath, tname = getpathcomponents(task, d)
 
-        sstatefile = d.expand("${SSTATE_DIR}/" + extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + extension)
+        sstatefile = d.expand("${SSTATE_DIR}/" + extrapath + generate_sstatefn(spec, sq_unihash[task], d) + "_" + tname + extension)
 
         if os.path.exists(sstatefile):
             bb.debug(2, "SState: Found valid sstate file %s" % sstatefile)
@@ -866,7 +950,7 @@ def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False, *,
             if task in ret:
                 continue
             spec, extrapath, tname = getpathcomponents(task, d)
-            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + extension)
+            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_unihash[task], d) + "_" + tname + extension)
             tasklist.append((task, sstatefile))
 
         if tasklist:
@@ -892,12 +976,12 @@ def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False, *,
         evdata = {'missed': [], 'found': []};
         for task in missed:
             spec, extrapath, tname = getpathcomponents(task, d)
-            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + ".tgz")
-            evdata['missed'].append( (sq_fn[task], sq_task[task], sq_hash[task], sstatefile ) )
+            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_unihash[task], d) + "_" + tname + ".tgz")
+            evdata['missed'].append( (sq_fn[task], sq_task[task], sq_unihash[task], sstatefile ) )
         for task in ret:
             spec, extrapath, tname = getpathcomponents(task, d)
-            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + ".tgz")
-            evdata['found'].append( (sq_fn[task], sq_task[task], sq_hash[task], sstatefile ) )
+            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_unihash[task], d) + "_" + tname + ".tgz")
+            evdata['found'].append( (sq_fn[task], sq_task[task], sq_unihash[task], sstatefile ) )
         bb.event.fire(bb.event.MetadataEvent("MissedSstate", evdata), d)
 
     # Print some summary statistics about the current task completion and how much sstate
diff --git a/meta/conf/bitbake.conf b/meta/conf/bitbake.conf
index 64800623545..e64ce6a6dab 100644
--- a/meta/conf/bitbake.conf
+++ b/meta/conf/bitbake.conf
@@ -867,7 +867,9 @@ BB_HASHBASE_WHITELIST ?= "TMPDIR FILE PATH PWD BB_TASKHASH BBPATH BBSERVER DL_DI
     STAMPS_DIR PRSERV_DUMPDIR PRSERV_DUMPFILE PRSERV_LOCKDOWN PARALLEL_MAKE \
     CCACHE_DIR EXTERNAL_TOOLCHAIN CCACHE CCACHE_NOHASHDIR LICENSE_PATH SDKPKGSUFFIX \
     WARN_QA ERROR_QA WORKDIR STAMPCLEAN PKGDATA_DIR BUILD_ARCH SSTATE_PKGARCH \
-    BB_WORKERCONTEXT BB_LIMITEDDEPS extend_recipe_sysroot DEPLOY_DIR"
+    BB_WORKERCONTEXT BB_LIMITEDDEPS BB_UNIHASH extend_recipe_sysroot DEPLOY_DIR \
+    SSTATE_HASHEQUIV_METHOD SSTATE_HASHEQUIV_SERVER SSTATE_HASHEQUIV_REPORT_TASKDATA \
+    SSTATE_HASHEQUIV_OWNER"
 BB_HASHCONFIG_WHITELIST ?= "${BB_HASHBASE_WHITELIST} DATE TIME SSH_AGENT_PID \
     SSH_AUTH_SOCK PSEUDO_BUILD BB_ENV_EXTRAWHITE DISABLE_SANITY_CHECKS \
     PARALLEL_MAKE BB_NUMBER_THREADS BB_ORIGENV BB_INVALIDCONF BBINCLUDED \
diff --git a/meta/lib/oe/sstatesig.py b/meta/lib/oe/sstatesig.py
index 18c5a353a2a..503f2452807 100644
--- a/meta/lib/oe/sstatesig.py
+++ b/meta/lib/oe/sstatesig.py
@@ -263,10 +263,177 @@ class SignatureGeneratorOEBasicHash(bb.siggen.SignatureGeneratorBasicHash):
         if error_msgs:
             bb.fatal("\n".join(error_msgs))
 
+class SignatureGeneratorOEEquivHash(SignatureGeneratorOEBasicHash):
+    name = "OEEquivHash"
+
+    def init_rundepcheck(self, data):
+        super().init_rundepcheck(data)
+        self.server = data.getVar('SSTATE_HASHEQUIV_SERVER')
+        self.method = data.getVar('SSTATE_HASHEQUIV_METHOD')
+        self.unihashes = bb.persist_data.persist('SSTATESIG_UNIHASH_CACHE_v1_' + self.method, data)
+
+    def get_taskdata(self):
+        return (self.server, self.method) + super().get_taskdata()
+
+    def set_taskdata(self, data):
+        self.server, self.method = data[:2]
+        super().set_taskdata(data[2:])
+
+    def __get_task_unihash_key(self, task):
+        # TODO: The key only *needs* to be the taskhash, the task is just
+        # convenient
+        return '%s:%s' % (task, self.taskhash[task])
+
+    def get_stampfile_hash(self, task):
+        if task in self.taskhash:
+            # If a unique hash is reported, use it as the stampfile hash. This
+            # ensures that if a task won't be re-run if the taskhash changes,
+            # but it would result in the same output hash
+            unihash = self.unihashes.get(self.__get_task_unihash_key(task))
+            if unihash is not None:
+                return unihash
+
+        return super().get_stampfile_hash(task)
+
+    def get_unihash(self, task):
+        import urllib
+        import json
+
+        taskhash = self.taskhash[task]
+
+        key = self.__get_task_unihash_key(task)
+
+        # TODO: This cache can grow unbounded. It probably only needs to keep
+        # for each task
+        unihash = self.taskhash.get(key)
+        if unihash is not None:
+            return unihash
+
+        # In the absence of being able to discover a unique hash from the
+        # server, make it be equivalent to the taskhash. The unique "hash" only
+        # really needs to be a unique string (not even necessarily a hash), but
+        # making it match the taskhash has a few advantages:
+        #
+        # 1) All of the sstate code that assumes hashes can be the same
+        # 2) It provides maximal compatibility with builders that don't use
+        #    an equivalency server
+        # 3) The value is easy for multiple independent builders to derive the
+        #    same unique hash from the same input. This means that if the
+        #    independent builders find the same taskhash, but it isn't reported
+        #    to the server, there is a better chance that they will agree on
+        #    the unique hash.
+        unihash = taskhash
+
+        try:
+            url = '%s/v1/equivalent?%s' % (self.server,
+                    urllib.parse.urlencode({'method': self.method, 'taskhash': self.taskhash[task]}))
+
+            request = urllib.request.Request(url)
+            response = urllib.request.urlopen(request)
+            data = response.read().decode('utf-8')
+
+            json_data = json.loads(data)
+
+            if json_data:
+                unihash = json_data['unihash']
+                # A unique hash equal to the taskhash is not very interesting,
+                # so it is reported it at debug level 2. If they differ, that
+                # is much more interesting, so it is reported at debug level 1
+                bb.debug((1, 2)[unihash == taskhash], 'Found unihash %s in place of %s for %s from %s' % (unihash, taskhash, task, self.server))
+            else:
+                bb.debug(2, 'No reported unihash for %s:%s from %s' % (task, taskhash, self.server))
+        except urllib.error.URLError as e:
+            bb.warn('Failure contacting Hash Equivalence Server %s: %s' % (self.server, str(e)))
+        except (KeyError, json.JSONDecodeError) as e:
+            bb.warn('Poorly formatted response from %s: %s' % (self.server, str(e)))
+
+        self.taskhash[key] = unihash
+        return unihash
+
+    def report_unihash(self, path, task, d):
+        import urllib
+        import json
+        import tempfile
+        import base64
+
+        taskhash = d.getVar('BB_TASKHASH')
+        unihash = d.getVar('BB_UNIHASH')
+        report_taskdata = d.getVar('SSTATE_HASHEQUIV_REPORT_TASKDATA') == '1'
+        tempdir = d.getVar('T')
+        fn = d.getVar('BB_FILENAME')
+        key = fn + '.do_' + task + ':' + taskhash
+
+        # Sanity checks
+        cache_unihash = self.taskhash.get(key)
+        if cache_unihash is None:
+            bb.fatal('%s not in unihash cache. Please report this error' % key)
+
+        if cache_unihash != unihash:
+            bb.fatal("Cache unihash %s doesn't match BB_UNIHASH %s" % (cache_unihash, unihash))
+
+        sigfile = None
+        sigfile_name = "depsig.do_%s.%d" % (task, os.getpid())
+        sigfile_link = "depsig.do_%s" % task
+
+        try:
+            call = self.method + '(path, sigfile, task, d)'
+            sigfile = open(os.path.join(tempdir, sigfile_name), 'w+b')
+            locs = {'path': path, 'sigfile': sigfile, 'task': task, 'd': d}
+
+            outhash = bb.utils.better_eval(call, locs)
+
+            try:
+                url = '%s/v1/equivalent' % self.server
+                task_data = {
+                    'taskhash': taskhash,
+                    'method': self.method,
+                    'outhash': outhash,
+                    'unihash': unihash,
+                    'owner': d.getVar('SSTATE_HASHEQUIV_OWNER')
+                    }
+
+                if report_taskdata:
+                    sigfile.seek(0)
+
+                    task_data['PN'] = d.getVar('PN')
+                    task_data['PV'] = d.getVar('PV')
+                    task_data['PR'] = d.getVar('PR')
+                    task_data['task'] = task
+                    task_data['outhash_siginfo'] = sigfile.read().decode('utf-8')
+
+                headers = {'content-type': 'application/json'}
+
+                request = urllib.request.Request(url, json.dumps(task_data).encode('utf-8'), headers)
+                response = urllib.request.urlopen(request)
+                data = response.read().decode('utf-8')
+
+                json_data = json.loads(data)
+                new_unihash = json_data['unihash']
+
+                if new_unihash != unihash:
+                    bb.debug(1, 'Task %s unihash changed %s -> %s by server %s' % (taskhash, unihash, new_unihash, self.server))
+                else:
+                    bb.debug(1, 'Reported task %s as unihash %s to %s' % (taskhash, unihash, self.server))
+            except urllib.error.URLError as e:
+                bb.warn('Failure contacting Hash Equivalence Server %s: %s' % (self.server, str(e)))
+            except (KeyError, json.JSONDecodeError) as e:
+                bb.warn('Poorly formatted response from %s: %s' % (self.server, str(e)))
+        finally:
+            if sigfile:
+                sigfile.close()
+
+                sigfile_link_path = os.path.join(tempdir, sigfile_link)
+                bb.utils.remove(sigfile_link_path)
+
+                try:
+                    os.symlink(sigfile_name, sigfile_link_path)
+                except OSError:
+                    pass
 
 # Insert these classes into siggen's namespace so it can see and select them
 bb.siggen.SignatureGeneratorOEBasic = SignatureGeneratorOEBasic
 bb.siggen.SignatureGeneratorOEBasicHash = SignatureGeneratorOEBasicHash
+bb.siggen.SignatureGeneratorOEEquivHash = SignatureGeneratorOEEquivHash
 
 
 def find_siginfo(pn, taskname, taskhashlist, d):
-- 
2.19.2



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* ✗ patchtest: failure for Hash Equivalency Server (rev3)
  2018-12-18 15:30       ` [PATCH " Joshua Watt
                         ` (11 preceding siblings ...)
  (?)
@ 2018-12-19  3:33       ` Patchwork
  -1 siblings, 0 replies; 158+ messages in thread
From: Patchwork @ 2018-12-19  3:33 UTC (permalink / raw)
  To: Joshua Watt; +Cc: openembedded-core

== Series Details ==

Series: Hash Equivalency Server (rev3)
Revision: 3
URL   : https://patchwork.openembedded.org/series/15190/
State : failure

== Summary ==


Thank you for submitting this patch series to OpenEmbedded Core. This is
an automated response. Several tests have been executed on the proposed
series by patchtest resulting in the following failures:



* Issue             Series sent to the wrong mailing list or some patches from the series correspond to different mailing lists [test_target_mailing_list] 
  Suggested fix    Send the series again to the correct mailing list (ML)
  Suggested ML     bitbake-devel@lists.openembedded.org [http://git.openembedded.org/bitbake/]
  Patch's path:    bitbake/bin/bitbake-selftest

* Issue             Series does not apply on top of target branch [test_series_merge_on_head] 
  Suggested fix    Rebase your series on top of targeted branch
  Targeted branch  master (currently at 14c291e1fb)



If you believe any of these test results are incorrect, please reply to the
mailing list (openembedded-core@lists.openembedded.org) raising your concerns.
Otherwise we would appreciate you correcting the issues and submitting a new
version of the patchset if applicable. Please ensure you add/increment the
version number when sending the new version (i.e. [PATCH] -> [PATCH v2] ->
[PATCH v3] -> ...).

---
Guidelines:     https://www.openembedded.org/wiki/Commit_Patch_Message_Guidelines
Test framework: http://git.yoctoproject.org/cgit/cgit.cgi/patchtest
Test suite:     http://git.yoctoproject.org/cgit/cgit.cgi/patchtest-oe



^ permalink raw reply	[flat|nested] 158+ messages in thread

* [OE-core][PATCH v6 0/3] Hash Equivalency Server
  2018-12-18 15:30       ` [PATCH " Joshua Watt
@ 2019-01-04  2:42         ` Joshua Watt
  -1 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2019-01-04  2:42 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

Apologies for cross-posting this to both the bitbake-devel and
openembedded-devel; this work necessarily intertwines both places, and
it is really necessary to look at both parts to get an idea of what is
going on. For convenience, the bitbake patches are listed first,
followed by the oe-core patches.

The basic premise is that any given task no longer hashes a dependent
task's taskhash to determine it's own taskhash, but instead hashes the
dependent task's "unique hash" (which doesn't strictly need to be a
hash, but is for consistency.  This allows multiple taskhashes to map to
the same unique hash, meaning that trivial changes to a recipe that
would change the taskhash don't necessarily need to change the unique
hash, and thus don't need to cause downstream tasks to be rebuilt (with
caveats, see below).

In the absence of any interaction by the user, the unique hash for a
task is just that task's taskhash, which effectively maintains the
current behavior. However, if the user enables the "OEEquivHash"
signature generator, they can direct it to look at a hash equivalency
server (of which a reference implementation is provided). The sstate
code will provide the server with an output hash that it calculates, and
the server will record all tasks with the same output hash as
"equivalent" and report the same unique hash for them when requested.
When initializing tasks, bitbake can ask the server about the unique
hash for new tasks it has never seen before and potentially skip
rebuilding, or restore the task from an equivalent sstate file. To
facilitate restoring tasks from sstate, sstate objects are now named
based on the tasks unique hash instead of the taskhash (which, again has
no effect if the server is in use).

This patchset doesn't make any attempt to dynamically update task unique
hash after bitbake initializes the tasks, and as such there are some
cases where this isn't accelerating the build as much as it possibly
could. I think it will be possible to add support for this, but this
preliminary support needs to come first.

You can also see these patches (and my first attempts at dynamic task
re-hashing) on the "jpew/hash-equivalence" branch in poky-contrib.

As always, thanks for your feedback and time

VERSION 2:

At the core, this patch does the same thing as V1 with some very minor
tweaks. The main things that have changed are:
 1) Per request, the Hash Equivalence Server reference implementation is
    now based entirely on built in Python modules and requires no
    external libraries. It also has a wrapper script to launch it
    (bitbake-hashserv) and unittests.
 2) There is a major rework of persist_data in bitbake. I
    think these patches could be submitted independently, but I doubt
    anyone is clamoring for them. The general gist of them is that there
    were a lot of strange edge cases that I found when using
    persist_data as an IPC mechanism between the main bitbake process
    and the bitbake-worker processes. I went ahead and added extensive
    unit tests for this as well.

VERSION 3:

Minor tweak to version 2 that should fix timeout errors seen on the
autobuilder

VERSION 4:

Based on discussion, the term "dependency ID" was dropped in favor of
"unique hash" (unihash).

The hash validation checks were updated to properly fallback to the old
function signatures (that don't pass the unihashes) for compatibility
with older implementations.

VERSION 5:

Removed os.fork() handlers for persist_data. They can be added back if
actually necessary.

Reworked hash validation slightly based on feedback.

VERSION 6:

Fixed a bug that was introduced with the rename to unihash that prevent
unihashes from being recorded in persist_data.

Joshua Watt (3):
  classes/sstate: Handle unihash in hash check
  bitbake: hashserv: Add hash equivalence reference server
  sstate: Implement hash equivalence sstate

 bitbake/bin/bitbake-hashserv     |  67 +++++++++++++
 bitbake/bin/bitbake-selftest     |   2 +
 bitbake/lib/hashserv/__init__.py | 152 ++++++++++++++++++++++++++++
 bitbake/lib/hashserv/tests.py    | 141 ++++++++++++++++++++++++++
 meta/classes/sstate.bbclass      | 102 +++++++++++++++++--
 meta/conf/bitbake.conf           |   4 +-
 meta/lib/oe/sstatesig.py         | 167 +++++++++++++++++++++++++++++++
 7 files changed, 625 insertions(+), 10 deletions(-)
 create mode 100755 bitbake/bin/bitbake-hashserv
 create mode 100644 bitbake/lib/hashserv/__init__.py
 create mode 100644 bitbake/lib/hashserv/tests.py

-- 
2.20.1



^ permalink raw reply	[flat|nested] 158+ messages in thread

* [PATCH v6 0/3] Hash Equivalency Server
@ 2019-01-04  2:42         ` Joshua Watt
  0 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2019-01-04  2:42 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

Apologies for cross-posting this to both the bitbake-devel and
openembedded-devel; this work necessarily intertwines both places, and
it is really necessary to look at both parts to get an idea of what is
going on. For convenience, the bitbake patches are listed first,
followed by the oe-core patches.

The basic premise is that any given task no longer hashes a dependent
task's taskhash to determine it's own taskhash, but instead hashes the
dependent task's "unique hash" (which doesn't strictly need to be a
hash, but is for consistency.  This allows multiple taskhashes to map to
the same unique hash, meaning that trivial changes to a recipe that
would change the taskhash don't necessarily need to change the unique
hash, and thus don't need to cause downstream tasks to be rebuilt (with
caveats, see below).

In the absence of any interaction by the user, the unique hash for a
task is just that task's taskhash, which effectively maintains the
current behavior. However, if the user enables the "OEEquivHash"
signature generator, they can direct it to look at a hash equivalency
server (of which a reference implementation is provided). The sstate
code will provide the server with an output hash that it calculates, and
the server will record all tasks with the same output hash as
"equivalent" and report the same unique hash for them when requested.
When initializing tasks, bitbake can ask the server about the unique
hash for new tasks it has never seen before and potentially skip
rebuilding, or restore the task from an equivalent sstate file. To
facilitate restoring tasks from sstate, sstate objects are now named
based on the tasks unique hash instead of the taskhash (which, again has
no effect if the server is in use).

This patchset doesn't make any attempt to dynamically update task unique
hash after bitbake initializes the tasks, and as such there are some
cases where this isn't accelerating the build as much as it possibly
could. I think it will be possible to add support for this, but this
preliminary support needs to come first.

You can also see these patches (and my first attempts at dynamic task
re-hashing) on the "jpew/hash-equivalence" branch in poky-contrib.

As always, thanks for your feedback and time

VERSION 2:

At the core, this patch does the same thing as V1 with some very minor
tweaks. The main things that have changed are:
 1) Per request, the Hash Equivalence Server reference implementation is
    now based entirely on built in Python modules and requires no
    external libraries. It also has a wrapper script to launch it
    (bitbake-hashserv) and unittests.
 2) There is a major rework of persist_data in bitbake. I
    think these patches could be submitted independently, but I doubt
    anyone is clamoring for them. The general gist of them is that there
    were a lot of strange edge cases that I found when using
    persist_data as an IPC mechanism between the main bitbake process
    and the bitbake-worker processes. I went ahead and added extensive
    unit tests for this as well.

VERSION 3:

Minor tweak to version 2 that should fix timeout errors seen on the
autobuilder

VERSION 4:

Based on discussion, the term "dependency ID" was dropped in favor of
"unique hash" (unihash).

The hash validation checks were updated to properly fallback to the old
function signatures (that don't pass the unihashes) for compatibility
with older implementations.

VERSION 5:

Removed os.fork() handlers for persist_data. They can be added back if
actually necessary.

Reworked hash validation slightly based on feedback.

VERSION 6:

Fixed a bug that was introduced with the rename to unihash that prevent
unihashes from being recorded in persist_data.

Joshua Watt (3):
  classes/sstate: Handle unihash in hash check
  bitbake: hashserv: Add hash equivalence reference server
  sstate: Implement hash equivalence sstate

 bitbake/bin/bitbake-hashserv     |  67 +++++++++++++
 bitbake/bin/bitbake-selftest     |   2 +
 bitbake/lib/hashserv/__init__.py | 152 ++++++++++++++++++++++++++++
 bitbake/lib/hashserv/tests.py    | 141 ++++++++++++++++++++++++++
 meta/classes/sstate.bbclass      | 102 +++++++++++++++++--
 meta/conf/bitbake.conf           |   4 +-
 meta/lib/oe/sstatesig.py         | 167 +++++++++++++++++++++++++++++++
 7 files changed, 625 insertions(+), 10 deletions(-)
 create mode 100755 bitbake/bin/bitbake-hashserv
 create mode 100644 bitbake/lib/hashserv/__init__.py
 create mode 100644 bitbake/lib/hashserv/tests.py

-- 
2.20.1



^ permalink raw reply	[flat|nested] 158+ messages in thread

* [OE-core][PATCH v6 1/3] classes/sstate: Handle unihash in hash check
  2019-01-04  2:42         ` [PATCH " Joshua Watt
@ 2019-01-04  2:42           ` Joshua Watt
  -1 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2019-01-04  2:42 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

Handles the argument that passes task unique hash in the hash check
function, as it is now required by bitbake

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 meta/classes/sstate.bbclass | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/meta/classes/sstate.bbclass b/meta/classes/sstate.bbclass
index 0abebce6996..8f3cd083e85 100644
--- a/meta/classes/sstate.bbclass
+++ b/meta/classes/sstate.bbclass
@@ -780,7 +780,7 @@ sstate_unpack_package () {
 
 BB_HASHCHECK_FUNCTION = "sstate_checkhashes"
 
-def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False):
+def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False, *, sq_unihash):
 
     ret = []
     missed = []
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [PATCH v6 1/3] classes/sstate: Handle unihash in hash check
@ 2019-01-04  2:42           ` Joshua Watt
  0 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2019-01-04  2:42 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

Handles the argument that passes task unique hash in the hash check
function, as it is now required by bitbake

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 meta/classes/sstate.bbclass | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/meta/classes/sstate.bbclass b/meta/classes/sstate.bbclass
index 0abebce6996..8f3cd083e85 100644
--- a/meta/classes/sstate.bbclass
+++ b/meta/classes/sstate.bbclass
@@ -780,7 +780,7 @@ sstate_unpack_package () {
 
 BB_HASHCHECK_FUNCTION = "sstate_checkhashes"
 
-def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False):
+def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False, *, sq_unihash):
 
     ret = []
     missed = []
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [OE-core][PATCH v6 2/3] bitbake: hashserv: Add hash equivalence reference server
  2019-01-04  2:42         ` [PATCH " Joshua Watt
@ 2019-01-04  2:42           ` Joshua Watt
  -1 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2019-01-04  2:42 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

Implements a reference implementation of the hash equivalence server.
This server has minimal dependencies (and no dependencies outside of the
standard Python library), and implements the minimum required to be a
conforming hash equivalence server.

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/bin/bitbake-hashserv     |  67 ++++++++++++++
 bitbake/bin/bitbake-selftest     |   2 +
 bitbake/lib/hashserv/__init__.py | 152 +++++++++++++++++++++++++++++++
 bitbake/lib/hashserv/tests.py    | 141 ++++++++++++++++++++++++++++
 4 files changed, 362 insertions(+)
 create mode 100755 bitbake/bin/bitbake-hashserv
 create mode 100644 bitbake/lib/hashserv/__init__.py
 create mode 100644 bitbake/lib/hashserv/tests.py

diff --git a/bitbake/bin/bitbake-hashserv b/bitbake/bin/bitbake-hashserv
new file mode 100755
index 00000000000..c49397b73a5
--- /dev/null
+++ b/bitbake/bin/bitbake-hashserv
@@ -0,0 +1,67 @@
+#! /usr/bin/env python3
+#
+# Copyright (C) 2018 Garmin Ltd.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License version 2 as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License along
+# with this program; if not, write to the Free Software Foundation, Inc.,
+# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+import os
+import sys
+import logging
+import argparse
+import sqlite3
+
+sys.path.insert(0, os.path.join(os.path.dirname(os.path.dirname(__file__)),'lib'))
+
+import hashserv
+
+VERSION = "1.0.0"
+
+DEFAULT_HOST = ''
+DEFAULT_PORT = 8686
+
+def main():
+    parser = argparse.ArgumentParser(description='HTTP Equivalence Reference Server. Version=%s' % VERSION)
+    parser.add_argument('--address', default=DEFAULT_HOST, help='Bind address (default "%(default)s")')
+    parser.add_argument('--port', type=int, default=DEFAULT_PORT, help='Bind port (default %(default)d)')
+    parser.add_argument('--prefix', default='', help='HTTP path prefix (default "%(default)s")')
+    parser.add_argument('--database', default='./hashserv.db', help='Database file (default "%(default)s")')
+    parser.add_argument('--log', default='WARNING', help='Set logging level')
+
+    args = parser.parse_args()
+
+    logger = logging.getLogger('hashserv')
+
+    level = getattr(logging, args.log.upper(), None)
+    if not isinstance(level, int):
+        raise ValueError('Invalid log level: %s' % args.log)
+
+    logger.setLevel(level)
+    console = logging.StreamHandler()
+    console.setLevel(level)
+    logger.addHandler(console)
+
+    db = sqlite3.connect(args.database)
+
+    server = hashserv.create_server((args.address, args.port), db, args.prefix)
+    server.serve_forever()
+    return 0
+
+if __name__ == '__main__':
+    try:
+        ret = main()
+    except Exception:
+        ret = 1
+        import traceback
+        traceback.print_exc()
+    sys.exit(ret)
+
diff --git a/bitbake/bin/bitbake-selftest b/bitbake/bin/bitbake-selftest
index c970dcae90c..99f1af910f4 100755
--- a/bitbake/bin/bitbake-selftest
+++ b/bitbake/bin/bitbake-selftest
@@ -22,6 +22,7 @@ sys.path.insert(0, os.path.join(os.path.dirname(os.path.dirname(__file__)), 'lib
 import unittest
 try:
     import bb
+    import hashserv
     import layerindexlib
 except RuntimeError as exc:
     sys.exit(str(exc))
@@ -35,6 +36,7 @@ tests = ["bb.tests.codeparser",
          "bb.tests.parse",
          "bb.tests.persist_data",
          "bb.tests.utils",
+         "hashserv.tests",
          "layerindexlib.tests.layerindexobj",
          "layerindexlib.tests.restapi",
          "layerindexlib.tests.cooker"]
diff --git a/bitbake/lib/hashserv/__init__.py b/bitbake/lib/hashserv/__init__.py
new file mode 100644
index 00000000000..46bca7cab32
--- /dev/null
+++ b/bitbake/lib/hashserv/__init__.py
@@ -0,0 +1,152 @@
+# Copyright (C) 2018 Garmin Ltd.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License version 2 as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License along
+# with this program; if not, write to the Free Software Foundation, Inc.,
+# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+
+from http.server import BaseHTTPRequestHandler, HTTPServer
+import contextlib
+import urllib.parse
+import sqlite3
+import json
+import traceback
+import logging
+from datetime import datetime
+
+logger = logging.getLogger('hashserv')
+
+class HashEquivalenceServer(BaseHTTPRequestHandler):
+    def log_message(self, f, *args):
+        logger.debug(f, *args)
+
+    def do_GET(self):
+        try:
+            p = urllib.parse.urlparse(self.path)
+
+            if p.path != self.prefix + '/v1/equivalent':
+                self.send_error(404)
+                return
+
+            query = urllib.parse.parse_qs(p.query, strict_parsing=True)
+            method = query['method'][0]
+            taskhash = query['taskhash'][0]
+
+            d = None
+            with contextlib.closing(self.db.cursor()) as cursor:
+                cursor.execute('SELECT taskhash, method, unihash FROM tasks_v1 WHERE method=:method AND taskhash=:taskhash ORDER BY created ASC LIMIT 1',
+                        {'method': method, 'taskhash': taskhash})
+
+                row = cursor.fetchone()
+
+                if row is not None:
+                    logger.debug('Found equivalent task %s', row['taskhash'])
+                    d = {k: row[k] for k in ('taskhash', 'method', 'unihash')}
+
+            self.send_response(200)
+            self.send_header('Content-Type', 'application/json; charset=utf-8')
+            self.end_headers()
+            self.wfile.write(json.dumps(d).encode('utf-8'))
+        except:
+            logger.exception('Error in GET')
+            self.send_error(400, explain=traceback.format_exc())
+            return
+
+    def do_POST(self):
+        try:
+            p = urllib.parse.urlparse(self.path)
+
+            if p.path != self.prefix + '/v1/equivalent':
+                self.send_error(404)
+                return
+
+            length = int(self.headers['content-length'])
+            data = json.loads(self.rfile.read(length).decode('utf-8'))
+
+            with contextlib.closing(self.db.cursor()) as cursor:
+                cursor.execute('''
+                    SELECT taskhash, method, unihash FROM tasks_v1 WHERE method=:method AND outhash=:outhash
+                    ORDER BY CASE WHEN taskhash=:taskhash THEN 1 ELSE 2 END,
+                        created ASC
+                    LIMIT 1
+                    ''', {k: data[k] for k in ('method', 'outhash', 'taskhash')})
+
+                row = cursor.fetchone()
+
+                if row is None or row['taskhash'] != data['taskhash']:
+                    unihash = data['unihash']
+                    if row is not None:
+                        unihash = row['unihash']
+
+                    insert_data = {
+                            'method': data['method'],
+                            'outhash': data['outhash'],
+                            'taskhash': data['taskhash'],
+                            'unihash': unihash,
+                            'created': datetime.now()
+                            }
+
+                    for k in ('owner', 'PN', 'PV', 'PR', 'task', 'outhash_siginfo'):
+                        if k in data:
+                            insert_data[k] = data[k]
+
+                    cursor.execute('''INSERT INTO tasks_v1 (%s) VALUES (%s)''' % (
+                            ', '.join(sorted(insert_data.keys())),
+                            ', '.join(':' + k for k in sorted(insert_data.keys()))),
+                        insert_data)
+
+                    logger.info('Adding taskhash %s with unihash %s', data['taskhash'], unihash)
+                    cursor.execute('SELECT taskhash, method, unihash FROM tasks_v1 WHERE id=:id', {'id': cursor.lastrowid})
+                    row = cursor.fetchone()
+
+                    self.db.commit()
+
+                d = {k: row[k] for k in ('taskhash', 'method', 'unihash')}
+
+                self.send_response(200)
+                self.send_header('Content-Type', 'application/json; charset=utf-8')
+                self.end_headers()
+                self.wfile.write(json.dumps(d).encode('utf-8'))
+        except:
+            logger.exception('Error in POST')
+            self.send_error(400, explain=traceback.format_exc())
+            return
+
+def create_server(addr, db, prefix=''):
+    class Handler(HashEquivalenceServer):
+        pass
+
+    Handler.prefix = prefix
+    Handler.db = db
+    db.row_factory = sqlite3.Row
+
+    with contextlib.closing(db.cursor()) as cursor:
+        cursor.execute('''
+            CREATE TABLE IF NOT EXISTS tasks_v1 (
+                id INTEGER PRIMARY KEY AUTOINCREMENT,
+                method TEXT NOT NULL,
+                outhash TEXT NOT NULL,
+                taskhash TEXT NOT NULL,
+                unihash TEXT NOT NULL,
+                created DATETIME,
+
+                -- Optional fields
+                owner TEXT,
+                PN TEXT,
+                PV TEXT,
+                PR TEXT,
+                task TEXT,
+                outhash_siginfo TEXT
+                )
+            ''')
+
+    logger.info('Starting server on %s', addr)
+    return HTTPServer(addr, Handler)
diff --git a/bitbake/lib/hashserv/tests.py b/bitbake/lib/hashserv/tests.py
new file mode 100644
index 00000000000..806b54c5ebd
--- /dev/null
+++ b/bitbake/lib/hashserv/tests.py
@@ -0,0 +1,141 @@
+#! /usr/bin/env python3
+#
+# Copyright (C) 2018 Garmin Ltd.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License version 2 as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License along
+# with this program; if not, write to the Free Software Foundation, Inc.,
+# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+
+import unittest
+import threading
+import sqlite3
+import hashlib
+import urllib.request
+import json
+from . import create_server
+
+class TestHashEquivalenceServer(unittest.TestCase):
+    def setUp(self):
+        # Start an in memory hash equivalence server in the background bound to
+        # an ephemeral port
+        db = sqlite3.connect(':memory:', check_same_thread=False)
+        self.server = create_server(('localhost', 0), db)
+        self.server_addr = 'http://localhost:%d' % self.server.socket.getsockname()[1]
+        self.server_thread = threading.Thread(target=self.server.serve_forever)
+        self.server_thread.start()
+
+    def tearDown(self):
+        # Shutdown server
+        s = getattr(self, 'server', None)
+        if s is not None:
+            self.server.shutdown()
+            self.server_thread.join()
+            self.server.server_close()
+
+    def send_get(self, path):
+        url = '%s/%s' % (self.server_addr, path)
+        request = urllib.request.Request(url)
+        response = urllib.request.urlopen(request)
+        return json.loads(response.read().decode('utf-8'))
+
+    def send_post(self, path, data):
+        headers = {'content-type': 'application/json'}
+        url = '%s/%s' % (self.server_addr, path)
+        request = urllib.request.Request(url, json.dumps(data).encode('utf-8'), headers)
+        response = urllib.request.urlopen(request)
+        return json.loads(response.read().decode('utf-8'))
+
+    def test_create_hash(self):
+        # Simple test that hashes can be created
+        taskhash = '35788efcb8dfb0a02659d81cf2bfd695fb30faf9'
+        outhash = '2765d4a5884be49b28601445c2760c5f21e7e5c0ee2b7e3fce98fd7e5970796f'
+        unihash = 'f46d3fbb439bd9b921095da657a4de906510d2cd'
+
+        d = self.send_get('v1/equivalent?method=TestMethod&taskhash=%s' % taskhash)
+        self.assertIsNone(d, msg='Found unexpected task, %r' % d)
+
+        d = self.send_post('v1/equivalent', {
+            'taskhash': taskhash,
+            'method': 'TestMethod',
+            'outhash': outhash,
+            'unihash': unihash,
+            })
+        self.assertEqual(d['unihash'], unihash, 'Server returned bad unihash')
+
+    def test_create_equivalent(self):
+        # Tests that a second reported task with the same outhash will be
+        # assigned the same unihash
+        taskhash = '53b8dce672cb6d0c73170be43f540460bfc347b4'
+        outhash = '5a9cb1649625f0bf41fc7791b635cd9c2d7118c7f021ba87dcd03f72b67ce7a8'
+        unihash = 'f37918cc02eb5a520b1aff86faacbc0a38124646'
+        d = self.send_post('v1/equivalent', {
+            'taskhash': taskhash,
+            'method': 'TestMethod',
+            'outhash': outhash,
+            'unihash': unihash,
+            })
+        self.assertEqual(d['unihash'], unihash, 'Server returned bad unihash')
+
+        # Report a different task with the same outhash. The returned unihash
+        # should match the first task
+        taskhash2 = '3bf6f1e89d26205aec90da04854fbdbf73afe6b4'
+        unihash2 = 'af36b199320e611fbb16f1f277d3ee1d619ca58b'
+        d = self.send_post('v1/equivalent', {
+            'taskhash': taskhash2,
+            'method': 'TestMethod',
+            'outhash': outhash,
+            'unihash': unihash2,
+            })
+        self.assertEqual(d['unihash'], unihash, 'Server returned bad unihash')
+
+    def test_duplicate_taskhash(self):
+        # Tests that duplicate reports of the same taskhash with different
+        # outhash & unihash always return the unihash from the first reported
+        # taskhash
+        taskhash = '8aa96fcffb5831b3c2c0cb75f0431e3f8b20554a'
+        outhash = 'afe240a439959ce86f5e322f8c208e1fedefea9e813f2140c81af866cc9edf7e'
+        unihash = '218e57509998197d570e2c98512d0105985dffc9'
+        d = self.send_post('v1/equivalent', {
+            'taskhash': taskhash,
+            'method': 'TestMethod',
+            'outhash': outhash,
+            'unihash': unihash,
+            })
+
+        d = self.send_get('v1/equivalent?method=TestMethod&taskhash=%s' % taskhash)
+        self.assertEqual(d['unihash'], unihash)
+
+        outhash2 = '0904a7fe3dc712d9fd8a74a616ddca2a825a8ee97adf0bd3fc86082c7639914d'
+        unihash2 = 'ae9a7d252735f0dafcdb10e2e02561ca3a47314c'
+        d = self.send_post('v1/equivalent', {
+            'taskhash': taskhash,
+            'method': 'TestMethod',
+            'outhash': outhash2,
+            'unihash': unihash2
+            })
+
+        d = self.send_get('v1/equivalent?method=TestMethod&taskhash=%s' % taskhash)
+        self.assertEqual(d['unihash'], unihash)
+
+        outhash3 = '77623a549b5b1a31e3732dfa8fe61d7ce5d44b3370f253c5360e136b852967b4'
+        unihash3 = '9217a7d6398518e5dc002ed58f2cbbbc78696603'
+        d = self.send_post('v1/equivalent', {
+            'taskhash': taskhash,
+            'method': 'TestMethod',
+            'outhash': outhash3,
+            'unihash': unihash3
+            })
+
+        d = self.send_get('v1/equivalent?method=TestMethod&taskhash=%s' % taskhash)
+        self.assertEqual(d['unihash'], unihash)
+
+
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [PATCH v6 2/3] bitbake: hashserv: Add hash equivalence reference server
@ 2019-01-04  2:42           ` Joshua Watt
  0 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2019-01-04  2:42 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

Implements a reference implementation of the hash equivalence server.
This server has minimal dependencies (and no dependencies outside of the
standard Python library), and implements the minimum required to be a
conforming hash equivalence server.

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/bin/bitbake-hashserv     |  67 ++++++++++++++
 bitbake/bin/bitbake-selftest     |   2 +
 bitbake/lib/hashserv/__init__.py | 152 +++++++++++++++++++++++++++++++
 bitbake/lib/hashserv/tests.py    | 141 ++++++++++++++++++++++++++++
 4 files changed, 362 insertions(+)
 create mode 100755 bitbake/bin/bitbake-hashserv
 create mode 100644 bitbake/lib/hashserv/__init__.py
 create mode 100644 bitbake/lib/hashserv/tests.py

diff --git a/bitbake/bin/bitbake-hashserv b/bitbake/bin/bitbake-hashserv
new file mode 100755
index 00000000000..c49397b73a5
--- /dev/null
+++ b/bitbake/bin/bitbake-hashserv
@@ -0,0 +1,67 @@
+#! /usr/bin/env python3
+#
+# Copyright (C) 2018 Garmin Ltd.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License version 2 as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License along
+# with this program; if not, write to the Free Software Foundation, Inc.,
+# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+import os
+import sys
+import logging
+import argparse
+import sqlite3
+
+sys.path.insert(0, os.path.join(os.path.dirname(os.path.dirname(__file__)),'lib'))
+
+import hashserv
+
+VERSION = "1.0.0"
+
+DEFAULT_HOST = ''
+DEFAULT_PORT = 8686
+
+def main():
+    parser = argparse.ArgumentParser(description='HTTP Equivalence Reference Server. Version=%s' % VERSION)
+    parser.add_argument('--address', default=DEFAULT_HOST, help='Bind address (default "%(default)s")')
+    parser.add_argument('--port', type=int, default=DEFAULT_PORT, help='Bind port (default %(default)d)')
+    parser.add_argument('--prefix', default='', help='HTTP path prefix (default "%(default)s")')
+    parser.add_argument('--database', default='./hashserv.db', help='Database file (default "%(default)s")')
+    parser.add_argument('--log', default='WARNING', help='Set logging level')
+
+    args = parser.parse_args()
+
+    logger = logging.getLogger('hashserv')
+
+    level = getattr(logging, args.log.upper(), None)
+    if not isinstance(level, int):
+        raise ValueError('Invalid log level: %s' % args.log)
+
+    logger.setLevel(level)
+    console = logging.StreamHandler()
+    console.setLevel(level)
+    logger.addHandler(console)
+
+    db = sqlite3.connect(args.database)
+
+    server = hashserv.create_server((args.address, args.port), db, args.prefix)
+    server.serve_forever()
+    return 0
+
+if __name__ == '__main__':
+    try:
+        ret = main()
+    except Exception:
+        ret = 1
+        import traceback
+        traceback.print_exc()
+    sys.exit(ret)
+
diff --git a/bitbake/bin/bitbake-selftest b/bitbake/bin/bitbake-selftest
index c970dcae90c..99f1af910f4 100755
--- a/bitbake/bin/bitbake-selftest
+++ b/bitbake/bin/bitbake-selftest
@@ -22,6 +22,7 @@ sys.path.insert(0, os.path.join(os.path.dirname(os.path.dirname(__file__)), 'lib
 import unittest
 try:
     import bb
+    import hashserv
     import layerindexlib
 except RuntimeError as exc:
     sys.exit(str(exc))
@@ -35,6 +36,7 @@ tests = ["bb.tests.codeparser",
          "bb.tests.parse",
          "bb.tests.persist_data",
          "bb.tests.utils",
+         "hashserv.tests",
          "layerindexlib.tests.layerindexobj",
          "layerindexlib.tests.restapi",
          "layerindexlib.tests.cooker"]
diff --git a/bitbake/lib/hashserv/__init__.py b/bitbake/lib/hashserv/__init__.py
new file mode 100644
index 00000000000..46bca7cab32
--- /dev/null
+++ b/bitbake/lib/hashserv/__init__.py
@@ -0,0 +1,152 @@
+# Copyright (C) 2018 Garmin Ltd.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License version 2 as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License along
+# with this program; if not, write to the Free Software Foundation, Inc.,
+# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+
+from http.server import BaseHTTPRequestHandler, HTTPServer
+import contextlib
+import urllib.parse
+import sqlite3
+import json
+import traceback
+import logging
+from datetime import datetime
+
+logger = logging.getLogger('hashserv')
+
+class HashEquivalenceServer(BaseHTTPRequestHandler):
+    def log_message(self, f, *args):
+        logger.debug(f, *args)
+
+    def do_GET(self):
+        try:
+            p = urllib.parse.urlparse(self.path)
+
+            if p.path != self.prefix + '/v1/equivalent':
+                self.send_error(404)
+                return
+
+            query = urllib.parse.parse_qs(p.query, strict_parsing=True)
+            method = query['method'][0]
+            taskhash = query['taskhash'][0]
+
+            d = None
+            with contextlib.closing(self.db.cursor()) as cursor:
+                cursor.execute('SELECT taskhash, method, unihash FROM tasks_v1 WHERE method=:method AND taskhash=:taskhash ORDER BY created ASC LIMIT 1',
+                        {'method': method, 'taskhash': taskhash})
+
+                row = cursor.fetchone()
+
+                if row is not None:
+                    logger.debug('Found equivalent task %s', row['taskhash'])
+                    d = {k: row[k] for k in ('taskhash', 'method', 'unihash')}
+
+            self.send_response(200)
+            self.send_header('Content-Type', 'application/json; charset=utf-8')
+            self.end_headers()
+            self.wfile.write(json.dumps(d).encode('utf-8'))
+        except:
+            logger.exception('Error in GET')
+            self.send_error(400, explain=traceback.format_exc())
+            return
+
+    def do_POST(self):
+        try:
+            p = urllib.parse.urlparse(self.path)
+
+            if p.path != self.prefix + '/v1/equivalent':
+                self.send_error(404)
+                return
+
+            length = int(self.headers['content-length'])
+            data = json.loads(self.rfile.read(length).decode('utf-8'))
+
+            with contextlib.closing(self.db.cursor()) as cursor:
+                cursor.execute('''
+                    SELECT taskhash, method, unihash FROM tasks_v1 WHERE method=:method AND outhash=:outhash
+                    ORDER BY CASE WHEN taskhash=:taskhash THEN 1 ELSE 2 END,
+                        created ASC
+                    LIMIT 1
+                    ''', {k: data[k] for k in ('method', 'outhash', 'taskhash')})
+
+                row = cursor.fetchone()
+
+                if row is None or row['taskhash'] != data['taskhash']:
+                    unihash = data['unihash']
+                    if row is not None:
+                        unihash = row['unihash']
+
+                    insert_data = {
+                            'method': data['method'],
+                            'outhash': data['outhash'],
+                            'taskhash': data['taskhash'],
+                            'unihash': unihash,
+                            'created': datetime.now()
+                            }
+
+                    for k in ('owner', 'PN', 'PV', 'PR', 'task', 'outhash_siginfo'):
+                        if k in data:
+                            insert_data[k] = data[k]
+
+                    cursor.execute('''INSERT INTO tasks_v1 (%s) VALUES (%s)''' % (
+                            ', '.join(sorted(insert_data.keys())),
+                            ', '.join(':' + k for k in sorted(insert_data.keys()))),
+                        insert_data)
+
+                    logger.info('Adding taskhash %s with unihash %s', data['taskhash'], unihash)
+                    cursor.execute('SELECT taskhash, method, unihash FROM tasks_v1 WHERE id=:id', {'id': cursor.lastrowid})
+                    row = cursor.fetchone()
+
+                    self.db.commit()
+
+                d = {k: row[k] for k in ('taskhash', 'method', 'unihash')}
+
+                self.send_response(200)
+                self.send_header('Content-Type', 'application/json; charset=utf-8')
+                self.end_headers()
+                self.wfile.write(json.dumps(d).encode('utf-8'))
+        except:
+            logger.exception('Error in POST')
+            self.send_error(400, explain=traceback.format_exc())
+            return
+
+def create_server(addr, db, prefix=''):
+    class Handler(HashEquivalenceServer):
+        pass
+
+    Handler.prefix = prefix
+    Handler.db = db
+    db.row_factory = sqlite3.Row
+
+    with contextlib.closing(db.cursor()) as cursor:
+        cursor.execute('''
+            CREATE TABLE IF NOT EXISTS tasks_v1 (
+                id INTEGER PRIMARY KEY AUTOINCREMENT,
+                method TEXT NOT NULL,
+                outhash TEXT NOT NULL,
+                taskhash TEXT NOT NULL,
+                unihash TEXT NOT NULL,
+                created DATETIME,
+
+                -- Optional fields
+                owner TEXT,
+                PN TEXT,
+                PV TEXT,
+                PR TEXT,
+                task TEXT,
+                outhash_siginfo TEXT
+                )
+            ''')
+
+    logger.info('Starting server on %s', addr)
+    return HTTPServer(addr, Handler)
diff --git a/bitbake/lib/hashserv/tests.py b/bitbake/lib/hashserv/tests.py
new file mode 100644
index 00000000000..806b54c5ebd
--- /dev/null
+++ b/bitbake/lib/hashserv/tests.py
@@ -0,0 +1,141 @@
+#! /usr/bin/env python3
+#
+# Copyright (C) 2018 Garmin Ltd.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License version 2 as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License along
+# with this program; if not, write to the Free Software Foundation, Inc.,
+# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+
+import unittest
+import threading
+import sqlite3
+import hashlib
+import urllib.request
+import json
+from . import create_server
+
+class TestHashEquivalenceServer(unittest.TestCase):
+    def setUp(self):
+        # Start an in memory hash equivalence server in the background bound to
+        # an ephemeral port
+        db = sqlite3.connect(':memory:', check_same_thread=False)
+        self.server = create_server(('localhost', 0), db)
+        self.server_addr = 'http://localhost:%d' % self.server.socket.getsockname()[1]
+        self.server_thread = threading.Thread(target=self.server.serve_forever)
+        self.server_thread.start()
+
+    def tearDown(self):
+        # Shutdown server
+        s = getattr(self, 'server', None)
+        if s is not None:
+            self.server.shutdown()
+            self.server_thread.join()
+            self.server.server_close()
+
+    def send_get(self, path):
+        url = '%s/%s' % (self.server_addr, path)
+        request = urllib.request.Request(url)
+        response = urllib.request.urlopen(request)
+        return json.loads(response.read().decode('utf-8'))
+
+    def send_post(self, path, data):
+        headers = {'content-type': 'application/json'}
+        url = '%s/%s' % (self.server_addr, path)
+        request = urllib.request.Request(url, json.dumps(data).encode('utf-8'), headers)
+        response = urllib.request.urlopen(request)
+        return json.loads(response.read().decode('utf-8'))
+
+    def test_create_hash(self):
+        # Simple test that hashes can be created
+        taskhash = '35788efcb8dfb0a02659d81cf2bfd695fb30faf9'
+        outhash = '2765d4a5884be49b28601445c2760c5f21e7e5c0ee2b7e3fce98fd7e5970796f'
+        unihash = 'f46d3fbb439bd9b921095da657a4de906510d2cd'
+
+        d = self.send_get('v1/equivalent?method=TestMethod&taskhash=%s' % taskhash)
+        self.assertIsNone(d, msg='Found unexpected task, %r' % d)
+
+        d = self.send_post('v1/equivalent', {
+            'taskhash': taskhash,
+            'method': 'TestMethod',
+            'outhash': outhash,
+            'unihash': unihash,
+            })
+        self.assertEqual(d['unihash'], unihash, 'Server returned bad unihash')
+
+    def test_create_equivalent(self):
+        # Tests that a second reported task with the same outhash will be
+        # assigned the same unihash
+        taskhash = '53b8dce672cb6d0c73170be43f540460bfc347b4'
+        outhash = '5a9cb1649625f0bf41fc7791b635cd9c2d7118c7f021ba87dcd03f72b67ce7a8'
+        unihash = 'f37918cc02eb5a520b1aff86faacbc0a38124646'
+        d = self.send_post('v1/equivalent', {
+            'taskhash': taskhash,
+            'method': 'TestMethod',
+            'outhash': outhash,
+            'unihash': unihash,
+            })
+        self.assertEqual(d['unihash'], unihash, 'Server returned bad unihash')
+
+        # Report a different task with the same outhash. The returned unihash
+        # should match the first task
+        taskhash2 = '3bf6f1e89d26205aec90da04854fbdbf73afe6b4'
+        unihash2 = 'af36b199320e611fbb16f1f277d3ee1d619ca58b'
+        d = self.send_post('v1/equivalent', {
+            'taskhash': taskhash2,
+            'method': 'TestMethod',
+            'outhash': outhash,
+            'unihash': unihash2,
+            })
+        self.assertEqual(d['unihash'], unihash, 'Server returned bad unihash')
+
+    def test_duplicate_taskhash(self):
+        # Tests that duplicate reports of the same taskhash with different
+        # outhash & unihash always return the unihash from the first reported
+        # taskhash
+        taskhash = '8aa96fcffb5831b3c2c0cb75f0431e3f8b20554a'
+        outhash = 'afe240a439959ce86f5e322f8c208e1fedefea9e813f2140c81af866cc9edf7e'
+        unihash = '218e57509998197d570e2c98512d0105985dffc9'
+        d = self.send_post('v1/equivalent', {
+            'taskhash': taskhash,
+            'method': 'TestMethod',
+            'outhash': outhash,
+            'unihash': unihash,
+            })
+
+        d = self.send_get('v1/equivalent?method=TestMethod&taskhash=%s' % taskhash)
+        self.assertEqual(d['unihash'], unihash)
+
+        outhash2 = '0904a7fe3dc712d9fd8a74a616ddca2a825a8ee97adf0bd3fc86082c7639914d'
+        unihash2 = 'ae9a7d252735f0dafcdb10e2e02561ca3a47314c'
+        d = self.send_post('v1/equivalent', {
+            'taskhash': taskhash,
+            'method': 'TestMethod',
+            'outhash': outhash2,
+            'unihash': unihash2
+            })
+
+        d = self.send_get('v1/equivalent?method=TestMethod&taskhash=%s' % taskhash)
+        self.assertEqual(d['unihash'], unihash)
+
+        outhash3 = '77623a549b5b1a31e3732dfa8fe61d7ce5d44b3370f253c5360e136b852967b4'
+        unihash3 = '9217a7d6398518e5dc002ed58f2cbbbc78696603'
+        d = self.send_post('v1/equivalent', {
+            'taskhash': taskhash,
+            'method': 'TestMethod',
+            'outhash': outhash3,
+            'unihash': unihash3
+            })
+
+        d = self.send_get('v1/equivalent?method=TestMethod&taskhash=%s' % taskhash)
+        self.assertEqual(d['unihash'], unihash)
+
+
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [OE-core][PATCH v6 3/3] sstate: Implement hash equivalence sstate
  2019-01-04  2:42         ` [PATCH " Joshua Watt
@ 2019-01-04  2:42           ` Joshua Watt
  -1 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2019-01-04  2:42 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

Converts sstate so that it can use a hash equivalence server to
determine if a task really needs to be rebuilt, or if it can be restored
from a different (equivalent) sstate object.

The unique hashes are cached persistently using persist_data. This has
a number of advantages:
 1) Unique hashes can be cached between invocations of bitbake to
    prevent needing to contact the server every time (which is slow)
 2) The value of each tasks unique hash can easily be synchronized
    between different threads, which will be useful if bitbake is
    updated to do on the fly task re-hashing.

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 meta/classes/sstate.bbclass | 100 +++++++++++++++++++--
 meta/conf/bitbake.conf      |   4 +-
 meta/lib/oe/sstatesig.py    | 167 ++++++++++++++++++++++++++++++++++++
 3 files changed, 262 insertions(+), 9 deletions(-)

diff --git a/meta/classes/sstate.bbclass b/meta/classes/sstate.bbclass
index 8f3cd083e85..46dbe7cb96d 100644
--- a/meta/classes/sstate.bbclass
+++ b/meta/classes/sstate.bbclass
@@ -11,7 +11,7 @@ def generate_sstatefn(spec, hash, d):
 SSTATE_PKGARCH    = "${PACKAGE_ARCH}"
 SSTATE_PKGSPEC    = "sstate:${PN}:${PACKAGE_ARCH}${TARGET_VENDOR}-${TARGET_OS}:${PV}:${PR}:${SSTATE_PKGARCH}:${SSTATE_VERSION}:"
 SSTATE_SWSPEC     = "sstate:${PN}::${PV}:${PR}::${SSTATE_VERSION}:"
-SSTATE_PKGNAME    = "${SSTATE_EXTRAPATH}${@generate_sstatefn(d.getVar('SSTATE_PKGSPEC'), d.getVar('BB_TASKHASH'), d)}"
+SSTATE_PKGNAME    = "${SSTATE_EXTRAPATH}${@generate_sstatefn(d.getVar('SSTATE_PKGSPEC'), d.getVar('BB_UNIHASH'), d)}"
 SSTATE_PKG        = "${SSTATE_DIR}/${SSTATE_PKGNAME}"
 SSTATE_EXTRAPATH   = ""
 SSTATE_EXTRAPATHWILDCARD = ""
@@ -82,6 +82,23 @@ SSTATE_SIG_PASSPHRASE ?= ""
 # Whether to verify the GnUPG signatures when extracting sstate archives
 SSTATE_VERIFY_SIG ?= "0"
 
+SSTATE_HASHEQUIV_METHOD ?= "OEOuthashBasic"
+SSTATE_HASHEQUIV_METHOD[doc] = "The function used to calculate the output hash \
+    for a task, which in turn is used to determine equivalency. \
+    "
+
+SSTATE_HASHEQUIV_SERVER ?= ""
+SSTATE_HASHEQUIV_SERVER[doc] = "The hash equivalence sever. For example, \
+    'http://192.168.0.1:5000'. Do not include a trailing slash \
+    "
+
+SSTATE_HASHEQUIV_REPORT_TASKDATA ?= "0"
+SSTATE_HASHEQUIV_REPORT_TASKDATA[doc] = "Report additional useful data to the \
+    hash equivalency server, such as PN, PV, taskname, etc. This information \
+    is very useful for developers looking at task data, but may leak sensitive \
+    data if the equivalence server is public. \
+    "
+
 python () {
     if bb.data.inherits_class('native', d):
         d.setVar('SSTATE_PKGARCH', d.getVar('BUILD_ARCH', False))
@@ -640,7 +657,7 @@ def sstate_package(ss, d):
         return
 
     for f in (d.getVar('SSTATECREATEFUNCS') or '').split() + \
-             ['sstate_create_package', 'sstate_sign_package'] + \
+             ['sstate_report_unihash', 'sstate_create_package', 'sstate_sign_package'] + \
              (d.getVar('SSTATEPOSTCREATEFUNCS') or '').split():
         # All hooks should run in SSTATE_BUILDDIR.
         bb.build.exec_func(f, d, (sstatebuild,))
@@ -764,6 +781,73 @@ python sstate_sign_package () {
                            d.getVar('SSTATE_SIG_PASSPHRASE'), armor=False)
 }
 
+def OEOuthashBasic(path, sigfile, task, d):
+    import hashlib
+    import stat
+
+    def update_hash(s):
+        s = s.encode('utf-8')
+        h.update(s)
+        if sigfile:
+            sigfile.write(s)
+
+    h = hashlib.sha256()
+    prev_dir = os.getcwd()
+
+    try:
+        os.chdir(path)
+
+        update_hash("OEOuthashBasic\n")
+
+        # It is only currently useful to get equivalent hashes for things that
+        # can be restored from sstate. Since the sstate object is named using
+        # SSTATE_PKGSPEC and the task name, those should be included in the
+        # output hash calculation.
+        update_hash("SSTATE_PKGSPEC=%s\n" % d.getVar('SSTATE_PKGSPEC'))
+        update_hash("task=%s\n" % task)
+
+        for root, dirs, files in os.walk('.', topdown=True):
+            # Sort directories and files to ensure consistent ordering
+            dirs.sort()
+            files.sort()
+
+            for f in files:
+                path = os.path.join(root, f)
+                s = os.lstat(path)
+
+                # Hash file path
+                update_hash(path + '\n')
+
+                # Hash file mode
+                update_hash("\tmode=0x%x\n" % stat.S_IMODE(s.st_mode))
+                update_hash("\ttype=0x%x\n" % stat.S_IFMT(s.st_mode))
+
+                if stat.S_ISBLK(s.st_mode) or stat.S_ISBLK(s.st_mode):
+                    # Hash device major and minor
+                    update_hash("\tdev=%d,%d\n" % (os.major(s.st_rdev), os.minor(s.st_rdev)))
+                elif stat.S_ISLNK(s.st_mode):
+                    # Hash symbolic link
+                    update_hash("\tsymlink=%s\n" % os.readlink(path))
+                else:
+                    fh = hashlib.sha256()
+                    # Hash file contents
+                    with open(path, 'rb') as d:
+                        for chunk in iter(lambda: d.read(4096), b""):
+                            fh.update(chunk)
+                    update_hash("\tdigest=%s\n" % fh.hexdigest())
+    finally:
+        os.chdir(prev_dir)
+
+    return h.hexdigest()
+
+python sstate_report_unihash() {
+    report_unihash = getattr(bb.parse.siggen, 'report_unihash', None)
+
+    if report_unihash:
+        ss = sstate_state_fromvars(d)
+        report_unihash(os.getcwd(), ss['task'], d)
+}
+
 #
 # Shell function to decompress and prepare a package for installation
 # Will be run from within SSTATE_INSTDIR.
@@ -810,7 +894,7 @@ def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False, *,
 
         spec, extrapath, tname = getpathcomponents(task, d)
 
-        sstatefile = d.expand("${SSTATE_DIR}/" + extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + extension)
+        sstatefile = d.expand("${SSTATE_DIR}/" + extrapath + generate_sstatefn(spec, sq_unihash[task], d) + "_" + tname + extension)
 
         if os.path.exists(sstatefile):
             bb.debug(2, "SState: Found valid sstate file %s" % sstatefile)
@@ -872,7 +956,7 @@ def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False, *,
             if task in ret:
                 continue
             spec, extrapath, tname = getpathcomponents(task, d)
-            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + extension)
+            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_unihash[task], d) + "_" + tname + extension)
             tasklist.append((task, sstatefile))
 
         if tasklist:
@@ -898,12 +982,12 @@ def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False, *,
         evdata = {'missed': [], 'found': []};
         for task in missed:
             spec, extrapath, tname = getpathcomponents(task, d)
-            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + ".tgz")
-            evdata['missed'].append( (sq_fn[task], sq_task[task], sq_hash[task], sstatefile ) )
+            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_unihash[task], d) + "_" + tname + ".tgz")
+            evdata['missed'].append( (sq_fn[task], sq_task[task], sq_unihash[task], sstatefile ) )
         for task in ret:
             spec, extrapath, tname = getpathcomponents(task, d)
-            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + ".tgz")
-            evdata['found'].append( (sq_fn[task], sq_task[task], sq_hash[task], sstatefile ) )
+            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_unihash[task], d) + "_" + tname + ".tgz")
+            evdata['found'].append( (sq_fn[task], sq_task[task], sq_unihash[task], sstatefile ) )
         bb.event.fire(bb.event.MetadataEvent("MissedSstate", evdata), d)
 
     # Print some summary statistics about the current task completion and how much sstate
diff --git a/meta/conf/bitbake.conf b/meta/conf/bitbake.conf
index 64800623545..e64ce6a6dab 100644
--- a/meta/conf/bitbake.conf
+++ b/meta/conf/bitbake.conf
@@ -867,7 +867,9 @@ BB_HASHBASE_WHITELIST ?= "TMPDIR FILE PATH PWD BB_TASKHASH BBPATH BBSERVER DL_DI
     STAMPS_DIR PRSERV_DUMPDIR PRSERV_DUMPFILE PRSERV_LOCKDOWN PARALLEL_MAKE \
     CCACHE_DIR EXTERNAL_TOOLCHAIN CCACHE CCACHE_NOHASHDIR LICENSE_PATH SDKPKGSUFFIX \
     WARN_QA ERROR_QA WORKDIR STAMPCLEAN PKGDATA_DIR BUILD_ARCH SSTATE_PKGARCH \
-    BB_WORKERCONTEXT BB_LIMITEDDEPS extend_recipe_sysroot DEPLOY_DIR"
+    BB_WORKERCONTEXT BB_LIMITEDDEPS BB_UNIHASH extend_recipe_sysroot DEPLOY_DIR \
+    SSTATE_HASHEQUIV_METHOD SSTATE_HASHEQUIV_SERVER SSTATE_HASHEQUIV_REPORT_TASKDATA \
+    SSTATE_HASHEQUIV_OWNER"
 BB_HASHCONFIG_WHITELIST ?= "${BB_HASHBASE_WHITELIST} DATE TIME SSH_AGENT_PID \
     SSH_AUTH_SOCK PSEUDO_BUILD BB_ENV_EXTRAWHITE DISABLE_SANITY_CHECKS \
     PARALLEL_MAKE BB_NUMBER_THREADS BB_ORIGENV BB_INVALIDCONF BBINCLUDED \
diff --git a/meta/lib/oe/sstatesig.py b/meta/lib/oe/sstatesig.py
index 18c5a353a2a..059e165c7ab 100644
--- a/meta/lib/oe/sstatesig.py
+++ b/meta/lib/oe/sstatesig.py
@@ -263,10 +263,177 @@ class SignatureGeneratorOEBasicHash(bb.siggen.SignatureGeneratorBasicHash):
         if error_msgs:
             bb.fatal("\n".join(error_msgs))
 
+class SignatureGeneratorOEEquivHash(SignatureGeneratorOEBasicHash):
+    name = "OEEquivHash"
+
+    def init_rundepcheck(self, data):
+        super().init_rundepcheck(data)
+        self.server = data.getVar('SSTATE_HASHEQUIV_SERVER')
+        self.method = data.getVar('SSTATE_HASHEQUIV_METHOD')
+        self.unihashes = bb.persist_data.persist('SSTATESIG_UNIHASH_CACHE_v1_' + self.method, data)
+
+    def get_taskdata(self):
+        return (self.server, self.method) + super().get_taskdata()
+
+    def set_taskdata(self, data):
+        self.server, self.method = data[:2]
+        super().set_taskdata(data[2:])
+
+    def __get_task_unihash_key(self, task):
+        # TODO: The key only *needs* to be the taskhash, the task is just
+        # convenient
+        return '%s:%s' % (task, self.taskhash[task])
+
+    def get_stampfile_hash(self, task):
+        if task in self.taskhash:
+            # If a unique hash is reported, use it as the stampfile hash. This
+            # ensures that if a task won't be re-run if the taskhash changes,
+            # but it would result in the same output hash
+            unihash = self.unihashes.get(self.__get_task_unihash_key(task))
+            if unihash is not None:
+                return unihash
+
+        return super().get_stampfile_hash(task)
+
+    def get_unihash(self, task):
+        import urllib
+        import json
+
+        taskhash = self.taskhash[task]
+
+        key = self.__get_task_unihash_key(task)
+
+        # TODO: This cache can grow unbounded. It probably only needs to keep
+        # for each task
+        unihash = self.unihashes.get(key)
+        if unihash is not None:
+            return unihash
+
+        # In the absence of being able to discover a unique hash from the
+        # server, make it be equivalent to the taskhash. The unique "hash" only
+        # really needs to be a unique string (not even necessarily a hash), but
+        # making it match the taskhash has a few advantages:
+        #
+        # 1) All of the sstate code that assumes hashes can be the same
+        # 2) It provides maximal compatibility with builders that don't use
+        #    an equivalency server
+        # 3) The value is easy for multiple independent builders to derive the
+        #    same unique hash from the same input. This means that if the
+        #    independent builders find the same taskhash, but it isn't reported
+        #    to the server, there is a better chance that they will agree on
+        #    the unique hash.
+        unihash = taskhash
+
+        try:
+            url = '%s/v1/equivalent?%s' % (self.server,
+                    urllib.parse.urlencode({'method': self.method, 'taskhash': self.taskhash[task]}))
+
+            request = urllib.request.Request(url)
+            response = urllib.request.urlopen(request)
+            data = response.read().decode('utf-8')
+
+            json_data = json.loads(data)
+
+            if json_data:
+                unihash = json_data['unihash']
+                # A unique hash equal to the taskhash is not very interesting,
+                # so it is reported it at debug level 2. If they differ, that
+                # is much more interesting, so it is reported at debug level 1
+                bb.debug((1, 2)[unihash == taskhash], 'Found unihash %s in place of %s for %s from %s' % (unihash, taskhash, task, self.server))
+            else:
+                bb.debug(2, 'No reported unihash for %s:%s from %s' % (task, taskhash, self.server))
+        except urllib.error.URLError as e:
+            bb.warn('Failure contacting Hash Equivalence Server %s: %s' % (self.server, str(e)))
+        except (KeyError, json.JSONDecodeError) as e:
+            bb.warn('Poorly formatted response from %s: %s' % (self.server, str(e)))
+
+        self.unihashes[key] = unihash
+        return unihash
+
+    def report_unihash(self, path, task, d):
+        import urllib
+        import json
+        import tempfile
+        import base64
+
+        taskhash = d.getVar('BB_TASKHASH')
+        unihash = d.getVar('BB_UNIHASH')
+        report_taskdata = d.getVar('SSTATE_HASHEQUIV_REPORT_TASKDATA') == '1'
+        tempdir = d.getVar('T')
+        fn = d.getVar('BB_FILENAME')
+        key = fn + '.do_' + task + ':' + taskhash
+
+        # Sanity checks
+        cache_unihash = self.unihashes.get(key)
+        if cache_unihash is None:
+            bb.fatal('%s not in unihash cache. Please report this error' % key)
+
+        if cache_unihash != unihash:
+            bb.fatal("Cache unihash %s doesn't match BB_UNIHASH %s" % (cache_unihash, unihash))
+
+        sigfile = None
+        sigfile_name = "depsig.do_%s.%d" % (task, os.getpid())
+        sigfile_link = "depsig.do_%s" % task
+
+        try:
+            call = self.method + '(path, sigfile, task, d)'
+            sigfile = open(os.path.join(tempdir, sigfile_name), 'w+b')
+            locs = {'path': path, 'sigfile': sigfile, 'task': task, 'd': d}
+
+            outhash = bb.utils.better_eval(call, locs)
+
+            try:
+                url = '%s/v1/equivalent' % self.server
+                task_data = {
+                    'taskhash': taskhash,
+                    'method': self.method,
+                    'outhash': outhash,
+                    'unihash': unihash,
+                    'owner': d.getVar('SSTATE_HASHEQUIV_OWNER')
+                    }
+
+                if report_taskdata:
+                    sigfile.seek(0)
+
+                    task_data['PN'] = d.getVar('PN')
+                    task_data['PV'] = d.getVar('PV')
+                    task_data['PR'] = d.getVar('PR')
+                    task_data['task'] = task
+                    task_data['outhash_siginfo'] = sigfile.read().decode('utf-8')
+
+                headers = {'content-type': 'application/json'}
+
+                request = urllib.request.Request(url, json.dumps(task_data).encode('utf-8'), headers)
+                response = urllib.request.urlopen(request)
+                data = response.read().decode('utf-8')
+
+                json_data = json.loads(data)
+                new_unihash = json_data['unihash']
+
+                if new_unihash != unihash:
+                    bb.debug(1, 'Task %s unihash changed %s -> %s by server %s' % (taskhash, unihash, new_unihash, self.server))
+                else:
+                    bb.debug(1, 'Reported task %s as unihash %s to %s' % (taskhash, unihash, self.server))
+            except urllib.error.URLError as e:
+                bb.warn('Failure contacting Hash Equivalence Server %s: %s' % (self.server, str(e)))
+            except (KeyError, json.JSONDecodeError) as e:
+                bb.warn('Poorly formatted response from %s: %s' % (self.server, str(e)))
+        finally:
+            if sigfile:
+                sigfile.close()
+
+                sigfile_link_path = os.path.join(tempdir, sigfile_link)
+                bb.utils.remove(sigfile_link_path)
+
+                try:
+                    os.symlink(sigfile_name, sigfile_link_path)
+                except OSError:
+                    pass
 
 # Insert these classes into siggen's namespace so it can see and select them
 bb.siggen.SignatureGeneratorOEBasic = SignatureGeneratorOEBasic
 bb.siggen.SignatureGeneratorOEBasicHash = SignatureGeneratorOEBasicHash
+bb.siggen.SignatureGeneratorOEEquivHash = SignatureGeneratorOEEquivHash
 
 
 def find_siginfo(pn, taskname, taskhashlist, d):
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [PATCH v6 3/3] sstate: Implement hash equivalence sstate
@ 2019-01-04  2:42           ` Joshua Watt
  0 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2019-01-04  2:42 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

Converts sstate so that it can use a hash equivalence server to
determine if a task really needs to be rebuilt, or if it can be restored
from a different (equivalent) sstate object.

The unique hashes are cached persistently using persist_data. This has
a number of advantages:
 1) Unique hashes can be cached between invocations of bitbake to
    prevent needing to contact the server every time (which is slow)
 2) The value of each tasks unique hash can easily be synchronized
    between different threads, which will be useful if bitbake is
    updated to do on the fly task re-hashing.

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 meta/classes/sstate.bbclass | 100 +++++++++++++++++++--
 meta/conf/bitbake.conf      |   4 +-
 meta/lib/oe/sstatesig.py    | 167 ++++++++++++++++++++++++++++++++++++
 3 files changed, 262 insertions(+), 9 deletions(-)

diff --git a/meta/classes/sstate.bbclass b/meta/classes/sstate.bbclass
index 8f3cd083e85..46dbe7cb96d 100644
--- a/meta/classes/sstate.bbclass
+++ b/meta/classes/sstate.bbclass
@@ -11,7 +11,7 @@ def generate_sstatefn(spec, hash, d):
 SSTATE_PKGARCH    = "${PACKAGE_ARCH}"
 SSTATE_PKGSPEC    = "sstate:${PN}:${PACKAGE_ARCH}${TARGET_VENDOR}-${TARGET_OS}:${PV}:${PR}:${SSTATE_PKGARCH}:${SSTATE_VERSION}:"
 SSTATE_SWSPEC     = "sstate:${PN}::${PV}:${PR}::${SSTATE_VERSION}:"
-SSTATE_PKGNAME    = "${SSTATE_EXTRAPATH}${@generate_sstatefn(d.getVar('SSTATE_PKGSPEC'), d.getVar('BB_TASKHASH'), d)}"
+SSTATE_PKGNAME    = "${SSTATE_EXTRAPATH}${@generate_sstatefn(d.getVar('SSTATE_PKGSPEC'), d.getVar('BB_UNIHASH'), d)}"
 SSTATE_PKG        = "${SSTATE_DIR}/${SSTATE_PKGNAME}"
 SSTATE_EXTRAPATH   = ""
 SSTATE_EXTRAPATHWILDCARD = ""
@@ -82,6 +82,23 @@ SSTATE_SIG_PASSPHRASE ?= ""
 # Whether to verify the GnUPG signatures when extracting sstate archives
 SSTATE_VERIFY_SIG ?= "0"
 
+SSTATE_HASHEQUIV_METHOD ?= "OEOuthashBasic"
+SSTATE_HASHEQUIV_METHOD[doc] = "The function used to calculate the output hash \
+    for a task, which in turn is used to determine equivalency. \
+    "
+
+SSTATE_HASHEQUIV_SERVER ?= ""
+SSTATE_HASHEQUIV_SERVER[doc] = "The hash equivalence sever. For example, \
+    'http://192.168.0.1:5000'. Do not include a trailing slash \
+    "
+
+SSTATE_HASHEQUIV_REPORT_TASKDATA ?= "0"
+SSTATE_HASHEQUIV_REPORT_TASKDATA[doc] = "Report additional useful data to the \
+    hash equivalency server, such as PN, PV, taskname, etc. This information \
+    is very useful for developers looking at task data, but may leak sensitive \
+    data if the equivalence server is public. \
+    "
+
 python () {
     if bb.data.inherits_class('native', d):
         d.setVar('SSTATE_PKGARCH', d.getVar('BUILD_ARCH', False))
@@ -640,7 +657,7 @@ def sstate_package(ss, d):
         return
 
     for f in (d.getVar('SSTATECREATEFUNCS') or '').split() + \
-             ['sstate_create_package', 'sstate_sign_package'] + \
+             ['sstate_report_unihash', 'sstate_create_package', 'sstate_sign_package'] + \
              (d.getVar('SSTATEPOSTCREATEFUNCS') or '').split():
         # All hooks should run in SSTATE_BUILDDIR.
         bb.build.exec_func(f, d, (sstatebuild,))
@@ -764,6 +781,73 @@ python sstate_sign_package () {
                            d.getVar('SSTATE_SIG_PASSPHRASE'), armor=False)
 }
 
+def OEOuthashBasic(path, sigfile, task, d):
+    import hashlib
+    import stat
+
+    def update_hash(s):
+        s = s.encode('utf-8')
+        h.update(s)
+        if sigfile:
+            sigfile.write(s)
+
+    h = hashlib.sha256()
+    prev_dir = os.getcwd()
+
+    try:
+        os.chdir(path)
+
+        update_hash("OEOuthashBasic\n")
+
+        # It is only currently useful to get equivalent hashes for things that
+        # can be restored from sstate. Since the sstate object is named using
+        # SSTATE_PKGSPEC and the task name, those should be included in the
+        # output hash calculation.
+        update_hash("SSTATE_PKGSPEC=%s\n" % d.getVar('SSTATE_PKGSPEC'))
+        update_hash("task=%s\n" % task)
+
+        for root, dirs, files in os.walk('.', topdown=True):
+            # Sort directories and files to ensure consistent ordering
+            dirs.sort()
+            files.sort()
+
+            for f in files:
+                path = os.path.join(root, f)
+                s = os.lstat(path)
+
+                # Hash file path
+                update_hash(path + '\n')
+
+                # Hash file mode
+                update_hash("\tmode=0x%x\n" % stat.S_IMODE(s.st_mode))
+                update_hash("\ttype=0x%x\n" % stat.S_IFMT(s.st_mode))
+
+                if stat.S_ISBLK(s.st_mode) or stat.S_ISBLK(s.st_mode):
+                    # Hash device major and minor
+                    update_hash("\tdev=%d,%d\n" % (os.major(s.st_rdev), os.minor(s.st_rdev)))
+                elif stat.S_ISLNK(s.st_mode):
+                    # Hash symbolic link
+                    update_hash("\tsymlink=%s\n" % os.readlink(path))
+                else:
+                    fh = hashlib.sha256()
+                    # Hash file contents
+                    with open(path, 'rb') as d:
+                        for chunk in iter(lambda: d.read(4096), b""):
+                            fh.update(chunk)
+                    update_hash("\tdigest=%s\n" % fh.hexdigest())
+    finally:
+        os.chdir(prev_dir)
+
+    return h.hexdigest()
+
+python sstate_report_unihash() {
+    report_unihash = getattr(bb.parse.siggen, 'report_unihash', None)
+
+    if report_unihash:
+        ss = sstate_state_fromvars(d)
+        report_unihash(os.getcwd(), ss['task'], d)
+}
+
 #
 # Shell function to decompress and prepare a package for installation
 # Will be run from within SSTATE_INSTDIR.
@@ -810,7 +894,7 @@ def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False, *,
 
         spec, extrapath, tname = getpathcomponents(task, d)
 
-        sstatefile = d.expand("${SSTATE_DIR}/" + extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + extension)
+        sstatefile = d.expand("${SSTATE_DIR}/" + extrapath + generate_sstatefn(spec, sq_unihash[task], d) + "_" + tname + extension)
 
         if os.path.exists(sstatefile):
             bb.debug(2, "SState: Found valid sstate file %s" % sstatefile)
@@ -872,7 +956,7 @@ def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False, *,
             if task in ret:
                 continue
             spec, extrapath, tname = getpathcomponents(task, d)
-            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + extension)
+            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_unihash[task], d) + "_" + tname + extension)
             tasklist.append((task, sstatefile))
 
         if tasklist:
@@ -898,12 +982,12 @@ def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False, *,
         evdata = {'missed': [], 'found': []};
         for task in missed:
             spec, extrapath, tname = getpathcomponents(task, d)
-            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + ".tgz")
-            evdata['missed'].append( (sq_fn[task], sq_task[task], sq_hash[task], sstatefile ) )
+            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_unihash[task], d) + "_" + tname + ".tgz")
+            evdata['missed'].append( (sq_fn[task], sq_task[task], sq_unihash[task], sstatefile ) )
         for task in ret:
             spec, extrapath, tname = getpathcomponents(task, d)
-            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + ".tgz")
-            evdata['found'].append( (sq_fn[task], sq_task[task], sq_hash[task], sstatefile ) )
+            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_unihash[task], d) + "_" + tname + ".tgz")
+            evdata['found'].append( (sq_fn[task], sq_task[task], sq_unihash[task], sstatefile ) )
         bb.event.fire(bb.event.MetadataEvent("MissedSstate", evdata), d)
 
     # Print some summary statistics about the current task completion and how much sstate
diff --git a/meta/conf/bitbake.conf b/meta/conf/bitbake.conf
index 64800623545..e64ce6a6dab 100644
--- a/meta/conf/bitbake.conf
+++ b/meta/conf/bitbake.conf
@@ -867,7 +867,9 @@ BB_HASHBASE_WHITELIST ?= "TMPDIR FILE PATH PWD BB_TASKHASH BBPATH BBSERVER DL_DI
     STAMPS_DIR PRSERV_DUMPDIR PRSERV_DUMPFILE PRSERV_LOCKDOWN PARALLEL_MAKE \
     CCACHE_DIR EXTERNAL_TOOLCHAIN CCACHE CCACHE_NOHASHDIR LICENSE_PATH SDKPKGSUFFIX \
     WARN_QA ERROR_QA WORKDIR STAMPCLEAN PKGDATA_DIR BUILD_ARCH SSTATE_PKGARCH \
-    BB_WORKERCONTEXT BB_LIMITEDDEPS extend_recipe_sysroot DEPLOY_DIR"
+    BB_WORKERCONTEXT BB_LIMITEDDEPS BB_UNIHASH extend_recipe_sysroot DEPLOY_DIR \
+    SSTATE_HASHEQUIV_METHOD SSTATE_HASHEQUIV_SERVER SSTATE_HASHEQUIV_REPORT_TASKDATA \
+    SSTATE_HASHEQUIV_OWNER"
 BB_HASHCONFIG_WHITELIST ?= "${BB_HASHBASE_WHITELIST} DATE TIME SSH_AGENT_PID \
     SSH_AUTH_SOCK PSEUDO_BUILD BB_ENV_EXTRAWHITE DISABLE_SANITY_CHECKS \
     PARALLEL_MAKE BB_NUMBER_THREADS BB_ORIGENV BB_INVALIDCONF BBINCLUDED \
diff --git a/meta/lib/oe/sstatesig.py b/meta/lib/oe/sstatesig.py
index 18c5a353a2a..059e165c7ab 100644
--- a/meta/lib/oe/sstatesig.py
+++ b/meta/lib/oe/sstatesig.py
@@ -263,10 +263,177 @@ class SignatureGeneratorOEBasicHash(bb.siggen.SignatureGeneratorBasicHash):
         if error_msgs:
             bb.fatal("\n".join(error_msgs))
 
+class SignatureGeneratorOEEquivHash(SignatureGeneratorOEBasicHash):
+    name = "OEEquivHash"
+
+    def init_rundepcheck(self, data):
+        super().init_rundepcheck(data)
+        self.server = data.getVar('SSTATE_HASHEQUIV_SERVER')
+        self.method = data.getVar('SSTATE_HASHEQUIV_METHOD')
+        self.unihashes = bb.persist_data.persist('SSTATESIG_UNIHASH_CACHE_v1_' + self.method, data)
+
+    def get_taskdata(self):
+        return (self.server, self.method) + super().get_taskdata()
+
+    def set_taskdata(self, data):
+        self.server, self.method = data[:2]
+        super().set_taskdata(data[2:])
+
+    def __get_task_unihash_key(self, task):
+        # TODO: The key only *needs* to be the taskhash, the task is just
+        # convenient
+        return '%s:%s' % (task, self.taskhash[task])
+
+    def get_stampfile_hash(self, task):
+        if task in self.taskhash:
+            # If a unique hash is reported, use it as the stampfile hash. This
+            # ensures that if a task won't be re-run if the taskhash changes,
+            # but it would result in the same output hash
+            unihash = self.unihashes.get(self.__get_task_unihash_key(task))
+            if unihash is not None:
+                return unihash
+
+        return super().get_stampfile_hash(task)
+
+    def get_unihash(self, task):
+        import urllib
+        import json
+
+        taskhash = self.taskhash[task]
+
+        key = self.__get_task_unihash_key(task)
+
+        # TODO: This cache can grow unbounded. It probably only needs to keep
+        # for each task
+        unihash = self.unihashes.get(key)
+        if unihash is not None:
+            return unihash
+
+        # In the absence of being able to discover a unique hash from the
+        # server, make it be equivalent to the taskhash. The unique "hash" only
+        # really needs to be a unique string (not even necessarily a hash), but
+        # making it match the taskhash has a few advantages:
+        #
+        # 1) All of the sstate code that assumes hashes can be the same
+        # 2) It provides maximal compatibility with builders that don't use
+        #    an equivalency server
+        # 3) The value is easy for multiple independent builders to derive the
+        #    same unique hash from the same input. This means that if the
+        #    independent builders find the same taskhash, but it isn't reported
+        #    to the server, there is a better chance that they will agree on
+        #    the unique hash.
+        unihash = taskhash
+
+        try:
+            url = '%s/v1/equivalent?%s' % (self.server,
+                    urllib.parse.urlencode({'method': self.method, 'taskhash': self.taskhash[task]}))
+
+            request = urllib.request.Request(url)
+            response = urllib.request.urlopen(request)
+            data = response.read().decode('utf-8')
+
+            json_data = json.loads(data)
+
+            if json_data:
+                unihash = json_data['unihash']
+                # A unique hash equal to the taskhash is not very interesting,
+                # so it is reported it at debug level 2. If they differ, that
+                # is much more interesting, so it is reported at debug level 1
+                bb.debug((1, 2)[unihash == taskhash], 'Found unihash %s in place of %s for %s from %s' % (unihash, taskhash, task, self.server))
+            else:
+                bb.debug(2, 'No reported unihash for %s:%s from %s' % (task, taskhash, self.server))
+        except urllib.error.URLError as e:
+            bb.warn('Failure contacting Hash Equivalence Server %s: %s' % (self.server, str(e)))
+        except (KeyError, json.JSONDecodeError) as e:
+            bb.warn('Poorly formatted response from %s: %s' % (self.server, str(e)))
+
+        self.unihashes[key] = unihash
+        return unihash
+
+    def report_unihash(self, path, task, d):
+        import urllib
+        import json
+        import tempfile
+        import base64
+
+        taskhash = d.getVar('BB_TASKHASH')
+        unihash = d.getVar('BB_UNIHASH')
+        report_taskdata = d.getVar('SSTATE_HASHEQUIV_REPORT_TASKDATA') == '1'
+        tempdir = d.getVar('T')
+        fn = d.getVar('BB_FILENAME')
+        key = fn + '.do_' + task + ':' + taskhash
+
+        # Sanity checks
+        cache_unihash = self.unihashes.get(key)
+        if cache_unihash is None:
+            bb.fatal('%s not in unihash cache. Please report this error' % key)
+
+        if cache_unihash != unihash:
+            bb.fatal("Cache unihash %s doesn't match BB_UNIHASH %s" % (cache_unihash, unihash))
+
+        sigfile = None
+        sigfile_name = "depsig.do_%s.%d" % (task, os.getpid())
+        sigfile_link = "depsig.do_%s" % task
+
+        try:
+            call = self.method + '(path, sigfile, task, d)'
+            sigfile = open(os.path.join(tempdir, sigfile_name), 'w+b')
+            locs = {'path': path, 'sigfile': sigfile, 'task': task, 'd': d}
+
+            outhash = bb.utils.better_eval(call, locs)
+
+            try:
+                url = '%s/v1/equivalent' % self.server
+                task_data = {
+                    'taskhash': taskhash,
+                    'method': self.method,
+                    'outhash': outhash,
+                    'unihash': unihash,
+                    'owner': d.getVar('SSTATE_HASHEQUIV_OWNER')
+                    }
+
+                if report_taskdata:
+                    sigfile.seek(0)
+
+                    task_data['PN'] = d.getVar('PN')
+                    task_data['PV'] = d.getVar('PV')
+                    task_data['PR'] = d.getVar('PR')
+                    task_data['task'] = task
+                    task_data['outhash_siginfo'] = sigfile.read().decode('utf-8')
+
+                headers = {'content-type': 'application/json'}
+
+                request = urllib.request.Request(url, json.dumps(task_data).encode('utf-8'), headers)
+                response = urllib.request.urlopen(request)
+                data = response.read().decode('utf-8')
+
+                json_data = json.loads(data)
+                new_unihash = json_data['unihash']
+
+                if new_unihash != unihash:
+                    bb.debug(1, 'Task %s unihash changed %s -> %s by server %s' % (taskhash, unihash, new_unihash, self.server))
+                else:
+                    bb.debug(1, 'Reported task %s as unihash %s to %s' % (taskhash, unihash, self.server))
+            except urllib.error.URLError as e:
+                bb.warn('Failure contacting Hash Equivalence Server %s: %s' % (self.server, str(e)))
+            except (KeyError, json.JSONDecodeError) as e:
+                bb.warn('Poorly formatted response from %s: %s' % (self.server, str(e)))
+        finally:
+            if sigfile:
+                sigfile.close()
+
+                sigfile_link_path = os.path.join(tempdir, sigfile_link)
+                bb.utils.remove(sigfile_link_path)
+
+                try:
+                    os.symlink(sigfile_name, sigfile_link_path)
+                except OSError:
+                    pass
 
 # Insert these classes into siggen's namespace so it can see and select them
 bb.siggen.SignatureGeneratorOEBasic = SignatureGeneratorOEBasic
 bb.siggen.SignatureGeneratorOEBasicHash = SignatureGeneratorOEBasicHash
+bb.siggen.SignatureGeneratorOEEquivHash = SignatureGeneratorOEEquivHash
 
 
 def find_siginfo(pn, taskname, taskhashlist, d):
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* ✗ patchtest: failure for Hash Equivalency Server (rev4)
  2018-12-18 15:30       ` [PATCH " Joshua Watt
                         ` (13 preceding siblings ...)
  (?)
@ 2019-01-04  3:03       ` Patchwork
  -1 siblings, 0 replies; 158+ messages in thread
From: Patchwork @ 2019-01-04  3:03 UTC (permalink / raw)
  To: Joshua Watt; +Cc: openembedded-core

== Series Details ==

Series: Hash Equivalency Server (rev4)
Revision: 4
URL   : https://patchwork.openembedded.org/series/15190/
State : failure

== Summary ==


Thank you for submitting this patch series to OpenEmbedded Core. This is
an automated response. Several tests have been executed on the proposed
series by patchtest resulting in the following failures:



* Issue             Series sent to the wrong mailing list or some patches from the series correspond to different mailing lists [test_target_mailing_list] 
  Suggested fix    Send the series again to the correct mailing list (ML)
  Suggested ML     bitbake-devel@lists.openembedded.org [http://git.openembedded.org/bitbake/]
  Patch's path:    bitbake/bin/bitbake-hashserv

* Issue             Series does not apply on top of target branch [test_series_merge_on_head] 
  Suggested fix    Rebase your series on top of targeted branch
  Targeted branch  master (currently at 65c419b8c4)



If you believe any of these test results are incorrect, please reply to the
mailing list (openembedded-core@lists.openembedded.org) raising your concerns.
Otherwise we would appreciate you correcting the issues and submitting a new
version of the patchset if applicable. Please ensure you add/increment the
version number when sending the new version (i.e. [PATCH] -> [PATCH v2] ->
[PATCH v3] -> ...).

---
Guidelines:     https://www.openembedded.org/wiki/Commit_Patch_Message_Guidelines
Test framework: http://git.yoctoproject.org/cgit/cgit.cgi/patchtest
Test suite:     http://git.yoctoproject.org/cgit/cgit.cgi/patchtest-oe



^ permalink raw reply	[flat|nested] 158+ messages in thread

* Re: [OE-core][PATCH v6 1/3] classes/sstate: Handle unihash in hash check
  2019-01-04  2:42           ` [PATCH " Joshua Watt
@ 2019-01-04  7:01             ` Richard Purdie
  -1 siblings, 0 replies; 158+ messages in thread
From: Richard Purdie @ 2019-01-04  7:01 UTC (permalink / raw)
  To: Joshua Watt, openembedded-core, bitbake-devel

On Thu, 2019-01-03 at 20:42 -0600, Joshua Watt wrote:
> Handles the argument that passes task unique hash in the hash check
> function, as it is now required by bitbake
> 
> [YOCTO #13030]
> 
> Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
> ---
>  meta/classes/sstate.bbclass | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/meta/classes/sstate.bbclass b/meta/classes/sstate.bbclass
> index 0abebce6996..8f3cd083e85 100644
> --- a/meta/classes/sstate.bbclass
> +++ b/meta/classes/sstate.bbclass
> @@ -780,7 +780,7 @@ sstate_unpack_package () {
>  
>  BB_HASHCHECK_FUNCTION = "sstate_checkhashes"
>  
> -def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False):
> +def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False, *, sq_unihash):

Sorry for not replying sooner, I've just remembered the question I had
here. Does this "degrade" safely where a old bitbake is used with a new
version of OE-Core with this change?

In the past we'd have done sq_unihash=None or similar and handled that
correctly in the code to ensure it did degrade safely...

Cheers,

Richard





^ permalink raw reply	[flat|nested] 158+ messages in thread

* Re: [bitbake-devel] [PATCH v6 1/3] classes/sstate: Handle unihash in hash check
@ 2019-01-04  7:01             ` Richard Purdie
  0 siblings, 0 replies; 158+ messages in thread
From: Richard Purdie @ 2019-01-04  7:01 UTC (permalink / raw)
  To: Joshua Watt, openembedded-core, bitbake-devel

On Thu, 2019-01-03 at 20:42 -0600, Joshua Watt wrote:
> Handles the argument that passes task unique hash in the hash check
> function, as it is now required by bitbake
> 
> [YOCTO #13030]
> 
> Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
> ---
>  meta/classes/sstate.bbclass | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/meta/classes/sstate.bbclass b/meta/classes/sstate.bbclass
> index 0abebce6996..8f3cd083e85 100644
> --- a/meta/classes/sstate.bbclass
> +++ b/meta/classes/sstate.bbclass
> @@ -780,7 +780,7 @@ sstate_unpack_package () {
>  
>  BB_HASHCHECK_FUNCTION = "sstate_checkhashes"
>  
> -def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False):
> +def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False, *, sq_unihash):

Sorry for not replying sooner, I've just remembered the question I had
here. Does this "degrade" safely where a old bitbake is used with a new
version of OE-Core with this change?

In the past we'd have done sq_unihash=None or similar and handled that
correctly in the code to ensure it did degrade safely...

Cheers,

Richard





^ permalink raw reply	[flat|nested] 158+ messages in thread

* [OE-core][PATCH v7 0/3] Hash Equivalency Server
  2019-01-04  2:42         ` [PATCH " Joshua Watt
@ 2019-01-04 16:20           ` Joshua Watt
  -1 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2019-01-04 16:20 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

Apologies for cross-posting this to both the bitbake-devel and
openembedded-devel; this work necessarily intertwines both places, and
it is really necessary to look at both parts to get an idea of what is
going on. For convenience, the bitbake patches are listed first,
followed by the oe-core patches.

The basic premise is that any given task no longer hashes a dependent
task's taskhash to determine it's own taskhash, but instead hashes the
dependent task's "unique hash" (which doesn't strictly need to be a
hash, but is for consistency.  This allows multiple taskhashes to map to
the same unique hash, meaning that trivial changes to a recipe that
would change the taskhash don't necessarily need to change the unique
hash, and thus don't need to cause downstream tasks to be rebuilt (with
caveats, see below).

In the absence of any interaction by the user, the unique hash for a
task is just that task's taskhash, which effectively maintains the
current behavior. However, if the user enables the "OEEquivHash"
signature generator, they can direct it to look at a hash equivalency
server (of which a reference implementation is provided). The sstate
code will provide the server with an output hash that it calculates, and
the server will record all tasks with the same output hash as
"equivalent" and report the same unique hash for them when requested.
When initializing tasks, bitbake can ask the server about the unique
hash for new tasks it has never seen before and potentially skip
rebuilding, or restore the task from an equivalent sstate file. To
facilitate restoring tasks from sstate, sstate objects are now named
based on the tasks unique hash instead of the taskhash (which, again has
no effect if the server is in use).

This patchset doesn't make any attempt to dynamically update task unique
hash after bitbake initializes the tasks, and as such there are some
cases where this isn't accelerating the build as much as it possibly
could. I think it will be possible to add support for this, but this
preliminary support needs to come first.

You can also see these patches (and my first attempts at dynamic task
re-hashing) on the "jpew/hash-equivalence" branch in poky-contrib.

As always, thanks for your feedback and time

VERSION 2:

At the core, this patch does the same thing as V1 with some very minor
tweaks. The main things that have changed are:
 1) Per request, the Hash Equivalence Server reference implementation is
    now based entirely on built in Python modules and requires no
    external libraries. It also has a wrapper script to launch it
    (bitbake-hashserv) and unittests.
 2) There is a major rework of persist_data in bitbake. I
    think these patches could be submitted independently, but I doubt
    anyone is clamoring for them. The general gist of them is that there
    were a lot of strange edge cases that I found when using
    persist_data as an IPC mechanism between the main bitbake process
    and the bitbake-worker processes. I went ahead and added extensive
    unit tests for this as well.

VERSION 3:

Minor tweak to version 2 that should fix timeout errors seen on the
autobuilder

VERSION 4:

Based on discussion, the term "dependency ID" was dropped in favor of
"unique hash" (unihash).

The hash validation checks were updated to properly fallback to the old
function signatures (that don't pass the unihashes) for compatibility
with older implementations.

VERSION 5:

Removed os.fork() handlers for persist_data. They can be added back if
actually necessary.

Reworked hash validation slightly based on feedback.

VERSION 6:

Fixed a bug that was introduced with the rename to unihash that prevent
unihashes from being recorded in persist_data.

VERSION 7:

Updated sstate hash check function so that it is backward compatible
with older versions of bitbake (that don't pass unihashes)

Joshua Watt (3):
  classes/sstate: Handle unihash in hash check
  bitbake: hashserv: Add hash equivalence reference server
  sstate: Implement hash equivalence sstate

 bitbake/bin/bitbake-hashserv     |  67 +++++++++++++
 bitbake/bin/bitbake-selftest     |   2 +
 bitbake/lib/hashserv/__init__.py | 152 ++++++++++++++++++++++++++++
 bitbake/lib/hashserv/tests.py    | 141 ++++++++++++++++++++++++++
 meta/classes/sstate.bbclass      | 107 ++++++++++++++++++--
 meta/conf/bitbake.conf           |   4 +-
 meta/lib/oe/sstatesig.py         | 167 +++++++++++++++++++++++++++++++
 7 files changed, 630 insertions(+), 10 deletions(-)
 create mode 100755 bitbake/bin/bitbake-hashserv
 create mode 100644 bitbake/lib/hashserv/__init__.py
 create mode 100644 bitbake/lib/hashserv/tests.py

-- 
2.20.1



^ permalink raw reply	[flat|nested] 158+ messages in thread

* [PATCH v7 0/3] Hash Equivalency Server
@ 2019-01-04 16:20           ` Joshua Watt
  0 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2019-01-04 16:20 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

Apologies for cross-posting this to both the bitbake-devel and
openembedded-devel; this work necessarily intertwines both places, and
it is really necessary to look at both parts to get an idea of what is
going on. For convenience, the bitbake patches are listed first,
followed by the oe-core patches.

The basic premise is that any given task no longer hashes a dependent
task's taskhash to determine it's own taskhash, but instead hashes the
dependent task's "unique hash" (which doesn't strictly need to be a
hash, but is for consistency.  This allows multiple taskhashes to map to
the same unique hash, meaning that trivial changes to a recipe that
would change the taskhash don't necessarily need to change the unique
hash, and thus don't need to cause downstream tasks to be rebuilt (with
caveats, see below).

In the absence of any interaction by the user, the unique hash for a
task is just that task's taskhash, which effectively maintains the
current behavior. However, if the user enables the "OEEquivHash"
signature generator, they can direct it to look at a hash equivalency
server (of which a reference implementation is provided). The sstate
code will provide the server with an output hash that it calculates, and
the server will record all tasks with the same output hash as
"equivalent" and report the same unique hash for them when requested.
When initializing tasks, bitbake can ask the server about the unique
hash for new tasks it has never seen before and potentially skip
rebuilding, or restore the task from an equivalent sstate file. To
facilitate restoring tasks from sstate, sstate objects are now named
based on the tasks unique hash instead of the taskhash (which, again has
no effect if the server is in use).

This patchset doesn't make any attempt to dynamically update task unique
hash after bitbake initializes the tasks, and as such there are some
cases where this isn't accelerating the build as much as it possibly
could. I think it will be possible to add support for this, but this
preliminary support needs to come first.

You can also see these patches (and my first attempts at dynamic task
re-hashing) on the "jpew/hash-equivalence" branch in poky-contrib.

As always, thanks for your feedback and time

VERSION 2:

At the core, this patch does the same thing as V1 with some very minor
tweaks. The main things that have changed are:
 1) Per request, the Hash Equivalence Server reference implementation is
    now based entirely on built in Python modules and requires no
    external libraries. It also has a wrapper script to launch it
    (bitbake-hashserv) and unittests.
 2) There is a major rework of persist_data in bitbake. I
    think these patches could be submitted independently, but I doubt
    anyone is clamoring for them. The general gist of them is that there
    were a lot of strange edge cases that I found when using
    persist_data as an IPC mechanism between the main bitbake process
    and the bitbake-worker processes. I went ahead and added extensive
    unit tests for this as well.

VERSION 3:

Minor tweak to version 2 that should fix timeout errors seen on the
autobuilder

VERSION 4:

Based on discussion, the term "dependency ID" was dropped in favor of
"unique hash" (unihash).

The hash validation checks were updated to properly fallback to the old
function signatures (that don't pass the unihashes) for compatibility
with older implementations.

VERSION 5:

Removed os.fork() handlers for persist_data. They can be added back if
actually necessary.

Reworked hash validation slightly based on feedback.

VERSION 6:

Fixed a bug that was introduced with the rename to unihash that prevent
unihashes from being recorded in persist_data.

VERSION 7:

Updated sstate hash check function so that it is backward compatible
with older versions of bitbake (that don't pass unihashes)

Joshua Watt (3):
  classes/sstate: Handle unihash in hash check
  bitbake: hashserv: Add hash equivalence reference server
  sstate: Implement hash equivalence sstate

 bitbake/bin/bitbake-hashserv     |  67 +++++++++++++
 bitbake/bin/bitbake-selftest     |   2 +
 bitbake/lib/hashserv/__init__.py | 152 ++++++++++++++++++++++++++++
 bitbake/lib/hashserv/tests.py    | 141 ++++++++++++++++++++++++++
 meta/classes/sstate.bbclass      | 107 ++++++++++++++++++--
 meta/conf/bitbake.conf           |   4 +-
 meta/lib/oe/sstatesig.py         | 167 +++++++++++++++++++++++++++++++
 7 files changed, 630 insertions(+), 10 deletions(-)
 create mode 100755 bitbake/bin/bitbake-hashserv
 create mode 100644 bitbake/lib/hashserv/__init__.py
 create mode 100644 bitbake/lib/hashserv/tests.py

-- 
2.20.1



^ permalink raw reply	[flat|nested] 158+ messages in thread

* [OE-core][PATCH v7 1/3] classes/sstate: Handle unihash in hash check
  2019-01-04 16:20           ` [PATCH " Joshua Watt
@ 2019-01-04 16:20             ` Joshua Watt
  -1 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2019-01-04 16:20 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

Handles the argument that passes task unique hash in the hash check
function, as it is now required by bitbake

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 meta/classes/sstate.bbclass | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/meta/classes/sstate.bbclass b/meta/classes/sstate.bbclass
index 0abebce6996..59ebc3ab5cc 100644
--- a/meta/classes/sstate.bbclass
+++ b/meta/classes/sstate.bbclass
@@ -780,7 +780,7 @@ sstate_unpack_package () {
 
 BB_HASHCHECK_FUNCTION = "sstate_checkhashes"
 
-def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False):
+def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False, *, sq_unihash=None):
 
     ret = []
     missed = []
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [PATCH v7 1/3] classes/sstate: Handle unihash in hash check
@ 2019-01-04 16:20             ` Joshua Watt
  0 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2019-01-04 16:20 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

Handles the argument that passes task unique hash in the hash check
function, as it is now required by bitbake

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 meta/classes/sstate.bbclass | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/meta/classes/sstate.bbclass b/meta/classes/sstate.bbclass
index 0abebce6996..59ebc3ab5cc 100644
--- a/meta/classes/sstate.bbclass
+++ b/meta/classes/sstate.bbclass
@@ -780,7 +780,7 @@ sstate_unpack_package () {
 
 BB_HASHCHECK_FUNCTION = "sstate_checkhashes"
 
-def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False):
+def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False, *, sq_unihash=None):
 
     ret = []
     missed = []
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [OE-core][PATCH v7 2/3] bitbake: hashserv: Add hash equivalence reference server
  2019-01-04 16:20           ` [PATCH " Joshua Watt
@ 2019-01-04 16:20             ` Joshua Watt
  -1 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2019-01-04 16:20 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

Implements a reference implementation of the hash equivalence server.
This server has minimal dependencies (and no dependencies outside of the
standard Python library), and implements the minimum required to be a
conforming hash equivalence server.

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/bin/bitbake-hashserv     |  67 ++++++++++++++
 bitbake/bin/bitbake-selftest     |   2 +
 bitbake/lib/hashserv/__init__.py | 152 +++++++++++++++++++++++++++++++
 bitbake/lib/hashserv/tests.py    | 141 ++++++++++++++++++++++++++++
 4 files changed, 362 insertions(+)
 create mode 100755 bitbake/bin/bitbake-hashserv
 create mode 100644 bitbake/lib/hashserv/__init__.py
 create mode 100644 bitbake/lib/hashserv/tests.py

diff --git a/bitbake/bin/bitbake-hashserv b/bitbake/bin/bitbake-hashserv
new file mode 100755
index 00000000000..c49397b73a5
--- /dev/null
+++ b/bitbake/bin/bitbake-hashserv
@@ -0,0 +1,67 @@
+#! /usr/bin/env python3
+#
+# Copyright (C) 2018 Garmin Ltd.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License version 2 as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License along
+# with this program; if not, write to the Free Software Foundation, Inc.,
+# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+import os
+import sys
+import logging
+import argparse
+import sqlite3
+
+sys.path.insert(0, os.path.join(os.path.dirname(os.path.dirname(__file__)),'lib'))
+
+import hashserv
+
+VERSION = "1.0.0"
+
+DEFAULT_HOST = ''
+DEFAULT_PORT = 8686
+
+def main():
+    parser = argparse.ArgumentParser(description='HTTP Equivalence Reference Server. Version=%s' % VERSION)
+    parser.add_argument('--address', default=DEFAULT_HOST, help='Bind address (default "%(default)s")')
+    parser.add_argument('--port', type=int, default=DEFAULT_PORT, help='Bind port (default %(default)d)')
+    parser.add_argument('--prefix', default='', help='HTTP path prefix (default "%(default)s")')
+    parser.add_argument('--database', default='./hashserv.db', help='Database file (default "%(default)s")')
+    parser.add_argument('--log', default='WARNING', help='Set logging level')
+
+    args = parser.parse_args()
+
+    logger = logging.getLogger('hashserv')
+
+    level = getattr(logging, args.log.upper(), None)
+    if not isinstance(level, int):
+        raise ValueError('Invalid log level: %s' % args.log)
+
+    logger.setLevel(level)
+    console = logging.StreamHandler()
+    console.setLevel(level)
+    logger.addHandler(console)
+
+    db = sqlite3.connect(args.database)
+
+    server = hashserv.create_server((args.address, args.port), db, args.prefix)
+    server.serve_forever()
+    return 0
+
+if __name__ == '__main__':
+    try:
+        ret = main()
+    except Exception:
+        ret = 1
+        import traceback
+        traceback.print_exc()
+    sys.exit(ret)
+
diff --git a/bitbake/bin/bitbake-selftest b/bitbake/bin/bitbake-selftest
index c970dcae90c..99f1af910f4 100755
--- a/bitbake/bin/bitbake-selftest
+++ b/bitbake/bin/bitbake-selftest
@@ -22,6 +22,7 @@ sys.path.insert(0, os.path.join(os.path.dirname(os.path.dirname(__file__)), 'lib
 import unittest
 try:
     import bb
+    import hashserv
     import layerindexlib
 except RuntimeError as exc:
     sys.exit(str(exc))
@@ -35,6 +36,7 @@ tests = ["bb.tests.codeparser",
          "bb.tests.parse",
          "bb.tests.persist_data",
          "bb.tests.utils",
+         "hashserv.tests",
          "layerindexlib.tests.layerindexobj",
          "layerindexlib.tests.restapi",
          "layerindexlib.tests.cooker"]
diff --git a/bitbake/lib/hashserv/__init__.py b/bitbake/lib/hashserv/__init__.py
new file mode 100644
index 00000000000..46bca7cab32
--- /dev/null
+++ b/bitbake/lib/hashserv/__init__.py
@@ -0,0 +1,152 @@
+# Copyright (C) 2018 Garmin Ltd.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License version 2 as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License along
+# with this program; if not, write to the Free Software Foundation, Inc.,
+# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+
+from http.server import BaseHTTPRequestHandler, HTTPServer
+import contextlib
+import urllib.parse
+import sqlite3
+import json
+import traceback
+import logging
+from datetime import datetime
+
+logger = logging.getLogger('hashserv')
+
+class HashEquivalenceServer(BaseHTTPRequestHandler):
+    def log_message(self, f, *args):
+        logger.debug(f, *args)
+
+    def do_GET(self):
+        try:
+            p = urllib.parse.urlparse(self.path)
+
+            if p.path != self.prefix + '/v1/equivalent':
+                self.send_error(404)
+                return
+
+            query = urllib.parse.parse_qs(p.query, strict_parsing=True)
+            method = query['method'][0]
+            taskhash = query['taskhash'][0]
+
+            d = None
+            with contextlib.closing(self.db.cursor()) as cursor:
+                cursor.execute('SELECT taskhash, method, unihash FROM tasks_v1 WHERE method=:method AND taskhash=:taskhash ORDER BY created ASC LIMIT 1',
+                        {'method': method, 'taskhash': taskhash})
+
+                row = cursor.fetchone()
+
+                if row is not None:
+                    logger.debug('Found equivalent task %s', row['taskhash'])
+                    d = {k: row[k] for k in ('taskhash', 'method', 'unihash')}
+
+            self.send_response(200)
+            self.send_header('Content-Type', 'application/json; charset=utf-8')
+            self.end_headers()
+            self.wfile.write(json.dumps(d).encode('utf-8'))
+        except:
+            logger.exception('Error in GET')
+            self.send_error(400, explain=traceback.format_exc())
+            return
+
+    def do_POST(self):
+        try:
+            p = urllib.parse.urlparse(self.path)
+
+            if p.path != self.prefix + '/v1/equivalent':
+                self.send_error(404)
+                return
+
+            length = int(self.headers['content-length'])
+            data = json.loads(self.rfile.read(length).decode('utf-8'))
+
+            with contextlib.closing(self.db.cursor()) as cursor:
+                cursor.execute('''
+                    SELECT taskhash, method, unihash FROM tasks_v1 WHERE method=:method AND outhash=:outhash
+                    ORDER BY CASE WHEN taskhash=:taskhash THEN 1 ELSE 2 END,
+                        created ASC
+                    LIMIT 1
+                    ''', {k: data[k] for k in ('method', 'outhash', 'taskhash')})
+
+                row = cursor.fetchone()
+
+                if row is None or row['taskhash'] != data['taskhash']:
+                    unihash = data['unihash']
+                    if row is not None:
+                        unihash = row['unihash']
+
+                    insert_data = {
+                            'method': data['method'],
+                            'outhash': data['outhash'],
+                            'taskhash': data['taskhash'],
+                            'unihash': unihash,
+                            'created': datetime.now()
+                            }
+
+                    for k in ('owner', 'PN', 'PV', 'PR', 'task', 'outhash_siginfo'):
+                        if k in data:
+                            insert_data[k] = data[k]
+
+                    cursor.execute('''INSERT INTO tasks_v1 (%s) VALUES (%s)''' % (
+                            ', '.join(sorted(insert_data.keys())),
+                            ', '.join(':' + k for k in sorted(insert_data.keys()))),
+                        insert_data)
+
+                    logger.info('Adding taskhash %s with unihash %s', data['taskhash'], unihash)
+                    cursor.execute('SELECT taskhash, method, unihash FROM tasks_v1 WHERE id=:id', {'id': cursor.lastrowid})
+                    row = cursor.fetchone()
+
+                    self.db.commit()
+
+                d = {k: row[k] for k in ('taskhash', 'method', 'unihash')}
+
+                self.send_response(200)
+                self.send_header('Content-Type', 'application/json; charset=utf-8')
+                self.end_headers()
+                self.wfile.write(json.dumps(d).encode('utf-8'))
+        except:
+            logger.exception('Error in POST')
+            self.send_error(400, explain=traceback.format_exc())
+            return
+
+def create_server(addr, db, prefix=''):
+    class Handler(HashEquivalenceServer):
+        pass
+
+    Handler.prefix = prefix
+    Handler.db = db
+    db.row_factory = sqlite3.Row
+
+    with contextlib.closing(db.cursor()) as cursor:
+        cursor.execute('''
+            CREATE TABLE IF NOT EXISTS tasks_v1 (
+                id INTEGER PRIMARY KEY AUTOINCREMENT,
+                method TEXT NOT NULL,
+                outhash TEXT NOT NULL,
+                taskhash TEXT NOT NULL,
+                unihash TEXT NOT NULL,
+                created DATETIME,
+
+                -- Optional fields
+                owner TEXT,
+                PN TEXT,
+                PV TEXT,
+                PR TEXT,
+                task TEXT,
+                outhash_siginfo TEXT
+                )
+            ''')
+
+    logger.info('Starting server on %s', addr)
+    return HTTPServer(addr, Handler)
diff --git a/bitbake/lib/hashserv/tests.py b/bitbake/lib/hashserv/tests.py
new file mode 100644
index 00000000000..806b54c5ebd
--- /dev/null
+++ b/bitbake/lib/hashserv/tests.py
@@ -0,0 +1,141 @@
+#! /usr/bin/env python3
+#
+# Copyright (C) 2018 Garmin Ltd.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License version 2 as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License along
+# with this program; if not, write to the Free Software Foundation, Inc.,
+# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+
+import unittest
+import threading
+import sqlite3
+import hashlib
+import urllib.request
+import json
+from . import create_server
+
+class TestHashEquivalenceServer(unittest.TestCase):
+    def setUp(self):
+        # Start an in memory hash equivalence server in the background bound to
+        # an ephemeral port
+        db = sqlite3.connect(':memory:', check_same_thread=False)
+        self.server = create_server(('localhost', 0), db)
+        self.server_addr = 'http://localhost:%d' % self.server.socket.getsockname()[1]
+        self.server_thread = threading.Thread(target=self.server.serve_forever)
+        self.server_thread.start()
+
+    def tearDown(self):
+        # Shutdown server
+        s = getattr(self, 'server', None)
+        if s is not None:
+            self.server.shutdown()
+            self.server_thread.join()
+            self.server.server_close()
+
+    def send_get(self, path):
+        url = '%s/%s' % (self.server_addr, path)
+        request = urllib.request.Request(url)
+        response = urllib.request.urlopen(request)
+        return json.loads(response.read().decode('utf-8'))
+
+    def send_post(self, path, data):
+        headers = {'content-type': 'application/json'}
+        url = '%s/%s' % (self.server_addr, path)
+        request = urllib.request.Request(url, json.dumps(data).encode('utf-8'), headers)
+        response = urllib.request.urlopen(request)
+        return json.loads(response.read().decode('utf-8'))
+
+    def test_create_hash(self):
+        # Simple test that hashes can be created
+        taskhash = '35788efcb8dfb0a02659d81cf2bfd695fb30faf9'
+        outhash = '2765d4a5884be49b28601445c2760c5f21e7e5c0ee2b7e3fce98fd7e5970796f'
+        unihash = 'f46d3fbb439bd9b921095da657a4de906510d2cd'
+
+        d = self.send_get('v1/equivalent?method=TestMethod&taskhash=%s' % taskhash)
+        self.assertIsNone(d, msg='Found unexpected task, %r' % d)
+
+        d = self.send_post('v1/equivalent', {
+            'taskhash': taskhash,
+            'method': 'TestMethod',
+            'outhash': outhash,
+            'unihash': unihash,
+            })
+        self.assertEqual(d['unihash'], unihash, 'Server returned bad unihash')
+
+    def test_create_equivalent(self):
+        # Tests that a second reported task with the same outhash will be
+        # assigned the same unihash
+        taskhash = '53b8dce672cb6d0c73170be43f540460bfc347b4'
+        outhash = '5a9cb1649625f0bf41fc7791b635cd9c2d7118c7f021ba87dcd03f72b67ce7a8'
+        unihash = 'f37918cc02eb5a520b1aff86faacbc0a38124646'
+        d = self.send_post('v1/equivalent', {
+            'taskhash': taskhash,
+            'method': 'TestMethod',
+            'outhash': outhash,
+            'unihash': unihash,
+            })
+        self.assertEqual(d['unihash'], unihash, 'Server returned bad unihash')
+
+        # Report a different task with the same outhash. The returned unihash
+        # should match the first task
+        taskhash2 = '3bf6f1e89d26205aec90da04854fbdbf73afe6b4'
+        unihash2 = 'af36b199320e611fbb16f1f277d3ee1d619ca58b'
+        d = self.send_post('v1/equivalent', {
+            'taskhash': taskhash2,
+            'method': 'TestMethod',
+            'outhash': outhash,
+            'unihash': unihash2,
+            })
+        self.assertEqual(d['unihash'], unihash, 'Server returned bad unihash')
+
+    def test_duplicate_taskhash(self):
+        # Tests that duplicate reports of the same taskhash with different
+        # outhash & unihash always return the unihash from the first reported
+        # taskhash
+        taskhash = '8aa96fcffb5831b3c2c0cb75f0431e3f8b20554a'
+        outhash = 'afe240a439959ce86f5e322f8c208e1fedefea9e813f2140c81af866cc9edf7e'
+        unihash = '218e57509998197d570e2c98512d0105985dffc9'
+        d = self.send_post('v1/equivalent', {
+            'taskhash': taskhash,
+            'method': 'TestMethod',
+            'outhash': outhash,
+            'unihash': unihash,
+            })
+
+        d = self.send_get('v1/equivalent?method=TestMethod&taskhash=%s' % taskhash)
+        self.assertEqual(d['unihash'], unihash)
+
+        outhash2 = '0904a7fe3dc712d9fd8a74a616ddca2a825a8ee97adf0bd3fc86082c7639914d'
+        unihash2 = 'ae9a7d252735f0dafcdb10e2e02561ca3a47314c'
+        d = self.send_post('v1/equivalent', {
+            'taskhash': taskhash,
+            'method': 'TestMethod',
+            'outhash': outhash2,
+            'unihash': unihash2
+            })
+
+        d = self.send_get('v1/equivalent?method=TestMethod&taskhash=%s' % taskhash)
+        self.assertEqual(d['unihash'], unihash)
+
+        outhash3 = '77623a549b5b1a31e3732dfa8fe61d7ce5d44b3370f253c5360e136b852967b4'
+        unihash3 = '9217a7d6398518e5dc002ed58f2cbbbc78696603'
+        d = self.send_post('v1/equivalent', {
+            'taskhash': taskhash,
+            'method': 'TestMethod',
+            'outhash': outhash3,
+            'unihash': unihash3
+            })
+
+        d = self.send_get('v1/equivalent?method=TestMethod&taskhash=%s' % taskhash)
+        self.assertEqual(d['unihash'], unihash)
+
+
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [PATCH v7 2/3] bitbake: hashserv: Add hash equivalence reference server
@ 2019-01-04 16:20             ` Joshua Watt
  0 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2019-01-04 16:20 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

Implements a reference implementation of the hash equivalence server.
This server has minimal dependencies (and no dependencies outside of the
standard Python library), and implements the minimum required to be a
conforming hash equivalence server.

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 bitbake/bin/bitbake-hashserv     |  67 ++++++++++++++
 bitbake/bin/bitbake-selftest     |   2 +
 bitbake/lib/hashserv/__init__.py | 152 +++++++++++++++++++++++++++++++
 bitbake/lib/hashserv/tests.py    | 141 ++++++++++++++++++++++++++++
 4 files changed, 362 insertions(+)
 create mode 100755 bitbake/bin/bitbake-hashserv
 create mode 100644 bitbake/lib/hashserv/__init__.py
 create mode 100644 bitbake/lib/hashserv/tests.py

diff --git a/bitbake/bin/bitbake-hashserv b/bitbake/bin/bitbake-hashserv
new file mode 100755
index 00000000000..c49397b73a5
--- /dev/null
+++ b/bitbake/bin/bitbake-hashserv
@@ -0,0 +1,67 @@
+#! /usr/bin/env python3
+#
+# Copyright (C) 2018 Garmin Ltd.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License version 2 as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License along
+# with this program; if not, write to the Free Software Foundation, Inc.,
+# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+import os
+import sys
+import logging
+import argparse
+import sqlite3
+
+sys.path.insert(0, os.path.join(os.path.dirname(os.path.dirname(__file__)),'lib'))
+
+import hashserv
+
+VERSION = "1.0.0"
+
+DEFAULT_HOST = ''
+DEFAULT_PORT = 8686
+
+def main():
+    parser = argparse.ArgumentParser(description='HTTP Equivalence Reference Server. Version=%s' % VERSION)
+    parser.add_argument('--address', default=DEFAULT_HOST, help='Bind address (default "%(default)s")')
+    parser.add_argument('--port', type=int, default=DEFAULT_PORT, help='Bind port (default %(default)d)')
+    parser.add_argument('--prefix', default='', help='HTTP path prefix (default "%(default)s")')
+    parser.add_argument('--database', default='./hashserv.db', help='Database file (default "%(default)s")')
+    parser.add_argument('--log', default='WARNING', help='Set logging level')
+
+    args = parser.parse_args()
+
+    logger = logging.getLogger('hashserv')
+
+    level = getattr(logging, args.log.upper(), None)
+    if not isinstance(level, int):
+        raise ValueError('Invalid log level: %s' % args.log)
+
+    logger.setLevel(level)
+    console = logging.StreamHandler()
+    console.setLevel(level)
+    logger.addHandler(console)
+
+    db = sqlite3.connect(args.database)
+
+    server = hashserv.create_server((args.address, args.port), db, args.prefix)
+    server.serve_forever()
+    return 0
+
+if __name__ == '__main__':
+    try:
+        ret = main()
+    except Exception:
+        ret = 1
+        import traceback
+        traceback.print_exc()
+    sys.exit(ret)
+
diff --git a/bitbake/bin/bitbake-selftest b/bitbake/bin/bitbake-selftest
index c970dcae90c..99f1af910f4 100755
--- a/bitbake/bin/bitbake-selftest
+++ b/bitbake/bin/bitbake-selftest
@@ -22,6 +22,7 @@ sys.path.insert(0, os.path.join(os.path.dirname(os.path.dirname(__file__)), 'lib
 import unittest
 try:
     import bb
+    import hashserv
     import layerindexlib
 except RuntimeError as exc:
     sys.exit(str(exc))
@@ -35,6 +36,7 @@ tests = ["bb.tests.codeparser",
          "bb.tests.parse",
          "bb.tests.persist_data",
          "bb.tests.utils",
+         "hashserv.tests",
          "layerindexlib.tests.layerindexobj",
          "layerindexlib.tests.restapi",
          "layerindexlib.tests.cooker"]
diff --git a/bitbake/lib/hashserv/__init__.py b/bitbake/lib/hashserv/__init__.py
new file mode 100644
index 00000000000..46bca7cab32
--- /dev/null
+++ b/bitbake/lib/hashserv/__init__.py
@@ -0,0 +1,152 @@
+# Copyright (C) 2018 Garmin Ltd.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License version 2 as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License along
+# with this program; if not, write to the Free Software Foundation, Inc.,
+# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+
+from http.server import BaseHTTPRequestHandler, HTTPServer
+import contextlib
+import urllib.parse
+import sqlite3
+import json
+import traceback
+import logging
+from datetime import datetime
+
+logger = logging.getLogger('hashserv')
+
+class HashEquivalenceServer(BaseHTTPRequestHandler):
+    def log_message(self, f, *args):
+        logger.debug(f, *args)
+
+    def do_GET(self):
+        try:
+            p = urllib.parse.urlparse(self.path)
+
+            if p.path != self.prefix + '/v1/equivalent':
+                self.send_error(404)
+                return
+
+            query = urllib.parse.parse_qs(p.query, strict_parsing=True)
+            method = query['method'][0]
+            taskhash = query['taskhash'][0]
+
+            d = None
+            with contextlib.closing(self.db.cursor()) as cursor:
+                cursor.execute('SELECT taskhash, method, unihash FROM tasks_v1 WHERE method=:method AND taskhash=:taskhash ORDER BY created ASC LIMIT 1',
+                        {'method': method, 'taskhash': taskhash})
+
+                row = cursor.fetchone()
+
+                if row is not None:
+                    logger.debug('Found equivalent task %s', row['taskhash'])
+                    d = {k: row[k] for k in ('taskhash', 'method', 'unihash')}
+
+            self.send_response(200)
+            self.send_header('Content-Type', 'application/json; charset=utf-8')
+            self.end_headers()
+            self.wfile.write(json.dumps(d).encode('utf-8'))
+        except:
+            logger.exception('Error in GET')
+            self.send_error(400, explain=traceback.format_exc())
+            return
+
+    def do_POST(self):
+        try:
+            p = urllib.parse.urlparse(self.path)
+
+            if p.path != self.prefix + '/v1/equivalent':
+                self.send_error(404)
+                return
+
+            length = int(self.headers['content-length'])
+            data = json.loads(self.rfile.read(length).decode('utf-8'))
+
+            with contextlib.closing(self.db.cursor()) as cursor:
+                cursor.execute('''
+                    SELECT taskhash, method, unihash FROM tasks_v1 WHERE method=:method AND outhash=:outhash
+                    ORDER BY CASE WHEN taskhash=:taskhash THEN 1 ELSE 2 END,
+                        created ASC
+                    LIMIT 1
+                    ''', {k: data[k] for k in ('method', 'outhash', 'taskhash')})
+
+                row = cursor.fetchone()
+
+                if row is None or row['taskhash'] != data['taskhash']:
+                    unihash = data['unihash']
+                    if row is not None:
+                        unihash = row['unihash']
+
+                    insert_data = {
+                            'method': data['method'],
+                            'outhash': data['outhash'],
+                            'taskhash': data['taskhash'],
+                            'unihash': unihash,
+                            'created': datetime.now()
+                            }
+
+                    for k in ('owner', 'PN', 'PV', 'PR', 'task', 'outhash_siginfo'):
+                        if k in data:
+                            insert_data[k] = data[k]
+
+                    cursor.execute('''INSERT INTO tasks_v1 (%s) VALUES (%s)''' % (
+                            ', '.join(sorted(insert_data.keys())),
+                            ', '.join(':' + k for k in sorted(insert_data.keys()))),
+                        insert_data)
+
+                    logger.info('Adding taskhash %s with unihash %s', data['taskhash'], unihash)
+                    cursor.execute('SELECT taskhash, method, unihash FROM tasks_v1 WHERE id=:id', {'id': cursor.lastrowid})
+                    row = cursor.fetchone()
+
+                    self.db.commit()
+
+                d = {k: row[k] for k in ('taskhash', 'method', 'unihash')}
+
+                self.send_response(200)
+                self.send_header('Content-Type', 'application/json; charset=utf-8')
+                self.end_headers()
+                self.wfile.write(json.dumps(d).encode('utf-8'))
+        except:
+            logger.exception('Error in POST')
+            self.send_error(400, explain=traceback.format_exc())
+            return
+
+def create_server(addr, db, prefix=''):
+    class Handler(HashEquivalenceServer):
+        pass
+
+    Handler.prefix = prefix
+    Handler.db = db
+    db.row_factory = sqlite3.Row
+
+    with contextlib.closing(db.cursor()) as cursor:
+        cursor.execute('''
+            CREATE TABLE IF NOT EXISTS tasks_v1 (
+                id INTEGER PRIMARY KEY AUTOINCREMENT,
+                method TEXT NOT NULL,
+                outhash TEXT NOT NULL,
+                taskhash TEXT NOT NULL,
+                unihash TEXT NOT NULL,
+                created DATETIME,
+
+                -- Optional fields
+                owner TEXT,
+                PN TEXT,
+                PV TEXT,
+                PR TEXT,
+                task TEXT,
+                outhash_siginfo TEXT
+                )
+            ''')
+
+    logger.info('Starting server on %s', addr)
+    return HTTPServer(addr, Handler)
diff --git a/bitbake/lib/hashserv/tests.py b/bitbake/lib/hashserv/tests.py
new file mode 100644
index 00000000000..806b54c5ebd
--- /dev/null
+++ b/bitbake/lib/hashserv/tests.py
@@ -0,0 +1,141 @@
+#! /usr/bin/env python3
+#
+# Copyright (C) 2018 Garmin Ltd.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License version 2 as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License along
+# with this program; if not, write to the Free Software Foundation, Inc.,
+# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+
+import unittest
+import threading
+import sqlite3
+import hashlib
+import urllib.request
+import json
+from . import create_server
+
+class TestHashEquivalenceServer(unittest.TestCase):
+    def setUp(self):
+        # Start an in memory hash equivalence server in the background bound to
+        # an ephemeral port
+        db = sqlite3.connect(':memory:', check_same_thread=False)
+        self.server = create_server(('localhost', 0), db)
+        self.server_addr = 'http://localhost:%d' % self.server.socket.getsockname()[1]
+        self.server_thread = threading.Thread(target=self.server.serve_forever)
+        self.server_thread.start()
+
+    def tearDown(self):
+        # Shutdown server
+        s = getattr(self, 'server', None)
+        if s is not None:
+            self.server.shutdown()
+            self.server_thread.join()
+            self.server.server_close()
+
+    def send_get(self, path):
+        url = '%s/%s' % (self.server_addr, path)
+        request = urllib.request.Request(url)
+        response = urllib.request.urlopen(request)
+        return json.loads(response.read().decode('utf-8'))
+
+    def send_post(self, path, data):
+        headers = {'content-type': 'application/json'}
+        url = '%s/%s' % (self.server_addr, path)
+        request = urllib.request.Request(url, json.dumps(data).encode('utf-8'), headers)
+        response = urllib.request.urlopen(request)
+        return json.loads(response.read().decode('utf-8'))
+
+    def test_create_hash(self):
+        # Simple test that hashes can be created
+        taskhash = '35788efcb8dfb0a02659d81cf2bfd695fb30faf9'
+        outhash = '2765d4a5884be49b28601445c2760c5f21e7e5c0ee2b7e3fce98fd7e5970796f'
+        unihash = 'f46d3fbb439bd9b921095da657a4de906510d2cd'
+
+        d = self.send_get('v1/equivalent?method=TestMethod&taskhash=%s' % taskhash)
+        self.assertIsNone(d, msg='Found unexpected task, %r' % d)
+
+        d = self.send_post('v1/equivalent', {
+            'taskhash': taskhash,
+            'method': 'TestMethod',
+            'outhash': outhash,
+            'unihash': unihash,
+            })
+        self.assertEqual(d['unihash'], unihash, 'Server returned bad unihash')
+
+    def test_create_equivalent(self):
+        # Tests that a second reported task with the same outhash will be
+        # assigned the same unihash
+        taskhash = '53b8dce672cb6d0c73170be43f540460bfc347b4'
+        outhash = '5a9cb1649625f0bf41fc7791b635cd9c2d7118c7f021ba87dcd03f72b67ce7a8'
+        unihash = 'f37918cc02eb5a520b1aff86faacbc0a38124646'
+        d = self.send_post('v1/equivalent', {
+            'taskhash': taskhash,
+            'method': 'TestMethod',
+            'outhash': outhash,
+            'unihash': unihash,
+            })
+        self.assertEqual(d['unihash'], unihash, 'Server returned bad unihash')
+
+        # Report a different task with the same outhash. The returned unihash
+        # should match the first task
+        taskhash2 = '3bf6f1e89d26205aec90da04854fbdbf73afe6b4'
+        unihash2 = 'af36b199320e611fbb16f1f277d3ee1d619ca58b'
+        d = self.send_post('v1/equivalent', {
+            'taskhash': taskhash2,
+            'method': 'TestMethod',
+            'outhash': outhash,
+            'unihash': unihash2,
+            })
+        self.assertEqual(d['unihash'], unihash, 'Server returned bad unihash')
+
+    def test_duplicate_taskhash(self):
+        # Tests that duplicate reports of the same taskhash with different
+        # outhash & unihash always return the unihash from the first reported
+        # taskhash
+        taskhash = '8aa96fcffb5831b3c2c0cb75f0431e3f8b20554a'
+        outhash = 'afe240a439959ce86f5e322f8c208e1fedefea9e813f2140c81af866cc9edf7e'
+        unihash = '218e57509998197d570e2c98512d0105985dffc9'
+        d = self.send_post('v1/equivalent', {
+            'taskhash': taskhash,
+            'method': 'TestMethod',
+            'outhash': outhash,
+            'unihash': unihash,
+            })
+
+        d = self.send_get('v1/equivalent?method=TestMethod&taskhash=%s' % taskhash)
+        self.assertEqual(d['unihash'], unihash)
+
+        outhash2 = '0904a7fe3dc712d9fd8a74a616ddca2a825a8ee97adf0bd3fc86082c7639914d'
+        unihash2 = 'ae9a7d252735f0dafcdb10e2e02561ca3a47314c'
+        d = self.send_post('v1/equivalent', {
+            'taskhash': taskhash,
+            'method': 'TestMethod',
+            'outhash': outhash2,
+            'unihash': unihash2
+            })
+
+        d = self.send_get('v1/equivalent?method=TestMethod&taskhash=%s' % taskhash)
+        self.assertEqual(d['unihash'], unihash)
+
+        outhash3 = '77623a549b5b1a31e3732dfa8fe61d7ce5d44b3370f253c5360e136b852967b4'
+        unihash3 = '9217a7d6398518e5dc002ed58f2cbbbc78696603'
+        d = self.send_post('v1/equivalent', {
+            'taskhash': taskhash,
+            'method': 'TestMethod',
+            'outhash': outhash3,
+            'unihash': unihash3
+            })
+
+        d = self.send_get('v1/equivalent?method=TestMethod&taskhash=%s' % taskhash)
+        self.assertEqual(d['unihash'], unihash)
+
+
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [OE-core][PATCH v7 3/3] sstate: Implement hash equivalence sstate
  2019-01-04 16:20           ` [PATCH " Joshua Watt
@ 2019-01-04 16:20             ` Joshua Watt
  -1 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2019-01-04 16:20 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

Converts sstate so that it can use a hash equivalence server to
determine if a task really needs to be rebuilt, or if it can be restored
from a different (equivalent) sstate object.

The unique hashes are cached persistently using persist_data. This has
a number of advantages:
 1) Unique hashes can be cached between invocations of bitbake to
    prevent needing to contact the server every time (which is slow)
 2) The value of each tasks unique hash can easily be synchronized
    between different threads, which will be useful if bitbake is
    updated to do on the fly task re-hashing.

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 meta/classes/sstate.bbclass | 105 +++++++++++++++++++++--
 meta/conf/bitbake.conf      |   4 +-
 meta/lib/oe/sstatesig.py    | 167 ++++++++++++++++++++++++++++++++++++
 3 files changed, 267 insertions(+), 9 deletions(-)

diff --git a/meta/classes/sstate.bbclass b/meta/classes/sstate.bbclass
index 59ebc3ab5cc..da0807d6e99 100644
--- a/meta/classes/sstate.bbclass
+++ b/meta/classes/sstate.bbclass
@@ -11,7 +11,7 @@ def generate_sstatefn(spec, hash, d):
 SSTATE_PKGARCH    = "${PACKAGE_ARCH}"
 SSTATE_PKGSPEC    = "sstate:${PN}:${PACKAGE_ARCH}${TARGET_VENDOR}-${TARGET_OS}:${PV}:${PR}:${SSTATE_PKGARCH}:${SSTATE_VERSION}:"
 SSTATE_SWSPEC     = "sstate:${PN}::${PV}:${PR}::${SSTATE_VERSION}:"
-SSTATE_PKGNAME    = "${SSTATE_EXTRAPATH}${@generate_sstatefn(d.getVar('SSTATE_PKGSPEC'), d.getVar('BB_TASKHASH'), d)}"
+SSTATE_PKGNAME    = "${SSTATE_EXTRAPATH}${@generate_sstatefn(d.getVar('SSTATE_PKGSPEC'), d.getVar('BB_UNIHASH'), d)}"
 SSTATE_PKG        = "${SSTATE_DIR}/${SSTATE_PKGNAME}"
 SSTATE_EXTRAPATH   = ""
 SSTATE_EXTRAPATHWILDCARD = ""
@@ -82,6 +82,23 @@ SSTATE_SIG_PASSPHRASE ?= ""
 # Whether to verify the GnUPG signatures when extracting sstate archives
 SSTATE_VERIFY_SIG ?= "0"
 
+SSTATE_HASHEQUIV_METHOD ?= "OEOuthashBasic"
+SSTATE_HASHEQUIV_METHOD[doc] = "The function used to calculate the output hash \
+    for a task, which in turn is used to determine equivalency. \
+    "
+
+SSTATE_HASHEQUIV_SERVER ?= ""
+SSTATE_HASHEQUIV_SERVER[doc] = "The hash equivalence sever. For example, \
+    'http://192.168.0.1:5000'. Do not include a trailing slash \
+    "
+
+SSTATE_HASHEQUIV_REPORT_TASKDATA ?= "0"
+SSTATE_HASHEQUIV_REPORT_TASKDATA[doc] = "Report additional useful data to the \
+    hash equivalency server, such as PN, PV, taskname, etc. This information \
+    is very useful for developers looking at task data, but may leak sensitive \
+    data if the equivalence server is public. \
+    "
+
 python () {
     if bb.data.inherits_class('native', d):
         d.setVar('SSTATE_PKGARCH', d.getVar('BUILD_ARCH', False))
@@ -640,7 +657,7 @@ def sstate_package(ss, d):
         return
 
     for f in (d.getVar('SSTATECREATEFUNCS') or '').split() + \
-             ['sstate_create_package', 'sstate_sign_package'] + \
+             ['sstate_report_unihash', 'sstate_create_package', 'sstate_sign_package'] + \
              (d.getVar('SSTATEPOSTCREATEFUNCS') or '').split():
         # All hooks should run in SSTATE_BUILDDIR.
         bb.build.exec_func(f, d, (sstatebuild,))
@@ -764,6 +781,73 @@ python sstate_sign_package () {
                            d.getVar('SSTATE_SIG_PASSPHRASE'), armor=False)
 }
 
+def OEOuthashBasic(path, sigfile, task, d):
+    import hashlib
+    import stat
+
+    def update_hash(s):
+        s = s.encode('utf-8')
+        h.update(s)
+        if sigfile:
+            sigfile.write(s)
+
+    h = hashlib.sha256()
+    prev_dir = os.getcwd()
+
+    try:
+        os.chdir(path)
+
+        update_hash("OEOuthashBasic\n")
+
+        # It is only currently useful to get equivalent hashes for things that
+        # can be restored from sstate. Since the sstate object is named using
+        # SSTATE_PKGSPEC and the task name, those should be included in the
+        # output hash calculation.
+        update_hash("SSTATE_PKGSPEC=%s\n" % d.getVar('SSTATE_PKGSPEC'))
+        update_hash("task=%s\n" % task)
+
+        for root, dirs, files in os.walk('.', topdown=True):
+            # Sort directories and files to ensure consistent ordering
+            dirs.sort()
+            files.sort()
+
+            for f in files:
+                path = os.path.join(root, f)
+                s = os.lstat(path)
+
+                # Hash file path
+                update_hash(path + '\n')
+
+                # Hash file mode
+                update_hash("\tmode=0x%x\n" % stat.S_IMODE(s.st_mode))
+                update_hash("\ttype=0x%x\n" % stat.S_IFMT(s.st_mode))
+
+                if stat.S_ISBLK(s.st_mode) or stat.S_ISBLK(s.st_mode):
+                    # Hash device major and minor
+                    update_hash("\tdev=%d,%d\n" % (os.major(s.st_rdev), os.minor(s.st_rdev)))
+                elif stat.S_ISLNK(s.st_mode):
+                    # Hash symbolic link
+                    update_hash("\tsymlink=%s\n" % os.readlink(path))
+                else:
+                    fh = hashlib.sha256()
+                    # Hash file contents
+                    with open(path, 'rb') as d:
+                        for chunk in iter(lambda: d.read(4096), b""):
+                            fh.update(chunk)
+                    update_hash("\tdigest=%s\n" % fh.hexdigest())
+    finally:
+        os.chdir(prev_dir)
+
+    return h.hexdigest()
+
+python sstate_report_unihash() {
+    report_unihash = getattr(bb.parse.siggen, 'report_unihash', None)
+
+    if report_unihash:
+        ss = sstate_state_fromvars(d)
+        report_unihash(os.getcwd(), ss['task'], d)
+}
+
 #
 # Shell function to decompress and prepare a package for installation
 # Will be run from within SSTATE_INSTDIR.
@@ -788,6 +872,11 @@ def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False, *,
     if siginfo:
         extension = extension + ".siginfo"
 
+    def gethash(task):
+        if sq_unihash is not None:
+            return sq_unihash[task]
+        return sq_hash[task]
+
     def getpathcomponents(task, d):
         # Magic data from BB_HASHFILENAME
         splithashfn = sq_hashfn[task].split(" ")
@@ -810,7 +899,7 @@ def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False, *,
 
         spec, extrapath, tname = getpathcomponents(task, d)
 
-        sstatefile = d.expand("${SSTATE_DIR}/" + extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + extension)
+        sstatefile = d.expand("${SSTATE_DIR}/" + extrapath + generate_sstatefn(spec, gethash(task), d) + "_" + tname + extension)
 
         if os.path.exists(sstatefile):
             bb.debug(2, "SState: Found valid sstate file %s" % sstatefile)
@@ -872,7 +961,7 @@ def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False, *,
             if task in ret:
                 continue
             spec, extrapath, tname = getpathcomponents(task, d)
-            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + extension)
+            sstatefile = d.expand(extrapath + generate_sstatefn(spec, gethash(task), d) + "_" + tname + extension)
             tasklist.append((task, sstatefile))
 
         if tasklist:
@@ -898,12 +987,12 @@ def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False, *,
         evdata = {'missed': [], 'found': []};
         for task in missed:
             spec, extrapath, tname = getpathcomponents(task, d)
-            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + ".tgz")
-            evdata['missed'].append( (sq_fn[task], sq_task[task], sq_hash[task], sstatefile ) )
+            sstatefile = d.expand(extrapath + generate_sstatefn(spec, gethash(task), d) + "_" + tname + ".tgz")
+            evdata['missed'].append( (sq_fn[task], sq_task[task], gethash(task), sstatefile ) )
         for task in ret:
             spec, extrapath, tname = getpathcomponents(task, d)
-            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + ".tgz")
-            evdata['found'].append( (sq_fn[task], sq_task[task], sq_hash[task], sstatefile ) )
+            sstatefile = d.expand(extrapath + generate_sstatefn(spec, gethash(task), d) + "_" + tname + ".tgz")
+            evdata['found'].append( (sq_fn[task], sq_task[task], gethash(task), sstatefile ) )
         bb.event.fire(bb.event.MetadataEvent("MissedSstate", evdata), d)
 
     # Print some summary statistics about the current task completion and how much sstate
diff --git a/meta/conf/bitbake.conf b/meta/conf/bitbake.conf
index 64800623545..e64ce6a6dab 100644
--- a/meta/conf/bitbake.conf
+++ b/meta/conf/bitbake.conf
@@ -867,7 +867,9 @@ BB_HASHBASE_WHITELIST ?= "TMPDIR FILE PATH PWD BB_TASKHASH BBPATH BBSERVER DL_DI
     STAMPS_DIR PRSERV_DUMPDIR PRSERV_DUMPFILE PRSERV_LOCKDOWN PARALLEL_MAKE \
     CCACHE_DIR EXTERNAL_TOOLCHAIN CCACHE CCACHE_NOHASHDIR LICENSE_PATH SDKPKGSUFFIX \
     WARN_QA ERROR_QA WORKDIR STAMPCLEAN PKGDATA_DIR BUILD_ARCH SSTATE_PKGARCH \
-    BB_WORKERCONTEXT BB_LIMITEDDEPS extend_recipe_sysroot DEPLOY_DIR"
+    BB_WORKERCONTEXT BB_LIMITEDDEPS BB_UNIHASH extend_recipe_sysroot DEPLOY_DIR \
+    SSTATE_HASHEQUIV_METHOD SSTATE_HASHEQUIV_SERVER SSTATE_HASHEQUIV_REPORT_TASKDATA \
+    SSTATE_HASHEQUIV_OWNER"
 BB_HASHCONFIG_WHITELIST ?= "${BB_HASHBASE_WHITELIST} DATE TIME SSH_AGENT_PID \
     SSH_AUTH_SOCK PSEUDO_BUILD BB_ENV_EXTRAWHITE DISABLE_SANITY_CHECKS \
     PARALLEL_MAKE BB_NUMBER_THREADS BB_ORIGENV BB_INVALIDCONF BBINCLUDED \
diff --git a/meta/lib/oe/sstatesig.py b/meta/lib/oe/sstatesig.py
index 18c5a353a2a..059e165c7ab 100644
--- a/meta/lib/oe/sstatesig.py
+++ b/meta/lib/oe/sstatesig.py
@@ -263,10 +263,177 @@ class SignatureGeneratorOEBasicHash(bb.siggen.SignatureGeneratorBasicHash):
         if error_msgs:
             bb.fatal("\n".join(error_msgs))
 
+class SignatureGeneratorOEEquivHash(SignatureGeneratorOEBasicHash):
+    name = "OEEquivHash"
+
+    def init_rundepcheck(self, data):
+        super().init_rundepcheck(data)
+        self.server = data.getVar('SSTATE_HASHEQUIV_SERVER')
+        self.method = data.getVar('SSTATE_HASHEQUIV_METHOD')
+        self.unihashes = bb.persist_data.persist('SSTATESIG_UNIHASH_CACHE_v1_' + self.method, data)
+
+    def get_taskdata(self):
+        return (self.server, self.method) + super().get_taskdata()
+
+    def set_taskdata(self, data):
+        self.server, self.method = data[:2]
+        super().set_taskdata(data[2:])
+
+    def __get_task_unihash_key(self, task):
+        # TODO: The key only *needs* to be the taskhash, the task is just
+        # convenient
+        return '%s:%s' % (task, self.taskhash[task])
+
+    def get_stampfile_hash(self, task):
+        if task in self.taskhash:
+            # If a unique hash is reported, use it as the stampfile hash. This
+            # ensures that if a task won't be re-run if the taskhash changes,
+            # but it would result in the same output hash
+            unihash = self.unihashes.get(self.__get_task_unihash_key(task))
+            if unihash is not None:
+                return unihash
+
+        return super().get_stampfile_hash(task)
+
+    def get_unihash(self, task):
+        import urllib
+        import json
+
+        taskhash = self.taskhash[task]
+
+        key = self.__get_task_unihash_key(task)
+
+        # TODO: This cache can grow unbounded. It probably only needs to keep
+        # for each task
+        unihash = self.unihashes.get(key)
+        if unihash is not None:
+            return unihash
+
+        # In the absence of being able to discover a unique hash from the
+        # server, make it be equivalent to the taskhash. The unique "hash" only
+        # really needs to be a unique string (not even necessarily a hash), but
+        # making it match the taskhash has a few advantages:
+        #
+        # 1) All of the sstate code that assumes hashes can be the same
+        # 2) It provides maximal compatibility with builders that don't use
+        #    an equivalency server
+        # 3) The value is easy for multiple independent builders to derive the
+        #    same unique hash from the same input. This means that if the
+        #    independent builders find the same taskhash, but it isn't reported
+        #    to the server, there is a better chance that they will agree on
+        #    the unique hash.
+        unihash = taskhash
+
+        try:
+            url = '%s/v1/equivalent?%s' % (self.server,
+                    urllib.parse.urlencode({'method': self.method, 'taskhash': self.taskhash[task]}))
+
+            request = urllib.request.Request(url)
+            response = urllib.request.urlopen(request)
+            data = response.read().decode('utf-8')
+
+            json_data = json.loads(data)
+
+            if json_data:
+                unihash = json_data['unihash']
+                # A unique hash equal to the taskhash is not very interesting,
+                # so it is reported it at debug level 2. If they differ, that
+                # is much more interesting, so it is reported at debug level 1
+                bb.debug((1, 2)[unihash == taskhash], 'Found unihash %s in place of %s for %s from %s' % (unihash, taskhash, task, self.server))
+            else:
+                bb.debug(2, 'No reported unihash for %s:%s from %s' % (task, taskhash, self.server))
+        except urllib.error.URLError as e:
+            bb.warn('Failure contacting Hash Equivalence Server %s: %s' % (self.server, str(e)))
+        except (KeyError, json.JSONDecodeError) as e:
+            bb.warn('Poorly formatted response from %s: %s' % (self.server, str(e)))
+
+        self.unihashes[key] = unihash
+        return unihash
+
+    def report_unihash(self, path, task, d):
+        import urllib
+        import json
+        import tempfile
+        import base64
+
+        taskhash = d.getVar('BB_TASKHASH')
+        unihash = d.getVar('BB_UNIHASH')
+        report_taskdata = d.getVar('SSTATE_HASHEQUIV_REPORT_TASKDATA') == '1'
+        tempdir = d.getVar('T')
+        fn = d.getVar('BB_FILENAME')
+        key = fn + '.do_' + task + ':' + taskhash
+
+        # Sanity checks
+        cache_unihash = self.unihashes.get(key)
+        if cache_unihash is None:
+            bb.fatal('%s not in unihash cache. Please report this error' % key)
+
+        if cache_unihash != unihash:
+            bb.fatal("Cache unihash %s doesn't match BB_UNIHASH %s" % (cache_unihash, unihash))
+
+        sigfile = None
+        sigfile_name = "depsig.do_%s.%d" % (task, os.getpid())
+        sigfile_link = "depsig.do_%s" % task
+
+        try:
+            call = self.method + '(path, sigfile, task, d)'
+            sigfile = open(os.path.join(tempdir, sigfile_name), 'w+b')
+            locs = {'path': path, 'sigfile': sigfile, 'task': task, 'd': d}
+
+            outhash = bb.utils.better_eval(call, locs)
+
+            try:
+                url = '%s/v1/equivalent' % self.server
+                task_data = {
+                    'taskhash': taskhash,
+                    'method': self.method,
+                    'outhash': outhash,
+                    'unihash': unihash,
+                    'owner': d.getVar('SSTATE_HASHEQUIV_OWNER')
+                    }
+
+                if report_taskdata:
+                    sigfile.seek(0)
+
+                    task_data['PN'] = d.getVar('PN')
+                    task_data['PV'] = d.getVar('PV')
+                    task_data['PR'] = d.getVar('PR')
+                    task_data['task'] = task
+                    task_data['outhash_siginfo'] = sigfile.read().decode('utf-8')
+
+                headers = {'content-type': 'application/json'}
+
+                request = urllib.request.Request(url, json.dumps(task_data).encode('utf-8'), headers)
+                response = urllib.request.urlopen(request)
+                data = response.read().decode('utf-8')
+
+                json_data = json.loads(data)
+                new_unihash = json_data['unihash']
+
+                if new_unihash != unihash:
+                    bb.debug(1, 'Task %s unihash changed %s -> %s by server %s' % (taskhash, unihash, new_unihash, self.server))
+                else:
+                    bb.debug(1, 'Reported task %s as unihash %s to %s' % (taskhash, unihash, self.server))
+            except urllib.error.URLError as e:
+                bb.warn('Failure contacting Hash Equivalence Server %s: %s' % (self.server, str(e)))
+            except (KeyError, json.JSONDecodeError) as e:
+                bb.warn('Poorly formatted response from %s: %s' % (self.server, str(e)))
+        finally:
+            if sigfile:
+                sigfile.close()
+
+                sigfile_link_path = os.path.join(tempdir, sigfile_link)
+                bb.utils.remove(sigfile_link_path)
+
+                try:
+                    os.symlink(sigfile_name, sigfile_link_path)
+                except OSError:
+                    pass
 
 # Insert these classes into siggen's namespace so it can see and select them
 bb.siggen.SignatureGeneratorOEBasic = SignatureGeneratorOEBasic
 bb.siggen.SignatureGeneratorOEBasicHash = SignatureGeneratorOEBasicHash
+bb.siggen.SignatureGeneratorOEEquivHash = SignatureGeneratorOEEquivHash
 
 
 def find_siginfo(pn, taskname, taskhashlist, d):
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* [PATCH v7 3/3] sstate: Implement hash equivalence sstate
@ 2019-01-04 16:20             ` Joshua Watt
  0 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2019-01-04 16:20 UTC (permalink / raw)
  To: openembedded-core, bitbake-devel

Converts sstate so that it can use a hash equivalence server to
determine if a task really needs to be rebuilt, or if it can be restored
from a different (equivalent) sstate object.

The unique hashes are cached persistently using persist_data. This has
a number of advantages:
 1) Unique hashes can be cached between invocations of bitbake to
    prevent needing to contact the server every time (which is slow)
 2) The value of each tasks unique hash can easily be synchronized
    between different threads, which will be useful if bitbake is
    updated to do on the fly task re-hashing.

[YOCTO #13030]

Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
---
 meta/classes/sstate.bbclass | 105 +++++++++++++++++++++--
 meta/conf/bitbake.conf      |   4 +-
 meta/lib/oe/sstatesig.py    | 167 ++++++++++++++++++++++++++++++++++++
 3 files changed, 267 insertions(+), 9 deletions(-)

diff --git a/meta/classes/sstate.bbclass b/meta/classes/sstate.bbclass
index 59ebc3ab5cc..da0807d6e99 100644
--- a/meta/classes/sstate.bbclass
+++ b/meta/classes/sstate.bbclass
@@ -11,7 +11,7 @@ def generate_sstatefn(spec, hash, d):
 SSTATE_PKGARCH    = "${PACKAGE_ARCH}"
 SSTATE_PKGSPEC    = "sstate:${PN}:${PACKAGE_ARCH}${TARGET_VENDOR}-${TARGET_OS}:${PV}:${PR}:${SSTATE_PKGARCH}:${SSTATE_VERSION}:"
 SSTATE_SWSPEC     = "sstate:${PN}::${PV}:${PR}::${SSTATE_VERSION}:"
-SSTATE_PKGNAME    = "${SSTATE_EXTRAPATH}${@generate_sstatefn(d.getVar('SSTATE_PKGSPEC'), d.getVar('BB_TASKHASH'), d)}"
+SSTATE_PKGNAME    = "${SSTATE_EXTRAPATH}${@generate_sstatefn(d.getVar('SSTATE_PKGSPEC'), d.getVar('BB_UNIHASH'), d)}"
 SSTATE_PKG        = "${SSTATE_DIR}/${SSTATE_PKGNAME}"
 SSTATE_EXTRAPATH   = ""
 SSTATE_EXTRAPATHWILDCARD = ""
@@ -82,6 +82,23 @@ SSTATE_SIG_PASSPHRASE ?= ""
 # Whether to verify the GnUPG signatures when extracting sstate archives
 SSTATE_VERIFY_SIG ?= "0"
 
+SSTATE_HASHEQUIV_METHOD ?= "OEOuthashBasic"
+SSTATE_HASHEQUIV_METHOD[doc] = "The function used to calculate the output hash \
+    for a task, which in turn is used to determine equivalency. \
+    "
+
+SSTATE_HASHEQUIV_SERVER ?= ""
+SSTATE_HASHEQUIV_SERVER[doc] = "The hash equivalence sever. For example, \
+    'http://192.168.0.1:5000'. Do not include a trailing slash \
+    "
+
+SSTATE_HASHEQUIV_REPORT_TASKDATA ?= "0"
+SSTATE_HASHEQUIV_REPORT_TASKDATA[doc] = "Report additional useful data to the \
+    hash equivalency server, such as PN, PV, taskname, etc. This information \
+    is very useful for developers looking at task data, but may leak sensitive \
+    data if the equivalence server is public. \
+    "
+
 python () {
     if bb.data.inherits_class('native', d):
         d.setVar('SSTATE_PKGARCH', d.getVar('BUILD_ARCH', False))
@@ -640,7 +657,7 @@ def sstate_package(ss, d):
         return
 
     for f in (d.getVar('SSTATECREATEFUNCS') or '').split() + \
-             ['sstate_create_package', 'sstate_sign_package'] + \
+             ['sstate_report_unihash', 'sstate_create_package', 'sstate_sign_package'] + \
              (d.getVar('SSTATEPOSTCREATEFUNCS') or '').split():
         # All hooks should run in SSTATE_BUILDDIR.
         bb.build.exec_func(f, d, (sstatebuild,))
@@ -764,6 +781,73 @@ python sstate_sign_package () {
                            d.getVar('SSTATE_SIG_PASSPHRASE'), armor=False)
 }
 
+def OEOuthashBasic(path, sigfile, task, d):
+    import hashlib
+    import stat
+
+    def update_hash(s):
+        s = s.encode('utf-8')
+        h.update(s)
+        if sigfile:
+            sigfile.write(s)
+
+    h = hashlib.sha256()
+    prev_dir = os.getcwd()
+
+    try:
+        os.chdir(path)
+
+        update_hash("OEOuthashBasic\n")
+
+        # It is only currently useful to get equivalent hashes for things that
+        # can be restored from sstate. Since the sstate object is named using
+        # SSTATE_PKGSPEC and the task name, those should be included in the
+        # output hash calculation.
+        update_hash("SSTATE_PKGSPEC=%s\n" % d.getVar('SSTATE_PKGSPEC'))
+        update_hash("task=%s\n" % task)
+
+        for root, dirs, files in os.walk('.', topdown=True):
+            # Sort directories and files to ensure consistent ordering
+            dirs.sort()
+            files.sort()
+
+            for f in files:
+                path = os.path.join(root, f)
+                s = os.lstat(path)
+
+                # Hash file path
+                update_hash(path + '\n')
+
+                # Hash file mode
+                update_hash("\tmode=0x%x\n" % stat.S_IMODE(s.st_mode))
+                update_hash("\ttype=0x%x\n" % stat.S_IFMT(s.st_mode))
+
+                if stat.S_ISBLK(s.st_mode) or stat.S_ISBLK(s.st_mode):
+                    # Hash device major and minor
+                    update_hash("\tdev=%d,%d\n" % (os.major(s.st_rdev), os.minor(s.st_rdev)))
+                elif stat.S_ISLNK(s.st_mode):
+                    # Hash symbolic link
+                    update_hash("\tsymlink=%s\n" % os.readlink(path))
+                else:
+                    fh = hashlib.sha256()
+                    # Hash file contents
+                    with open(path, 'rb') as d:
+                        for chunk in iter(lambda: d.read(4096), b""):
+                            fh.update(chunk)
+                    update_hash("\tdigest=%s\n" % fh.hexdigest())
+    finally:
+        os.chdir(prev_dir)
+
+    return h.hexdigest()
+
+python sstate_report_unihash() {
+    report_unihash = getattr(bb.parse.siggen, 'report_unihash', None)
+
+    if report_unihash:
+        ss = sstate_state_fromvars(d)
+        report_unihash(os.getcwd(), ss['task'], d)
+}
+
 #
 # Shell function to decompress and prepare a package for installation
 # Will be run from within SSTATE_INSTDIR.
@@ -788,6 +872,11 @@ def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False, *,
     if siginfo:
         extension = extension + ".siginfo"
 
+    def gethash(task):
+        if sq_unihash is not None:
+            return sq_unihash[task]
+        return sq_hash[task]
+
     def getpathcomponents(task, d):
         # Magic data from BB_HASHFILENAME
         splithashfn = sq_hashfn[task].split(" ")
@@ -810,7 +899,7 @@ def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False, *,
 
         spec, extrapath, tname = getpathcomponents(task, d)
 
-        sstatefile = d.expand("${SSTATE_DIR}/" + extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + extension)
+        sstatefile = d.expand("${SSTATE_DIR}/" + extrapath + generate_sstatefn(spec, gethash(task), d) + "_" + tname + extension)
 
         if os.path.exists(sstatefile):
             bb.debug(2, "SState: Found valid sstate file %s" % sstatefile)
@@ -872,7 +961,7 @@ def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False, *,
             if task in ret:
                 continue
             spec, extrapath, tname = getpathcomponents(task, d)
-            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + extension)
+            sstatefile = d.expand(extrapath + generate_sstatefn(spec, gethash(task), d) + "_" + tname + extension)
             tasklist.append((task, sstatefile))
 
         if tasklist:
@@ -898,12 +987,12 @@ def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False, *,
         evdata = {'missed': [], 'found': []};
         for task in missed:
             spec, extrapath, tname = getpathcomponents(task, d)
-            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + ".tgz")
-            evdata['missed'].append( (sq_fn[task], sq_task[task], sq_hash[task], sstatefile ) )
+            sstatefile = d.expand(extrapath + generate_sstatefn(spec, gethash(task), d) + "_" + tname + ".tgz")
+            evdata['missed'].append( (sq_fn[task], sq_task[task], gethash(task), sstatefile ) )
         for task in ret:
             spec, extrapath, tname = getpathcomponents(task, d)
-            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + ".tgz")
-            evdata['found'].append( (sq_fn[task], sq_task[task], sq_hash[task], sstatefile ) )
+            sstatefile = d.expand(extrapath + generate_sstatefn(spec, gethash(task), d) + "_" + tname + ".tgz")
+            evdata['found'].append( (sq_fn[task], sq_task[task], gethash(task), sstatefile ) )
         bb.event.fire(bb.event.MetadataEvent("MissedSstate", evdata), d)
 
     # Print some summary statistics about the current task completion and how much sstate
diff --git a/meta/conf/bitbake.conf b/meta/conf/bitbake.conf
index 64800623545..e64ce6a6dab 100644
--- a/meta/conf/bitbake.conf
+++ b/meta/conf/bitbake.conf
@@ -867,7 +867,9 @@ BB_HASHBASE_WHITELIST ?= "TMPDIR FILE PATH PWD BB_TASKHASH BBPATH BBSERVER DL_DI
     STAMPS_DIR PRSERV_DUMPDIR PRSERV_DUMPFILE PRSERV_LOCKDOWN PARALLEL_MAKE \
     CCACHE_DIR EXTERNAL_TOOLCHAIN CCACHE CCACHE_NOHASHDIR LICENSE_PATH SDKPKGSUFFIX \
     WARN_QA ERROR_QA WORKDIR STAMPCLEAN PKGDATA_DIR BUILD_ARCH SSTATE_PKGARCH \
-    BB_WORKERCONTEXT BB_LIMITEDDEPS extend_recipe_sysroot DEPLOY_DIR"
+    BB_WORKERCONTEXT BB_LIMITEDDEPS BB_UNIHASH extend_recipe_sysroot DEPLOY_DIR \
+    SSTATE_HASHEQUIV_METHOD SSTATE_HASHEQUIV_SERVER SSTATE_HASHEQUIV_REPORT_TASKDATA \
+    SSTATE_HASHEQUIV_OWNER"
 BB_HASHCONFIG_WHITELIST ?= "${BB_HASHBASE_WHITELIST} DATE TIME SSH_AGENT_PID \
     SSH_AUTH_SOCK PSEUDO_BUILD BB_ENV_EXTRAWHITE DISABLE_SANITY_CHECKS \
     PARALLEL_MAKE BB_NUMBER_THREADS BB_ORIGENV BB_INVALIDCONF BBINCLUDED \
diff --git a/meta/lib/oe/sstatesig.py b/meta/lib/oe/sstatesig.py
index 18c5a353a2a..059e165c7ab 100644
--- a/meta/lib/oe/sstatesig.py
+++ b/meta/lib/oe/sstatesig.py
@@ -263,10 +263,177 @@ class SignatureGeneratorOEBasicHash(bb.siggen.SignatureGeneratorBasicHash):
         if error_msgs:
             bb.fatal("\n".join(error_msgs))
 
+class SignatureGeneratorOEEquivHash(SignatureGeneratorOEBasicHash):
+    name = "OEEquivHash"
+
+    def init_rundepcheck(self, data):
+        super().init_rundepcheck(data)
+        self.server = data.getVar('SSTATE_HASHEQUIV_SERVER')
+        self.method = data.getVar('SSTATE_HASHEQUIV_METHOD')
+        self.unihashes = bb.persist_data.persist('SSTATESIG_UNIHASH_CACHE_v1_' + self.method, data)
+
+    def get_taskdata(self):
+        return (self.server, self.method) + super().get_taskdata()
+
+    def set_taskdata(self, data):
+        self.server, self.method = data[:2]
+        super().set_taskdata(data[2:])
+
+    def __get_task_unihash_key(self, task):
+        # TODO: The key only *needs* to be the taskhash, the task is just
+        # convenient
+        return '%s:%s' % (task, self.taskhash[task])
+
+    def get_stampfile_hash(self, task):
+        if task in self.taskhash:
+            # If a unique hash is reported, use it as the stampfile hash. This
+            # ensures that if a task won't be re-run if the taskhash changes,
+            # but it would result in the same output hash
+            unihash = self.unihashes.get(self.__get_task_unihash_key(task))
+            if unihash is not None:
+                return unihash
+
+        return super().get_stampfile_hash(task)
+
+    def get_unihash(self, task):
+        import urllib
+        import json
+
+        taskhash = self.taskhash[task]
+
+        key = self.__get_task_unihash_key(task)
+
+        # TODO: This cache can grow unbounded. It probably only needs to keep
+        # for each task
+        unihash = self.unihashes.get(key)
+        if unihash is not None:
+            return unihash
+
+        # In the absence of being able to discover a unique hash from the
+        # server, make it be equivalent to the taskhash. The unique "hash" only
+        # really needs to be a unique string (not even necessarily a hash), but
+        # making it match the taskhash has a few advantages:
+        #
+        # 1) All of the sstate code that assumes hashes can be the same
+        # 2) It provides maximal compatibility with builders that don't use
+        #    an equivalency server
+        # 3) The value is easy for multiple independent builders to derive the
+        #    same unique hash from the same input. This means that if the
+        #    independent builders find the same taskhash, but it isn't reported
+        #    to the server, there is a better chance that they will agree on
+        #    the unique hash.
+        unihash = taskhash
+
+        try:
+            url = '%s/v1/equivalent?%s' % (self.server,
+                    urllib.parse.urlencode({'method': self.method, 'taskhash': self.taskhash[task]}))
+
+            request = urllib.request.Request(url)
+            response = urllib.request.urlopen(request)
+            data = response.read().decode('utf-8')
+
+            json_data = json.loads(data)
+
+            if json_data:
+                unihash = json_data['unihash']
+                # A unique hash equal to the taskhash is not very interesting,
+                # so it is reported it at debug level 2. If they differ, that
+                # is much more interesting, so it is reported at debug level 1
+                bb.debug((1, 2)[unihash == taskhash], 'Found unihash %s in place of %s for %s from %s' % (unihash, taskhash, task, self.server))
+            else:
+                bb.debug(2, 'No reported unihash for %s:%s from %s' % (task, taskhash, self.server))
+        except urllib.error.URLError as e:
+            bb.warn('Failure contacting Hash Equivalence Server %s: %s' % (self.server, str(e)))
+        except (KeyError, json.JSONDecodeError) as e:
+            bb.warn('Poorly formatted response from %s: %s' % (self.server, str(e)))
+
+        self.unihashes[key] = unihash
+        return unihash
+
+    def report_unihash(self, path, task, d):
+        import urllib
+        import json
+        import tempfile
+        import base64
+
+        taskhash = d.getVar('BB_TASKHASH')
+        unihash = d.getVar('BB_UNIHASH')
+        report_taskdata = d.getVar('SSTATE_HASHEQUIV_REPORT_TASKDATA') == '1'
+        tempdir = d.getVar('T')
+        fn = d.getVar('BB_FILENAME')
+        key = fn + '.do_' + task + ':' + taskhash
+
+        # Sanity checks
+        cache_unihash = self.unihashes.get(key)
+        if cache_unihash is None:
+            bb.fatal('%s not in unihash cache. Please report this error' % key)
+
+        if cache_unihash != unihash:
+            bb.fatal("Cache unihash %s doesn't match BB_UNIHASH %s" % (cache_unihash, unihash))
+
+        sigfile = None
+        sigfile_name = "depsig.do_%s.%d" % (task, os.getpid())
+        sigfile_link = "depsig.do_%s" % task
+
+        try:
+            call = self.method + '(path, sigfile, task, d)'
+            sigfile = open(os.path.join(tempdir, sigfile_name), 'w+b')
+            locs = {'path': path, 'sigfile': sigfile, 'task': task, 'd': d}
+
+            outhash = bb.utils.better_eval(call, locs)
+
+            try:
+                url = '%s/v1/equivalent' % self.server
+                task_data = {
+                    'taskhash': taskhash,
+                    'method': self.method,
+                    'outhash': outhash,
+                    'unihash': unihash,
+                    'owner': d.getVar('SSTATE_HASHEQUIV_OWNER')
+                    }
+
+                if report_taskdata:
+                    sigfile.seek(0)
+
+                    task_data['PN'] = d.getVar('PN')
+                    task_data['PV'] = d.getVar('PV')
+                    task_data['PR'] = d.getVar('PR')
+                    task_data['task'] = task
+                    task_data['outhash_siginfo'] = sigfile.read().decode('utf-8')
+
+                headers = {'content-type': 'application/json'}
+
+                request = urllib.request.Request(url, json.dumps(task_data).encode('utf-8'), headers)
+                response = urllib.request.urlopen(request)
+                data = response.read().decode('utf-8')
+
+                json_data = json.loads(data)
+                new_unihash = json_data['unihash']
+
+                if new_unihash != unihash:
+                    bb.debug(1, 'Task %s unihash changed %s -> %s by server %s' % (taskhash, unihash, new_unihash, self.server))
+                else:
+                    bb.debug(1, 'Reported task %s as unihash %s to %s' % (taskhash, unihash, self.server))
+            except urllib.error.URLError as e:
+                bb.warn('Failure contacting Hash Equivalence Server %s: %s' % (self.server, str(e)))
+            except (KeyError, json.JSONDecodeError) as e:
+                bb.warn('Poorly formatted response from %s: %s' % (self.server, str(e)))
+        finally:
+            if sigfile:
+                sigfile.close()
+
+                sigfile_link_path = os.path.join(tempdir, sigfile_link)
+                bb.utils.remove(sigfile_link_path)
+
+                try:
+                    os.symlink(sigfile_name, sigfile_link_path)
+                except OSError:
+                    pass
 
 # Insert these classes into siggen's namespace so it can see and select them
 bb.siggen.SignatureGeneratorOEBasic = SignatureGeneratorOEBasic
 bb.siggen.SignatureGeneratorOEBasicHash = SignatureGeneratorOEBasicHash
+bb.siggen.SignatureGeneratorOEEquivHash = SignatureGeneratorOEEquivHash
 
 
 def find_siginfo(pn, taskname, taskhashlist, d):
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 158+ messages in thread

* ✗ patchtest: failure for Hash Equivalency Server (rev5)
  2019-01-04  2:42         ` [PATCH " Joshua Watt
                           ` (4 preceding siblings ...)
  (?)
@ 2019-01-04 16:33         ` Patchwork
  -1 siblings, 0 replies; 158+ messages in thread
From: Patchwork @ 2019-01-04 16:33 UTC (permalink / raw)
  To: Joshua Watt; +Cc: openembedded-core

== Series Details ==

Series: Hash Equivalency Server (rev5)
Revision: 5
URL   : https://patchwork.openembedded.org/series/15190/
State : failure

== Summary ==


Thank you for submitting this patch series to OpenEmbedded Core. This is
an automated response. Several tests have been executed on the proposed
series by patchtest resulting in the following failures:



* Issue             Series sent to the wrong mailing list or some patches from the series correspond to different mailing lists [test_target_mailing_list] 
  Suggested fix    Send the series again to the correct mailing list (ML)
  Suggested ML     bitbake-devel@lists.openembedded.org [http://git.openembedded.org/bitbake/]
  Patch's path:    bitbake/bin/bitbake-hashserv

* Issue             Series does not apply on top of target branch [test_series_merge_on_head] 
  Suggested fix    Rebase your series on top of targeted branch
  Targeted branch  master (currently at 65c419b8c4)



If you believe any of these test results are incorrect, please reply to the
mailing list (openembedded-core@lists.openembedded.org) raising your concerns.
Otherwise we would appreciate you correcting the issues and submitting a new
version of the patchset if applicable. Please ensure you add/increment the
version number when sending the new version (i.e. [PATCH] -> [PATCH v2] ->
[PATCH v3] -> ...).

---
Guidelines:     https://www.openembedded.org/wiki/Commit_Patch_Message_Guidelines
Test framework: http://git.yoctoproject.org/cgit/cgit.cgi/patchtest
Test suite:     http://git.yoctoproject.org/cgit/cgit.cgi/patchtest-oe



^ permalink raw reply	[flat|nested] 158+ messages in thread

* Re: [OE-core] [PATCH v5 3/8] bitbake: runqueue: Track task unique hash
  2018-12-19  3:10           ` [PATCH " Joshua Watt
@ 2019-01-05  7:49             ` Alejandro Hernandez
  -1 siblings, 0 replies; 158+ messages in thread
From: Alejandro Hernandez @ 2019-01-05  7:49 UTC (permalink / raw)
  To: Joshua Watt, openembedded-core, bitbake-devel

Hey Joshua,

This is breaking multiconfig builds with the following error (trimmed), 
I believe it is not taking into account that the Key could contain 
"mc:..." if it is a multiconfig build.


ERROR: Running idle function
  File "poky/bitbake/lib/bb/runqueue.py", line 1170, in 
RunQueueData.prepare_task_hash(tid='multiconfig:x86:poky/meta/recipes-support/attr/acl_2.2.52.bb:do_fetch'):
              self.runtaskentries[tid].hash = 
bb.parse.siggen.get_taskhash(taskfn, taskname, procdep, self.dataCaches[mc])
     >        self.runtaskentries[tid].unihash = 
bb.parse.siggen.get_unihash(fn + "." + taskname)

   File "poky/bitbake/lib/bb/siggen.py", line 45, in 
SignatureGeneratorOEBasicHash.get_unihash(task='poky/meta/recipes-support/attr/acl_2.2.52.bb.do_fetch'):
          def get_unihash(self, task):
     >        return self.taskhash[task]

KeyError: 'poky/meta/recipes-support/attr/acl_2.2.52.bb.do_fetch'


Cheers,

Alejandro


On 12/18/2018 7:10 PM, Joshua Watt wrote:
> Requests the task unique hash from siggen and tracks it
>
> [YOCTO #13030]
>
> Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
> ---
>   bitbake/lib/bb/runqueue.py | 25 +++++++++++++++++--------
>   1 file changed, 17 insertions(+), 8 deletions(-)
>
> diff --git a/bitbake/lib/bb/runqueue.py b/bitbake/lib/bb/runqueue.py
> index f2b95a9829b..27b188256dd 100644
> --- a/bitbake/lib/bb/runqueue.py
> +++ b/bitbake/lib/bb/runqueue.py
> @@ -346,6 +346,7 @@ class RunTaskEntry(object):
>           self.depends = set()
>           self.revdeps = set()
>           self.hash = None
> +        self.unihash = None
>           self.task = None
>           self.weight = 1
>   
> @@ -385,6 +386,9 @@ class RunQueueData:
>       def get_task_hash(self, tid):
>           return self.runtaskentries[tid].hash
>   
> +    def get_task_unihash(self, tid):
> +        return self.runtaskentries[tid].unihash
> +
>       def get_user_idstring(self, tid, task_name_suffix = ""):
>           return tid + task_name_suffix
>   
> @@ -1150,18 +1154,21 @@ class RunQueueData:
>                   if len(self.runtaskentries[tid].depends - dealtwith) == 0:
>                       dealtwith.add(tid)
>                       todeal.remove(tid)
> -                    procdep = []
> -                    for dep in self.runtaskentries[tid].depends:
> -                        procdep.append(fn_from_tid(dep) + "." + taskname_from_tid(dep))
> -                    (mc, fn, taskname, taskfn) = split_tid_mcfn(tid)
> -                    self.runtaskentries[tid].hash = bb.parse.siggen.get_taskhash(taskfn, taskname, procdep, self.dataCaches[mc])
> -                    task = self.runtaskentries[tid].task
> +                    self.prepare_task_hash(tid)
>   
>           bb.parse.siggen.writeout_file_checksum_cache()
>   
>           #self.dump_data()
>           return len(self.runtaskentries)
>   
> +    def prepare_task_hash(self, tid):
> +        procdep = []
> +        for dep in self.runtaskentries[tid].depends:
> +            procdep.append(fn_from_tid(dep) + "." + taskname_from_tid(dep))
> +        (mc, fn, taskname, taskfn) = split_tid_mcfn(tid)
> +        self.runtaskentries[tid].hash = bb.parse.siggen.get_taskhash(taskfn, taskname, procdep, self.dataCaches[mc])
> +        self.runtaskentries[tid].unihash = bb.parse.siggen.get_unihash(fn + "." + taskname)
> +
>       def dump_data(self):
>           """
>           Dump some debug information on the internal data structures
> @@ -2081,7 +2088,8 @@ class RunQueueExecuteTasks(RunQueueExecute):
>                   deps = self.rqdata.runtaskentries[revdep].depends
>                   provides = self.rqdata.dataCaches[mc].fn_provides[taskfn]
>                   taskhash = self.rqdata.runtaskentries[revdep].hash
> -                taskdepdata[revdep] = [pn, taskname, fn, deps, provides, taskhash]
> +                unihash = self.rqdata.runtaskentries[revdep].unihash
> +                taskdepdata[revdep] = [pn, taskname, fn, deps, provides, taskhash, unihash]
>                   for revdep2 in deps:
>                       if revdep2 not in taskdepdata:
>                           additional.append(revdep2)
> @@ -2524,7 +2532,8 @@ class RunQueueExecuteScenequeue(RunQueueExecute):
>                   deps = getsetscenedeps(revdep)
>                   provides = self.rqdata.dataCaches[mc].fn_provides[taskfn]
>                   taskhash = self.rqdata.runtaskentries[revdep].hash
> -                taskdepdata[revdep] = [pn, taskname, fn, deps, provides, taskhash]
> +                unihash = self.rqdata.runtaskentries[revdep].unihash
> +                taskdepdata[revdep] = [pn, taskname, fn, deps, provides, taskhash, unihash]
>                   for revdep2 in deps:
>                       if revdep2 not in taskdepdata:
>                           additional.append(revdep2)


^ permalink raw reply	[flat|nested] 158+ messages in thread

* Re: [PATCH v5 3/8] bitbake: runqueue: Track task unique hash
@ 2019-01-05  7:49             ` Alejandro Hernandez
  0 siblings, 0 replies; 158+ messages in thread
From: Alejandro Hernandez @ 2019-01-05  7:49 UTC (permalink / raw)
  To: Joshua Watt, openembedded-core, bitbake-devel

Hey Joshua,

This is breaking multiconfig builds with the following error (trimmed), 
I believe it is not taking into account that the Key could contain 
"mc:..." if it is a multiconfig build.


ERROR: Running idle function
  File "poky/bitbake/lib/bb/runqueue.py", line 1170, in 
RunQueueData.prepare_task_hash(tid='multiconfig:x86:poky/meta/recipes-support/attr/acl_2.2.52.bb:do_fetch'):
              self.runtaskentries[tid].hash = 
bb.parse.siggen.get_taskhash(taskfn, taskname, procdep, self.dataCaches[mc])
     >        self.runtaskentries[tid].unihash = 
bb.parse.siggen.get_unihash(fn + "." + taskname)

   File "poky/bitbake/lib/bb/siggen.py", line 45, in 
SignatureGeneratorOEBasicHash.get_unihash(task='poky/meta/recipes-support/attr/acl_2.2.52.bb.do_fetch'):
          def get_unihash(self, task):
     >        return self.taskhash[task]

KeyError: 'poky/meta/recipes-support/attr/acl_2.2.52.bb.do_fetch'


Cheers,

Alejandro


On 12/18/2018 7:10 PM, Joshua Watt wrote:
> Requests the task unique hash from siggen and tracks it
>
> [YOCTO #13030]
>
> Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
> ---
>   bitbake/lib/bb/runqueue.py | 25 +++++++++++++++++--------
>   1 file changed, 17 insertions(+), 8 deletions(-)
>
> diff --git a/bitbake/lib/bb/runqueue.py b/bitbake/lib/bb/runqueue.py
> index f2b95a9829b..27b188256dd 100644
> --- a/bitbake/lib/bb/runqueue.py
> +++ b/bitbake/lib/bb/runqueue.py
> @@ -346,6 +346,7 @@ class RunTaskEntry(object):
>           self.depends = set()
>           self.revdeps = set()
>           self.hash = None
> +        self.unihash = None
>           self.task = None
>           self.weight = 1
>   
> @@ -385,6 +386,9 @@ class RunQueueData:
>       def get_task_hash(self, tid):
>           return self.runtaskentries[tid].hash
>   
> +    def get_task_unihash(self, tid):
> +        return self.runtaskentries[tid].unihash
> +
>       def get_user_idstring(self, tid, task_name_suffix = ""):
>           return tid + task_name_suffix
>   
> @@ -1150,18 +1154,21 @@ class RunQueueData:
>                   if len(self.runtaskentries[tid].depends - dealtwith) == 0:
>                       dealtwith.add(tid)
>                       todeal.remove(tid)
> -                    procdep = []
> -                    for dep in self.runtaskentries[tid].depends:
> -                        procdep.append(fn_from_tid(dep) + "." + taskname_from_tid(dep))
> -                    (mc, fn, taskname, taskfn) = split_tid_mcfn(tid)
> -                    self.runtaskentries[tid].hash = bb.parse.siggen.get_taskhash(taskfn, taskname, procdep, self.dataCaches[mc])
> -                    task = self.runtaskentries[tid].task
> +                    self.prepare_task_hash(tid)
>   
>           bb.parse.siggen.writeout_file_checksum_cache()
>   
>           #self.dump_data()
>           return len(self.runtaskentries)
>   
> +    def prepare_task_hash(self, tid):
> +        procdep = []
> +        for dep in self.runtaskentries[tid].depends:
> +            procdep.append(fn_from_tid(dep) + "." + taskname_from_tid(dep))
> +        (mc, fn, taskname, taskfn) = split_tid_mcfn(tid)
> +        self.runtaskentries[tid].hash = bb.parse.siggen.get_taskhash(taskfn, taskname, procdep, self.dataCaches[mc])
> +        self.runtaskentries[tid].unihash = bb.parse.siggen.get_unihash(fn + "." + taskname)
> +
>       def dump_data(self):
>           """
>           Dump some debug information on the internal data structures
> @@ -2081,7 +2088,8 @@ class RunQueueExecuteTasks(RunQueueExecute):
>                   deps = self.rqdata.runtaskentries[revdep].depends
>                   provides = self.rqdata.dataCaches[mc].fn_provides[taskfn]
>                   taskhash = self.rqdata.runtaskentries[revdep].hash
> -                taskdepdata[revdep] = [pn, taskname, fn, deps, provides, taskhash]
> +                unihash = self.rqdata.runtaskentries[revdep].unihash
> +                taskdepdata[revdep] = [pn, taskname, fn, deps, provides, taskhash, unihash]
>                   for revdep2 in deps:
>                       if revdep2 not in taskdepdata:
>                           additional.append(revdep2)
> @@ -2524,7 +2532,8 @@ class RunQueueExecuteScenequeue(RunQueueExecute):
>                   deps = getsetscenedeps(revdep)
>                   provides = self.rqdata.dataCaches[mc].fn_provides[taskfn]
>                   taskhash = self.rqdata.runtaskentries[revdep].hash
> -                taskdepdata[revdep] = [pn, taskname, fn, deps, provides, taskhash]
> +                unihash = self.rqdata.runtaskentries[revdep].unihash
> +                taskdepdata[revdep] = [pn, taskname, fn, deps, provides, taskhash, unihash]
>                   for revdep2 in deps:
>                       if revdep2 not in taskdepdata:
>                           additional.append(revdep2)


^ permalink raw reply	[flat|nested] 158+ messages in thread

* Re: [OE-core] [PATCH v5 3/8] bitbake: runqueue: Track task unique hash
  2019-01-05  7:49             ` Alejandro Hernandez
@ 2019-01-06  3:09               ` Joshua Watt
  -1 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2019-01-06  3:09 UTC (permalink / raw)
  To: Alejandro Hernandez; +Cc: bitbake-devel, OE-core

On Sat, Jan 5, 2019 at 1:50 AM Alejandro Hernandez
<alejandro.enedino.hernandez-samaniego@xilinx.com> wrote:
>
> Hey Joshua,
>
> This is breaking multiconfig builds with the following error (trimmed),
> I believe it is not taking into account that the Key could contain
> "mc:..." if it is a multiconfig build.

Hmm, yes that seems likely. I'll take a look, would you mind opening a
bug in Bugzilla and assigning it to me? I'm not very familiar with
multiconfig, so some instructions to help reproduce would be very
helpful.

Does anyone know if mutliconfig is tested on the autobuilders?

>
>
> ERROR: Running idle function
>   File "poky/bitbake/lib/bb/runqueue.py", line 1170, in
> RunQueueData.prepare_task_hash(tid='multiconfig:x86:poky/meta/recipes-support/attr/acl_2.2.52.bb:do_fetch'):
>               self.runtaskentries[tid].hash =
> bb.parse.siggen.get_taskhash(taskfn, taskname, procdep, self.dataCaches[mc])
>      >        self.runtaskentries[tid].unihash =
> bb.parse.siggen.get_unihash(fn + "." + taskname)
>
>    File "poky/bitbake/lib/bb/siggen.py", line 45, in
> SignatureGeneratorOEBasicHash.get_unihash(task='poky/meta/recipes-support/attr/acl_2.2.52.bb.do_fetch'):
>           def get_unihash(self, task):
>      >        return self.taskhash[task]
>
> KeyError: 'poky/meta/recipes-support/attr/acl_2.2.52.bb.do_fetch'
>
>
> Cheers,
>
> Alejandro
>
>
> On 12/18/2018 7:10 PM, Joshua Watt wrote:
> > Requests the task unique hash from siggen and tracks it
> >
> > [YOCTO #13030]
> >
> > Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
> > ---
> >   bitbake/lib/bb/runqueue.py | 25 +++++++++++++++++--------
> >   1 file changed, 17 insertions(+), 8 deletions(-)
> >
> > diff --git a/bitbake/lib/bb/runqueue.py b/bitbake/lib/bb/runqueue.py
> > index f2b95a9829b..27b188256dd 100644
> > --- a/bitbake/lib/bb/runqueue.py
> > +++ b/bitbake/lib/bb/runqueue.py
> > @@ -346,6 +346,7 @@ class RunTaskEntry(object):
> >           self.depends = set()
> >           self.revdeps = set()
> >           self.hash = None
> > +        self.unihash = None
> >           self.task = None
> >           self.weight = 1
> >
> > @@ -385,6 +386,9 @@ class RunQueueData:
> >       def get_task_hash(self, tid):
> >           return self.runtaskentries[tid].hash
> >
> > +    def get_task_unihash(self, tid):
> > +        return self.runtaskentries[tid].unihash
> > +
> >       def get_user_idstring(self, tid, task_name_suffix = ""):
> >           return tid + task_name_suffix
> >
> > @@ -1150,18 +1154,21 @@ class RunQueueData:
> >                   if len(self.runtaskentries[tid].depends - dealtwith) == 0:
> >                       dealtwith.add(tid)
> >                       todeal.remove(tid)
> > -                    procdep = []
> > -                    for dep in self.runtaskentries[tid].depends:
> > -                        procdep.append(fn_from_tid(dep) + "." + taskname_from_tid(dep))
> > -                    (mc, fn, taskname, taskfn) = split_tid_mcfn(tid)
> > -                    self.runtaskentries[tid].hash = bb.parse.siggen.get_taskhash(taskfn, taskname, procdep, self.dataCaches[mc])
> > -                    task = self.runtaskentries[tid].task
> > +                    self.prepare_task_hash(tid)
> >
> >           bb.parse.siggen.writeout_file_checksum_cache()
> >
> >           #self.dump_data()
> >           return len(self.runtaskentries)
> >
> > +    def prepare_task_hash(self, tid):
> > +        procdep = []
> > +        for dep in self.runtaskentries[tid].depends:
> > +            procdep.append(fn_from_tid(dep) + "." + taskname_from_tid(dep))
> > +        (mc, fn, taskname, taskfn) = split_tid_mcfn(tid)
> > +        self.runtaskentries[tid].hash = bb.parse.siggen.get_taskhash(taskfn, taskname, procdep, self.dataCaches[mc])
> > +        self.runtaskentries[tid].unihash = bb.parse.siggen.get_unihash(fn + "." + taskname)
> > +
> >       def dump_data(self):
> >           """
> >           Dump some debug information on the internal data structures
> > @@ -2081,7 +2088,8 @@ class RunQueueExecuteTasks(RunQueueExecute):
> >                   deps = self.rqdata.runtaskentries[revdep].depends
> >                   provides = self.rqdata.dataCaches[mc].fn_provides[taskfn]
> >                   taskhash = self.rqdata.runtaskentries[revdep].hash
> > -                taskdepdata[revdep] = [pn, taskname, fn, deps, provides, taskhash]
> > +                unihash = self.rqdata.runtaskentries[revdep].unihash
> > +                taskdepdata[revdep] = [pn, taskname, fn, deps, provides, taskhash, unihash]
> >                   for revdep2 in deps:
> >                       if revdep2 not in taskdepdata:
> >                           additional.append(revdep2)
> > @@ -2524,7 +2532,8 @@ class RunQueueExecuteScenequeue(RunQueueExecute):
> >                   deps = getsetscenedeps(revdep)
> >                   provides = self.rqdata.dataCaches[mc].fn_provides[taskfn]
> >                   taskhash = self.rqdata.runtaskentries[revdep].hash
> > -                taskdepdata[revdep] = [pn, taskname, fn, deps, provides, taskhash]
> > +                unihash = self.rqdata.runtaskentries[revdep].unihash
> > +                taskdepdata[revdep] = [pn, taskname, fn, deps, provides, taskhash, unihash]
> >                   for revdep2 in deps:
> >                       if revdep2 not in taskdepdata:
> >                           additional.append(revdep2)


^ permalink raw reply	[flat|nested] 158+ messages in thread

* Re: [PATCH v5 3/8] bitbake: runqueue: Track task unique hash
@ 2019-01-06  3:09               ` Joshua Watt
  0 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2019-01-06  3:09 UTC (permalink / raw)
  To: Alejandro Hernandez; +Cc: bitbake-devel, OE-core

On Sat, Jan 5, 2019 at 1:50 AM Alejandro Hernandez
<alejandro.enedino.hernandez-samaniego@xilinx.com> wrote:
>
> Hey Joshua,
>
> This is breaking multiconfig builds with the following error (trimmed),
> I believe it is not taking into account that the Key could contain
> "mc:..." if it is a multiconfig build.

Hmm, yes that seems likely. I'll take a look, would you mind opening a
bug in Bugzilla and assigning it to me? I'm not very familiar with
multiconfig, so some instructions to help reproduce would be very
helpful.

Does anyone know if mutliconfig is tested on the autobuilders?

>
>
> ERROR: Running idle function
>   File "poky/bitbake/lib/bb/runqueue.py", line 1170, in
> RunQueueData.prepare_task_hash(tid='multiconfig:x86:poky/meta/recipes-support/attr/acl_2.2.52.bb:do_fetch'):
>               self.runtaskentries[tid].hash =
> bb.parse.siggen.get_taskhash(taskfn, taskname, procdep, self.dataCaches[mc])
>      >        self.runtaskentries[tid].unihash =
> bb.parse.siggen.get_unihash(fn + "." + taskname)
>
>    File "poky/bitbake/lib/bb/siggen.py", line 45, in
> SignatureGeneratorOEBasicHash.get_unihash(task='poky/meta/recipes-support/attr/acl_2.2.52.bb.do_fetch'):
>           def get_unihash(self, task):
>      >        return self.taskhash[task]
>
> KeyError: 'poky/meta/recipes-support/attr/acl_2.2.52.bb.do_fetch'
>
>
> Cheers,
>
> Alejandro
>
>
> On 12/18/2018 7:10 PM, Joshua Watt wrote:
> > Requests the task unique hash from siggen and tracks it
> >
> > [YOCTO #13030]
> >
> > Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
> > ---
> >   bitbake/lib/bb/runqueue.py | 25 +++++++++++++++++--------
> >   1 file changed, 17 insertions(+), 8 deletions(-)
> >
> > diff --git a/bitbake/lib/bb/runqueue.py b/bitbake/lib/bb/runqueue.py
> > index f2b95a9829b..27b188256dd 100644
> > --- a/bitbake/lib/bb/runqueue.py
> > +++ b/bitbake/lib/bb/runqueue.py
> > @@ -346,6 +346,7 @@ class RunTaskEntry(object):
> >           self.depends = set()
> >           self.revdeps = set()
> >           self.hash = None
> > +        self.unihash = None
> >           self.task = None
> >           self.weight = 1
> >
> > @@ -385,6 +386,9 @@ class RunQueueData:
> >       def get_task_hash(self, tid):
> >           return self.runtaskentries[tid].hash
> >
> > +    def get_task_unihash(self, tid):
> > +        return self.runtaskentries[tid].unihash
> > +
> >       def get_user_idstring(self, tid, task_name_suffix = ""):
> >           return tid + task_name_suffix
> >
> > @@ -1150,18 +1154,21 @@ class RunQueueData:
> >                   if len(self.runtaskentries[tid].depends - dealtwith) == 0:
> >                       dealtwith.add(tid)
> >                       todeal.remove(tid)
> > -                    procdep = []
> > -                    for dep in self.runtaskentries[tid].depends:
> > -                        procdep.append(fn_from_tid(dep) + "." + taskname_from_tid(dep))
> > -                    (mc, fn, taskname, taskfn) = split_tid_mcfn(tid)
> > -                    self.runtaskentries[tid].hash = bb.parse.siggen.get_taskhash(taskfn, taskname, procdep, self.dataCaches[mc])
> > -                    task = self.runtaskentries[tid].task
> > +                    self.prepare_task_hash(tid)
> >
> >           bb.parse.siggen.writeout_file_checksum_cache()
> >
> >           #self.dump_data()
> >           return len(self.runtaskentries)
> >
> > +    def prepare_task_hash(self, tid):
> > +        procdep = []
> > +        for dep in self.runtaskentries[tid].depends:
> > +            procdep.append(fn_from_tid(dep) + "." + taskname_from_tid(dep))
> > +        (mc, fn, taskname, taskfn) = split_tid_mcfn(tid)
> > +        self.runtaskentries[tid].hash = bb.parse.siggen.get_taskhash(taskfn, taskname, procdep, self.dataCaches[mc])
> > +        self.runtaskentries[tid].unihash = bb.parse.siggen.get_unihash(fn + "." + taskname)
> > +
> >       def dump_data(self):
> >           """
> >           Dump some debug information on the internal data structures
> > @@ -2081,7 +2088,8 @@ class RunQueueExecuteTasks(RunQueueExecute):
> >                   deps = self.rqdata.runtaskentries[revdep].depends
> >                   provides = self.rqdata.dataCaches[mc].fn_provides[taskfn]
> >                   taskhash = self.rqdata.runtaskentries[revdep].hash
> > -                taskdepdata[revdep] = [pn, taskname, fn, deps, provides, taskhash]
> > +                unihash = self.rqdata.runtaskentries[revdep].unihash
> > +                taskdepdata[revdep] = [pn, taskname, fn, deps, provides, taskhash, unihash]
> >                   for revdep2 in deps:
> >                       if revdep2 not in taskdepdata:
> >                           additional.append(revdep2)
> > @@ -2524,7 +2532,8 @@ class RunQueueExecuteScenequeue(RunQueueExecute):
> >                   deps = getsetscenedeps(revdep)
> >                   provides = self.rqdata.dataCaches[mc].fn_provides[taskfn]
> >                   taskhash = self.rqdata.runtaskentries[revdep].hash
> > -                taskdepdata[revdep] = [pn, taskname, fn, deps, provides, taskhash]
> > +                unihash = self.rqdata.runtaskentries[revdep].unihash
> > +                taskdepdata[revdep] = [pn, taskname, fn, deps, provides, taskhash, unihash]
> >                   for revdep2 in deps:
> >                       if revdep2 not in taskdepdata:
> >                           additional.append(revdep2)


^ permalink raw reply	[flat|nested] 158+ messages in thread

* Re: [OE-core] [PATCH v5 3/8] bitbake: runqueue: Track task unique hash
  2019-01-06  3:09               ` Joshua Watt
@ 2019-01-07  6:52                 ` Alejandro Hernandez
  -1 siblings, 0 replies; 158+ messages in thread
From: Alejandro Hernandez @ 2019-01-07  6:52 UTC (permalink / raw)
  To: Joshua Watt, Alejandro Hernandez; +Cc: bitbake-devel, OE-core

On 1/5/2019 7:09 PM, Joshua Watt wrote:
> On Sat, Jan 5, 2019 at 1:50 AM Alejandro Hernandez
> <alejandro.enedino.hernandez-samaniego@xilinx.com> wrote:
>> Hey Joshua,
>>
>> This is breaking multiconfig builds with the following error (trimmed),
>> I believe it is not taking into account that the Key could contain
>> "mc:..." if it is a multiconfig build.
> Hmm, yes that seems likely. I'll take a look, would you mind opening a
> bug in Bugzilla and assigning it to me? I'm not very familiar with
> multiconfig, so some instructions to help reproduce would be very
> helpful.
Sure thing
>
> Does anyone know if mutliconfig is tested on the autobuilders?

Nope, it is not tested yet, I've been meaning to add a buildset that 
tests it but I haven't found the time tbh

Cheers,

Alejandro


>
>>
>> ERROR: Running idle function
>>    File "poky/bitbake/lib/bb/runqueue.py", line 1170, in
>> RunQueueData.prepare_task_hash(tid='multiconfig:x86:poky/meta/recipes-support/attr/acl_2.2.52.bb:do_fetch'):
>>                self.runtaskentries[tid].hash =
>> bb.parse.siggen.get_taskhash(taskfn, taskname, procdep, self.dataCaches[mc])
>>       >        self.runtaskentries[tid].unihash =
>> bb.parse.siggen.get_unihash(fn + "." + taskname)
>>
>>     File "poky/bitbake/lib/bb/siggen.py", line 45, in
>> SignatureGeneratorOEBasicHash.get_unihash(task='poky/meta/recipes-support/attr/acl_2.2.52.bb.do_fetch'):
>>            def get_unihash(self, task):
>>       >        return self.taskhash[task]
>>
>> KeyError: 'poky/meta/recipes-support/attr/acl_2.2.52.bb.do_fetch'
>>
>>
>> Cheers,
>>
>> Alejandro
>>
>>
>> On 12/18/2018 7:10 PM, Joshua Watt wrote:
>>> Requests the task unique hash from siggen and tracks it
>>>
>>> [YOCTO #13030]
>>>
>>> Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
>>> ---
>>>    bitbake/lib/bb/runqueue.py | 25 +++++++++++++++++--------
>>>    1 file changed, 17 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/bitbake/lib/bb/runqueue.py b/bitbake/lib/bb/runqueue.py
>>> index f2b95a9829b..27b188256dd 100644
>>> --- a/bitbake/lib/bb/runqueue.py
>>> +++ b/bitbake/lib/bb/runqueue.py
>>> @@ -346,6 +346,7 @@ class RunTaskEntry(object):
>>>            self.depends = set()
>>>            self.revdeps = set()
>>>            self.hash = None
>>> +        self.unihash = None
>>>            self.task = None
>>>            self.weight = 1
>>>
>>> @@ -385,6 +386,9 @@ class RunQueueData:
>>>        def get_task_hash(self, tid):
>>>            return self.runtaskentries[tid].hash
>>>
>>> +    def get_task_unihash(self, tid):
>>> +        return self.runtaskentries[tid].unihash
>>> +
>>>        def get_user_idstring(self, tid, task_name_suffix = ""):
>>>            return tid + task_name_suffix
>>>
>>> @@ -1150,18 +1154,21 @@ class RunQueueData:
>>>                    if len(self.runtaskentries[tid].depends - dealtwith) == 0:
>>>                        dealtwith.add(tid)
>>>                        todeal.remove(tid)
>>> -                    procdep = []
>>> -                    for dep in self.runtaskentries[tid].depends:
>>> -                        procdep.append(fn_from_tid(dep) + "." + taskname_from_tid(dep))
>>> -                    (mc, fn, taskname, taskfn) = split_tid_mcfn(tid)
>>> -                    self.runtaskentries[tid].hash = bb.parse.siggen.get_taskhash(taskfn, taskname, procdep, self.dataCaches[mc])
>>> -                    task = self.runtaskentries[tid].task
>>> +                    self.prepare_task_hash(tid)
>>>
>>>            bb.parse.siggen.writeout_file_checksum_cache()
>>>
>>>            #self.dump_data()
>>>            return len(self.runtaskentries)
>>>
>>> +    def prepare_task_hash(self, tid):
>>> +        procdep = []
>>> +        for dep in self.runtaskentries[tid].depends:
>>> +            procdep.append(fn_from_tid(dep) + "." + taskname_from_tid(dep))
>>> +        (mc, fn, taskname, taskfn) = split_tid_mcfn(tid)
>>> +        self.runtaskentries[tid].hash = bb.parse.siggen.get_taskhash(taskfn, taskname, procdep, self.dataCaches[mc])
>>> +        self.runtaskentries[tid].unihash = bb.parse.siggen.get_unihash(fn + "." + taskname)
>>> +
>>>        def dump_data(self):
>>>            """
>>>            Dump some debug information on the internal data structures
>>> @@ -2081,7 +2088,8 @@ class RunQueueExecuteTasks(RunQueueExecute):
>>>                    deps = self.rqdata.runtaskentries[revdep].depends
>>>                    provides = self.rqdata.dataCaches[mc].fn_provides[taskfn]
>>>                    taskhash = self.rqdata.runtaskentries[revdep].hash
>>> -                taskdepdata[revdep] = [pn, taskname, fn, deps, provides, taskhash]
>>> +                unihash = self.rqdata.runtaskentries[revdep].unihash
>>> +                taskdepdata[revdep] = [pn, taskname, fn, deps, provides, taskhash, unihash]
>>>                    for revdep2 in deps:
>>>                        if revdep2 not in taskdepdata:
>>>                            additional.append(revdep2)
>>> @@ -2524,7 +2532,8 @@ class RunQueueExecuteScenequeue(RunQueueExecute):
>>>                    deps = getsetscenedeps(revdep)
>>>                    provides = self.rqdata.dataCaches[mc].fn_provides[taskfn]
>>>                    taskhash = self.rqdata.runtaskentries[revdep].hash
>>> -                taskdepdata[revdep] = [pn, taskname, fn, deps, provides, taskhash]
>>> +                unihash = self.rqdata.runtaskentries[revdep].unihash
>>> +                taskdepdata[revdep] = [pn, taskname, fn, deps, provides, taskhash, unihash]
>>>                    for revdep2 in deps:
>>>                        if revdep2 not in taskdepdata:
>>>                            additional.append(revdep2)


^ permalink raw reply	[flat|nested] 158+ messages in thread

* Re: [PATCH v5 3/8] bitbake: runqueue: Track task unique hash
@ 2019-01-07  6:52                 ` Alejandro Hernandez
  0 siblings, 0 replies; 158+ messages in thread
From: Alejandro Hernandez @ 2019-01-07  6:52 UTC (permalink / raw)
  To: Joshua Watt, Alejandro Hernandez; +Cc: bitbake-devel, OE-core

On 1/5/2019 7:09 PM, Joshua Watt wrote:
> On Sat, Jan 5, 2019 at 1:50 AM Alejandro Hernandez
> <alejandro.enedino.hernandez-samaniego@xilinx.com> wrote:
>> Hey Joshua,
>>
>> This is breaking multiconfig builds with the following error (trimmed),
>> I believe it is not taking into account that the Key could contain
>> "mc:..." if it is a multiconfig build.
> Hmm, yes that seems likely. I'll take a look, would you mind opening a
> bug in Bugzilla and assigning it to me? I'm not very familiar with
> multiconfig, so some instructions to help reproduce would be very
> helpful.
Sure thing
>
> Does anyone know if mutliconfig is tested on the autobuilders?

Nope, it is not tested yet, I've been meaning to add a buildset that 
tests it but I haven't found the time tbh

Cheers,

Alejandro


>
>>
>> ERROR: Running idle function
>>    File "poky/bitbake/lib/bb/runqueue.py", line 1170, in
>> RunQueueData.prepare_task_hash(tid='multiconfig:x86:poky/meta/recipes-support/attr/acl_2.2.52.bb:do_fetch'):
>>                self.runtaskentries[tid].hash =
>> bb.parse.siggen.get_taskhash(taskfn, taskname, procdep, self.dataCaches[mc])
>>       >        self.runtaskentries[tid].unihash =
>> bb.parse.siggen.get_unihash(fn + "." + taskname)
>>
>>     File "poky/bitbake/lib/bb/siggen.py", line 45, in
>> SignatureGeneratorOEBasicHash.get_unihash(task='poky/meta/recipes-support/attr/acl_2.2.52.bb.do_fetch'):
>>            def get_unihash(self, task):
>>       >        return self.taskhash[task]
>>
>> KeyError: 'poky/meta/recipes-support/attr/acl_2.2.52.bb.do_fetch'
>>
>>
>> Cheers,
>>
>> Alejandro
>>
>>
>> On 12/18/2018 7:10 PM, Joshua Watt wrote:
>>> Requests the task unique hash from siggen and tracks it
>>>
>>> [YOCTO #13030]
>>>
>>> Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
>>> ---
>>>    bitbake/lib/bb/runqueue.py | 25 +++++++++++++++++--------
>>>    1 file changed, 17 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/bitbake/lib/bb/runqueue.py b/bitbake/lib/bb/runqueue.py
>>> index f2b95a9829b..27b188256dd 100644
>>> --- a/bitbake/lib/bb/runqueue.py
>>> +++ b/bitbake/lib/bb/runqueue.py
>>> @@ -346,6 +346,7 @@ class RunTaskEntry(object):
>>>            self.depends = set()
>>>            self.revdeps = set()
>>>            self.hash = None
>>> +        self.unihash = None
>>>            self.task = None
>>>            self.weight = 1
>>>
>>> @@ -385,6 +386,9 @@ class RunQueueData:
>>>        def get_task_hash(self, tid):
>>>            return self.runtaskentries[tid].hash
>>>
>>> +    def get_task_unihash(self, tid):
>>> +        return self.runtaskentries[tid].unihash
>>> +
>>>        def get_user_idstring(self, tid, task_name_suffix = ""):
>>>            return tid + task_name_suffix
>>>
>>> @@ -1150,18 +1154,21 @@ class RunQueueData:
>>>                    if len(self.runtaskentries[tid].depends - dealtwith) == 0:
>>>                        dealtwith.add(tid)
>>>                        todeal.remove(tid)
>>> -                    procdep = []
>>> -                    for dep in self.runtaskentries[tid].depends:
>>> -                        procdep.append(fn_from_tid(dep) + "." + taskname_from_tid(dep))
>>> -                    (mc, fn, taskname, taskfn) = split_tid_mcfn(tid)
>>> -                    self.runtaskentries[tid].hash = bb.parse.siggen.get_taskhash(taskfn, taskname, procdep, self.dataCaches[mc])
>>> -                    task = self.runtaskentries[tid].task
>>> +                    self.prepare_task_hash(tid)
>>>
>>>            bb.parse.siggen.writeout_file_checksum_cache()
>>>
>>>            #self.dump_data()
>>>            return len(self.runtaskentries)
>>>
>>> +    def prepare_task_hash(self, tid):
>>> +        procdep = []
>>> +        for dep in self.runtaskentries[tid].depends:
>>> +            procdep.append(fn_from_tid(dep) + "." + taskname_from_tid(dep))
>>> +        (mc, fn, taskname, taskfn) = split_tid_mcfn(tid)
>>> +        self.runtaskentries[tid].hash = bb.parse.siggen.get_taskhash(taskfn, taskname, procdep, self.dataCaches[mc])
>>> +        self.runtaskentries[tid].unihash = bb.parse.siggen.get_unihash(fn + "." + taskname)
>>> +
>>>        def dump_data(self):
>>>            """
>>>            Dump some debug information on the internal data structures
>>> @@ -2081,7 +2088,8 @@ class RunQueueExecuteTasks(RunQueueExecute):
>>>                    deps = self.rqdata.runtaskentries[revdep].depends
>>>                    provides = self.rqdata.dataCaches[mc].fn_provides[taskfn]
>>>                    taskhash = self.rqdata.runtaskentries[revdep].hash
>>> -                taskdepdata[revdep] = [pn, taskname, fn, deps, provides, taskhash]
>>> +                unihash = self.rqdata.runtaskentries[revdep].unihash
>>> +                taskdepdata[revdep] = [pn, taskname, fn, deps, provides, taskhash, unihash]
>>>                    for revdep2 in deps:
>>>                        if revdep2 not in taskdepdata:
>>>                            additional.append(revdep2)
>>> @@ -2524,7 +2532,8 @@ class RunQueueExecuteScenequeue(RunQueueExecute):
>>>                    deps = getsetscenedeps(revdep)
>>>                    provides = self.rqdata.dataCaches[mc].fn_provides[taskfn]
>>>                    taskhash = self.rqdata.runtaskentries[revdep].hash
>>> -                taskdepdata[revdep] = [pn, taskname, fn, deps, provides, taskhash]
>>> +                unihash = self.rqdata.runtaskentries[revdep].unihash
>>> +                taskdepdata[revdep] = [pn, taskname, fn, deps, provides, taskhash, unihash]
>>>                    for revdep2 in deps:
>>>                        if revdep2 not in taskdepdata:
>>>                            additional.append(revdep2)


^ permalink raw reply	[flat|nested] 158+ messages in thread

* Re: [OE-core] [PATCH v5 3/8] bitbake: runqueue: Track task unique hash
  2019-01-06  3:09               ` Joshua Watt
@ 2019-01-07 16:16                 ` akuster808
  -1 siblings, 0 replies; 158+ messages in thread
From: akuster808 @ 2019-01-07 16:16 UTC (permalink / raw)
  To: Joshua Watt, Alejandro Hernandez; +Cc: bitbake-devel, OE-core



On 1/5/19 7:09 PM, Joshua Watt wrote:
> On Sat, Jan 5, 2019 at 1:50 AM Alejandro Hernandez
> <alejandro.enedino.hernandez-samaniego@xilinx.com> wrote:
>> Hey Joshua,
>>
>> This is breaking multiconfig builds with the following error (trimmed),
>> I believe it is not taking into account that the Key could contain
>> "mc:..." if it is a multiconfig build.
> Hmm, yes that seems likely. I'll take a look, would you mind opening a
> bug in Bugzilla and assigning it to me? I'm not very familiar with
> multiconfig, so some instructions to help reproduce would be very
> helpful.
>
> Does anyone know if mutliconfig is tested on the autobuilders?
We don't as far as I know.

Do we need to bug this as this patch in in master.

- armin
>
>>
>> ERROR: Running idle function
>>   File "poky/bitbake/lib/bb/runqueue.py", line 1170, in
>> RunQueueData.prepare_task_hash(tid='multiconfig:x86:poky/meta/recipes-support/attr/acl_2.2.52.bb:do_fetch'):
>>               self.runtaskentries[tid].hash =
>> bb.parse.siggen.get_taskhash(taskfn, taskname, procdep, self.dataCaches[mc])
>>      >        self.runtaskentries[tid].unihash =
>> bb.parse.siggen.get_unihash(fn + "." + taskname)
>>
>>    File "poky/bitbake/lib/bb/siggen.py", line 45, in
>> SignatureGeneratorOEBasicHash.get_unihash(task='poky/meta/recipes-support/attr/acl_2.2.52.bb.do_fetch'):
>>           def get_unihash(self, task):
>>      >        return self.taskhash[task]
>>
>> KeyError: 'poky/meta/recipes-support/attr/acl_2.2.52.bb.do_fetch'
>>
>>
>> Cheers,
>>
>> Alejandro
>>
>>
>> On 12/18/2018 7:10 PM, Joshua Watt wrote:
>>> Requests the task unique hash from siggen and tracks it
>>>
>>> [YOCTO #13030]
>>>
>>> Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
>>> ---
>>>   bitbake/lib/bb/runqueue.py | 25 +++++++++++++++++--------
>>>   1 file changed, 17 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/bitbake/lib/bb/runqueue.py b/bitbake/lib/bb/runqueue.py
>>> index f2b95a9829b..27b188256dd 100644
>>> --- a/bitbake/lib/bb/runqueue.py
>>> +++ b/bitbake/lib/bb/runqueue.py
>>> @@ -346,6 +346,7 @@ class RunTaskEntry(object):
>>>           self.depends = set()
>>>           self.revdeps = set()
>>>           self.hash = None
>>> +        self.unihash = None
>>>           self.task = None
>>>           self.weight = 1
>>>
>>> @@ -385,6 +386,9 @@ class RunQueueData:
>>>       def get_task_hash(self, tid):
>>>           return self.runtaskentries[tid].hash
>>>
>>> +    def get_task_unihash(self, tid):
>>> +        return self.runtaskentries[tid].unihash
>>> +
>>>       def get_user_idstring(self, tid, task_name_suffix = ""):
>>>           return tid + task_name_suffix
>>>
>>> @@ -1150,18 +1154,21 @@ class RunQueueData:
>>>                   if len(self.runtaskentries[tid].depends - dealtwith) == 0:
>>>                       dealtwith.add(tid)
>>>                       todeal.remove(tid)
>>> -                    procdep = []
>>> -                    for dep in self.runtaskentries[tid].depends:
>>> -                        procdep.append(fn_from_tid(dep) + "." + taskname_from_tid(dep))
>>> -                    (mc, fn, taskname, taskfn) = split_tid_mcfn(tid)
>>> -                    self.runtaskentries[tid].hash = bb.parse.siggen.get_taskhash(taskfn, taskname, procdep, self.dataCaches[mc])
>>> -                    task = self.runtaskentries[tid].task
>>> +                    self.prepare_task_hash(tid)
>>>
>>>           bb.parse.siggen.writeout_file_checksum_cache()
>>>
>>>           #self.dump_data()
>>>           return len(self.runtaskentries)
>>>
>>> +    def prepare_task_hash(self, tid):
>>> +        procdep = []
>>> +        for dep in self.runtaskentries[tid].depends:
>>> +            procdep.append(fn_from_tid(dep) + "." + taskname_from_tid(dep))
>>> +        (mc, fn, taskname, taskfn) = split_tid_mcfn(tid)
>>> +        self.runtaskentries[tid].hash = bb.parse.siggen.get_taskhash(taskfn, taskname, procdep, self.dataCaches[mc])
>>> +        self.runtaskentries[tid].unihash = bb.parse.siggen.get_unihash(fn + "." + taskname)
>>> +
>>>       def dump_data(self):
>>>           """
>>>           Dump some debug information on the internal data structures
>>> @@ -2081,7 +2088,8 @@ class RunQueueExecuteTasks(RunQueueExecute):
>>>                   deps = self.rqdata.runtaskentries[revdep].depends
>>>                   provides = self.rqdata.dataCaches[mc].fn_provides[taskfn]
>>>                   taskhash = self.rqdata.runtaskentries[revdep].hash
>>> -                taskdepdata[revdep] = [pn, taskname, fn, deps, provides, taskhash]
>>> +                unihash = self.rqdata.runtaskentries[revdep].unihash
>>> +                taskdepdata[revdep] = [pn, taskname, fn, deps, provides, taskhash, unihash]
>>>                   for revdep2 in deps:
>>>                       if revdep2 not in taskdepdata:
>>>                           additional.append(revdep2)
>>> @@ -2524,7 +2532,8 @@ class RunQueueExecuteScenequeue(RunQueueExecute):
>>>                   deps = getsetscenedeps(revdep)
>>>                   provides = self.rqdata.dataCaches[mc].fn_provides[taskfn]
>>>                   taskhash = self.rqdata.runtaskentries[revdep].hash
>>> -                taskdepdata[revdep] = [pn, taskname, fn, deps, provides, taskhash]
>>> +                unihash = self.rqdata.runtaskentries[revdep].unihash
>>> +                taskdepdata[revdep] = [pn, taskname, fn, deps, provides, taskhash, unihash]
>>>                   for revdep2 in deps:
>>>                       if revdep2 not in taskdepdata:
>>>                           additional.append(revdep2)



^ permalink raw reply	[flat|nested] 158+ messages in thread

* Re: [PATCH v5 3/8] bitbake: runqueue: Track task unique hash
@ 2019-01-07 16:16                 ` akuster808
  0 siblings, 0 replies; 158+ messages in thread
From: akuster808 @ 2019-01-07 16:16 UTC (permalink / raw)
  To: Joshua Watt, Alejandro Hernandez; +Cc: bitbake-devel, OE-core



On 1/5/19 7:09 PM, Joshua Watt wrote:
> On Sat, Jan 5, 2019 at 1:50 AM Alejandro Hernandez
> <alejandro.enedino.hernandez-samaniego@xilinx.com> wrote:
>> Hey Joshua,
>>
>> This is breaking multiconfig builds with the following error (trimmed),
>> I believe it is not taking into account that the Key could contain
>> "mc:..." if it is a multiconfig build.
> Hmm, yes that seems likely. I'll take a look, would you mind opening a
> bug in Bugzilla and assigning it to me? I'm not very familiar with
> multiconfig, so some instructions to help reproduce would be very
> helpful.
>
> Does anyone know if mutliconfig is tested on the autobuilders?
We don't as far as I know.

Do we need to bug this as this patch in in master.

- armin
>
>>
>> ERROR: Running idle function
>>   File "poky/bitbake/lib/bb/runqueue.py", line 1170, in
>> RunQueueData.prepare_task_hash(tid='multiconfig:x86:poky/meta/recipes-support/attr/acl_2.2.52.bb:do_fetch'):
>>               self.runtaskentries[tid].hash =
>> bb.parse.siggen.get_taskhash(taskfn, taskname, procdep, self.dataCaches[mc])
>>      >        self.runtaskentries[tid].unihash =
>> bb.parse.siggen.get_unihash(fn + "." + taskname)
>>
>>    File "poky/bitbake/lib/bb/siggen.py", line 45, in
>> SignatureGeneratorOEBasicHash.get_unihash(task='poky/meta/recipes-support/attr/acl_2.2.52.bb.do_fetch'):
>>           def get_unihash(self, task):
>>      >        return self.taskhash[task]
>>
>> KeyError: 'poky/meta/recipes-support/attr/acl_2.2.52.bb.do_fetch'
>>
>>
>> Cheers,
>>
>> Alejandro
>>
>>
>> On 12/18/2018 7:10 PM, Joshua Watt wrote:
>>> Requests the task unique hash from siggen and tracks it
>>>
>>> [YOCTO #13030]
>>>
>>> Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
>>> ---
>>>   bitbake/lib/bb/runqueue.py | 25 +++++++++++++++++--------
>>>   1 file changed, 17 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/bitbake/lib/bb/runqueue.py b/bitbake/lib/bb/runqueue.py
>>> index f2b95a9829b..27b188256dd 100644
>>> --- a/bitbake/lib/bb/runqueue.py
>>> +++ b/bitbake/lib/bb/runqueue.py
>>> @@ -346,6 +346,7 @@ class RunTaskEntry(object):
>>>           self.depends = set()
>>>           self.revdeps = set()
>>>           self.hash = None
>>> +        self.unihash = None
>>>           self.task = None
>>>           self.weight = 1
>>>
>>> @@ -385,6 +386,9 @@ class RunQueueData:
>>>       def get_task_hash(self, tid):
>>>           return self.runtaskentries[tid].hash
>>>
>>> +    def get_task_unihash(self, tid):
>>> +        return self.runtaskentries[tid].unihash
>>> +
>>>       def get_user_idstring(self, tid, task_name_suffix = ""):
>>>           return tid + task_name_suffix
>>>
>>> @@ -1150,18 +1154,21 @@ class RunQueueData:
>>>                   if len(self.runtaskentries[tid].depends - dealtwith) == 0:
>>>                       dealtwith.add(tid)
>>>                       todeal.remove(tid)
>>> -                    procdep = []
>>> -                    for dep in self.runtaskentries[tid].depends:
>>> -                        procdep.append(fn_from_tid(dep) + "." + taskname_from_tid(dep))
>>> -                    (mc, fn, taskname, taskfn) = split_tid_mcfn(tid)
>>> -                    self.runtaskentries[tid].hash = bb.parse.siggen.get_taskhash(taskfn, taskname, procdep, self.dataCaches[mc])
>>> -                    task = self.runtaskentries[tid].task
>>> +                    self.prepare_task_hash(tid)
>>>
>>>           bb.parse.siggen.writeout_file_checksum_cache()
>>>
>>>           #self.dump_data()
>>>           return len(self.runtaskentries)
>>>
>>> +    def prepare_task_hash(self, tid):
>>> +        procdep = []
>>> +        for dep in self.runtaskentries[tid].depends:
>>> +            procdep.append(fn_from_tid(dep) + "." + taskname_from_tid(dep))
>>> +        (mc, fn, taskname, taskfn) = split_tid_mcfn(tid)
>>> +        self.runtaskentries[tid].hash = bb.parse.siggen.get_taskhash(taskfn, taskname, procdep, self.dataCaches[mc])
>>> +        self.runtaskentries[tid].unihash = bb.parse.siggen.get_unihash(fn + "." + taskname)
>>> +
>>>       def dump_data(self):
>>>           """
>>>           Dump some debug information on the internal data structures
>>> @@ -2081,7 +2088,8 @@ class RunQueueExecuteTasks(RunQueueExecute):
>>>                   deps = self.rqdata.runtaskentries[revdep].depends
>>>                   provides = self.rqdata.dataCaches[mc].fn_provides[taskfn]
>>>                   taskhash = self.rqdata.runtaskentries[revdep].hash
>>> -                taskdepdata[revdep] = [pn, taskname, fn, deps, provides, taskhash]
>>> +                unihash = self.rqdata.runtaskentries[revdep].unihash
>>> +                taskdepdata[revdep] = [pn, taskname, fn, deps, provides, taskhash, unihash]
>>>                   for revdep2 in deps:
>>>                       if revdep2 not in taskdepdata:
>>>                           additional.append(revdep2)
>>> @@ -2524,7 +2532,8 @@ class RunQueueExecuteScenequeue(RunQueueExecute):
>>>                   deps = getsetscenedeps(revdep)
>>>                   provides = self.rqdata.dataCaches[mc].fn_provides[taskfn]
>>>                   taskhash = self.rqdata.runtaskentries[revdep].hash
>>> -                taskdepdata[revdep] = [pn, taskname, fn, deps, provides, taskhash]
>>> +                unihash = self.rqdata.runtaskentries[revdep].unihash
>>> +                taskdepdata[revdep] = [pn, taskname, fn, deps, provides, taskhash, unihash]
>>>                   for revdep2 in deps:
>>>                       if revdep2 not in taskdepdata:
>>>                           additional.append(revdep2)



^ permalink raw reply	[flat|nested] 158+ messages in thread

* Re: [OE-core] [PATCH v5 3/8] bitbake: runqueue: Track task unique hash
  2019-01-07 16:16                 ` akuster808
@ 2019-01-07 16:40                   ` Joshua Watt
  -1 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2019-01-07 16:40 UTC (permalink / raw)
  To: akuster808, Alejandro Hernandez; +Cc: bitbake-devel, OE-core

On Mon, 2019-01-07 at 08:16 -0800, akuster808 wrote:
> 
> On 1/5/19 7:09 PM, Joshua Watt wrote:
> > On Sat, Jan 5, 2019 at 1:50 AM Alejandro Hernandez
> > <alejandro.enedino.hernandez-samaniego@xilinx.com> wrote:
> > > Hey Joshua,
> > > 
> > > This is breaking multiconfig builds with the following error
> > > (trimmed),
> > > I believe it is not taking into account that the Key could
> > > contain
> > > "mc:..." if it is a multiconfig build.
> > Hmm, yes that seems likely. I'll take a look, would you mind
> > opening a
> > bug in Bugzilla and assigning it to me? I'm not very familiar with
> > multiconfig, so some instructions to help reproduce would be very
> > helpful.
> > 
> > Does anyone know if mutliconfig is tested on the autobuilders?
> We don't as far as I know.
> 
> Do we need to bug this as this patch in in master.

I don't know what the exact criteria is for when a bug gets created.
Based on my (limited) previous experience, I thought it deserved a bug
because it was on master and we are past 2.7 M1.

Anyway, it's already been created: 
https://bugzilla.yoctoproject.org/show_bug.cgi?id=13124

> 
> - armin
> > > ERROR: Running idle function
> > >   File "poky/bitbake/lib/bb/runqueue.py", line 1170, in
> > > RunQueueData.prepare_task_hash(tid='multiconfig:x86:poky/meta/rec
> > > ipes-support/attr/acl_2.2.52.bb:do_fetch'):
> > >               self.runtaskentries[tid].hash =
> > > bb.parse.siggen.get_taskhash(taskfn, taskname, procdep,
> > > self.dataCaches[mc])
> > >      >        self.runtaskentries[tid].unihash =
> > > bb.parse.siggen.get_unihash(fn + "." + taskname)
> > > 
> > >    File "poky/bitbake/lib/bb/siggen.py", line 45, in
> > > SignatureGeneratorOEBasicHash.get_unihash(task='poky/meta/recipes
> > > -support/attr/acl_2.2.52.bb.do_fetch'):
> > >           def get_unihash(self, task):
> > >      >        return self.taskhash[task]
> > > 
> > > KeyError: 'poky/meta/recipes-support/attr/acl_2.2.52.bb.do_fetch'
> > > 
> > > 
> > > Cheers,
> > > 
> > > Alejandro
> > > 
> > > 
> > > On 12/18/2018 7:10 PM, Joshua Watt wrote:
> > > > Requests the task unique hash from siggen and tracks it
> > > > 
> > > > [YOCTO #13030]
> > > > 
> > > > Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
> > > > ---
> > > >   bitbake/lib/bb/runqueue.py | 25 +++++++++++++++++--------
> > > >   1 file changed, 17 insertions(+), 8 deletions(-)
> > > > 
> > > > diff --git a/bitbake/lib/bb/runqueue.py
> > > > b/bitbake/lib/bb/runqueue.py
> > > > index f2b95a9829b..27b188256dd 100644
> > > > --- a/bitbake/lib/bb/runqueue.py
> > > > +++ b/bitbake/lib/bb/runqueue.py
> > > > @@ -346,6 +346,7 @@ class RunTaskEntry(object):
> > > >           self.depends = set()
> > > >           self.revdeps = set()
> > > >           self.hash = None
> > > > +        self.unihash = None
> > > >           self.task = None
> > > >           self.weight = 1
> > > > 
> > > > @@ -385,6 +386,9 @@ class RunQueueData:
> > > >       def get_task_hash(self, tid):
> > > >           return self.runtaskentries[tid].hash
> > > > 
> > > > +    def get_task_unihash(self, tid):
> > > > +        return self.runtaskentries[tid].unihash
> > > > +
> > > >       def get_user_idstring(self, tid, task_name_suffix = ""):
> > > >           return tid + task_name_suffix
> > > > 
> > > > @@ -1150,18 +1154,21 @@ class RunQueueData:
> > > >                   if len(self.runtaskentries[tid].depends -
> > > > dealtwith) == 0:
> > > >                       dealtwith.add(tid)
> > > >                       todeal.remove(tid)
> > > > -                    procdep = []
> > > > -                    for dep in
> > > > self.runtaskentries[tid].depends:
> > > > -                        procdep.append(fn_from_tid(dep) + "."
> > > > + taskname_from_tid(dep))
> > > > -                    (mc, fn, taskname, taskfn) =
> > > > split_tid_mcfn(tid)
> > > > -                    self.runtaskentries[tid].hash =
> > > > bb.parse.siggen.get_taskhash(taskfn, taskname, procdep,
> > > > self.dataCaches[mc])
> > > > -                    task = self.runtaskentries[tid].task
> > > > +                    self.prepare_task_hash(tid)
> > > > 
> > > >           bb.parse.siggen.writeout_file_checksum_cache()
> > > > 
> > > >           #self.dump_data()
> > > >           return len(self.runtaskentries)
> > > > 
> > > > +    def prepare_task_hash(self, tid):
> > > > +        procdep = []
> > > > +        for dep in self.runtaskentries[tid].depends:
> > > > +            procdep.append(fn_from_tid(dep) + "." +
> > > > taskname_from_tid(dep))
> > > > +        (mc, fn, taskname, taskfn) = split_tid_mcfn(tid)
> > > > +        self.runtaskentries[tid].hash =
> > > > bb.parse.siggen.get_taskhash(taskfn, taskname, procdep,
> > > > self.dataCaches[mc])
> > > > +        self.runtaskentries[tid].unihash =
> > > > bb.parse.siggen.get_unihash(fn + "." + taskname)
> > > > +
> > > >       def dump_data(self):
> > > >           """
> > > >           Dump some debug information on the internal data
> > > > structures
> > > > @@ -2081,7 +2088,8 @@ class
> > > > RunQueueExecuteTasks(RunQueueExecute):
> > > >                   deps =
> > > > self.rqdata.runtaskentries[revdep].depends
> > > >                   provides =
> > > > self.rqdata.dataCaches[mc].fn_provides[taskfn]
> > > >                   taskhash =
> > > > self.rqdata.runtaskentries[revdep].hash
> > > > -                taskdepdata[revdep] = [pn, taskname, fn, deps,
> > > > provides, taskhash]
> > > > +                unihash =
> > > > self.rqdata.runtaskentries[revdep].unihash
> > > > +                taskdepdata[revdep] = [pn, taskname, fn, deps,
> > > > provides, taskhash, unihash]
> > > >                   for revdep2 in deps:
> > > >                       if revdep2 not in taskdepdata:
> > > >                           additional.append(revdep2)
> > > > @@ -2524,7 +2532,8 @@ class
> > > > RunQueueExecuteScenequeue(RunQueueExecute):
> > > >                   deps = getsetscenedeps(revdep)
> > > >                   provides =
> > > > self.rqdata.dataCaches[mc].fn_provides[taskfn]
> > > >                   taskhash =
> > > > self.rqdata.runtaskentries[revdep].hash
> > > > -                taskdepdata[revdep] = [pn, taskname, fn, deps,
> > > > provides, taskhash]
> > > > +                unihash =
> > > > self.rqdata.runtaskentries[revdep].unihash
> > > > +                taskdepdata[revdep] = [pn, taskname, fn, deps,
> > > > provides, taskhash, unihash]
> > > >                   for revdep2 in deps:
> > > >                       if revdep2 not in taskdepdata:
> > > >                           additional.append(revdep2)
-- 
Joshua Watt <JPEWhacker@gmail.com>



^ permalink raw reply	[flat|nested] 158+ messages in thread

* Re: [PATCH v5 3/8] bitbake: runqueue: Track task unique hash
@ 2019-01-07 16:40                   ` Joshua Watt
  0 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2019-01-07 16:40 UTC (permalink / raw)
  To: akuster808, Alejandro Hernandez; +Cc: bitbake-devel, OE-core

On Mon, 2019-01-07 at 08:16 -0800, akuster808 wrote:
> 
> On 1/5/19 7:09 PM, Joshua Watt wrote:
> > On Sat, Jan 5, 2019 at 1:50 AM Alejandro Hernandez
> > <alejandro.enedino.hernandez-samaniego@xilinx.com> wrote:
> > > Hey Joshua,
> > > 
> > > This is breaking multiconfig builds with the following error
> > > (trimmed),
> > > I believe it is not taking into account that the Key could
> > > contain
> > > "mc:..." if it is a multiconfig build.
> > Hmm, yes that seems likely. I'll take a look, would you mind
> > opening a
> > bug in Bugzilla and assigning it to me? I'm not very familiar with
> > multiconfig, so some instructions to help reproduce would be very
> > helpful.
> > 
> > Does anyone know if mutliconfig is tested on the autobuilders?
> We don't as far as I know.
> 
> Do we need to bug this as this patch in in master.

I don't know what the exact criteria is for when a bug gets created.
Based on my (limited) previous experience, I thought it deserved a bug
because it was on master and we are past 2.7 M1.

Anyway, it's already been created: 
https://bugzilla.yoctoproject.org/show_bug.cgi?id=13124

> 
> - armin
> > > ERROR: Running idle function
> > >   File "poky/bitbake/lib/bb/runqueue.py", line 1170, in
> > > RunQueueData.prepare_task_hash(tid='multiconfig:x86:poky/meta/rec
> > > ipes-support/attr/acl_2.2.52.bb:do_fetch'):
> > >               self.runtaskentries[tid].hash =
> > > bb.parse.siggen.get_taskhash(taskfn, taskname, procdep,
> > > self.dataCaches[mc])
> > >      >        self.runtaskentries[tid].unihash =
> > > bb.parse.siggen.get_unihash(fn + "." + taskname)
> > > 
> > >    File "poky/bitbake/lib/bb/siggen.py", line 45, in
> > > SignatureGeneratorOEBasicHash.get_unihash(task='poky/meta/recipes
> > > -support/attr/acl_2.2.52.bb.do_fetch'):
> > >           def get_unihash(self, task):
> > >      >        return self.taskhash[task]
> > > 
> > > KeyError: 'poky/meta/recipes-support/attr/acl_2.2.52.bb.do_fetch'
> > > 
> > > 
> > > Cheers,
> > > 
> > > Alejandro
> > > 
> > > 
> > > On 12/18/2018 7:10 PM, Joshua Watt wrote:
> > > > Requests the task unique hash from siggen and tracks it
> > > > 
> > > > [YOCTO #13030]
> > > > 
> > > > Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
> > > > ---
> > > >   bitbake/lib/bb/runqueue.py | 25 +++++++++++++++++--------
> > > >   1 file changed, 17 insertions(+), 8 deletions(-)
> > > > 
> > > > diff --git a/bitbake/lib/bb/runqueue.py
> > > > b/bitbake/lib/bb/runqueue.py
> > > > index f2b95a9829b..27b188256dd 100644
> > > > --- a/bitbake/lib/bb/runqueue.py
> > > > +++ b/bitbake/lib/bb/runqueue.py
> > > > @@ -346,6 +346,7 @@ class RunTaskEntry(object):
> > > >           self.depends = set()
> > > >           self.revdeps = set()
> > > >           self.hash = None
> > > > +        self.unihash = None
> > > >           self.task = None
> > > >           self.weight = 1
> > > > 
> > > > @@ -385,6 +386,9 @@ class RunQueueData:
> > > >       def get_task_hash(self, tid):
> > > >           return self.runtaskentries[tid].hash
> > > > 
> > > > +    def get_task_unihash(self, tid):
> > > > +        return self.runtaskentries[tid].unihash
> > > > +
> > > >       def get_user_idstring(self, tid, task_name_suffix = ""):
> > > >           return tid + task_name_suffix
> > > > 
> > > > @@ -1150,18 +1154,21 @@ class RunQueueData:
> > > >                   if len(self.runtaskentries[tid].depends -
> > > > dealtwith) == 0:
> > > >                       dealtwith.add(tid)
> > > >                       todeal.remove(tid)
> > > > -                    procdep = []
> > > > -                    for dep in
> > > > self.runtaskentries[tid].depends:
> > > > -                        procdep.append(fn_from_tid(dep) + "."
> > > > + taskname_from_tid(dep))
> > > > -                    (mc, fn, taskname, taskfn) =
> > > > split_tid_mcfn(tid)
> > > > -                    self.runtaskentries[tid].hash =
> > > > bb.parse.siggen.get_taskhash(taskfn, taskname, procdep,
> > > > self.dataCaches[mc])
> > > > -                    task = self.runtaskentries[tid].task
> > > > +                    self.prepare_task_hash(tid)
> > > > 
> > > >           bb.parse.siggen.writeout_file_checksum_cache()
> > > > 
> > > >           #self.dump_data()
> > > >           return len(self.runtaskentries)
> > > > 
> > > > +    def prepare_task_hash(self, tid):
> > > > +        procdep = []
> > > > +        for dep in self.runtaskentries[tid].depends:
> > > > +            procdep.append(fn_from_tid(dep) + "." +
> > > > taskname_from_tid(dep))
> > > > +        (mc, fn, taskname, taskfn) = split_tid_mcfn(tid)
> > > > +        self.runtaskentries[tid].hash =
> > > > bb.parse.siggen.get_taskhash(taskfn, taskname, procdep,
> > > > self.dataCaches[mc])
> > > > +        self.runtaskentries[tid].unihash =
> > > > bb.parse.siggen.get_unihash(fn + "." + taskname)
> > > > +
> > > >       def dump_data(self):
> > > >           """
> > > >           Dump some debug information on the internal data
> > > > structures
> > > > @@ -2081,7 +2088,8 @@ class
> > > > RunQueueExecuteTasks(RunQueueExecute):
> > > >                   deps =
> > > > self.rqdata.runtaskentries[revdep].depends
> > > >                   provides =
> > > > self.rqdata.dataCaches[mc].fn_provides[taskfn]
> > > >                   taskhash =
> > > > self.rqdata.runtaskentries[revdep].hash
> > > > -                taskdepdata[revdep] = [pn, taskname, fn, deps,
> > > > provides, taskhash]
> > > > +                unihash =
> > > > self.rqdata.runtaskentries[revdep].unihash
> > > > +                taskdepdata[revdep] = [pn, taskname, fn, deps,
> > > > provides, taskhash, unihash]
> > > >                   for revdep2 in deps:
> > > >                       if revdep2 not in taskdepdata:
> > > >                           additional.append(revdep2)
> > > > @@ -2524,7 +2532,8 @@ class
> > > > RunQueueExecuteScenequeue(RunQueueExecute):
> > > >                   deps = getsetscenedeps(revdep)
> > > >                   provides =
> > > > self.rqdata.dataCaches[mc].fn_provides[taskfn]
> > > >                   taskhash =
> > > > self.rqdata.runtaskentries[revdep].hash
> > > > -                taskdepdata[revdep] = [pn, taskname, fn, deps,
> > > > provides, taskhash]
> > > > +                unihash =
> > > > self.rqdata.runtaskentries[revdep].unihash
> > > > +                taskdepdata[revdep] = [pn, taskname, fn, deps,
> > > > provides, taskhash, unihash]
> > > >                   for revdep2 in deps:
> > > >                       if revdep2 not in taskdepdata:
> > > >                           additional.append(revdep2)
-- 
Joshua Watt <JPEWhacker@gmail.com>



^ permalink raw reply	[flat|nested] 158+ messages in thread

* Re: [OE-core][PATCH v7 3/3] sstate: Implement hash equivalence sstate
  2019-01-04 16:20             ` [PATCH " Joshua Watt
@ 2019-01-08  6:29               ` Jacob Kroon
  -1 siblings, 0 replies; 158+ messages in thread
From: Jacob Kroon @ 2019-01-08  6:29 UTC (permalink / raw)
  To: Joshua Watt, openembedded-core, bitbake-devel

On 1/4/19 5:20 PM, Joshua Watt wrote:
> Converts sstate so that it can use a hash equivalence server to
> determine if a task really needs to be rebuilt, or if it can be restored
> from a different (equivalent) sstate object.
> 
> The unique hashes are cached persistently using persist_data. This has
> a number of advantages:
>   1) Unique hashes can be cached between invocations of bitbake to
>      prevent needing to contact the server every time (which is slow)
>   2) The value of each tasks unique hash can easily be synchronized
>      between different threads, which will be useful if bitbake is
>      updated to do on the fly task re-hashing.
> 
> [YOCTO #13030]
> 
> Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
> ---
>   meta/classes/sstate.bbclass | 105 +++++++++++++++++++++--
>   meta/conf/bitbake.conf      |   4 +-
>   meta/lib/oe/sstatesig.py    | 167 ++++++++++++++++++++++++++++++++++++
>   3 files changed, 267 insertions(+), 9 deletions(-)
> 
> diff --git a/meta/classes/sstate.bbclass b/meta/classes/sstate.bbclass
> index 59ebc3ab5cc..da0807d6e99 100644
> --- a/meta/classes/sstate.bbclass
> +++ b/meta/classes/sstate.bbclass
> @@ -11,7 +11,7 @@ def generate_sstatefn(spec, hash, d):
>   SSTATE_PKGARCH    = "${PACKAGE_ARCH}"
>   SSTATE_PKGSPEC    = "sstate:${PN}:${PACKAGE_ARCH}${TARGET_VENDOR}-${TARGET_OS}:${PV}:${PR}:${SSTATE_PKGARCH}:${SSTATE_VERSION}:"
>   SSTATE_SWSPEC     = "sstate:${PN}::${PV}:${PR}::${SSTATE_VERSION}:"
> -SSTATE_PKGNAME    = "${SSTATE_EXTRAPATH}${@generate_sstatefn(d.getVar('SSTATE_PKGSPEC'), d.getVar('BB_TASKHASH'), d)}"
> +SSTATE_PKGNAME    = "${SSTATE_EXTRAPATH}${@generate_sstatefn(d.getVar('SSTATE_PKGSPEC'), d.getVar('BB_UNIHASH'), d)}"
>   SSTATE_PKG        = "${SSTATE_DIR}/${SSTATE_PKGNAME}"
>   SSTATE_EXTRAPATH   = ""
>   SSTATE_EXTRAPATHWILDCARD = ""
> @@ -82,6 +82,23 @@ SSTATE_SIG_PASSPHRASE ?= ""
>   # Whether to verify the GnUPG signatures when extracting sstate archives
>   SSTATE_VERIFY_SIG ?= "0"
>   
> +SSTATE_HASHEQUIV_METHOD ?= "OEOuthashBasic"
> +SSTATE_HASHEQUIV_METHOD[doc] = "The function used to calculate the output hash \
> +    for a task, which in turn is used to determine equivalency. \
> +    "
> +
> +SSTATE_HASHEQUIV_SERVER ?= ""
> +SSTATE_HASHEQUIV_SERVER[doc] = "The hash equivalence sever. For example, \
> +    'http://192.168.0.1:5000'. Do not include a trailing slash \
> +    "
> +
> +SSTATE_HASHEQUIV_REPORT_TASKDATA ?= "0"
> +SSTATE_HASHEQUIV_REPORT_TASKDATA[doc] = "Report additional useful data to the \
> +    hash equivalency server, such as PN, PV, taskname, etc. This information \
> +    is very useful for developers looking at task data, but may leak sensitive \
> +    data if the equivalence server is public. \
> +    "
> +
>   python () {
>       if bb.data.inherits_class('native', d):
>           d.setVar('SSTATE_PKGARCH', d.getVar('BUILD_ARCH', False))
> @@ -640,7 +657,7 @@ def sstate_package(ss, d):
>           return
>   
>       for f in (d.getVar('SSTATECREATEFUNCS') or '').split() + \
> -             ['sstate_create_package', 'sstate_sign_package'] + \
> +             ['sstate_report_unihash', 'sstate_create_package', 'sstate_sign_package'] + \
>                (d.getVar('SSTATEPOSTCREATEFUNCS') or '').split():
>           # All hooks should run in SSTATE_BUILDDIR.
>           bb.build.exec_func(f, d, (sstatebuild,))
> @@ -764,6 +781,73 @@ python sstate_sign_package () {
>                              d.getVar('SSTATE_SIG_PASSPHRASE'), armor=False)
>   }
>   
> +def OEOuthashBasic(path, sigfile, task, d):
> +    import hashlib
> +    import stat
> +
> +    def update_hash(s):
> +        s = s.encode('utf-8')
> +        h.update(s)
> +        if sigfile:
> +            sigfile.write(s)
> +
> +    h = hashlib.sha256()
> +    prev_dir = os.getcwd()
> +
> +    try:
> +        os.chdir(path)
> +
> +        update_hash("OEOuthashBasic\n")
> +
> +        # It is only currently useful to get equivalent hashes for things that
> +        # can be restored from sstate. Since the sstate object is named using
> +        # SSTATE_PKGSPEC and the task name, those should be included in the
> +        # output hash calculation.
> +        update_hash("SSTATE_PKGSPEC=%s\n" % d.getVar('SSTATE_PKGSPEC'))
> +        update_hash("task=%s\n" % task)
> +
> +        for root, dirs, files in os.walk('.', topdown=True):
> +            # Sort directories and files to ensure consistent ordering
> +            dirs.sort()
> +            files.sort()
> +
> +            for f in files:
> +                path = os.path.join(root, f)
> +                s = os.lstat(path)
> +
> +                # Hash file path
> +                update_hash(path + '\n')
> +
> +                # Hash file mode
> +                update_hash("\tmode=0x%x\n" % stat.S_IMODE(s.st_mode))
> +                update_hash("\ttype=0x%x\n" % stat.S_IFMT(s.st_mode))
> +
> +                if stat.S_ISBLK(s.st_mode) or stat.S_ISBLK(s.st_mode):
> +                    # Hash device major and minor
> +                    update_hash("\tdev=%d,%d\n" % (os.major(s.st_rdev), os.minor(s.st_rdev)))
> +                elif stat.S_ISLNK(s.st_mode):
> +                    # Hash symbolic link
> +                    update_hash("\tsymlink=%s\n" % os.readlink(path))
> +                else:
> +                    fh = hashlib.sha256()
> +                    # Hash file contents
> +                    with open(path, 'rb') as d:
> +                        for chunk in iter(lambda: d.read(4096), b""):
> +                            fh.update(chunk)
> +                    update_hash("\tdigest=%s\n" % fh.hexdigest())

Would it be a good idea to make the depsig.do_* files even more human 
readable, considering that they could be candidates for being stored in 
buildhistory ?

As an example, here's what buildhistory/.../files-in-package.txt for 
busybox looks like:

drwxr-xr-x root       root             4096 ./bin
lrwxrwxrwx root       root               14 ./bin/busybox -> busybox.nosuid
-rwxr-xr-x root       root           547292 ./bin/busybox.nosuid
-rwsr-xr-x root       root            50860 ./bin/busybox.suid
lrwxrwxrwx root       root               14 ./bin/sh -> busybox.nosuid
drwxr-xr-x root       root             4096 ./etc
-rw-r--r-- root       root             2339 ./etc/busybox.links.nosuid
-rw-r--r-- root       root               91 ./etc/busybox.links.suid

> +    finally:
> +        os.chdir(prev_dir)
> +
> +    return h.hexdigest()
> +
> +python sstate_report_unihash() {
> +    report_unihash = getattr(bb.parse.siggen, 'report_unihash', None)
> +
> +    if report_unihash:
> +        ss = sstate_state_fromvars(d)
> +        report_unihash(os.getcwd(), ss['task'], d)
> +}
> +
>   #
>   # Shell function to decompress and prepare a package for installation
>   # Will be run from within SSTATE_INSTDIR.
> @@ -788,6 +872,11 @@ def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False, *,
>       if siginfo:
>           extension = extension + ".siginfo"
>   
> +    def gethash(task):
> +        if sq_unihash is not None:
> +            return sq_unihash[task]
> +        return sq_hash[task]
> +
>       def getpathcomponents(task, d):
>           # Magic data from BB_HASHFILENAME
>           splithashfn = sq_hashfn[task].split(" ")
> @@ -810,7 +899,7 @@ def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False, *,
>   
>           spec, extrapath, tname = getpathcomponents(task, d)
>   
> -        sstatefile = d.expand("${SSTATE_DIR}/" + extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + extension)
> +        sstatefile = d.expand("${SSTATE_DIR}/" + extrapath + generate_sstatefn(spec, gethash(task), d) + "_" + tname + extension)
>   
>           if os.path.exists(sstatefile):
>               bb.debug(2, "SState: Found valid sstate file %s" % sstatefile)
> @@ -872,7 +961,7 @@ def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False, *,
>               if task in ret:
>                   continue
>               spec, extrapath, tname = getpathcomponents(task, d)
> -            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + extension)
> +            sstatefile = d.expand(extrapath + generate_sstatefn(spec, gethash(task), d) + "_" + tname + extension)
>               tasklist.append((task, sstatefile))
>   
>           if tasklist:
> @@ -898,12 +987,12 @@ def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False, *,
>           evdata = {'missed': [], 'found': []};
>           for task in missed:
>               spec, extrapath, tname = getpathcomponents(task, d)
> -            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + ".tgz")
> -            evdata['missed'].append( (sq_fn[task], sq_task[task], sq_hash[task], sstatefile ) )
> +            sstatefile = d.expand(extrapath + generate_sstatefn(spec, gethash(task), d) + "_" + tname + ".tgz")
> +            evdata['missed'].append( (sq_fn[task], sq_task[task], gethash(task), sstatefile ) )
>           for task in ret:
>               spec, extrapath, tname = getpathcomponents(task, d)
> -            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + ".tgz")
> -            evdata['found'].append( (sq_fn[task], sq_task[task], sq_hash[task], sstatefile ) )
> +            sstatefile = d.expand(extrapath + generate_sstatefn(spec, gethash(task), d) + "_" + tname + ".tgz")
> +            evdata['found'].append( (sq_fn[task], sq_task[task], gethash(task), sstatefile ) )
>           bb.event.fire(bb.event.MetadataEvent("MissedSstate", evdata), d)
>   
>       # Print some summary statistics about the current task completion and how much sstate
> diff --git a/meta/conf/bitbake.conf b/meta/conf/bitbake.conf
> index 64800623545..e64ce6a6dab 100644
> --- a/meta/conf/bitbake.conf
> +++ b/meta/conf/bitbake.conf
> @@ -867,7 +867,9 @@ BB_HASHBASE_WHITELIST ?= "TMPDIR FILE PATH PWD BB_TASKHASH BBPATH BBSERVER DL_DI
>       STAMPS_DIR PRSERV_DUMPDIR PRSERV_DUMPFILE PRSERV_LOCKDOWN PARALLEL_MAKE \
>       CCACHE_DIR EXTERNAL_TOOLCHAIN CCACHE CCACHE_NOHASHDIR LICENSE_PATH SDKPKGSUFFIX \
>       WARN_QA ERROR_QA WORKDIR STAMPCLEAN PKGDATA_DIR BUILD_ARCH SSTATE_PKGARCH \
> -    BB_WORKERCONTEXT BB_LIMITEDDEPS extend_recipe_sysroot DEPLOY_DIR"
> +    BB_WORKERCONTEXT BB_LIMITEDDEPS BB_UNIHASH extend_recipe_sysroot DEPLOY_DIR \
> +    SSTATE_HASHEQUIV_METHOD SSTATE_HASHEQUIV_SERVER SSTATE_HASHEQUIV_REPORT_TASKDATA \
> +    SSTATE_HASHEQUIV_OWNER"
>   BB_HASHCONFIG_WHITELIST ?= "${BB_HASHBASE_WHITELIST} DATE TIME SSH_AGENT_PID \
>       SSH_AUTH_SOCK PSEUDO_BUILD BB_ENV_EXTRAWHITE DISABLE_SANITY_CHECKS \
>       PARALLEL_MAKE BB_NUMBER_THREADS BB_ORIGENV BB_INVALIDCONF BBINCLUDED \
> diff --git a/meta/lib/oe/sstatesig.py b/meta/lib/oe/sstatesig.py
> index 18c5a353a2a..059e165c7ab 100644
> --- a/meta/lib/oe/sstatesig.py
> +++ b/meta/lib/oe/sstatesig.py
> @@ -263,10 +263,177 @@ class SignatureGeneratorOEBasicHash(bb.siggen.SignatureGeneratorBasicHash):
>           if error_msgs:
>               bb.fatal("\n".join(error_msgs))
>   
> +class SignatureGeneratorOEEquivHash(SignatureGeneratorOEBasicHash):
> +    name = "OEEquivHash"
> +
> +    def init_rundepcheck(self, data):
> +        super().init_rundepcheck(data)
> +        self.server = data.getVar('SSTATE_HASHEQUIV_SERVER')
> +        self.method = data.getVar('SSTATE_HASHEQUIV_METHOD')
> +        self.unihashes = bb.persist_data.persist('SSTATESIG_UNIHASH_CACHE_v1_' + self.method, data)
> +
> +    def get_taskdata(self):
> +        return (self.server, self.method) + super().get_taskdata()
> +
> +    def set_taskdata(self, data):
> +        self.server, self.method = data[:2]
> +        super().set_taskdata(data[2:])
> +
> +    def __get_task_unihash_key(self, task):
> +        # TODO: The key only *needs* to be the taskhash, the task is just
> +        # convenient
> +        return '%s:%s' % (task, self.taskhash[task])
> +
> +    def get_stampfile_hash(self, task):
> +        if task in self.taskhash:
> +            # If a unique hash is reported, use it as the stampfile hash. This
> +            # ensures that if a task won't be re-run if the taskhash changes,
> +            # but it would result in the same output hash
> +            unihash = self.unihashes.get(self.__get_task_unihash_key(task))
> +            if unihash is not None:
> +                return unihash
> +
> +        return super().get_stampfile_hash(task)
> +
> +    def get_unihash(self, task):
> +        import urllib
> +        import json
> +
> +        taskhash = self.taskhash[task]
> +
> +        key = self.__get_task_unihash_key(task)
> +
> +        # TODO: This cache can grow unbounded. It probably only needs to keep
> +        # for each task
> +        unihash = self.unihashes.get(key)
> +        if unihash is not None:
> +            return unihash
> +
> +        # In the absence of being able to discover a unique hash from the
> +        # server, make it be equivalent to the taskhash. The unique "hash" only
> +        # really needs to be a unique string (not even necessarily a hash), but
> +        # making it match the taskhash has a few advantages:
> +        #
> +        # 1) All of the sstate code that assumes hashes can be the same
> +        # 2) It provides maximal compatibility with builders that don't use
> +        #    an equivalency server
> +        # 3) The value is easy for multiple independent builders to derive the
> +        #    same unique hash from the same input. This means that if the
> +        #    independent builders find the same taskhash, but it isn't reported
> +        #    to the server, there is a better chance that they will agree on
> +        #    the unique hash.
> +        unihash = taskhash
> +
> +        try:
> +            url = '%s/v1/equivalent?%s' % (self.server,
> +                    urllib.parse.urlencode({'method': self.method, 'taskhash': self.taskhash[task]}))
> +
> +            request = urllib.request.Request(url)
> +            response = urllib.request.urlopen(request)
> +            data = response.read().decode('utf-8')
> +
> +            json_data = json.loads(data)
> +
> +            if json_data:
> +                unihash = json_data['unihash']
> +                # A unique hash equal to the taskhash is not very interesting,
> +                # so it is reported it at debug level 2. If they differ, that
> +                # is much more interesting, so it is reported at debug level 1
> +                bb.debug((1, 2)[unihash == taskhash], 'Found unihash %s in place of %s for %s from %s' % (unihash, taskhash, task, self.server))
> +            else:
> +                bb.debug(2, 'No reported unihash for %s:%s from %s' % (task, taskhash, self.server))
> +        except urllib.error.URLError as e:
> +            bb.warn('Failure contacting Hash Equivalence Server %s: %s' % (self.server, str(e)))
> +        except (KeyError, json.JSONDecodeError) as e:
> +            bb.warn('Poorly formatted response from %s: %s' % (self.server, str(e)))
> +
> +        self.unihashes[key] = unihash
> +        return unihash
> +
> +    def report_unihash(self, path, task, d):
> +        import urllib
> +        import json
> +        import tempfile
> +        import base64
> +
> +        taskhash = d.getVar('BB_TASKHASH')
> +        unihash = d.getVar('BB_UNIHASH')
> +        report_taskdata = d.getVar('SSTATE_HASHEQUIV_REPORT_TASKDATA') == '1'
> +        tempdir = d.getVar('T')
> +        fn = d.getVar('BB_FILENAME')
> +        key = fn + '.do_' + task + ':' + taskhash
> +
> +        # Sanity checks
> +        cache_unihash = self.unihashes.get(key)
> +        if cache_unihash is None:
> +            bb.fatal('%s not in unihash cache. Please report this error' % key)
> +
> +        if cache_unihash != unihash:
> +            bb.fatal("Cache unihash %s doesn't match BB_UNIHASH %s" % (cache_unihash, unihash))
> +
> +        sigfile = None
> +        sigfile_name = "depsig.do_%s.%d" % (task, os.getpid())
> +        sigfile_link = "depsig.do_%s" % task
> +
> +        try:
> +            call = self.method + '(path, sigfile, task, d)'
> +            sigfile = open(os.path.join(tempdir, sigfile_name), 'w+b')
> +            locs = {'path': path, 'sigfile': sigfile, 'task': task, 'd': d}
> +
> +            outhash = bb.utils.better_eval(call, locs)
> +
> +            try:
> +                url = '%s/v1/equivalent' % self.server
> +                task_data = {
> +                    'taskhash': taskhash,
> +                    'method': self.method,
> +                    'outhash': outhash,
> +                    'unihash': unihash,
> +                    'owner': d.getVar('SSTATE_HASHEQUIV_OWNER')
> +                    }
> +
> +                if report_taskdata:
> +                    sigfile.seek(0)
> +
> +                    task_data['PN'] = d.getVar('PN')
> +                    task_data['PV'] = d.getVar('PV')
> +                    task_data['PR'] = d.getVar('PR')
> +                    task_data['task'] = task
> +                    task_data['outhash_siginfo'] = sigfile.read().decode('utf-8')
> +
> +                headers = {'content-type': 'application/json'}
> +
> +                request = urllib.request.Request(url, json.dumps(task_data).encode('utf-8'), headers)
> +                response = urllib.request.urlopen(request)
> +                data = response.read().decode('utf-8')
> +
> +                json_data = json.loads(data)
> +                new_unihash = json_data['unihash']
> +
> +                if new_unihash != unihash:
> +                    bb.debug(1, 'Task %s unihash changed %s -> %s by server %s' % (taskhash, unihash, new_unihash, self.server))
> +                else:
> +                    bb.debug(1, 'Reported task %s as unihash %s to %s' % (taskhash, unihash, self.server))
> +            except urllib.error.URLError as e:
> +                bb.warn('Failure contacting Hash Equivalence Server %s: %s' % (self.server, str(e)))
> +            except (KeyError, json.JSONDecodeError) as e:
> +                bb.warn('Poorly formatted response from %s: %s' % (self.server, str(e)))
> +        finally:
> +            if sigfile:
> +                sigfile.close()
> +
> +                sigfile_link_path = os.path.join(tempdir, sigfile_link)
> +                bb.utils.remove(sigfile_link_path)
> +
> +                try:
> +                    os.symlink(sigfile_name, sigfile_link_path)
> +                except OSError:
> +                    pass
>   
>   # Insert these classes into siggen's namespace so it can see and select them
>   bb.siggen.SignatureGeneratorOEBasic = SignatureGeneratorOEBasic
>   bb.siggen.SignatureGeneratorOEBasicHash = SignatureGeneratorOEBasicHash
> +bb.siggen.SignatureGeneratorOEEquivHash = SignatureGeneratorOEEquivHash
>   
>   
>   def find_siginfo(pn, taskname, taskhashlist, d):
> 


^ permalink raw reply	[flat|nested] 158+ messages in thread

* Re: [bitbake-devel] [PATCH v7 3/3] sstate: Implement hash equivalence sstate
@ 2019-01-08  6:29               ` Jacob Kroon
  0 siblings, 0 replies; 158+ messages in thread
From: Jacob Kroon @ 2019-01-08  6:29 UTC (permalink / raw)
  To: Joshua Watt, openembedded-core, bitbake-devel

On 1/4/19 5:20 PM, Joshua Watt wrote:
> Converts sstate so that it can use a hash equivalence server to
> determine if a task really needs to be rebuilt, or if it can be restored
> from a different (equivalent) sstate object.
> 
> The unique hashes are cached persistently using persist_data. This has
> a number of advantages:
>   1) Unique hashes can be cached between invocations of bitbake to
>      prevent needing to contact the server every time (which is slow)
>   2) The value of each tasks unique hash can easily be synchronized
>      between different threads, which will be useful if bitbake is
>      updated to do on the fly task re-hashing.
> 
> [YOCTO #13030]
> 
> Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
> ---
>   meta/classes/sstate.bbclass | 105 +++++++++++++++++++++--
>   meta/conf/bitbake.conf      |   4 +-
>   meta/lib/oe/sstatesig.py    | 167 ++++++++++++++++++++++++++++++++++++
>   3 files changed, 267 insertions(+), 9 deletions(-)
> 
> diff --git a/meta/classes/sstate.bbclass b/meta/classes/sstate.bbclass
> index 59ebc3ab5cc..da0807d6e99 100644
> --- a/meta/classes/sstate.bbclass
> +++ b/meta/classes/sstate.bbclass
> @@ -11,7 +11,7 @@ def generate_sstatefn(spec, hash, d):
>   SSTATE_PKGARCH    = "${PACKAGE_ARCH}"
>   SSTATE_PKGSPEC    = "sstate:${PN}:${PACKAGE_ARCH}${TARGET_VENDOR}-${TARGET_OS}:${PV}:${PR}:${SSTATE_PKGARCH}:${SSTATE_VERSION}:"
>   SSTATE_SWSPEC     = "sstate:${PN}::${PV}:${PR}::${SSTATE_VERSION}:"
> -SSTATE_PKGNAME    = "${SSTATE_EXTRAPATH}${@generate_sstatefn(d.getVar('SSTATE_PKGSPEC'), d.getVar('BB_TASKHASH'), d)}"
> +SSTATE_PKGNAME    = "${SSTATE_EXTRAPATH}${@generate_sstatefn(d.getVar('SSTATE_PKGSPEC'), d.getVar('BB_UNIHASH'), d)}"
>   SSTATE_PKG        = "${SSTATE_DIR}/${SSTATE_PKGNAME}"
>   SSTATE_EXTRAPATH   = ""
>   SSTATE_EXTRAPATHWILDCARD = ""
> @@ -82,6 +82,23 @@ SSTATE_SIG_PASSPHRASE ?= ""
>   # Whether to verify the GnUPG signatures when extracting sstate archives
>   SSTATE_VERIFY_SIG ?= "0"
>   
> +SSTATE_HASHEQUIV_METHOD ?= "OEOuthashBasic"
> +SSTATE_HASHEQUIV_METHOD[doc] = "The function used to calculate the output hash \
> +    for a task, which in turn is used to determine equivalency. \
> +    "
> +
> +SSTATE_HASHEQUIV_SERVER ?= ""
> +SSTATE_HASHEQUIV_SERVER[doc] = "The hash equivalence sever. For example, \
> +    'http://192.168.0.1:5000'. Do not include a trailing slash \
> +    "
> +
> +SSTATE_HASHEQUIV_REPORT_TASKDATA ?= "0"
> +SSTATE_HASHEQUIV_REPORT_TASKDATA[doc] = "Report additional useful data to the \
> +    hash equivalency server, such as PN, PV, taskname, etc. This information \
> +    is very useful for developers looking at task data, but may leak sensitive \
> +    data if the equivalence server is public. \
> +    "
> +
>   python () {
>       if bb.data.inherits_class('native', d):
>           d.setVar('SSTATE_PKGARCH', d.getVar('BUILD_ARCH', False))
> @@ -640,7 +657,7 @@ def sstate_package(ss, d):
>           return
>   
>       for f in (d.getVar('SSTATECREATEFUNCS') or '').split() + \
> -             ['sstate_create_package', 'sstate_sign_package'] + \
> +             ['sstate_report_unihash', 'sstate_create_package', 'sstate_sign_package'] + \
>                (d.getVar('SSTATEPOSTCREATEFUNCS') or '').split():
>           # All hooks should run in SSTATE_BUILDDIR.
>           bb.build.exec_func(f, d, (sstatebuild,))
> @@ -764,6 +781,73 @@ python sstate_sign_package () {
>                              d.getVar('SSTATE_SIG_PASSPHRASE'), armor=False)
>   }
>   
> +def OEOuthashBasic(path, sigfile, task, d):
> +    import hashlib
> +    import stat
> +
> +    def update_hash(s):
> +        s = s.encode('utf-8')
> +        h.update(s)
> +        if sigfile:
> +            sigfile.write(s)
> +
> +    h = hashlib.sha256()
> +    prev_dir = os.getcwd()
> +
> +    try:
> +        os.chdir(path)
> +
> +        update_hash("OEOuthashBasic\n")
> +
> +        # It is only currently useful to get equivalent hashes for things that
> +        # can be restored from sstate. Since the sstate object is named using
> +        # SSTATE_PKGSPEC and the task name, those should be included in the
> +        # output hash calculation.
> +        update_hash("SSTATE_PKGSPEC=%s\n" % d.getVar('SSTATE_PKGSPEC'))
> +        update_hash("task=%s\n" % task)
> +
> +        for root, dirs, files in os.walk('.', topdown=True):
> +            # Sort directories and files to ensure consistent ordering
> +            dirs.sort()
> +            files.sort()
> +
> +            for f in files:
> +                path = os.path.join(root, f)
> +                s = os.lstat(path)
> +
> +                # Hash file path
> +                update_hash(path + '\n')
> +
> +                # Hash file mode
> +                update_hash("\tmode=0x%x\n" % stat.S_IMODE(s.st_mode))
> +                update_hash("\ttype=0x%x\n" % stat.S_IFMT(s.st_mode))
> +
> +                if stat.S_ISBLK(s.st_mode) or stat.S_ISBLK(s.st_mode):
> +                    # Hash device major and minor
> +                    update_hash("\tdev=%d,%d\n" % (os.major(s.st_rdev), os.minor(s.st_rdev)))
> +                elif stat.S_ISLNK(s.st_mode):
> +                    # Hash symbolic link
> +                    update_hash("\tsymlink=%s\n" % os.readlink(path))
> +                else:
> +                    fh = hashlib.sha256()
> +                    # Hash file contents
> +                    with open(path, 'rb') as d:
> +                        for chunk in iter(lambda: d.read(4096), b""):
> +                            fh.update(chunk)
> +                    update_hash("\tdigest=%s\n" % fh.hexdigest())

Would it be a good idea to make the depsig.do_* files even more human 
readable, considering that they could be candidates for being stored in 
buildhistory ?

As an example, here's what buildhistory/.../files-in-package.txt for 
busybox looks like:

drwxr-xr-x root       root             4096 ./bin
lrwxrwxrwx root       root               14 ./bin/busybox -> busybox.nosuid
-rwxr-xr-x root       root           547292 ./bin/busybox.nosuid
-rwsr-xr-x root       root            50860 ./bin/busybox.suid
lrwxrwxrwx root       root               14 ./bin/sh -> busybox.nosuid
drwxr-xr-x root       root             4096 ./etc
-rw-r--r-- root       root             2339 ./etc/busybox.links.nosuid
-rw-r--r-- root       root               91 ./etc/busybox.links.suid

> +    finally:
> +        os.chdir(prev_dir)
> +
> +    return h.hexdigest()
> +
> +python sstate_report_unihash() {
> +    report_unihash = getattr(bb.parse.siggen, 'report_unihash', None)
> +
> +    if report_unihash:
> +        ss = sstate_state_fromvars(d)
> +        report_unihash(os.getcwd(), ss['task'], d)
> +}
> +
>   #
>   # Shell function to decompress and prepare a package for installation
>   # Will be run from within SSTATE_INSTDIR.
> @@ -788,6 +872,11 @@ def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False, *,
>       if siginfo:
>           extension = extension + ".siginfo"
>   
> +    def gethash(task):
> +        if sq_unihash is not None:
> +            return sq_unihash[task]
> +        return sq_hash[task]
> +
>       def getpathcomponents(task, d):
>           # Magic data from BB_HASHFILENAME
>           splithashfn = sq_hashfn[task].split(" ")
> @@ -810,7 +899,7 @@ def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False, *,
>   
>           spec, extrapath, tname = getpathcomponents(task, d)
>   
> -        sstatefile = d.expand("${SSTATE_DIR}/" + extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + extension)
> +        sstatefile = d.expand("${SSTATE_DIR}/" + extrapath + generate_sstatefn(spec, gethash(task), d) + "_" + tname + extension)
>   
>           if os.path.exists(sstatefile):
>               bb.debug(2, "SState: Found valid sstate file %s" % sstatefile)
> @@ -872,7 +961,7 @@ def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False, *,
>               if task in ret:
>                   continue
>               spec, extrapath, tname = getpathcomponents(task, d)
> -            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + extension)
> +            sstatefile = d.expand(extrapath + generate_sstatefn(spec, gethash(task), d) + "_" + tname + extension)
>               tasklist.append((task, sstatefile))
>   
>           if tasklist:
> @@ -898,12 +987,12 @@ def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False, *,
>           evdata = {'missed': [], 'found': []};
>           for task in missed:
>               spec, extrapath, tname = getpathcomponents(task, d)
> -            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + ".tgz")
> -            evdata['missed'].append( (sq_fn[task], sq_task[task], sq_hash[task], sstatefile ) )
> +            sstatefile = d.expand(extrapath + generate_sstatefn(spec, gethash(task), d) + "_" + tname + ".tgz")
> +            evdata['missed'].append( (sq_fn[task], sq_task[task], gethash(task), sstatefile ) )
>           for task in ret:
>               spec, extrapath, tname = getpathcomponents(task, d)
> -            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + ".tgz")
> -            evdata['found'].append( (sq_fn[task], sq_task[task], sq_hash[task], sstatefile ) )
> +            sstatefile = d.expand(extrapath + generate_sstatefn(spec, gethash(task), d) + "_" + tname + ".tgz")
> +            evdata['found'].append( (sq_fn[task], sq_task[task], gethash(task), sstatefile ) )
>           bb.event.fire(bb.event.MetadataEvent("MissedSstate", evdata), d)
>   
>       # Print some summary statistics about the current task completion and how much sstate
> diff --git a/meta/conf/bitbake.conf b/meta/conf/bitbake.conf
> index 64800623545..e64ce6a6dab 100644
> --- a/meta/conf/bitbake.conf
> +++ b/meta/conf/bitbake.conf
> @@ -867,7 +867,9 @@ BB_HASHBASE_WHITELIST ?= "TMPDIR FILE PATH PWD BB_TASKHASH BBPATH BBSERVER DL_DI
>       STAMPS_DIR PRSERV_DUMPDIR PRSERV_DUMPFILE PRSERV_LOCKDOWN PARALLEL_MAKE \
>       CCACHE_DIR EXTERNAL_TOOLCHAIN CCACHE CCACHE_NOHASHDIR LICENSE_PATH SDKPKGSUFFIX \
>       WARN_QA ERROR_QA WORKDIR STAMPCLEAN PKGDATA_DIR BUILD_ARCH SSTATE_PKGARCH \
> -    BB_WORKERCONTEXT BB_LIMITEDDEPS extend_recipe_sysroot DEPLOY_DIR"
> +    BB_WORKERCONTEXT BB_LIMITEDDEPS BB_UNIHASH extend_recipe_sysroot DEPLOY_DIR \
> +    SSTATE_HASHEQUIV_METHOD SSTATE_HASHEQUIV_SERVER SSTATE_HASHEQUIV_REPORT_TASKDATA \
> +    SSTATE_HASHEQUIV_OWNER"
>   BB_HASHCONFIG_WHITELIST ?= "${BB_HASHBASE_WHITELIST} DATE TIME SSH_AGENT_PID \
>       SSH_AUTH_SOCK PSEUDO_BUILD BB_ENV_EXTRAWHITE DISABLE_SANITY_CHECKS \
>       PARALLEL_MAKE BB_NUMBER_THREADS BB_ORIGENV BB_INVALIDCONF BBINCLUDED \
> diff --git a/meta/lib/oe/sstatesig.py b/meta/lib/oe/sstatesig.py
> index 18c5a353a2a..059e165c7ab 100644
> --- a/meta/lib/oe/sstatesig.py
> +++ b/meta/lib/oe/sstatesig.py
> @@ -263,10 +263,177 @@ class SignatureGeneratorOEBasicHash(bb.siggen.SignatureGeneratorBasicHash):
>           if error_msgs:
>               bb.fatal("\n".join(error_msgs))
>   
> +class SignatureGeneratorOEEquivHash(SignatureGeneratorOEBasicHash):
> +    name = "OEEquivHash"
> +
> +    def init_rundepcheck(self, data):
> +        super().init_rundepcheck(data)
> +        self.server = data.getVar('SSTATE_HASHEQUIV_SERVER')
> +        self.method = data.getVar('SSTATE_HASHEQUIV_METHOD')
> +        self.unihashes = bb.persist_data.persist('SSTATESIG_UNIHASH_CACHE_v1_' + self.method, data)
> +
> +    def get_taskdata(self):
> +        return (self.server, self.method) + super().get_taskdata()
> +
> +    def set_taskdata(self, data):
> +        self.server, self.method = data[:2]
> +        super().set_taskdata(data[2:])
> +
> +    def __get_task_unihash_key(self, task):
> +        # TODO: The key only *needs* to be the taskhash, the task is just
> +        # convenient
> +        return '%s:%s' % (task, self.taskhash[task])
> +
> +    def get_stampfile_hash(self, task):
> +        if task in self.taskhash:
> +            # If a unique hash is reported, use it as the stampfile hash. This
> +            # ensures that if a task won't be re-run if the taskhash changes,
> +            # but it would result in the same output hash
> +            unihash = self.unihashes.get(self.__get_task_unihash_key(task))
> +            if unihash is not None:
> +                return unihash
> +
> +        return super().get_stampfile_hash(task)
> +
> +    def get_unihash(self, task):
> +        import urllib
> +        import json
> +
> +        taskhash = self.taskhash[task]
> +
> +        key = self.__get_task_unihash_key(task)
> +
> +        # TODO: This cache can grow unbounded. It probably only needs to keep
> +        # for each task
> +        unihash = self.unihashes.get(key)
> +        if unihash is not None:
> +            return unihash
> +
> +        # In the absence of being able to discover a unique hash from the
> +        # server, make it be equivalent to the taskhash. The unique "hash" only
> +        # really needs to be a unique string (not even necessarily a hash), but
> +        # making it match the taskhash has a few advantages:
> +        #
> +        # 1) All of the sstate code that assumes hashes can be the same
> +        # 2) It provides maximal compatibility with builders that don't use
> +        #    an equivalency server
> +        # 3) The value is easy for multiple independent builders to derive the
> +        #    same unique hash from the same input. This means that if the
> +        #    independent builders find the same taskhash, but it isn't reported
> +        #    to the server, there is a better chance that they will agree on
> +        #    the unique hash.
> +        unihash = taskhash
> +
> +        try:
> +            url = '%s/v1/equivalent?%s' % (self.server,
> +                    urllib.parse.urlencode({'method': self.method, 'taskhash': self.taskhash[task]}))
> +
> +            request = urllib.request.Request(url)
> +            response = urllib.request.urlopen(request)
> +            data = response.read().decode('utf-8')
> +
> +            json_data = json.loads(data)
> +
> +            if json_data:
> +                unihash = json_data['unihash']
> +                # A unique hash equal to the taskhash is not very interesting,
> +                # so it is reported it at debug level 2. If they differ, that
> +                # is much more interesting, so it is reported at debug level 1
> +                bb.debug((1, 2)[unihash == taskhash], 'Found unihash %s in place of %s for %s from %s' % (unihash, taskhash, task, self.server))
> +            else:
> +                bb.debug(2, 'No reported unihash for %s:%s from %s' % (task, taskhash, self.server))
> +        except urllib.error.URLError as e:
> +            bb.warn('Failure contacting Hash Equivalence Server %s: %s' % (self.server, str(e)))
> +        except (KeyError, json.JSONDecodeError) as e:
> +            bb.warn('Poorly formatted response from %s: %s' % (self.server, str(e)))
> +
> +        self.unihashes[key] = unihash
> +        return unihash
> +
> +    def report_unihash(self, path, task, d):
> +        import urllib
> +        import json
> +        import tempfile
> +        import base64
> +
> +        taskhash = d.getVar('BB_TASKHASH')
> +        unihash = d.getVar('BB_UNIHASH')
> +        report_taskdata = d.getVar('SSTATE_HASHEQUIV_REPORT_TASKDATA') == '1'
> +        tempdir = d.getVar('T')
> +        fn = d.getVar('BB_FILENAME')
> +        key = fn + '.do_' + task + ':' + taskhash
> +
> +        # Sanity checks
> +        cache_unihash = self.unihashes.get(key)
> +        if cache_unihash is None:
> +            bb.fatal('%s not in unihash cache. Please report this error' % key)
> +
> +        if cache_unihash != unihash:
> +            bb.fatal("Cache unihash %s doesn't match BB_UNIHASH %s" % (cache_unihash, unihash))
> +
> +        sigfile = None
> +        sigfile_name = "depsig.do_%s.%d" % (task, os.getpid())
> +        sigfile_link = "depsig.do_%s" % task
> +
> +        try:
> +            call = self.method + '(path, sigfile, task, d)'
> +            sigfile = open(os.path.join(tempdir, sigfile_name), 'w+b')
> +            locs = {'path': path, 'sigfile': sigfile, 'task': task, 'd': d}
> +
> +            outhash = bb.utils.better_eval(call, locs)
> +
> +            try:
> +                url = '%s/v1/equivalent' % self.server
> +                task_data = {
> +                    'taskhash': taskhash,
> +                    'method': self.method,
> +                    'outhash': outhash,
> +                    'unihash': unihash,
> +                    'owner': d.getVar('SSTATE_HASHEQUIV_OWNER')
> +                    }
> +
> +                if report_taskdata:
> +                    sigfile.seek(0)
> +
> +                    task_data['PN'] = d.getVar('PN')
> +                    task_data['PV'] = d.getVar('PV')
> +                    task_data['PR'] = d.getVar('PR')
> +                    task_data['task'] = task
> +                    task_data['outhash_siginfo'] = sigfile.read().decode('utf-8')
> +
> +                headers = {'content-type': 'application/json'}
> +
> +                request = urllib.request.Request(url, json.dumps(task_data).encode('utf-8'), headers)
> +                response = urllib.request.urlopen(request)
> +                data = response.read().decode('utf-8')
> +
> +                json_data = json.loads(data)
> +                new_unihash = json_data['unihash']
> +
> +                if new_unihash != unihash:
> +                    bb.debug(1, 'Task %s unihash changed %s -> %s by server %s' % (taskhash, unihash, new_unihash, self.server))
> +                else:
> +                    bb.debug(1, 'Reported task %s as unihash %s to %s' % (taskhash, unihash, self.server))
> +            except urllib.error.URLError as e:
> +                bb.warn('Failure contacting Hash Equivalence Server %s: %s' % (self.server, str(e)))
> +            except (KeyError, json.JSONDecodeError) as e:
> +                bb.warn('Poorly formatted response from %s: %s' % (self.server, str(e)))
> +        finally:
> +            if sigfile:
> +                sigfile.close()
> +
> +                sigfile_link_path = os.path.join(tempdir, sigfile_link)
> +                bb.utils.remove(sigfile_link_path)
> +
> +                try:
> +                    os.symlink(sigfile_name, sigfile_link_path)
> +                except OSError:
> +                    pass
>   
>   # Insert these classes into siggen's namespace so it can see and select them
>   bb.siggen.SignatureGeneratorOEBasic = SignatureGeneratorOEBasic
>   bb.siggen.SignatureGeneratorOEBasicHash = SignatureGeneratorOEBasicHash
> +bb.siggen.SignatureGeneratorOEEquivHash = SignatureGeneratorOEEquivHash
>   
>   
>   def find_siginfo(pn, taskname, taskhashlist, d):
> 


^ permalink raw reply	[flat|nested] 158+ messages in thread

* Re: [OE-core][PATCH v7 3/3] sstate: Implement hash equivalence sstate
  2019-01-08  6:29               ` [bitbake-devel] [PATCH " Jacob Kroon
@ 2019-01-09 17:09                 ` Joshua Watt
  -1 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2019-01-09 17:09 UTC (permalink / raw)
  To: Jacob Kroon, openembedded-core, bitbake-devel

On Tue, 2019-01-08 at 07:29 +0100, Jacob Kroon wrote:
> On 1/4/19 5:20 PM, Joshua Watt wrote:
> > Converts sstate so that it can use a hash equivalence server to
> > determine if a task really needs to be rebuilt, or if it can be
> > restored
> > from a different (equivalent) sstate object.
> > 
> > The unique hashes are cached persistently using persist_data. This
> > has
> > a number of advantages:
> >   1) Unique hashes can be cached between invocations of bitbake to
> >      prevent needing to contact the server every time (which is
> > slow)
> >   2) The value of each tasks unique hash can easily be synchronized
> >      between different threads, which will be useful if bitbake is
> >      updated to do on the fly task re-hashing.
> > 
> > [YOCTO #13030]
> > 
> > Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
> > ---
> >   meta/classes/sstate.bbclass | 105 +++++++++++++++++++++--
> >   meta/conf/bitbake.conf      |   4 +-
> >   meta/lib/oe/sstatesig.py    | 167
> > ++++++++++++++++++++++++++++++++++++
> >   3 files changed, 267 insertions(+), 9 deletions(-)
> > 
> > diff --git a/meta/classes/sstate.bbclass
> > b/meta/classes/sstate.bbclass
> > index 59ebc3ab5cc..da0807d6e99 100644
> > --- a/meta/classes/sstate.bbclass
> > +++ b/meta/classes/sstate.bbclass
> > @@ -11,7 +11,7 @@ def generate_sstatefn(spec, hash, d):
> >   SSTATE_PKGARCH    = "${PACKAGE_ARCH}"
> >   SSTATE_PKGSPEC    =
> > "sstate:${PN}:${PACKAGE_ARCH}${TARGET_VENDOR}-
> > ${TARGET_OS}:${PV}:${PR}:${SSTATE_PKGARCH}:${SSTATE_VERSION}:"
> >   SSTATE_SWSPEC     =
> > "sstate:${PN}::${PV}:${PR}::${SSTATE_VERSION}:"
> > -SSTATE_PKGNAME    = "${SSTATE_EXTRAPATH}${@generate_sstatefn(d.get
> > Var('SSTATE_PKGSPEC'), d.getVar('BB_TASKHASH'), d)}"
> > +SSTATE_PKGNAME    = "${SSTATE_EXTRAPATH}${@generate_sstatefn(d.get
> > Var('SSTATE_PKGSPEC'), d.getVar('BB_UNIHASH'), d)}"
> >   SSTATE_PKG        = "${SSTATE_DIR}/${SSTATE_PKGNAME}"
> >   SSTATE_EXTRAPATH   = ""
> >   SSTATE_EXTRAPATHWILDCARD = ""
> > @@ -82,6 +82,23 @@ SSTATE_SIG_PASSPHRASE ?= ""
> >   # Whether to verify the GnUPG signatures when extracting sstate
> > archives
> >   SSTATE_VERIFY_SIG ?= "0"
> >   
> > +SSTATE_HASHEQUIV_METHOD ?= "OEOuthashBasic"
> > +SSTATE_HASHEQUIV_METHOD[doc] = "The function used to calculate the
> > output hash \
> > +    for a task, which in turn is used to determine equivalency. \
> > +    "
> > +
> > +SSTATE_HASHEQUIV_SERVER ?= ""
> > +SSTATE_HASHEQUIV_SERVER[doc] = "The hash equivalence sever. For
> > example, \
> > +    'http://192.168.0.1:5000'. Do not include a trailing slash \
> > +    "
> > +
> > +SSTATE_HASHEQUIV_REPORT_TASKDATA ?= "0"
> > +SSTATE_HASHEQUIV_REPORT_TASKDATA[doc] = "Report additional useful
> > data to the \
> > +    hash equivalency server, such as PN, PV, taskname, etc. This
> > information \
> > +    is very useful for developers looking at task data, but may
> > leak sensitive \
> > +    data if the equivalence server is public. \
> > +    "
> > +
> >   python () {
> >       if bb.data.inherits_class('native', d):
> >           d.setVar('SSTATE_PKGARCH', d.getVar('BUILD_ARCH', False))
> > @@ -640,7 +657,7 @@ def sstate_package(ss, d):
> >           return
> >   
> >       for f in (d.getVar('SSTATECREATEFUNCS') or '').split() + \
> > -             ['sstate_create_package', 'sstate_sign_package'] + \
> > +             ['sstate_report_unihash', 'sstate_create_package',
> > 'sstate_sign_package'] + \
> >                (d.getVar('SSTATEPOSTCREATEFUNCS') or '').split():
> >           # All hooks should run in SSTATE_BUILDDIR.
> >           bb.build.exec_func(f, d, (sstatebuild,))
> > @@ -764,6 +781,73 @@ python sstate_sign_package () {
> >                              d.getVar('SSTATE_SIG_PASSPHRASE'),
> > armor=False)
> >   }
> >   
> > +def OEOuthashBasic(path, sigfile, task, d):
> > +    import hashlib
> > +    import stat
> > +
> > +    def update_hash(s):
> > +        s = s.encode('utf-8')
> > +        h.update(s)
> > +        if sigfile:
> > +            sigfile.write(s)
> > +
> > +    h = hashlib.sha256()
> > +    prev_dir = os.getcwd()
> > +
> > +    try:
> > +        os.chdir(path)
> > +
> > +        update_hash("OEOuthashBasic\n")
> > +
> > +        # It is only currently useful to get equivalent hashes for
> > things that
> > +        # can be restored from sstate. Since the sstate object is
> > named using
> > +        # SSTATE_PKGSPEC and the task name, those should be
> > included in the
> > +        # output hash calculation.
> > +        update_hash("SSTATE_PKGSPEC=%s\n" %
> > d.getVar('SSTATE_PKGSPEC'))
> > +        update_hash("task=%s\n" % task)
> > +
> > +        for root, dirs, files in os.walk('.', topdown=True):
> > +            # Sort directories and files to ensure consistent
> > ordering
> > +            dirs.sort()
> > +            files.sort()
> > +
> > +            for f in files:
> > +                path = os.path.join(root, f)
> > +                s = os.lstat(path)
> > +
> > +                # Hash file path
> > +                update_hash(path + '\n')
> > +
> > +                # Hash file mode
> > +                update_hash("\tmode=0x%x\n" %
> > stat.S_IMODE(s.st_mode))
> > +                update_hash("\ttype=0x%x\n" %
> > stat.S_IFMT(s.st_mode))
> > +
> > +                if stat.S_ISBLK(s.st_mode) or
> > stat.S_ISBLK(s.st_mode):
> > +                    # Hash device major and minor
> > +                    update_hash("\tdev=%d,%d\n" %
> > (os.major(s.st_rdev), os.minor(s.st_rdev)))
> > +                elif stat.S_ISLNK(s.st_mode):
> > +                    # Hash symbolic link
> > +                    update_hash("\tsymlink=%s\n" %
> > os.readlink(path))
> > +                else:
> > +                    fh = hashlib.sha256()
> > +                    # Hash file contents
> > +                    with open(path, 'rb') as d:
> > +                        for chunk in iter(lambda: d.read(4096),
> > b""):
> > +                            fh.update(chunk)
> > +                    update_hash("\tdigest=%s\n" % fh.hexdigest())
> 
> Would it be a good idea to make the depsig.do_* files even more
> human 
> readable, considering that they could be candidates for being stored
> in 
> buildhistory ?
> 
> As an example, here's what buildhistory/.../files-in-package.txt for 
> busybox looks like:
> 
> drwxr-xr-x root       root             4096 ./bin
> lrwxrwxrwx root       root               14 ./bin/busybox ->
> busybox.nosuid
> -rwxr-xr-x root       root           547292 ./bin/busybox.nosuid
> -rwsr-xr-x root       root            50860 ./bin/busybox.suid
> lrwxrwxrwx root       root               14 ./bin/sh ->
> busybox.nosuid
> drwxr-xr-x root       root             4096 ./etc
> -rw-r--r-- root       root             2339
> ./etc/busybox.links.nosuid
> -rw-r--r-- root       root               91 ./etc/busybox.links.suid
> 

I went through the effort to try this, and I'm pretty happy with the
results except for one important distinction: It's not reproducible in
all cases because of the inclusion of the owner UID/GID (I used the
decimal user and group IDs to prevent the dependency on the names).

For any task running under fakeroot (pesudo), this works like you would
expect. However, for tasks not running under fakeroot (and possibly
others that copy files from tasks not running under fakeroot?), the
files are owned by the user that is running bitbake (e.g. You). This
makes the output hashes not shareable between different developers.

I'm not sure what the best way to address this is; The UID and GID are
an important part of the reproducibility and *should* be included in
the output hash when relevant, but I don't know yet how to determine if
they are relevant. I'm going to dig in and see if I can use "the
current task is running under fakeroot" as that distinction. If anyone
has any other ideas please chime in.



> > +    finally:
> > +        os.chdir(prev_dir)
> > +
> > +    return h.hexdigest()
> > +
> > +python sstate_report_unihash() {
> > +    report_unihash = getattr(bb.parse.siggen, 'report_unihash',
> > None)
> > +
> > +    if report_unihash:
> > +        ss = sstate_state_fromvars(d)
> > +        report_unihash(os.getcwd(), ss['task'], d)
> > +}
> > +
> >   #
> >   # Shell function to decompress and prepare a package for
> > installation
> >   # Will be run from within SSTATE_INSTDIR.
> > @@ -788,6 +872,11 @@ def sstate_checkhashes(sq_fn, sq_task,
> > sq_hash, sq_hashfn, d, siginfo=False, *,
> >       if siginfo:
> >           extension = extension + ".siginfo"
> >   
> > +    def gethash(task):
> > +        if sq_unihash is not None:
> > +            return sq_unihash[task]
> > +        return sq_hash[task]
> > +
> >       def getpathcomponents(task, d):
> >           # Magic data from BB_HASHFILENAME
> >           splithashfn = sq_hashfn[task].split(" ")
> > @@ -810,7 +899,7 @@ def sstate_checkhashes(sq_fn, sq_task, sq_hash,
> > sq_hashfn, d, siginfo=False, *,
> >   
> >           spec, extrapath, tname = getpathcomponents(task, d)
> >   
> > -        sstatefile = d.expand("${SSTATE_DIR}/" + extrapath +
> > generate_sstatefn(spec, sq_hash[task], d) + "_" + tname +
> > extension)
> > +        sstatefile = d.expand("${SSTATE_DIR}/" + extrapath +
> > generate_sstatefn(spec, gethash(task), d) + "_" + tname +
> > extension)
> >   
> >           if os.path.exists(sstatefile):
> >               bb.debug(2, "SState: Found valid sstate file %s" %
> > sstatefile)
> > @@ -872,7 +961,7 @@ def sstate_checkhashes(sq_fn, sq_task, sq_hash,
> > sq_hashfn, d, siginfo=False, *,
> >               if task in ret:
> >                   continue
> >               spec, extrapath, tname = getpathcomponents(task, d)
> > -            sstatefile = d.expand(extrapath +
> > generate_sstatefn(spec, sq_hash[task], d) + "_" + tname +
> > extension)
> > +            sstatefile = d.expand(extrapath +
> > generate_sstatefn(spec, gethash(task), d) + "_" + tname +
> > extension)
> >               tasklist.append((task, sstatefile))
> >   
> >           if tasklist:
> > @@ -898,12 +987,12 @@ def sstate_checkhashes(sq_fn, sq_task,
> > sq_hash, sq_hashfn, d, siginfo=False, *,
> >           evdata = {'missed': [], 'found': []};
> >           for task in missed:
> >               spec, extrapath, tname = getpathcomponents(task, d)
> > -            sstatefile = d.expand(extrapath +
> > generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + ".tgz")
> > -            evdata['missed'].append( (sq_fn[task], sq_task[task],
> > sq_hash[task], sstatefile ) )
> > +            sstatefile = d.expand(extrapath +
> > generate_sstatefn(spec, gethash(task), d) + "_" + tname + ".tgz")
> > +            evdata['missed'].append( (sq_fn[task], sq_task[task],
> > gethash(task), sstatefile ) )
> >           for task in ret:
> >               spec, extrapath, tname = getpathcomponents(task, d)
> > -            sstatefile = d.expand(extrapath +
> > generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + ".tgz")
> > -            evdata['found'].append( (sq_fn[task], sq_task[task],
> > sq_hash[task], sstatefile ) )
> > +            sstatefile = d.expand(extrapath +
> > generate_sstatefn(spec, gethash(task), d) + "_" + tname + ".tgz")
> > +            evdata['found'].append( (sq_fn[task], sq_task[task],
> > gethash(task), sstatefile ) )
> >           bb.event.fire(bb.event.MetadataEvent("MissedSstate",
> > evdata), d)
> >   
> >       # Print some summary statistics about the current task
> > completion and how much sstate
> > diff --git a/meta/conf/bitbake.conf b/meta/conf/bitbake.conf
> > index 64800623545..e64ce6a6dab 100644
> > --- a/meta/conf/bitbake.conf
> > +++ b/meta/conf/bitbake.conf
> > @@ -867,7 +867,9 @@ BB_HASHBASE_WHITELIST ?= "TMPDIR FILE PATH PWD
> > BB_TASKHASH BBPATH BBSERVER DL_DI
> >       STAMPS_DIR PRSERV_DUMPDIR PRSERV_DUMPFILE PRSERV_LOCKDOWN
> > PARALLEL_MAKE \
> >       CCACHE_DIR EXTERNAL_TOOLCHAIN CCACHE CCACHE_NOHASHDIR
> > LICENSE_PATH SDKPKGSUFFIX \
> >       WARN_QA ERROR_QA WORKDIR STAMPCLEAN PKGDATA_DIR BUILD_ARCH
> > SSTATE_PKGARCH \
> > -    BB_WORKERCONTEXT BB_LIMITEDDEPS extend_recipe_sysroot
> > DEPLOY_DIR"
> > +    BB_WORKERCONTEXT BB_LIMITEDDEPS BB_UNIHASH
> > extend_recipe_sysroot DEPLOY_DIR \
> > +    SSTATE_HASHEQUIV_METHOD SSTATE_HASHEQUIV_SERVER
> > SSTATE_HASHEQUIV_REPORT_TASKDATA \
> > +    SSTATE_HASHEQUIV_OWNER"
> >   BB_HASHCONFIG_WHITELIST ?= "${BB_HASHBASE_WHITELIST} DATE TIME
> > SSH_AGENT_PID \
> >       SSH_AUTH_SOCK PSEUDO_BUILD BB_ENV_EXTRAWHITE
> > DISABLE_SANITY_CHECKS \
> >       PARALLEL_MAKE BB_NUMBER_THREADS BB_ORIGENV BB_INVALIDCONF
> > BBINCLUDED \
> > diff --git a/meta/lib/oe/sstatesig.py b/meta/lib/oe/sstatesig.py
> > index 18c5a353a2a..059e165c7ab 100644
> > --- a/meta/lib/oe/sstatesig.py
> > +++ b/meta/lib/oe/sstatesig.py
> > @@ -263,10 +263,177 @@ class
> > SignatureGeneratorOEBasicHash(bb.siggen.SignatureGeneratorBasicHash
> > ):
> >           if error_msgs:
> >               bb.fatal("\n".join(error_msgs))
> >   
> > +class
> > SignatureGeneratorOEEquivHash(SignatureGeneratorOEBasicHash):
> > +    name = "OEEquivHash"
> > +
> > +    def init_rundepcheck(self, data):
> > +        super().init_rundepcheck(data)
> > +        self.server = data.getVar('SSTATE_HASHEQUIV_SERVER')
> > +        self.method = data.getVar('SSTATE_HASHEQUIV_METHOD')
> > +        self.unihashes =
> > bb.persist_data.persist('SSTATESIG_UNIHASH_CACHE_v1_' +
> > self.method, data)
> > +
> > +    def get_taskdata(self):
> > +        return (self.server, self.method) + super().get_taskdata()
> > +
> > +    def set_taskdata(self, data):
> > +        self.server, self.method = data[:2]
> > +        super().set_taskdata(data[2:])
> > +
> > +    def __get_task_unihash_key(self, task):
> > +        # TODO: The key only *needs* to be the taskhash, the task
> > is just
> > +        # convenient
> > +        return '%s:%s' % (task, self.taskhash[task])
> > +
> > +    def get_stampfile_hash(self, task):
> > +        if task in self.taskhash:
> > +            # If a unique hash is reported, use it as the
> > stampfile hash. This
> > +            # ensures that if a task won't be re-run if the
> > taskhash changes,
> > +            # but it would result in the same output hash
> > +            unihash =
> > self.unihashes.get(self.__get_task_unihash_key(task))
> > +            if unihash is not None:
> > +                return unihash
> > +
> > +        return super().get_stampfile_hash(task)
> > +
> > +    def get_unihash(self, task):
> > +        import urllib
> > +        import json
> > +
> > +        taskhash = self.taskhash[task]
> > +
> > +        key = self.__get_task_unihash_key(task)
> > +
> > +        # TODO: This cache can grow unbounded. It probably only
> > needs to keep
> > +        # for each task
> > +        unihash = self.unihashes.get(key)
> > +        if unihash is not None:
> > +            return unihash
> > +
> > +        # In the absence of being able to discover a unique hash
> > from the
> > +        # server, make it be equivalent to the taskhash. The
> > unique "hash" only
> > +        # really needs to be a unique string (not even necessarily
> > a hash), but
> > +        # making it match the taskhash has a few advantages:
> > +        #
> > +        # 1) All of the sstate code that assumes hashes can be the
> > same
> > +        # 2) It provides maximal compatibility with builders that
> > don't use
> > +        #    an equivalency server
> > +        # 3) The value is easy for multiple independent builders
> > to derive the
> > +        #    same unique hash from the same input. This means that
> > if the
> > +        #    independent builders find the same taskhash, but it
> > isn't reported
> > +        #    to the server, there is a better chance that they
> > will agree on
> > +        #    the unique hash.
> > +        unihash = taskhash
> > +
> > +        try:
> > +            url = '%s/v1/equivalent?%s' % (self.server,
> > +                    urllib.parse.urlencode({'method': self.method,
> > 'taskhash': self.taskhash[task]}))
> > +
> > +            request = urllib.request.Request(url)
> > +            response = urllib.request.urlopen(request)
> > +            data = response.read().decode('utf-8')
> > +
> > +            json_data = json.loads(data)
> > +
> > +            if json_data:
> > +                unihash = json_data['unihash']
> > +                # A unique hash equal to the taskhash is not very
> > interesting,
> > +                # so it is reported it at debug level 2. If they
> > differ, that
> > +                # is much more interesting, so it is reported at
> > debug level 1
> > +                bb.debug((1, 2)[unihash == taskhash], 'Found
> > unihash %s in place of %s for %s from %s' % (unihash, taskhash,
> > task, self.server))
> > +            else:
> > +                bb.debug(2, 'No reported unihash for %s:%s from
> > %s' % (task, taskhash, self.server))
> > +        except urllib.error.URLError as e:
> > +            bb.warn('Failure contacting Hash Equivalence Server
> > %s: %s' % (self.server, str(e)))
> > +        except (KeyError, json.JSONDecodeError) as e:
> > +            bb.warn('Poorly formatted response from %s: %s' %
> > (self.server, str(e)))
> > +
> > +        self.unihashes[key] = unihash
> > +        return unihash
> > +
> > +    def report_unihash(self, path, task, d):
> > +        import urllib
> > +        import json
> > +        import tempfile
> > +        import base64
> > +
> > +        taskhash = d.getVar('BB_TASKHASH')
> > +        unihash = d.getVar('BB_UNIHASH')
> > +        report_taskdata =
> > d.getVar('SSTATE_HASHEQUIV_REPORT_TASKDATA') == '1'
> > +        tempdir = d.getVar('T')
> > +        fn = d.getVar('BB_FILENAME')
> > +        key = fn + '.do_' + task + ':' + taskhash
> > +
> > +        # Sanity checks
> > +        cache_unihash = self.unihashes.get(key)
> > +        if cache_unihash is None:
> > +            bb.fatal('%s not in unihash cache. Please report this
> > error' % key)
> > +
> > +        if cache_unihash != unihash:
> > +            bb.fatal("Cache unihash %s doesn't match BB_UNIHASH
> > %s" % (cache_unihash, unihash))
> > +
> > +        sigfile = None
> > +        sigfile_name = "depsig.do_%s.%d" % (task, os.getpid())
> > +        sigfile_link = "depsig.do_%s" % task
> > +
> > +        try:
> > +            call = self.method + '(path, sigfile, task, d)'
> > +            sigfile = open(os.path.join(tempdir, sigfile_name),
> > 'w+b')
> > +            locs = {'path': path, 'sigfile': sigfile, 'task':
> > task, 'd': d}
> > +
> > +            outhash = bb.utils.better_eval(call, locs)
> > +
> > +            try:
> > +                url = '%s/v1/equivalent' % self.server
> > +                task_data = {
> > +                    'taskhash': taskhash,
> > +                    'method': self.method,
> > +                    'outhash': outhash,
> > +                    'unihash': unihash,
> > +                    'owner': d.getVar('SSTATE_HASHEQUIV_OWNER')
> > +                    }
> > +
> > +                if report_taskdata:
> > +                    sigfile.seek(0)
> > +
> > +                    task_data['PN'] = d.getVar('PN')
> > +                    task_data['PV'] = d.getVar('PV')
> > +                    task_data['PR'] = d.getVar('PR')
> > +                    task_data['task'] = task
> > +                    task_data['outhash_siginfo'] =
> > sigfile.read().decode('utf-8')
> > +
> > +                headers = {'content-type': 'application/json'}
> > +
> > +                request = urllib.request.Request(url,
> > json.dumps(task_data).encode('utf-8'), headers)
> > +                response = urllib.request.urlopen(request)
> > +                data = response.read().decode('utf-8')
> > +
> > +                json_data = json.loads(data)
> > +                new_unihash = json_data['unihash']
> > +
> > +                if new_unihash != unihash:
> > +                    bb.debug(1, 'Task %s unihash changed %s -> %s
> > by server %s' % (taskhash, unihash, new_unihash, self.server))
> > +                else:
> > +                    bb.debug(1, 'Reported task %s as unihash %s to
> > %s' % (taskhash, unihash, self.server))
> > +            except urllib.error.URLError as e:
> > +                bb.warn('Failure contacting Hash Equivalence
> > Server %s: %s' % (self.server, str(e)))
> > +            except (KeyError, json.JSONDecodeError) as e:
> > +                bb.warn('Poorly formatted response from %s: %s' %
> > (self.server, str(e)))
> > +        finally:
> > +            if sigfile:
> > +                sigfile.close()
> > +
> > +                sigfile_link_path = os.path.join(tempdir,
> > sigfile_link)
> > +                bb.utils.remove(sigfile_link_path)
> > +
> > +                try:
> > +                    os.symlink(sigfile_name, sigfile_link_path)
> > +                except OSError:
> > +                    pass
> >   
> >   # Insert these classes into siggen's namespace so it can see and
> > select them
> >   bb.siggen.SignatureGeneratorOEBasic = SignatureGeneratorOEBasic
> >   bb.siggen.SignatureGeneratorOEBasicHash =
> > SignatureGeneratorOEBasicHash
> > +bb.siggen.SignatureGeneratorOEEquivHash =
> > SignatureGeneratorOEEquivHash
> >   
> >   
> >   def find_siginfo(pn, taskname, taskhashlist, d):
> > 
-- 
Joshua Watt <JPEWhacker@gmail.com>



^ permalink raw reply	[flat|nested] 158+ messages in thread

* Re: [bitbake-devel] [PATCH v7 3/3] sstate: Implement hash equivalence sstate
@ 2019-01-09 17:09                 ` Joshua Watt
  0 siblings, 0 replies; 158+ messages in thread
From: Joshua Watt @ 2019-01-09 17:09 UTC (permalink / raw)
  To: Jacob Kroon, openembedded-core, bitbake-devel

On Tue, 2019-01-08 at 07:29 +0100, Jacob Kroon wrote:
> On 1/4/19 5:20 PM, Joshua Watt wrote:
> > Converts sstate so that it can use a hash equivalence server to
> > determine if a task really needs to be rebuilt, or if it can be
> > restored
> > from a different (equivalent) sstate object.
> > 
> > The unique hashes are cached persistently using persist_data. This
> > has
> > a number of advantages:
> >   1) Unique hashes can be cached between invocations of bitbake to
> >      prevent needing to contact the server every time (which is
> > slow)
> >   2) The value of each tasks unique hash can easily be synchronized
> >      between different threads, which will be useful if bitbake is
> >      updated to do on the fly task re-hashing.
> > 
> > [YOCTO #13030]
> > 
> > Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
> > ---
> >   meta/classes/sstate.bbclass | 105 +++++++++++++++++++++--
> >   meta/conf/bitbake.conf      |   4 +-
> >   meta/lib/oe/sstatesig.py    | 167
> > ++++++++++++++++++++++++++++++++++++
> >   3 files changed, 267 insertions(+), 9 deletions(-)
> > 
> > diff --git a/meta/classes/sstate.bbclass
> > b/meta/classes/sstate.bbclass
> > index 59ebc3ab5cc..da0807d6e99 100644
> > --- a/meta/classes/sstate.bbclass
> > +++ b/meta/classes/sstate.bbclass
> > @@ -11,7 +11,7 @@ def generate_sstatefn(spec, hash, d):
> >   SSTATE_PKGARCH    = "${PACKAGE_ARCH}"
> >   SSTATE_PKGSPEC    =
> > "sstate:${PN}:${PACKAGE_ARCH}${TARGET_VENDOR}-
> > ${TARGET_OS}:${PV}:${PR}:${SSTATE_PKGARCH}:${SSTATE_VERSION}:"
> >   SSTATE_SWSPEC     =
> > "sstate:${PN}::${PV}:${PR}::${SSTATE_VERSION}:"
> > -SSTATE_PKGNAME    = "${SSTATE_EXTRAPATH}${@generate_sstatefn(d.get
> > Var('SSTATE_PKGSPEC'), d.getVar('BB_TASKHASH'), d)}"
> > +SSTATE_PKGNAME    = "${SSTATE_EXTRAPATH}${@generate_sstatefn(d.get
> > Var('SSTATE_PKGSPEC'), d.getVar('BB_UNIHASH'), d)}"
> >   SSTATE_PKG        = "${SSTATE_DIR}/${SSTATE_PKGNAME}"
> >   SSTATE_EXTRAPATH   = ""
> >   SSTATE_EXTRAPATHWILDCARD = ""
> > @@ -82,6 +82,23 @@ SSTATE_SIG_PASSPHRASE ?= ""
> >   # Whether to verify the GnUPG signatures when extracting sstate
> > archives
> >   SSTATE_VERIFY_SIG ?= "0"
> >   
> > +SSTATE_HASHEQUIV_METHOD ?= "OEOuthashBasic"
> > +SSTATE_HASHEQUIV_METHOD[doc] = "The function used to calculate the
> > output hash \
> > +    for a task, which in turn is used to determine equivalency. \
> > +    "
> > +
> > +SSTATE_HASHEQUIV_SERVER ?= ""
> > +SSTATE_HASHEQUIV_SERVER[doc] = "The hash equivalence sever. For
> > example, \
> > +    'http://192.168.0.1:5000'. Do not include a trailing slash \
> > +    "
> > +
> > +SSTATE_HASHEQUIV_REPORT_TASKDATA ?= "0"
> > +SSTATE_HASHEQUIV_REPORT_TASKDATA[doc] = "Report additional useful
> > data to the \
> > +    hash equivalency server, such as PN, PV, taskname, etc. This
> > information \
> > +    is very useful for developers looking at task data, but may
> > leak sensitive \
> > +    data if the equivalence server is public. \
> > +    "
> > +
> >   python () {
> >       if bb.data.inherits_class('native', d):
> >           d.setVar('SSTATE_PKGARCH', d.getVar('BUILD_ARCH', False))
> > @@ -640,7 +657,7 @@ def sstate_package(ss, d):
> >           return
> >   
> >       for f in (d.getVar('SSTATECREATEFUNCS') or '').split() + \
> > -             ['sstate_create_package', 'sstate_sign_package'] + \
> > +             ['sstate_report_unihash', 'sstate_create_package',
> > 'sstate_sign_package'] + \
> >                (d.getVar('SSTATEPOSTCREATEFUNCS') or '').split():
> >           # All hooks should run in SSTATE_BUILDDIR.
> >           bb.build.exec_func(f, d, (sstatebuild,))
> > @@ -764,6 +781,73 @@ python sstate_sign_package () {
> >                              d.getVar('SSTATE_SIG_PASSPHRASE'),
> > armor=False)
> >   }
> >   
> > +def OEOuthashBasic(path, sigfile, task, d):
> > +    import hashlib
> > +    import stat
> > +
> > +    def update_hash(s):
> > +        s = s.encode('utf-8')
> > +        h.update(s)
> > +        if sigfile:
> > +            sigfile.write(s)
> > +
> > +    h = hashlib.sha256()
> > +    prev_dir = os.getcwd()
> > +
> > +    try:
> > +        os.chdir(path)
> > +
> > +        update_hash("OEOuthashBasic\n")
> > +
> > +        # It is only currently useful to get equivalent hashes for
> > things that
> > +        # can be restored from sstate. Since the sstate object is
> > named using
> > +        # SSTATE_PKGSPEC and the task name, those should be
> > included in the
> > +        # output hash calculation.
> > +        update_hash("SSTATE_PKGSPEC=%s\n" %
> > d.getVar('SSTATE_PKGSPEC'))
> > +        update_hash("task=%s\n" % task)
> > +
> > +        for root, dirs, files in os.walk('.', topdown=True):
> > +            # Sort directories and files to ensure consistent
> > ordering
> > +            dirs.sort()
> > +            files.sort()
> > +
> > +            for f in files:
> > +                path = os.path.join(root, f)
> > +                s = os.lstat(path)
> > +
> > +                # Hash file path
> > +                update_hash(path + '\n')
> > +
> > +                # Hash file mode
> > +                update_hash("\tmode=0x%x\n" %
> > stat.S_IMODE(s.st_mode))
> > +                update_hash("\ttype=0x%x\n" %
> > stat.S_IFMT(s.st_mode))
> > +
> > +                if stat.S_ISBLK(s.st_mode) or
> > stat.S_ISBLK(s.st_mode):
> > +                    # Hash device major and minor
> > +                    update_hash("\tdev=%d,%d\n" %
> > (os.major(s.st_rdev), os.minor(s.st_rdev)))
> > +                elif stat.S_ISLNK(s.st_mode):
> > +                    # Hash symbolic link
> > +                    update_hash("\tsymlink=%s\n" %
> > os.readlink(path))
> > +                else:
> > +                    fh = hashlib.sha256()
> > +                    # Hash file contents
> > +                    with open(path, 'rb') as d:
> > +                        for chunk in iter(lambda: d.read(4096),
> > b""):
> > +                            fh.update(chunk)
> > +                    update_hash("\tdigest=%s\n" % fh.hexdigest())
> 
> Would it be a good idea to make the depsig.do_* files even more
> human 
> readable, considering that they could be candidates for being stored
> in 
> buildhistory ?
> 
> As an example, here's what buildhistory/.../files-in-package.txt for 
> busybox looks like:
> 
> drwxr-xr-x root       root             4096 ./bin
> lrwxrwxrwx root       root               14 ./bin/busybox ->
> busybox.nosuid
> -rwxr-xr-x root       root           547292 ./bin/busybox.nosuid
> -rwsr-xr-x root       root            50860 ./bin/busybox.suid
> lrwxrwxrwx root       root               14 ./bin/sh ->
> busybox.nosuid
> drwxr-xr-x root       root             4096 ./etc
> -rw-r--r-- root       root             2339
> ./etc/busybox.links.nosuid
> -rw-r--r-- root       root               91 ./etc/busybox.links.suid
> 

I went through the effort to try this, and I'm pretty happy with the
results except for one important distinction: It's not reproducible in
all cases because of the inclusion of the owner UID/GID (I used the
decimal user and group IDs to prevent the dependency on the names).

For any task running under fakeroot (pesudo), this works like you would
expect. However, for tasks not running under fakeroot (and possibly
others that copy files from tasks not running under fakeroot?), the
files are owned by the user that is running bitbake (e.g. You). This
makes the output hashes not shareable between different developers.

I'm not sure what the best way to address this is; The UID and GID are
an important part of the reproducibility and *should* be included in
the output hash when relevant, but I don't know yet how to determine if
they are relevant. I'm going to dig in and see if I can use "the
current task is running under fakeroot" as that distinction. If anyone
has any other ideas please chime in.



> > +    finally:
> > +        os.chdir(prev_dir)
> > +
> > +    return h.hexdigest()
> > +
> > +python sstate_report_unihash() {
> > +    report_unihash = getattr(bb.parse.siggen, 'report_unihash',
> > None)
> > +
> > +    if report_unihash:
> > +        ss = sstate_state_fromvars(d)
> > +        report_unihash(os.getcwd(), ss['task'], d)
> > +}
> > +
> >   #
> >   # Shell function to decompress and prepare a package for
> > installation
> >   # Will be run from within SSTATE_INSTDIR.
> > @@ -788,6 +872,11 @@ def sstate_checkhashes(sq_fn, sq_task,
> > sq_hash, sq_hashfn, d, siginfo=False, *,
> >       if siginfo:
> >           extension = extension + ".siginfo"
> >   
> > +    def gethash(task):
> > +        if sq_unihash is not None:
> > +            return sq_unihash[task]
> > +        return sq_hash[task]
> > +
> >       def getpathcomponents(task, d):
> >           # Magic data from BB_HASHFILENAME
> >           splithashfn = sq_hashfn[task].split(" ")
> > @@ -810,7 +899,7 @@ def sstate_checkhashes(sq_fn, sq_task, sq_hash,
> > sq_hashfn, d, siginfo=False, *,
> >   
> >           spec, extrapath, tname = getpathcomponents(task, d)
> >   
> > -        sstatefile = d.expand("${SSTATE_DIR}/" + extrapath +
> > generate_sstatefn(spec, sq_hash[task], d) + "_" + tname +
> > extension)
> > +        sstatefile = d.expand("${SSTATE_DIR}/" + extrapath +
> > generate_sstatefn(spec, gethash(task), d) + "_" + tname +
> > extension)
> >   
> >           if os.path.exists(sstatefile):
> >               bb.debug(2, "SState: Found valid sstate file %s" %
> > sstatefile)
> > @@ -872,7 +961,7 @@ def sstate_checkhashes(sq_fn, sq_task, sq_hash,
> > sq_hashfn, d, siginfo=False, *,
> >               if task in ret:
> >                   continue
> >               spec, extrapath, tname = getpathcomponents(task, d)
> > -            sstatefile = d.expand(extrapath +
> > generate_sstatefn(spec, sq_hash[task], d) + "_" + tname +
> > extension)
> > +            sstatefile = d.expand(extrapath +
> > generate_sstatefn(spec, gethash(task), d) + "_" + tname +
> > extension)
> >               tasklist.append((task, sstatefile))
> >   
> >           if tasklist:
> > @@ -898,12 +987,12 @@ def sstate_checkhashes(sq_fn, sq_task,
> > sq_hash, sq_hashfn, d, siginfo=False, *,
> >           evdata = {'missed': [], 'found': []};
> >           for task in missed:
> >               spec, extrapath, tname = getpathcomponents(task, d)
> > -            sstatefile = d.expand(extrapath +
> > generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + ".tgz")
> > -            evdata['missed'].append( (sq_fn[task], sq_task[task],
> > sq_hash[task], sstatefile ) )
> > +            sstatefile = d.expand(extrapath +
> > generate_sstatefn(spec, gethash(task), d) + "_" + tname + ".tgz")
> > +            evdata['missed'].append( (sq_fn[task], sq_task[task],
> > gethash(task), sstatefile ) )
> >           for task in ret:
> >               spec, extrapath, tname = getpathcomponents(task, d)
> > -            sstatefile = d.expand(extrapath +
> > generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + ".tgz")
> > -            evdata['found'].append( (sq_fn[task], sq_task[task],
> > sq_hash[task], sstatefile ) )
> > +            sstatefile = d.expand(extrapath +
> > generate_sstatefn(spec, gethash(task), d) + "_" + tname + ".tgz")
> > +            evdata['found'].append( (sq_fn[task], sq_task[task],
> > gethash(task), sstatefile ) )
> >           bb.event.fire(bb.event.MetadataEvent("MissedSstate",
> > evdata), d)
> >   
> >       # Print some summary statistics about the current task
> > completion and how much sstate
> > diff --git a/meta/conf/bitbake.conf b/meta/conf/bitbake.conf
> > index 64800623545..e64ce6a6dab 100644
> > --- a/meta/conf/bitbake.conf
> > +++ b/meta/conf/bitbake.conf
> > @@ -867,7 +867,9 @@ BB_HASHBASE_WHITELIST ?= "TMPDIR FILE PATH PWD
> > BB_TASKHASH BBPATH BBSERVER DL_DI
> >       STAMPS_DIR PRSERV_DUMPDIR PRSERV_DUMPFILE PRSERV_LOCKDOWN
> > PARALLEL_MAKE \
> >       CCACHE_DIR EXTERNAL_TOOLCHAIN CCACHE CCACHE_NOHASHDIR
> > LICENSE_PATH SDKPKGSUFFIX \
> >       WARN_QA ERROR_QA WORKDIR STAMPCLEAN PKGDATA_DIR BUILD_ARCH
> > SSTATE_PKGARCH \
> > -    BB_WORKERCONTEXT BB_LIMITEDDEPS extend_recipe_sysroot
> > DEPLOY_DIR"
> > +    BB_WORKERCONTEXT BB_LIMITEDDEPS BB_UNIHASH
> > extend_recipe_sysroot DEPLOY_DIR \
> > +    SSTATE_HASHEQUIV_METHOD SSTATE_HASHEQUIV_SERVER
> > SSTATE_HASHEQUIV_REPORT_TASKDATA \
> > +    SSTATE_HASHEQUIV_OWNER"
> >   BB_HASHCONFIG_WHITELIST ?= "${BB_HASHBASE_WHITELIST} DATE TIME
> > SSH_AGENT_PID \
> >       SSH_AUTH_SOCK PSEUDO_BUILD BB_ENV_EXTRAWHITE
> > DISABLE_SANITY_CHECKS \
> >       PARALLEL_MAKE BB_NUMBER_THREADS BB_ORIGENV BB_INVALIDCONF
> > BBINCLUDED \
> > diff --git a/meta/lib/oe/sstatesig.py b/meta/lib/oe/sstatesig.py
> > index 18c5a353a2a..059e165c7ab 100644
> > --- a/meta/lib/oe/sstatesig.py
> > +++ b/meta/lib/oe/sstatesig.py
> > @@ -263,10 +263,177 @@ class
> > SignatureGeneratorOEBasicHash(bb.siggen.SignatureGeneratorBasicHash
> > ):
> >           if error_msgs:
> >               bb.fatal("\n".join(error_msgs))
> >   
> > +class
> > SignatureGeneratorOEEquivHash(SignatureGeneratorOEBasicHash):
> > +    name = "OEEquivHash"
> > +
> > +    def init_rundepcheck(self, data):
> > +        super().init_rundepcheck(data)
> > +        self.server = data.getVar('SSTATE_HASHEQUIV_SERVER')
> > +        self.method = data.getVar('SSTATE_HASHEQUIV_METHOD')
> > +        self.unihashes =
> > bb.persist_data.persist('SSTATESIG_UNIHASH_CACHE_v1_' +
> > self.method, data)
> > +
> > +    def get_taskdata(self):
> > +        return (self.server, self.method) + super().get_taskdata()
> > +
> > +    def set_taskdata(self, data):
> > +        self.server, self.method = data[:2]
> > +        super().set_taskdata(data[2:])
> > +
> > +    def __get_task_unihash_key(self, task):
> > +        # TODO: The key only *needs* to be the taskhash, the task
> > is just
> > +        # convenient
> > +        return '%s:%s' % (task, self.taskhash[task])
> > +
> > +    def get_stampfile_hash(self, task):
> > +        if task in self.taskhash:
> > +            # If a unique hash is reported, use it as the
> > stampfile hash. This
> > +            # ensures that if a task won't be re-run if the
> > taskhash changes,
> > +            # but it would result in the same output hash
> > +            unihash =
> > self.unihashes.get(self.__get_task_unihash_key(task))
> > +            if unihash is not None:
> > +                return unihash
> > +
> > +        return super().get_stampfile_hash(task)
> > +
> > +    def get_unihash(self, task):
> > +        import urllib
> > +        import json
> > +
> > +        taskhash = self.taskhash[task]
> > +
> > +        key = self.__get_task_unihash_key(task)
> > +
> > +        # TODO: This cache can grow unbounded. It probably only
> > needs to keep
> > +        # for each task
> > +        unihash = self.unihashes.get(key)
> > +        if unihash is not None:
> > +            return unihash
> > +
> > +        # In the absence of being able to discover a unique hash
> > from the
> > +        # server, make it be equivalent to the taskhash. The
> > unique "hash" only
> > +        # really needs to be a unique string (not even necessarily
> > a hash), but
> > +        # making it match the taskhash has a few advantages:
> > +        #
> > +        # 1) All of the sstate code that assumes hashes can be the
> > same
> > +        # 2) It provides maximal compatibility with builders that
> > don't use
> > +        #    an equivalency server
> > +        # 3) The value is easy for multiple independent builders
> > to derive the
> > +        #    same unique hash from the same input. This means that
> > if the
> > +        #    independent builders find the same taskhash, but it
> > isn't reported
> > +        #    to the server, there is a better chance that they
> > will agree on
> > +        #    the unique hash.
> > +        unihash = taskhash
> > +
> > +        try:
> > +            url = '%s/v1/equivalent?%s' % (self.server,
> > +                    urllib.parse.urlencode({'method': self.method,
> > 'taskhash': self.taskhash[task]}))
> > +
> > +            request = urllib.request.Request(url)
> > +            response = urllib.request.urlopen(request)
> > +            data = response.read().decode('utf-8')
> > +
> > +            json_data = json.loads(data)
> > +
> > +            if json_data:
> > +                unihash = json_data['unihash']
> > +                # A unique hash equal to the taskhash is not very
> > interesting,
> > +                # so it is reported it at debug level 2. If they
> > differ, that
> > +                # is much more interesting, so it is reported at
> > debug level 1
> > +                bb.debug((1, 2)[unihash == taskhash], 'Found
> > unihash %s in place of %s for %s from %s' % (unihash, taskhash,
> > task, self.server))
> > +            else:
> > +                bb.debug(2, 'No reported unihash for %s:%s from
> > %s' % (task, taskhash, self.server))
> > +        except urllib.error.URLError as e:
> > +            bb.warn('Failure contacting Hash Equivalence Server
> > %s: %s' % (self.server, str(e)))
> > +        except (KeyError, json.JSONDecodeError) as e:
> > +            bb.warn('Poorly formatted response from %s: %s' %
> > (self.server, str(e)))
> > +
> > +        self.unihashes[key] = unihash
> > +        return unihash
> > +
> > +    def report_unihash(self, path, task, d):
> > +        import urllib
> > +        import json
> > +        import tempfile
> > +        import base64
> > +
> > +        taskhash = d.getVar('BB_TASKHASH')
> > +        unihash = d.getVar('BB_UNIHASH')
> > +        report_taskdata =
> > d.getVar('SSTATE_HASHEQUIV_REPORT_TASKDATA') == '1'
> > +        tempdir = d.getVar('T')
> > +        fn = d.getVar('BB_FILENAME')
> > +        key = fn + '.do_' + task + ':' + taskhash
> > +
> > +        # Sanity checks
> > +        cache_unihash = self.unihashes.get(key)
> > +        if cache_unihash is None:
> > +            bb.fatal('%s not in unihash cache. Please report this
> > error' % key)
> > +
> > +        if cache_unihash != unihash:
> > +            bb.fatal("Cache unihash %s doesn't match BB_UNIHASH
> > %s" % (cache_unihash, unihash))
> > +
> > +        sigfile = None
> > +        sigfile_name = "depsig.do_%s.%d" % (task, os.getpid())
> > +        sigfile_link = "depsig.do_%s" % task
> > +
> > +        try:
> > +            call = self.method + '(path, sigfile, task, d)'
> > +            sigfile = open(os.path.join(tempdir, sigfile_name),
> > 'w+b')
> > +            locs = {'path': path, 'sigfile': sigfile, 'task':
> > task, 'd': d}
> > +
> > +            outhash = bb.utils.better_eval(call, locs)
> > +
> > +            try:
> > +                url = '%s/v1/equivalent' % self.server
> > +                task_data = {
> > +                    'taskhash': taskhash,
> > +                    'method': self.method,
> > +                    'outhash': outhash,
> > +                    'unihash': unihash,
> > +                    'owner': d.getVar('SSTATE_HASHEQUIV_OWNER')
> > +                    }
> > +
> > +                if report_taskdata:
> > +                    sigfile.seek(0)
> > +
> > +                    task_data['PN'] = d.getVar('PN')
> > +                    task_data['PV'] = d.getVar('PV')
> > +                    task_data['PR'] = d.getVar('PR')
> > +                    task_data['task'] = task
> > +                    task_data['outhash_siginfo'] =
> > sigfile.read().decode('utf-8')
> > +
> > +                headers = {'content-type': 'application/json'}
> > +
> > +                request = urllib.request.Request(url,
> > json.dumps(task_data).encode('utf-8'), headers)
> > +                response = urllib.request.urlopen(request)
> > +                data = response.read().decode('utf-8')
> > +
> > +                json_data = json.loads(data)
> > +                new_unihash = json_data['unihash']
> > +
> > +                if new_unihash != unihash:
> > +                    bb.debug(1, 'Task %s unihash changed %s -> %s
> > by server %s' % (taskhash, unihash, new_unihash, self.server))
> > +                else:
> > +                    bb.debug(1, 'Reported task %s as unihash %s to
> > %s' % (taskhash, unihash, self.server))
> > +            except urllib.error.URLError as e:
> > +                bb.warn('Failure contacting Hash Equivalence
> > Server %s: %s' % (self.server, str(e)))
> > +            except (KeyError, json.JSONDecodeError) as e:
> > +                bb.warn('Poorly formatted response from %s: %s' %
> > (self.server, str(e)))
> > +        finally:
> > +            if sigfile:
> > +                sigfile.close()
> > +
> > +                sigfile_link_path = os.path.join(tempdir,
> > sigfile_link)
> > +                bb.utils.remove(sigfile_link_path)
> > +
> > +                try:
> > +                    os.symlink(sigfile_name, sigfile_link_path)
> > +                except OSError:
> > +                    pass
> >   
> >   # Insert these classes into siggen's namespace so it can see and
> > select them
> >   bb.siggen.SignatureGeneratorOEBasic = SignatureGeneratorOEBasic
> >   bb.siggen.SignatureGeneratorOEBasicHash =
> > SignatureGeneratorOEBasicHash
> > +bb.siggen.SignatureGeneratorOEEquivHash =
> > SignatureGeneratorOEEquivHash
> >   
> >   
> >   def find_siginfo(pn, taskname, taskhashlist, d):
> > 
-- 
Joshua Watt <JPEWhacker@gmail.com>



^ permalink raw reply	[flat|nested] 158+ messages in thread

* Re: [OE-core][PATCH v7 3/3] sstate: Implement hash equivalence sstate
  2019-01-09 17:09                 ` [bitbake-devel] [PATCH " Joshua Watt
@ 2019-01-11 20:39                   ` Peter Kjellerstedt
  -1 siblings, 0 replies; 158+ messages in thread
From: Peter Kjellerstedt @ 2019-01-11 20:39 UTC (permalink / raw)
  To: Joshua Watt, Jacob Kroon, openembedded-core, bitbake-devel

> -----Original Message-----
> From: bitbake-devel-bounces@lists.openembedded.org <bitbake-devel-
> bounces@lists.openembedded.org> On Behalf Of Joshua Watt
> Sent: den 9 januari 2019 18:10
> To: Jacob Kroon <jacob.kroon@gmail.com>; openembedded-
> core@lists.openembedded.org; bitbake-devel@lists.openembedded.org
> Subject: Re: [bitbake-devel] [OE-core][PATCH v7 3/3] sstate: Implement
> hash equivalence sstate
> 
> On Tue, 2019-01-08 at 07:29 +0100, Jacob Kroon wrote:
> > On 1/4/19 5:20 PM, Joshua Watt wrote:
> > > Converts sstate so that it can use a hash equivalence server to
> > > determine if a task really needs to be rebuilt, or if it can be
> > > restored
> > > from a different (equivalent) sstate object.
> > >
> > > The unique hashes are cached persistently using persist_data. This
> > > has
> > > a number of advantages:
> > >   1) Unique hashes can be cached between invocations of bitbake to
> > >      prevent needing to contact the server every time (which is
> > > slow)
> > >   2) The value of each tasks unique hash can easily be synchronized
> > >      between different threads, which will be useful if bitbake is
> > >      updated to do on the fly task re-hashing.
> > >
> > > [YOCTO #13030]
> > >
> > > Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
> > > ---
> > >   meta/classes/sstate.bbclass | 105 +++++++++++++++++++++--
> > >   meta/conf/bitbake.conf      |   4 +-
> > >   meta/lib/oe/sstatesig.py    | 167
> > > ++++++++++++++++++++++++++++++++++++
> > >   3 files changed, 267 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/meta/classes/sstate.bbclass
> > > b/meta/classes/sstate.bbclass
> > > index 59ebc3ab5cc..da0807d6e99 100644
> > > --- a/meta/classes/sstate.bbclass
> > > +++ b/meta/classes/sstate.bbclass
> > > @@ -11,7 +11,7 @@ def generate_sstatefn(spec, hash, d):
> > >   SSTATE_PKGARCH    = "${PACKAGE_ARCH}"
> > >   SSTATE_PKGSPEC    =
> > > "sstate:${PN}:${PACKAGE_ARCH}${TARGET_VENDOR}-
> > > ${TARGET_OS}:${PV}:${PR}:${SSTATE_PKGARCH}:${SSTATE_VERSION}:"
> > >   SSTATE_SWSPEC     =
> > > "sstate:${PN}::${PV}:${PR}::${SSTATE_VERSION}:"
> > > -SSTATE_PKGNAME    = "${SSTATE_EXTRAPATH}${@generate_sstatefn(d.get
> > > Var('SSTATE_PKGSPEC'), d.getVar('BB_TASKHASH'), d)}"
> > > +SSTATE_PKGNAME    = "${SSTATE_EXTRAPATH}${@generate_sstatefn(d.get
> > > Var('SSTATE_PKGSPEC'), d.getVar('BB_UNIHASH'), d)}"
> > >   SSTATE_PKG        = "${SSTATE_DIR}/${SSTATE_PKGNAME}"
> > >   SSTATE_EXTRAPATH   = ""
> > >   SSTATE_EXTRAPATHWILDCARD = ""
> > > @@ -82,6 +82,23 @@ SSTATE_SIG_PASSPHRASE ?= ""
> > >   # Whether to verify the GnUPG signatures when extracting sstate
> > > archives
> > >   SSTATE_VERIFY_SIG ?= "0"
> > >
> > > +SSTATE_HASHEQUIV_METHOD ?= "OEOuthashBasic"
> > > +SSTATE_HASHEQUIV_METHOD[doc] = "The function used to calculate the
> > > output hash \
> > > +    for a task, which in turn is used to determine equivalency. \
> > > +    "
> > > +
> > > +SSTATE_HASHEQUIV_SERVER ?= ""
> > > +SSTATE_HASHEQUIV_SERVER[doc] = "The hash equivalence sever. For
> > > example, \
> > > +    'http://192.168.0.1:5000'. Do not include a trailing slash \
> > > +    "
> > > +
> > > +SSTATE_HASHEQUIV_REPORT_TASKDATA ?= "0"
> > > +SSTATE_HASHEQUIV_REPORT_TASKDATA[doc] = "Report additional useful
> > > data to the \
> > > +    hash equivalency server, such as PN, PV, taskname, etc. This
> > > information \
> > > +    is very useful for developers looking at task data, but may
> > > leak sensitive \
> > > +    data if the equivalence server is public. \
> > > +    "
> > > +
> > >   python () {
> > >       if bb.data.inherits_class('native', d):
> > >           d.setVar('SSTATE_PKGARCH', d.getVar('BUILD_ARCH', False))
> > > @@ -640,7 +657,7 @@ def sstate_package(ss, d):
> > >           return
> > >
> > >       for f in (d.getVar('SSTATECREATEFUNCS') or '').split() + \
> > > -             ['sstate_create_package', 'sstate_sign_package'] + \
> > > +             ['sstate_report_unihash', 'sstate_create_package',
> > > 'sstate_sign_package'] + \
> > >                (d.getVar('SSTATEPOSTCREATEFUNCS') or '').split():
> > >           # All hooks should run in SSTATE_BUILDDIR.
> > >           bb.build.exec_func(f, d, (sstatebuild,))
> > > @@ -764,6 +781,73 @@ python sstate_sign_package () {
> > >                              d.getVar('SSTATE_SIG_PASSPHRASE'),
> > > armor=False)
> > >   }
> > >
> > > +def OEOuthashBasic(path, sigfile, task, d):
> > > +    import hashlib
> > > +    import stat
> > > +
> > > +    def update_hash(s):
> > > +        s = s.encode('utf-8')
> > > +        h.update(s)
> > > +        if sigfile:
> > > +            sigfile.write(s)
> > > +
> > > +    h = hashlib.sha256()
> > > +    prev_dir = os.getcwd()
> > > +
> > > +    try:
> > > +        os.chdir(path)
> > > +
> > > +        update_hash("OEOuthashBasic\n")
> > > +
> > > +        # It is only currently useful to get equivalent hashes for
> > > things that
> > > +        # can be restored from sstate. Since the sstate object is
> > > named using
> > > +        # SSTATE_PKGSPEC and the task name, those should be
> > > included in the
> > > +        # output hash calculation.
> > > +        update_hash("SSTATE_PKGSPEC=%s\n" %
> > > d.getVar('SSTATE_PKGSPEC'))
> > > +        update_hash("task=%s\n" % task)
> > > +
> > > +        for root, dirs, files in os.walk('.', topdown=True):
> > > +            # Sort directories and files to ensure consistent
> > > ordering
> > > +            dirs.sort()
> > > +            files.sort()
> > > +
> > > +            for f in files:
> > > +                path = os.path.join(root, f)
> > > +                s = os.lstat(path)
> > > +
> > > +                # Hash file path
> > > +                update_hash(path + '\n')
> > > +
> > > +                # Hash file mode
> > > +                update_hash("\tmode=0x%x\n" %
> > > stat.S_IMODE(s.st_mode))
> > > +                update_hash("\ttype=0x%x\n" %
> > > stat.S_IFMT(s.st_mode))
> > > +
> > > +                if stat.S_ISBLK(s.st_mode) or
> > > stat.S_ISBLK(s.st_mode):
> > > +                    # Hash device major and minor
> > > +                    update_hash("\tdev=%d,%d\n" %
> > > (os.major(s.st_rdev), os.minor(s.st_rdev)))
> > > +                elif stat.S_ISLNK(s.st_mode):
> > > +                    # Hash symbolic link
> > > +                    update_hash("\tsymlink=%s\n" %
> > > os.readlink(path))
> > > +                else:
> > > +                    fh = hashlib.sha256()
> > > +                    # Hash file contents
> > > +                    with open(path, 'rb') as d:
> > > +                        for chunk in iter(lambda: d.read(4096),
> > > b""):
> > > +                            fh.update(chunk)
> > > +                    update_hash("\tdigest=%s\n" % fh.hexdigest())
> >
> > Would it be a good idea to make the depsig.do_* files even more
> > human
> > readable, considering that they could be candidates for being stored
> > in
> > buildhistory ?
> >
> > As an example, here's what buildhistory/.../files-in-package.txt for
> > busybox looks like:
> >
> > drwxr-xr-x root       root             4096 ./bin
> > lrwxrwxrwx root       root               14 ./bin/busybox ->
> > busybox.nosuid
> > -rwxr-xr-x root       root           547292 ./bin/busybox.nosuid
> > -rwsr-xr-x root       root            50860 ./bin/busybox.suid
> > lrwxrwxrwx root       root               14 ./bin/sh ->
> > busybox.nosuid
> > drwxr-xr-x root       root             4096 ./etc
> > -rw-r--r-- root       root             2339
> > ./etc/busybox.links.nosuid
> > -rw-r--r-- root       root               91 ./etc/busybox.links.suid
> >
> 
> I went through the effort to try this, and I'm pretty happy with the
> results except for one important distinction: It's not reproducible in
> all cases because of the inclusion of the owner UID/GID (I used the
> decimal user and group IDs to prevent the dependency on the names).
> 
> For any task running under fakeroot (pesudo), this works like you would
> expect. However, for tasks not running under fakeroot (and possibly
> others that copy files from tasks not running under fakeroot?), the
> files are owned by the user that is running bitbake (e.g. You). This
> makes the output hashes not shareable between different developers.
> 
> I'm not sure what the best way to address this is; The UID and GID are
> an important part of the reproducibility and *should* be included in
> the output hash when relevant, but I don't know yet how to determine if
> they are relevant. I'm going to dig in and see if I can use "the
> current task is running under fakeroot" as that distinction. If anyone
> has any other ideas please chime in.

You should probably not rely on UID/GID to be stable for target. That 
is only the case if you have configured the build to use static IDs, 
otherwise they are dynamically assigned and may vary between builds. 
The user and group names should be stable though.

//Peter



^ permalink raw reply	[flat|nested] 158+ messages in thread

* Re: [bitbake-devel] [PATCH v7 3/3] sstate: Implement hash equivalence sstate
@ 2019-01-11 20:39                   ` Peter Kjellerstedt
  0 siblings, 0 replies; 158+ messages in thread
From: Peter Kjellerstedt @ 2019-01-11 20:39 UTC (permalink / raw)
  To: Joshua Watt, Jacob Kroon, openembedded-core, bitbake-devel

> -----Original Message-----
> From: bitbake-devel-bounces@lists.openembedded.org <bitbake-devel-
> bounces@lists.openembedded.org> On Behalf Of Joshua Watt
> Sent: den 9 januari 2019 18:10
> To: Jacob Kroon <jacob.kroon@gmail.com>; openembedded-
> core@lists.openembedded.org; bitbake-devel@lists.openembedded.org
> Subject: Re: [bitbake-devel] [OE-core][PATCH v7 3/3] sstate: Implement
> hash equivalence sstate
> 
> On Tue, 2019-01-08 at 07:29 +0100, Jacob Kroon wrote:
> > On 1/4/19 5:20 PM, Joshua Watt wrote:
> > > Converts sstate so that it can use a hash equivalence server to
> > > determine if a task really needs to be rebuilt, or if it can be
> > > restored
> > > from a different (equivalent) sstate object.
> > >
> > > The unique hashes are cached persistently using persist_data. This
> > > has
> > > a number of advantages:
> > >   1) Unique hashes can be cached between invocations of bitbake to
> > >      prevent needing to contact the server every time (which is
> > > slow)
> > >   2) The value of each tasks unique hash can easily be synchronized
> > >      between different threads, which will be useful if bitbake is
> > >      updated to do on the fly task re-hashing.
> > >
> > > [YOCTO #13030]
> > >
> > > Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
> > > ---
> > >   meta/classes/sstate.bbclass | 105 +++++++++++++++++++++--
> > >   meta/conf/bitbake.conf      |   4 +-
> > >   meta/lib/oe/sstatesig.py    | 167
> > > ++++++++++++++++++++++++++++++++++++
> > >   3 files changed, 267 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/meta/classes/sstate.bbclass
> > > b/meta/classes/sstate.bbclass
> > > index 59ebc3ab5cc..da0807d6e99 100644
> > > --- a/meta/classes/sstate.bbclass
> > > +++ b/meta/classes/sstate.bbclass
> > > @@ -11,7 +11,7 @@ def generate_sstatefn(spec, hash, d):
> > >   SSTATE_PKGARCH    = "${PACKAGE_ARCH}"
> > >   SSTATE_PKGSPEC    =
> > > "sstate:${PN}:${PACKAGE_ARCH}${TARGET_VENDOR}-
> > > ${TARGET_OS}:${PV}:${PR}:${SSTATE_PKGARCH}:${SSTATE_VERSION}:"
> > >   SSTATE_SWSPEC     =
> > > "sstate:${PN}::${PV}:${PR}::${SSTATE_VERSION}:"
> > > -SSTATE_PKGNAME    = "${SSTATE_EXTRAPATH}${@generate_sstatefn(d.get
> > > Var('SSTATE_PKGSPEC'), d.getVar('BB_TASKHASH'), d)}"
> > > +SSTATE_PKGNAME    = "${SSTATE_EXTRAPATH}${@generate_sstatefn(d.get
> > > Var('SSTATE_PKGSPEC'), d.getVar('BB_UNIHASH'), d)}"
> > >   SSTATE_PKG        = "${SSTATE_DIR}/${SSTATE_PKGNAME}"
> > >   SSTATE_EXTRAPATH   = ""
> > >   SSTATE_EXTRAPATHWILDCARD = ""
> > > @@ -82,6 +82,23 @@ SSTATE_SIG_PASSPHRASE ?= ""
> > >   # Whether to verify the GnUPG signatures when extracting sstate
> > > archives
> > >   SSTATE_VERIFY_SIG ?= "0"
> > >
> > > +SSTATE_HASHEQUIV_METHOD ?= "OEOuthashBasic"
> > > +SSTATE_HASHEQUIV_METHOD[doc] = "The function used to calculate the
> > > output hash \
> > > +    for a task, which in turn is used to determine equivalency. \
> > > +    "
> > > +
> > > +SSTATE_HASHEQUIV_SERVER ?= ""
> > > +SSTATE_HASHEQUIV_SERVER[doc] = "The hash equivalence sever. For
> > > example, \
> > > +    'http://192.168.0.1:5000'. Do not include a trailing slash \
> > > +    "
> > > +
> > > +SSTATE_HASHEQUIV_REPORT_TASKDATA ?= "0"
> > > +SSTATE_HASHEQUIV_REPORT_TASKDATA[doc] = "Report additional useful
> > > data to the \
> > > +    hash equivalency server, such as PN, PV, taskname, etc. This
> > > information \
> > > +    is very useful for developers looking at task data, but may
> > > leak sensitive \
> > > +    data if the equivalence server is public. \
> > > +    "
> > > +
> > >   python () {
> > >       if bb.data.inherits_class('native', d):
> > >           d.setVar('SSTATE_PKGARCH', d.getVar('BUILD_ARCH', False))
> > > @@ -640,7 +657,7 @@ def sstate_package(ss, d):
> > >           return
> > >
> > >       for f in (d.getVar('SSTATECREATEFUNCS') or '').split() + \
> > > -             ['sstate_create_package', 'sstate_sign_package'] + \
> > > +             ['sstate_report_unihash', 'sstate_create_package',
> > > 'sstate_sign_package'] + \
> > >                (d.getVar('SSTATEPOSTCREATEFUNCS') or '').split():
> > >           # All hooks should run in SSTATE_BUILDDIR.
> > >           bb.build.exec_func(f, d, (sstatebuild,))
> > > @@ -764,6 +781,73 @@ python sstate_sign_package () {
> > >                              d.getVar('SSTATE_SIG_PASSPHRASE'),
> > > armor=False)
> > >   }
> > >
> > > +def OEOuthashBasic(path, sigfile, task, d):
> > > +    import hashlib
> > > +    import stat
> > > +
> > > +    def update_hash(s):
> > > +        s = s.encode('utf-8')
> > > +        h.update(s)
> > > +        if sigfile:
> > > +            sigfile.write(s)
> > > +
> > > +    h = hashlib.sha256()
> > > +    prev_dir = os.getcwd()
> > > +
> > > +    try:
> > > +        os.chdir(path)
> > > +
> > > +        update_hash("OEOuthashBasic\n")
> > > +
> > > +        # It is only currently useful to get equivalent hashes for
> > > things that
> > > +        # can be restored from sstate. Since the sstate object is
> > > named using
> > > +        # SSTATE_PKGSPEC and the task name, those should be
> > > included in the
> > > +        # output hash calculation.
> > > +        update_hash("SSTATE_PKGSPEC=%s\n" %
> > > d.getVar('SSTATE_PKGSPEC'))
> > > +        update_hash("task=%s\n" % task)
> > > +
> > > +        for root, dirs, files in os.walk('.', topdown=True):
> > > +            # Sort directories and files to ensure consistent
> > > ordering
> > > +            dirs.sort()
> > > +            files.sort()
> > > +
> > > +            for f in files:
> > > +                path = os.path.join(root, f)
> > > +                s = os.lstat(path)
> > > +
> > > +                # Hash file path
> > > +                update_hash(path + '\n')
> > > +
> > > +                # Hash file mode
> > > +                update_hash("\tmode=0x%x\n" %
> > > stat.S_IMODE(s.st_mode))
> > > +                update_hash("\ttype=0x%x\n" %
> > > stat.S_IFMT(s.st_mode))
> > > +
> > > +                if stat.S_ISBLK(s.st_mode) or
> > > stat.S_ISBLK(s.st_mode):
> > > +                    # Hash device major and minor
> > > +                    update_hash("\tdev=%d,%d\n" %
> > > (os.major(s.st_rdev), os.minor(s.st_rdev)))
> > > +                elif stat.S_ISLNK(s.st_mode):
> > > +                    # Hash symbolic link
> > > +                    update_hash("\tsymlink=%s\n" %
> > > os.readlink(path))
> > > +                else:
> > > +                    fh = hashlib.sha256()
> > > +                    # Hash file contents
> > > +                    with open(path, 'rb') as d:
> > > +                        for chunk in iter(lambda: d.read(4096),
> > > b""):
> > > +                            fh.update(chunk)
> > > +                    update_hash("\tdigest=%s\n" % fh.hexdigest())
> >
> > Would it be a good idea to make the depsig.do_* files even more
> > human
> > readable, considering that they could be candidates for being stored
> > in
> > buildhistory ?
> >
> > As an example, here's what buildhistory/.../files-in-package.txt for
> > busybox looks like:
> >
> > drwxr-xr-x root       root             4096 ./bin
> > lrwxrwxrwx root       root               14 ./bin/busybox ->
> > busybox.nosuid
> > -rwxr-xr-x root       root           547292 ./bin/busybox.nosuid
> > -rwsr-xr-x root       root            50860 ./bin/busybox.suid
> > lrwxrwxrwx root       root               14 ./bin/sh ->
> > busybox.nosuid
> > drwxr-xr-x root       root             4096 ./etc
> > -rw-r--r-- root       root             2339
> > ./etc/busybox.links.nosuid
> > -rw-r--r-- root       root               91 ./etc/busybox.links.suid
> >
> 
> I went through the effort to try this, and I'm pretty happy with the
> results except for one important distinction: It's not reproducible in
> all cases because of the inclusion of the owner UID/GID (I used the
> decimal user and group IDs to prevent the dependency on the names).
> 
> For any task running under fakeroot (pesudo), this works like you would
> expect. However, for tasks not running under fakeroot (and possibly
> others that copy files from tasks not running under fakeroot?), the
> files are owned by the user that is running bitbake (e.g. You). This
> makes the output hashes not shareable between different developers.
> 
> I'm not sure what the best way to address this is; The UID and GID are
> an important part of the reproducibility and *should* be included in
> the output hash when relevant, but I don't know yet how to determine if
> they are relevant. I'm going to dig in and see if I can use "the
> current task is running under fakeroot" as that distinction. If anyone
> has any other ideas please chime in.

You should probably not rely on UID/GID to be stable for target. That 
is only the case if you have configured the build to use static IDs, 
otherwise they are dynamically assigned and may vary between builds. 
The user and group names should be stable though.

//Peter



^ permalink raw reply	[flat|nested] 158+ messages in thread

end of thread, other threads:[~2019-01-11 20:39 UTC | newest]

Thread overview: 158+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-16 20:37 [RFC 0/9] Hash Equivalency Server Joshua Watt
2018-07-16 20:37 ` [RFC 1/9] bitbake-worker: Pass taskhash as runtask parameter Joshua Watt
2018-07-16 20:37 ` [RFC 2/9] siggen: Split out stampfile hash fetch Joshua Watt
2018-07-16 20:37 ` [RFC 3/9] siggen: Split out task depend ID Joshua Watt
2018-07-16 20:37 ` [RFC 4/9] runqueue: Track task dependency ID Joshua Watt
2018-07-16 20:37 ` [RFC 5/9] runqueue: Pass dependency ID to task Joshua Watt
2018-07-16 20:37 ` [RFC 6/9] runqueue: Pass dependency ID to hash validate Joshua Watt
2018-07-16 20:37 ` [RFC 7/9] classes/sstate: Handle depid in hash check Joshua Watt
2018-07-16 20:37 ` [RFC 8/9] hashserver: Add initial reference server Joshua Watt
2018-07-17 12:11   ` Richard Purdie
2018-07-17 12:11     ` [bitbake-devel] " Richard Purdie
2018-07-17 13:44     ` Joshua Watt
2018-07-17 13:44       ` [bitbake-devel] " Joshua Watt
2018-07-18 13:53     ` Joshua Watt
2018-07-18 13:53       ` [bitbake-devel] " Joshua Watt
2018-07-16 20:37 ` [RFC 9/9] sstate: Implement hash equivalence sstate Joshua Watt
2018-08-09 22:08 ` [RFC v2 00/16] Hash Equivalency Server Joshua Watt
2018-08-09 22:08   ` [RFC v2 01/16] bitbake: fork: Add os.fork() wrappers Joshua Watt
2018-08-09 22:08   ` [RFC v2 02/16] bitbake: persist_data: Fix leaking cursors causing deadlock Joshua Watt
2018-08-09 22:08   ` [RFC v2 03/16] bitbake: persist_data: Add key constraints Joshua Watt
2018-08-09 22:08   ` [RFC v2 04/16] bitbake: persist_data: Enable Write Ahead Log Joshua Watt
2018-08-09 22:08   ` [RFC v2 05/16] bitbake: persist_data: Disable enable_shared_cache Joshua Watt
2018-08-09 22:08   ` [RFC v2 06/16] bitbake: persist_data: Close databases across fork Joshua Watt
2018-08-09 22:08   ` [RFC v2 07/16] bitbake: tests/persist_data: Add tests Joshua Watt
2018-08-09 22:08   ` [RFC v2 08/16] bitbake: bitbake-worker: Pass taskhash as runtask parameter Joshua Watt
2018-08-09 22:08   ` [RFC v2 09/16] bitbake: siggen: Split out stampfile hash fetch Joshua Watt
2018-08-09 22:08   ` [RFC v2 10/16] bitbake: siggen: Split out task depend ID Joshua Watt
2018-08-09 22:08   ` [RFC v2 11/16] bitbake: runqueue: Track task dependency ID Joshua Watt
2018-08-09 22:08   ` [RFC v2 12/16] bitbake: runqueue: Pass dependency ID to task Joshua Watt
2018-08-09 22:08   ` [RFC v2 13/16] bitbake: runqueue: Pass dependency ID to hash validate Joshua Watt
2018-08-09 22:08   ` [RFC v2 14/16] classes/sstate: Handle depid in hash check Joshua Watt
2018-08-09 22:08   ` [RFC v2 15/16] bitbake: hashserv: Add hash equivalence reference server Joshua Watt
2018-08-09 22:08   ` [RFC v2 16/16] sstate: Implement hash equivalence sstate Joshua Watt
2018-12-04  3:42   ` [OE-core][PATCH v3 00/17] Hash Equivalency Server Joshua Watt
2018-12-04  3:42     ` [PATCH " Joshua Watt
2018-12-04  3:42     ` [OE-core][PATCH v3 01/17] bitbake: fork: Add os.fork() wrappers Joshua Watt
2018-12-04  3:42       ` [PATCH " Joshua Watt
2018-12-04  3:42     ` [OE-core][PATCH v3 02/17] bitbake: persist_data: Fix leaking cursors causing deadlock Joshua Watt
2018-12-04  3:42       ` [PATCH " Joshua Watt
2018-12-04  3:42     ` [OE-core][PATCH v3 03/17] bitbake: persist_data: Add key constraints Joshua Watt
2018-12-04  3:42       ` [PATCH " Joshua Watt
2018-12-04  3:42     ` [OE-core][PATCH v3 04/17] bitbake: persist_data: Enable Write Ahead Log Joshua Watt
2018-12-04  3:42       ` [PATCH " Joshua Watt
2018-12-04  3:42     ` [OE-core][PATCH v3 05/17] bitbake: persist_data: Disable enable_shared_cache Joshua Watt
2018-12-04  3:42       ` [PATCH " Joshua Watt
2018-12-04  3:42     ` [OE-core][PATCH v3 06/17] bitbake: persist_data: Close databases across fork Joshua Watt
2018-12-04  3:42       ` [PATCH " Joshua Watt
2018-12-04  3:42     ` [OE-core][PATCH v3 07/17] bitbake: tests/persist_data: Add tests Joshua Watt
2018-12-04  3:42       ` [PATCH " Joshua Watt
2018-12-04  3:42     ` [OE-core][PATCH v3 08/17] bitbake: bitbake-worker: Pass taskhash as runtask parameter Joshua Watt
2018-12-04  3:42       ` [PATCH " Joshua Watt
2018-12-04  3:42     ` [OE-core][PATCH v3 09/17] bitbake: siggen: Split out stampfile hash fetch Joshua Watt
2018-12-04  3:42       ` [PATCH " Joshua Watt
2018-12-04  3:42     ` [OE-core][PATCH v3 10/17] bitbake: siggen: Split out task depend ID Joshua Watt
2018-12-04  3:42       ` [PATCH " Joshua Watt
2018-12-05 22:50       ` [OE-core][PATCH " Richard Purdie
2018-12-05 22:50         ` [bitbake-devel] [PATCH " Richard Purdie
2018-12-06 14:58         ` [OE-core][PATCH " Joshua Watt
2018-12-06 14:58           ` [bitbake-devel] [PATCH " Joshua Watt
2018-12-04  3:42     ` [OE-core][PATCH v3 11/17] bitbake: runqueue: Track task dependency ID Joshua Watt
2018-12-04  3:42       ` [PATCH " Joshua Watt
2018-12-04  3:42     ` [OE-core][PATCH v3 12/17] bitbake: runqueue: Pass dependency ID to task Joshua Watt
2018-12-04  3:42       ` [PATCH " Joshua Watt
2018-12-04  3:42     ` [OE-core][PATCH v3 13/17] bitbake: runqueue: Pass dependency ID to hash validate Joshua Watt
2018-12-04  3:42       ` [PATCH " Joshua Watt
2018-12-05 22:52       ` [OE-core][PATCH " Richard Purdie
2018-12-05 22:52         ` [bitbake-devel] [PATCH " Richard Purdie
2018-12-04  3:42     ` [OE-core][PATCH v3 14/17] classes/sstate: Handle depid in hash check Joshua Watt
2018-12-04  3:42       ` [PATCH " Joshua Watt
2018-12-04  3:42     ` [OE-core][PATCH v3 15/17] bitbake: hashserv: Add hash equivalence reference server Joshua Watt
2018-12-04  3:42       ` [PATCH " Joshua Watt
2018-12-04  3:42     ` [OE-core][PATCH v3 16/17] sstate: Implement hash equivalence sstate Joshua Watt
2018-12-04  3:42       ` [PATCH " Joshua Watt
2018-12-04  3:42     ` [OE-core][PATCH v3 17/17] classes/image-buildinfo: Remove unused argument Joshua Watt
2018-12-04  3:42       ` [PATCH " Joshua Watt
2018-12-18 15:30     ` [OE-core][PATCH v4 00/10] Hash Equivalency Server Joshua Watt
2018-12-18 15:30       ` [PATCH " Joshua Watt
2018-12-18 15:30       ` [OE-core][PATCH v4 01/10] bitbake: fork: Add os.fork() wrappers Joshua Watt
2018-12-18 15:30         ` [PATCH " Joshua Watt
2018-12-18 15:30       ` [OE-core][PATCH v4 02/10] bitbake: persist_data: Close databases across fork Joshua Watt
2018-12-18 15:30         ` [PATCH " Joshua Watt
2018-12-18 15:30       ` [OE-core][PATCH v4 03/10] bitbake: tests/persist_data: Add tests Joshua Watt
2018-12-18 15:30         ` [PATCH " Joshua Watt
2018-12-18 15:30       ` [OE-core][PATCH v4 04/10] bitbake: siggen: Split out task unique hash Joshua Watt
2018-12-18 15:30         ` [PATCH " Joshua Watt
2018-12-18 15:30       ` [OE-core][PATCH v4 05/10] bitbake: runqueue: Track " Joshua Watt
2018-12-18 15:30         ` [PATCH " Joshua Watt
2018-12-18 15:30       ` [OE-core][PATCH v4 06/10] bitbake: runqueue: Pass unique hash to task Joshua Watt
2018-12-18 15:30         ` [PATCH " Joshua Watt
2018-12-18 15:30       ` [OE-core][PATCH v4 07/10] bitbake: runqueue: Pass unique hash to hash validate Joshua Watt
2018-12-18 15:30         ` [PATCH " Joshua Watt
2018-12-18 16:24         ` [OE-core] " Richard Purdie
2018-12-18 16:24           ` Richard Purdie
2018-12-18 16:31           ` [OE-core] " Joshua Watt
2018-12-18 16:31             ` Joshua Watt
2018-12-18 15:30       ` [OE-core][PATCH v4 08/10] classes/sstate: Handle unihash in hash check Joshua Watt
2018-12-18 15:30         ` [PATCH " Joshua Watt
2018-12-18 15:31       ` [OE-core][PATCH v4 09/10] bitbake: hashserv: Add hash equivalence reference server Joshua Watt
2018-12-18 15:31         ` [PATCH " Joshua Watt
2018-12-18 15:31       ` [OE-core][PATCH v4 10/10] sstate: Implement hash equivalence sstate Joshua Watt
2018-12-18 15:31         ` [PATCH " Joshua Watt
2018-12-19  3:10       ` [OE-core][PATCH v5 0/8] Hash Equivalency Server Joshua Watt
2018-12-19  3:10         ` [PATCH " Joshua Watt
2018-12-19  3:10         ` [OE-core][PATCH v5 1/8] bitbake: tests/persist_data: Add tests Joshua Watt
2018-12-19  3:10           ` [PATCH " Joshua Watt
2018-12-19  3:10         ` [OE-core][PATCH v5 2/8] bitbake: siggen: Split out task unique hash Joshua Watt
2018-12-19  3:10           ` [PATCH " Joshua Watt
2018-12-19  3:10         ` [OE-core][PATCH v5 3/8] bitbake: runqueue: Track " Joshua Watt
2018-12-19  3:10           ` [PATCH " Joshua Watt
2019-01-05  7:49           ` [OE-core] " Alejandro Hernandez
2019-01-05  7:49             ` Alejandro Hernandez
2019-01-06  3:09             ` [OE-core] " Joshua Watt
2019-01-06  3:09               ` Joshua Watt
2019-01-07  6:52               ` [OE-core] " Alejandro Hernandez
2019-01-07  6:52                 ` Alejandro Hernandez
2019-01-07 16:16               ` [OE-core] " akuster808
2019-01-07 16:16                 ` akuster808
2019-01-07 16:40                 ` [OE-core] " Joshua Watt
2019-01-07 16:40                   ` Joshua Watt
2018-12-19  3:10         ` [OE-core][PATCH v5 4/8] bitbake: runqueue: Pass unique hash to task Joshua Watt
2018-12-19  3:10           ` [PATCH " Joshua Watt
2018-12-19  3:10         ` [OE-core][PATCH v5 5/8] bitbake: runqueue: Pass unique hash to hash validate Joshua Watt
2018-12-19  3:10           ` [PATCH " Joshua Watt
2018-12-19  3:10         ` [OE-core][PATCH v5 6/8] classes/sstate: Handle unihash in hash check Joshua Watt
2018-12-19  3:10           ` [PATCH " Joshua Watt
2018-12-19  3:10         ` [OE-core][PATCH v5 7/8] bitbake: hashserv: Add hash equivalence reference server Joshua Watt
2018-12-19  3:10           ` [PATCH " Joshua Watt
2018-12-19  3:10         ` [OE-core][PATCH v5 8/8] sstate: Implement hash equivalence sstate Joshua Watt
2018-12-19  3:10           ` [PATCH " Joshua Watt
2018-12-19  3:33       ` ✗ patchtest: failure for Hash Equivalency Server (rev3) Patchwork
2019-01-04  2:42       ` [OE-core][PATCH v6 0/3] Hash Equivalency Server Joshua Watt
2019-01-04  2:42         ` [PATCH " Joshua Watt
2019-01-04  2:42         ` [OE-core][PATCH v6 1/3] classes/sstate: Handle unihash in hash check Joshua Watt
2019-01-04  2:42           ` [PATCH " Joshua Watt
2019-01-04  7:01           ` [OE-core][PATCH " Richard Purdie
2019-01-04  7:01             ` [bitbake-devel] [PATCH " Richard Purdie
2019-01-04  2:42         ` [OE-core][PATCH v6 2/3] bitbake: hashserv: Add hash equivalence reference server Joshua Watt
2019-01-04  2:42           ` [PATCH " Joshua Watt
2019-01-04  2:42         ` [OE-core][PATCH v6 3/3] sstate: Implement hash equivalence sstate Joshua Watt
2019-01-04  2:42           ` [PATCH " Joshua Watt
2019-01-04 16:20         ` [OE-core][PATCH v7 0/3] Hash Equivalency Server Joshua Watt
2019-01-04 16:20           ` [PATCH " Joshua Watt
2019-01-04 16:20           ` [OE-core][PATCH v7 1/3] classes/sstate: Handle unihash in hash check Joshua Watt
2019-01-04 16:20             ` [PATCH " Joshua Watt
2019-01-04 16:20           ` [OE-core][PATCH v7 2/3] bitbake: hashserv: Add hash equivalence reference server Joshua Watt
2019-01-04 16:20             ` [PATCH " Joshua Watt
2019-01-04 16:20           ` [OE-core][PATCH v7 3/3] sstate: Implement hash equivalence sstate Joshua Watt
2019-01-04 16:20             ` [PATCH " Joshua Watt
2019-01-08  6:29             ` [OE-core][PATCH " Jacob Kroon
2019-01-08  6:29               ` [bitbake-devel] [PATCH " Jacob Kroon
2019-01-09 17:09               ` [OE-core][PATCH " Joshua Watt
2019-01-09 17:09                 ` [bitbake-devel] [PATCH " Joshua Watt
2019-01-11 20:39                 ` [OE-core][PATCH " Peter Kjellerstedt
2019-01-11 20:39                   ` [bitbake-devel] [PATCH " Peter Kjellerstedt
2019-01-04 16:33         ` ✗ patchtest: failure for Hash Equivalency Server (rev5) Patchwork
2019-01-04  3:03       ` ✗ patchtest: failure for Hash Equivalency Server (rev4) Patchwork
2018-12-18 16:03     ` ✗ patchtest: failure for Hash Equivalency Server (rev2) Patchwork
2018-12-04  4:05   ` ✗ patchtest: failure for Hash Equivalency Server Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.