All of lore.kernel.org
 help / color / mirror / Atom feed
* [Buildroot] [PATCH v4 1/5] autobuild-run: check if reproducibile_results exists before checking its size
@ 2019-08-20 14:52 Atharva Lele
  2019-08-20 14:52 ` [Buildroot] [PATCH v4 2/5] autobuild-run: initial implementation of get_reproducibility_failure_reason() Atharva Lele
                   ` (4 more replies)
  0 siblings, 5 replies; 16+ messages in thread
From: Atharva Lele @ 2019-08-20 14:52 UTC (permalink / raw)
  To: buildroot

This was causing some reproducible configs to crash autobuild script since it
tried to check the size of a file that did not exist.

Signed-off-by: Atharva Lele <itsatharva@gmail.com>
---
 scripts/autobuild-run | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/autobuild-run b/scripts/autobuild-run
index ead81a0..99b57dd 100755
--- a/scripts/autobuild-run
+++ b/scripts/autobuild-run
@@ -461,7 +461,7 @@ class Builder:
                 log_write(self.log, "INFO: diffoscope not installed, falling back to cmp")
                 subprocess.call(["cmp", "-b", build_1_image, build_2_image], stdout=diff, stderr=self.log)
 
-        if os.stat(reproducible_results).st_size > 0:
+        if os.path.exists(reproducible_results) and os.stat(reproducible_results).st_size > 0:
             log_write(self.log, "INFO: Build is non-reproducible.")
             return -1
 
-- 
2.22.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [Buildroot] [PATCH v4 2/5] autobuild-run: initial implementation of get_reproducibility_failure_reason()
  2019-08-20 14:52 [Buildroot] [PATCH v4 1/5] autobuild-run: check if reproducibile_results exists before checking its size Atharva Lele
@ 2019-08-20 14:52 ` Atharva Lele
  2019-09-08 17:06   ` Arnout Vandecappelle
  2019-08-20 14:52 ` [Buildroot] [PATCH v4 3/5] autobuild-run: account for reproducibility failures in get_failure_reason() Atharva Lele
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Atharva Lele @ 2019-08-20 14:52 UTC (permalink / raw)
  To: buildroot

Analyze the JSON formatted output from diffoscope and check if
the differences are due to a filesystem reproducibility issue
or a package reproducibility issue.

Also, discard the deltas because they might take up too much space.

Signed-off-by: Atharva Lele <itsatharva@gmail.com>
---
Changes v2 -> v4:
  - Change if-else to try-except
  - remove blank line
Changes v1 -> v2:
  - Refactor using subfunctions and local variables (suggested by Thomas)
  - Added comments (suggested by Thomas)
  - Use more pythonic loops (suggested by Thomas)
---
 scripts/autobuild-run | 88 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 88 insertions(+)

diff --git a/scripts/autobuild-run b/scripts/autobuild-run
index 99b57dd..384cf73 100755
--- a/scripts/autobuild-run
+++ b/scripts/autobuild-run
@@ -131,6 +131,7 @@ import csv
 import docopt
 import errno
 import hashlib
+import json
 import mmap
 import multiprocessing
 import os
@@ -599,6 +600,93 @@ class Builder:
         if reject_results():
             return
 
+        def get_reproducibility_failure_reason(reproducible_results):
+            def split_delta(delta):
+                # Take a delta and split it into added, deleted lines.
+                added = []
+                deleted = []
+                for line in delta:
+                    if line.startswith("+"):
+                        added.append(line)
+                    if line.startswith("-"):
+                        deleted.append(line)
+                return added, deleted
+
+            def get_package(sourcef):
+                # Returns which package the source file belongs to.
+                with open(packages_file_list, "r") as packagef:
+                    for line in packagef:
+                        if sourcef in line:
+                            package = line.split(',')[0]
+
+                try:
+                    # Get package version
+                    package_info = json.loads(subprocess.check_output(["make", "--no-print-directory",
+                                                                       "O=%s" % self.outputdir,
+                                                                       "-C", self.srcdir,
+                                                                       "%s-show-info" % package]))
+                    if "version" in package_info[package]:
+                        version = package_info[package]["version"]
+                        return [package, version]
+                    else:
+                        return [package]
+                except:
+                    return ["not found"]
+
+            def cleanup(l):
+                # Takes a list and removes data which is redundant (source2) or data
+                # that might take up too much space (like huge diffs).
+                if "unified_diff" in l:
+                    l.pop("unified_diff")
+                if "source2" in l:
+                    l.pop("source2")
+
+            packages_file_list = os.path.join(self.outputdir, "build", "packages-file-list.txt")
+
+            with open(reproducible_results, "r") as reproduciblef:
+                json_data = json.load(reproduciblef)
+
+            if json_data["unified_diff"] == None:
+                # Remove the file list because it is not useful, i.e. it only shows
+                # which files vary, and nothing more.
+                if json_data["details"][0]["source1"] == "file list":
+                    json_data["details"].pop(0)
+
+                # Iterate over details in the diffoscope output.
+                for item in json_data["details"]:
+                    diff_src = item["source1"]
+                    item["package"] = get_package(diff_src)
+
+                    # In some cases, diffoscope uses multiple commands to get various
+                    # diffs. Due to this, it generates a "details" key for those files
+                    # instead of just storing the diff in the "unified_diff" key.
+                    if item["unified_diff"] == None:
+                        for item_details in item["details"]:
+                            diff = item_details["unified_diff"].split("\n")
+                            split_deltas = split_delta(diff)
+                            item_details["added"] = split_deltas[0][:100]
+                            item_details["deleted"] = split_deltas[1][:100]
+                            cleanup(item_details)
+                    else:
+                        diff = item["unified_diff"].split("\n")
+                        split_deltas = split_delta(diff)
+                        item["added"] = split_deltas[0][:100]
+                        item["deleted"] = split_deltas[1][:100]
+                    cleanup(item)
+                # We currently just set the reason from first non-reproducible package in the
+                # dictionary.
+                reason = json_data["details"][0]["package"]
+
+                # If there does exist a unified_diff directly for the .tar images, it is probably
+                # a filesystem reproducibility issue.
+            else:
+                reason = ["filesystem"]
+
+            with open(reproducible_results, "w") as reproduciblef:
+                json.dump(json_data, reproduciblef, sort_keys=True, indent=4)
+
+            return reason
+
         def get_failure_reason():
             # Output is a tuple (package, version), or None.
             lastlines = decode_bytes(subprocess.Popen(
-- 
2.22.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [Buildroot] [PATCH v4 3/5] autobuild-run: account for reproducibility failures in get_failure_reason()
  2019-08-20 14:52 [Buildroot] [PATCH v4 1/5] autobuild-run: check if reproducibile_results exists before checking its size Atharva Lele
  2019-08-20 14:52 ` [Buildroot] [PATCH v4 2/5] autobuild-run: initial implementation of get_reproducibility_failure_reason() Atharva Lele
@ 2019-08-20 14:52 ` Atharva Lele
  2019-09-08 17:13   ` Arnout Vandecappelle
  2019-08-20 14:52 ` [Buildroot] [PATCH v4 4/5] autobuild-run: move with open to appropriate place in check_reproducibility() Atharva Lele
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Atharva Lele @ 2019-08-20 14:52 UTC (permalink / raw)
  To: buildroot

Signed-off-by: Atharva Lele <itsatharva@gmail.com>
---
Changes v1 -> v3:
  - Account for changed name of diffoscope output
---
 scripts/autobuild-run | 29 ++++++++++++++++++++---------
 1 file changed, 20 insertions(+), 9 deletions(-)

diff --git a/scripts/autobuild-run b/scripts/autobuild-run
index 384cf73..876feb2 100755
--- a/scripts/autobuild-run
+++ b/scripts/autobuild-run
@@ -689,15 +689,26 @@ class Builder:
 
         def get_failure_reason():
             # Output is a tuple (package, version), or None.
-            lastlines = decode_bytes(subprocess.Popen(
-                ["tail", "-n", "3", os.path.join(self.outputdir, "logfile")],
-                stdout=subprocess.PIPE).communicate()[0]).splitlines()
-
-            regexp = re.compile(r'make: \*\*\* .*/(?:build|toolchain)/([^/]*)/')
-            for line in lastlines:
-                m = regexp.search(line)
-                if m:
-                    return m.group(1).rsplit('-', 1)
+            # Output is "package-reproducible" in case of reproducibility failure.
+
+            reproducible_results = os.path.join(self.resultdir, "diffoscope-results.json")
+            if os.path.exists(reproducible_results) and os.stat(reproducible_results).st_size > 0:
+                if self.sysinfo.has("diffoscope"):
+                    reason = get_reproducibility_failure_reason(reproducible_results)
+                    reason.append("nonreproducible")
+                    return reason
+                else:
+                    return ["nonreproducible"]
+            else:
+                lastlines = decode_bytes(subprocess.Popen(
+                    ["tail", "-n", "3", os.path.join(self.outputdir, "logfile")],
+                    stdout=subprocess.PIPE).communicate()[0]).splitlines()
+
+                regexp = re.compile(r'make: \*\*\* .*/(?:build|toolchain)/([^/]*)/')
+                for line in lastlines:
+                    m = regexp.search(line)
+                    if m:
+                        return m.group(1).rsplit('-', 1)
 
             # not found
             return None
-- 
2.22.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [Buildroot] [PATCH v4 4/5] autobuild-run: move with open to appropriate place in check_reproducibility()
  2019-08-20 14:52 [Buildroot] [PATCH v4 1/5] autobuild-run: check if reproducibile_results exists before checking its size Atharva Lele
  2019-08-20 14:52 ` [Buildroot] [PATCH v4 2/5] autobuild-run: initial implementation of get_reproducibility_failure_reason() Atharva Lele
  2019-08-20 14:52 ` [Buildroot] [PATCH v4 3/5] autobuild-run: account for reproducibility failures in get_failure_reason() Atharva Lele
@ 2019-08-20 14:52 ` Atharva Lele
  2019-08-20 14:52 ` [Buildroot] [PATCH v4 5/5] autobuild-run: initial implementation of categorization() of nonreproducibility Atharva Lele
  2019-09-08 16:43 ` [Buildroot] [PATCH v4 1/5] autobuild-run: check if reproducibile_results exists before checking its size Arnout Vandecappelle
  4 siblings, 0 replies; 16+ messages in thread
From: Atharva Lele @ 2019-08-20 14:52 UTC (permalink / raw)
  To: buildroot

Since we switched to using diffoscope's ability to write to a file, having a
with open() does not make sense while running the command. Instead, move it to
the else condition where it can be used by cmp.

Signed-off-by: Atharva Lele <itsatharva@gmail.com>
---
 scripts/autobuild-run | 26 +++++++++++++-------------
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/scripts/autobuild-run b/scripts/autobuild-run
index 876feb2..c25413b 100755
--- a/scripts/autobuild-run
+++ b/scripts/autobuild-run
@@ -446,19 +446,19 @@ class Builder:
         build_1_image = os.path.join(self.outputdir, "images", "rootfs.tar")
         build_2_image = os.path.join(self.outputdir_2, "images", "rootfs.tar")
 
-        with open(reproducible_results, 'w') as diff:
-            if self.sysinfo.has("diffoscope"):
-                # Prefix to point diffoscope towards cross-tools
-                prefix = subprocess.check_output(["make", "--no-print-directory", "O=%s" % self.outputdir,
-                                                  "-C", self.srcdir, "printvars", "VARS=TARGET_CROSS"])
-                # Remove TARGET_CROSS= and \n from the string
-                prefix = prefix[13:-1]
-                log_write(self.log, "INFO: running diffoscope on images")
-                subprocess.call(["diffoscope", build_1_image, build_2_image,
-                                 "--tool-prefix-binutils", prefix, "--json", reproducible_results,
-                                 "--text", reproducible_results_txt, "--max-text-report-size", "40000"],
-                                stderr=self.log)
-            else:
+        if self.sysinfo.has("diffoscope"):
+            # Prefix to point diffoscope towards cross-tools
+            prefix = subprocess.check_output(["make", "--no-print-directory", "O=%s" % self.outputdir,
+                                              "-C", self.srcdir, "printvars", "VARS=TARGET_CROSS"])
+            # Remove TARGET_CROSS= and \n from the string
+            prefix = prefix[13:-1]
+            log_write(self.log, "INFO: running diffoscope on images")
+            subprocess.call(["diffoscope", build_1_image, build_2_image,
+                             "--tool-prefix-binutils", prefix, "--json", reproducible_results,
+                             "--text", reproducible_results_txt, "--max-text-report-size", "40000"],
+                            stderr=self.log)
+        else:
+            with open(reproducible_results_txt, 'w') as diff:
                 log_write(self.log, "INFO: diffoscope not installed, falling back to cmp")
                 subprocess.call(["cmp", "-b", build_1_image, build_2_image], stdout=diff, stderr=self.log)
 
-- 
2.22.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [Buildroot] [PATCH v4 5/5] autobuild-run: initial implementation of categorization() of nonreproducibility
  2019-08-20 14:52 [Buildroot] [PATCH v4 1/5] autobuild-run: check if reproducibile_results exists before checking its size Atharva Lele
                   ` (2 preceding siblings ...)
  2019-08-20 14:52 ` [Buildroot] [PATCH v4 4/5] autobuild-run: move with open to appropriate place in check_reproducibility() Atharva Lele
@ 2019-08-20 14:52 ` Atharva Lele
  2019-09-08 16:43 ` [Buildroot] [PATCH v4 1/5] autobuild-run: check if reproducibile_results exists before checking its size Arnout Vandecappelle
  4 siblings, 0 replies; 16+ messages in thread
From: Atharva Lele @ 2019-08-20 14:52 UTC (permalink / raw)
  To: buildroot

Build ID and Build Path reproducibility issues are easy to identify and thus we
start categorization with these issues.

Signed-off-by: Atharva Lele <itsatharva@gmail.com>
---
 scripts/autobuild-run | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/scripts/autobuild-run b/scripts/autobuild-run
index c25413b..83acaad 100755
--- a/scripts/autobuild-run
+++ b/scripts/autobuild-run
@@ -131,6 +131,7 @@ import csv
 import docopt
 import errno
 import hashlib
+from itertools import izip
 import json
 import mmap
 import multiprocessing
@@ -641,6 +642,26 @@ class Builder:
                 if "source2" in l:
                     l.pop("source2")
 
+            def categorize(added, deleted):
+                # In some deltas, the only part of output directory is captured.
+                # For eg. "put-1" or "tput-2", thus we must check all such possibilities.
+                # Start with 3 letter combinations to avoid false positives.
+                path_1 = "output-1"
+                path_2 = "output-2"
+                paths = [path_1[i:j] for i in range(len(path_1)) for j in range(i+3, len(path_1)+1)]
+                paths_2 = [path_2[i:j] for i in range(len(path_1)) for j in range(i+3, len(path_1)+1)]
+                paths = paths + paths_2
+                # We need to iterate over the deltas simultaneously.
+                for a, d in izip(added, deleted):
+                    for p in paths:
+                        if p in a or p in d:
+                            return "Embedded Path"
+                    if "Build ID" in a or "Build ID" in d:
+                        return "Build ID variation"
+                    else:
+                        continue
+                return "not found"
+
             packages_file_list = os.path.join(self.outputdir, "build", "packages-file-list.txt")
 
             with open(reproducible_results, "r") as reproduciblef:
@@ -667,12 +688,18 @@ class Builder:
                             item_details["added"] = split_deltas[0][:100]
                             item_details["deleted"] = split_deltas[1][:100]
                             cleanup(item_details)
+                            category = categorize(item_details["added"], item_details["deleted"])
+                            if category is not "not found":
+                                item["category"] = category
+                                break
                     else:
                         diff = item["unified_diff"].split("\n")
                         split_deltas = split_delta(diff)
                         item["added"] = split_deltas[0][:100]
                         item["deleted"] = split_deltas[1][:100]
                     cleanup(item)
+                    if "added" in item or "deleted" in item:
+                        item["category"] = categorize(item["added"], item["deleted"])
                 # We currently just set the reason from first non-reproducible package in the
                 # dictionary.
                 reason = json_data["details"][0]["package"]
-- 
2.22.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [Buildroot] [PATCH v4 1/5] autobuild-run: check if reproducibile_results exists before checking its size
  2019-08-20 14:52 [Buildroot] [PATCH v4 1/5] autobuild-run: check if reproducibile_results exists before checking its size Atharva Lele
                   ` (3 preceding siblings ...)
  2019-08-20 14:52 ` [Buildroot] [PATCH v4 5/5] autobuild-run: initial implementation of categorization() of nonreproducibility Atharva Lele
@ 2019-09-08 16:43 ` Arnout Vandecappelle
  2019-09-12 12:00   ` Atharva Lele
  4 siblings, 1 reply; 16+ messages in thread
From: Arnout Vandecappelle @ 2019-09-08 16:43 UTC (permalink / raw)
  To: buildroot



On 20/08/2019 16:52, Atharva Lele wrote:
> This was causing some reproducible configs to crash autobuild script since it
> tried to check the size of a file that did not exist.

 How does it happen that it doesn't exist? A few lines above, we have:

with open(reproducible_results, 'w') as diff:
   ..

This will create the file, even if nothing is written to it.

 Or maybe this issue only occurs after patch 4 has been applied?

 Regards,
 Arnout

> 
> Signed-off-by: Atharva Lele <itsatharva@gmail.com>
> ---
>  scripts/autobuild-run | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/scripts/autobuild-run b/scripts/autobuild-run
> index ead81a0..99b57dd 100755
> --- a/scripts/autobuild-run
> +++ b/scripts/autobuild-run
> @@ -461,7 +461,7 @@ class Builder:
>                  log_write(self.log, "INFO: diffoscope not installed, falling back to cmp")
>                  subprocess.call(["cmp", "-b", build_1_image, build_2_image], stdout=diff, stderr=self.log)
>  
> -        if os.stat(reproducible_results).st_size > 0:
> +        if os.path.exists(reproducible_results) and os.stat(reproducible_results).st_size > 0:
>              log_write(self.log, "INFO: Build is non-reproducible.")
>              return -1
>  
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Buildroot] [PATCH v4 2/5] autobuild-run: initial implementation of get_reproducibility_failure_reason()
  2019-08-20 14:52 ` [Buildroot] [PATCH v4 2/5] autobuild-run: initial implementation of get_reproducibility_failure_reason() Atharva Lele
@ 2019-09-08 17:06   ` Arnout Vandecappelle
  2019-09-08 22:42     ` Thomas Petazzoni
  2019-09-12 12:47     ` Atharva Lele
  0 siblings, 2 replies; 16+ messages in thread
From: Arnout Vandecappelle @ 2019-09-08 17:06 UTC (permalink / raw)
  To: buildroot

 Hi Atharva,

 I'm very sorry that I'm so late reviewing this...

On 20/08/2019 16:52, Atharva Lele wrote:
> Analyze the JSON formatted output from diffoscope and check if
> the differences are due to a filesystem reproducibility issue
> or a package reproducibility issue.
> 
> Also, discard the deltas because they might take up too much space.
> 
> Signed-off-by: Atharva Lele <itsatharva@gmail.com>
> ---
> Changes v2 -> v4:
>   - Change if-else to try-except
>   - remove blank line
> Changes v1 -> v2:
>   - Refactor using subfunctions and local variables (suggested by Thomas)
>   - Added comments (suggested by Thomas)
>   - Use more pythonic loops (suggested by Thomas)
> ---
>  scripts/autobuild-run | 88 +++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 88 insertions(+)
> 
> diff --git a/scripts/autobuild-run b/scripts/autobuild-run
> index 99b57dd..384cf73 100755
> --- a/scripts/autobuild-run
> +++ b/scripts/autobuild-run
> @@ -131,6 +131,7 @@ import csv
>  import docopt
>  import errno
>  import hashlib
> +import json
>  import mmap
>  import multiprocessing
>  import os
> @@ -599,6 +600,93 @@ class Builder:
>          if reject_results():
>              return
>  
> +        def get_reproducibility_failure_reason(reproducible_results):

 This function really warrants a docstring. Especially because it does more than
the title says: it not only returns the reason, but it also updates the JSON
file with it and changes the way the diffs are stored.

> +            def split_delta(delta):
> +                # Take a delta and split it into added, deleted lines.
> +                added = []
> +                deleted = []
> +                for line in delta:
> +                    if line.startswith("+"):
> +                        added.append(line)
> +                    if line.startswith("-"):
> +                        deleted.append(line)
> +                return added, deleted

 A more pythonesque way to do this would be

added = [line for line in delta if line.startswith("+")]

> +
> +            def get_package(sourcef):
> +                # Returns which package the source file belongs to.
> +                with open(packages_file_list, "r") as packagef:

 It is strange to have packages_file_list defined at a higher scope and then
used within this function. I think it's better to move its definition inside
this function as well.

> +                    for line in packagef:
> +                        if sourcef in line:
> +                            package = line.split(',')[0]

 I think it is better to keep a list of packages for the JSON output, and use
the last one for the reason output. So here you would have:

 packages = [line.split(',', 1)[0] for line in packagef if sourcef in line]


> +
> +                try:
> +                    # Get package version
> +                    package_info = json.loads(subprocess.check_output(["make", "--no-print-directory",
> +                                                                       "O=%s" % self.outputdir,
> +                                                                       "-C", self.srcdir,
> +                                                                       "%s-show-info" % package]))
> +                    if "version" in package_info[package]:
> +                        version = package_info[package]["version"]
> +                        return [package, version]

 I don't know how useful it is to extract the package version... It is currently
part of the reason file mostly by accident, I think.

> +                    else:
> +                        return [package]
> +                except:
> +                    return ["not found"]
> +
> +            def cleanup(l):
> +                # Takes a list and removes data which is redundant (source2) or data
> +                # that might take up too much space (like huge diffs).
> +                if "unified_diff" in l:
> +                    l.pop("unified_diff")

 Condition can be avoided with

 l.pop("unified_diff", None)

(or maybe that's only python3? I'm not used to legacy python any more :-)

> +                if "source2" in l:
> +                    l.pop("source2")
> +
> +            packages_file_list = os.path.join(self.outputdir, "build", "packages-file-list.txt")
> +
> +            with open(reproducible_results, "r") as reproduciblef:
> +                json_data = json.load(reproduciblef)
> +
> +            if json_data["unified_diff"] == None:

 == None makes little sense - you should just use 'not' in that case. If you
really want to check that it's None, you should use 'is None'.  However, do you
really get a null entry in the json output? Isn't it that the "unified_diff"
section is missing in the json output? In that case, the
json_data["unified_diff"] would give a KeyError exception...

> +                # Remove the file list because it is not useful, i.e. it only shows
> +                # which files vary, and nothing more.
> +                if json_data["details"][0]["source1"] == "file list":
> +                    json_data["details"].pop(0)
> +
> +                # Iterate over details in the diffoscope output.
> +                for item in json_data["details"]:
> +                    diff_src = item["source1"]
> +                    item["package"] = get_package(diff_src)
> +
> +                    # In some cases, diffoscope uses multiple commands to get various
> +                    # diffs. Due to this, it generates a "details" key for those files
> +                    # instead of just storing the diff in the "unified_diff" key.
> +                    if item["unified_diff"] == None:
> +                        for item_details in item["details"]:
> +                            diff = item_details["unified_diff"].split("\n")
> +                            split_deltas = split_delta(diff)
> +                            item_details["added"] = split_deltas[0][:100]
> +                            item_details["deleted"] = split_deltas[1][:100]
> +                            cleanup(item_details)
> +                    else:
> +                        diff = item["unified_diff"].split("\n")
> +                        split_deltas = split_delta(diff)
> +                        item["added"] = split_deltas[0][:100]
> +                        item["deleted"] = split_deltas[1][:100]
> +                    cleanup(item)

 This cleanup has nothing to do with getting the failure reason. I think it
would be better to do it in a separate function (and a separate patch). Also,
I'm not really happy yet with this diff cleanup, because we loose all context
information. So maybe, for now, it's better to just use the --max-report-size
option.

> +                # We currently just set the reason from first non-reproducible package in the
> +                # dictionary.
> +                reason = json_data["details"][0]["package"]
> +
> +                # If there does exist a unified_diff directly for the .tar images, it is probably
> +                # a filesystem reproducibility issue.

 This comment should come after the else, or before the condition. Like this, it
looks like it belongs to the == None condition.

 Maybe it's cleaner to swap the condition and put this filesystem assumption first.


 Regards,
 Arnout

> +            else:
> +                reason = ["filesystem"]
> +
> +            with open(reproducible_results, "w") as reproduciblef:
> +                json.dump(json_data, reproduciblef, sort_keys=True, indent=4)
> +
> +            return reason
> +
>          def get_failure_reason():
>              # Output is a tuple (package, version), or None.
>              lastlines = decode_bytes(subprocess.Popen(
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Buildroot] [PATCH v4 3/5] autobuild-run: account for reproducibility failures in get_failure_reason()
  2019-08-20 14:52 ` [Buildroot] [PATCH v4 3/5] autobuild-run: account for reproducibility failures in get_failure_reason() Atharva Lele
@ 2019-09-08 17:13   ` Arnout Vandecappelle
  2019-09-12 12:59     ` Atharva Lele
  0 siblings, 1 reply; 16+ messages in thread
From: Arnout Vandecappelle @ 2019-09-08 17:13 UTC (permalink / raw)
  To: buildroot



On 20/08/2019 16:52, Atharva Lele wrote:
> Signed-off-by: Atharva Lele <itsatharva@gmail.com>
> ---
> Changes v1 -> v3:
>   - Account for changed name of diffoscope output
> ---
>  scripts/autobuild-run | 29 ++++++++++++++++++++---------
>  1 file changed, 20 insertions(+), 9 deletions(-)
> 
> diff --git a/scripts/autobuild-run b/scripts/autobuild-run
> index 384cf73..876feb2 100755
> --- a/scripts/autobuild-run
> +++ b/scripts/autobuild-run
> @@ -689,15 +689,26 @@ class Builder:
>  
>          def get_failure_reason():
>              # Output is a tuple (package, version), or None.
> -            lastlines = decode_bytes(subprocess.Popen(
> -                ["tail", "-n", "3", os.path.join(self.outputdir, "logfile")],
> -                stdout=subprocess.PIPE).communicate()[0]).splitlines()
> -
> -            regexp = re.compile(r'make: \*\*\* .*/(?:build|toolchain)/([^/]*)/')
> -            for line in lastlines:
> -                m = regexp.search(line)
> -                if m:
> -                    return m.group(1).rsplit('-', 1)
> +            # Output is "package-reproducible" in case of reproducibility failure.
> +
> +            reproducible_results = os.path.join(self.resultdir, "diffoscope-results.json")

 Since file is now used in several functions, it would make sense to move it to
the class itself, like we do for so many others.

> +            if os.path.exists(reproducible_results) and os.stat(reproducible_results).st_size > 0:
> +                if self.sysinfo.has("diffoscope"):

 It would make more sense to move this condition up - if diffoscope is not
installed, there will be no diffoscope-results.json. Oh, actually there will be,
but it's not a JSON file... So, actually, in case diffoscope is not installed,
we should not create diffoscope-results.json, but diffoscope-results.txt (and no
json).

 Regards,
 Arnout

> +                    reason = get_reproducibility_failure_reason(reproducible_results)
> +                    reason.append("nonreproducible")
> +                    return reason
> +                else:
> +                    return ["nonreproducible"]
> +            else:
> +                lastlines = decode_bytes(subprocess.Popen(
> +                    ["tail", "-n", "3", os.path.join(self.outputdir, "logfile")],
> +                    stdout=subprocess.PIPE).communicate()[0]).splitlines()
> +
> +                regexp = re.compile(r'make: \*\*\* .*/(?:build|toolchain)/([^/]*)/')
> +                for line in lastlines:
> +                    m = regexp.search(line)
> +                    if m:
> +                        return m.group(1).rsplit('-', 1)
>  
>              # not found
>              return None
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Buildroot] [PATCH v4 2/5] autobuild-run: initial implementation of get_reproducibility_failure_reason()
  2019-09-08 17:06   ` Arnout Vandecappelle
@ 2019-09-08 22:42     ` Thomas Petazzoni
  2019-09-09  7:35       ` Arnout Vandecappelle
  2019-09-12 12:47     ` Atharva Lele
  1 sibling, 1 reply; 16+ messages in thread
From: Thomas Petazzoni @ 2019-09-08 22:42 UTC (permalink / raw)
  To: buildroot

On Sun, 8 Sep 2019 19:06:11 +0200
Arnout Vandecappelle <arnout@mind.be> wrote:

> > +                    if "version" in package_info[package]:
> > +                        version = package_info[package]["version"]
> > +                        return [package, version]  
> 
>  I don't know how useful it is to extract the package version... It is currently
> part of the reason file mostly by accident, I think.

I find it quite useful to have the version in the failure reason.
Thanks to that, when you search with ?reason=foo%, you can see
immediately in the list which versions of the package are/were affected
by a failure.

Thomas
-- 
Thomas Petazzoni, CTO, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Buildroot] [PATCH v4 2/5] autobuild-run: initial implementation of get_reproducibility_failure_reason()
  2019-09-08 22:42     ` Thomas Petazzoni
@ 2019-09-09  7:35       ` Arnout Vandecappelle
  2019-09-09  7:45         ` Thomas Petazzoni
  0 siblings, 1 reply; 16+ messages in thread
From: Arnout Vandecappelle @ 2019-09-09  7:35 UTC (permalink / raw)
  To: buildroot



On 09/09/2019 00:42, Thomas Petazzoni wrote:
> On Sun, 8 Sep 2019 19:06:11 +0200
> Arnout Vandecappelle <arnout@mind.be> wrote:
> 
>>> +                    if "version" in package_info[package]:
>>> +                        version = package_info[package]["version"]
>>> +                        return [package, version]  
>>
>>  I don't know how useful it is to extract the package version... It is currently
>> part of the reason file mostly by accident, I think.
> 
> I find it quite useful to have the version in the failure reason.
> Thanks to that, when you search with ?reason=foo%, you can see
> immediately in the list which versions of the package are/were affected
> by a failure.

 Yes, but:

1. I really think it is there by accident :-)

2. For reproducibility failures, I don't think it is that relevant.

In case a failure was introduced by a version bump, you anyway have to check by
hand when the version was bumped to be sure that that is really the cause.

 Regards,
 Arnout

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Buildroot] [PATCH v4 2/5] autobuild-run: initial implementation of get_reproducibility_failure_reason()
  2019-09-09  7:35       ` Arnout Vandecappelle
@ 2019-09-09  7:45         ` Thomas Petazzoni
  0 siblings, 0 replies; 16+ messages in thread
From: Thomas Petazzoni @ 2019-09-09  7:45 UTC (permalink / raw)
  To: buildroot

On Mon, 9 Sep 2019 09:35:34 +0200
Arnout Vandecappelle <arnout@mind.be> wrote:

>  Yes, but:
> 
> 1. I really think it is there by accident :-)
> 
> 2. For reproducibility failures, I don't think it is that relevant.
> 
> In case a failure was introduced by a version bump, you anyway have to check by
> hand when the version was bumped to be sure that that is really the cause.

Agreed.

Thomas
-- 
Thomas Petazzoni, CTO, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Buildroot] [PATCH v4 1/5] autobuild-run: check if reproducibile_results exists before checking its size
  2019-09-08 16:43 ` [Buildroot] [PATCH v4 1/5] autobuild-run: check if reproducibile_results exists before checking its size Arnout Vandecappelle
@ 2019-09-12 12:00   ` Atharva Lele
  0 siblings, 0 replies; 16+ messages in thread
From: Atharva Lele @ 2019-09-12 12:00 UTC (permalink / raw)
  To: buildroot

On Sun, Sep 8, 2019 at 10:13 PM Arnout Vandecappelle <arnout@mind.be> wrote:
>
>
>
> On 20/08/2019 16:52, Atharva Lele wrote:
> > This was causing some reproducible configs to crash autobuild script since it
> > tried to check the size of a file that did not exist.
>
>  How does it happen that it doesn't exist? A few lines above, we have:
>

Exactly. I was baffled by this as well. It seems to happen randomly.

> with open(reproducible_results, 'w') as diff:
>    ..
>
> This will create the file, even if nothing is written to it.
>
>  Or maybe this issue only occurs after patch 4 has been applied?
>

It occurs even without patch 4 applied.

>  Regards,
>  Arnout
>
> >
> > Signed-off-by: Atharva Lele <itsatharva@gmail.com>
> > ---
> >  scripts/autobuild-run | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/scripts/autobuild-run b/scripts/autobuild-run
> > index ead81a0..99b57dd 100755
> > --- a/scripts/autobuild-run
> > +++ b/scripts/autobuild-run
> > @@ -461,7 +461,7 @@ class Builder:
> >                  log_write(self.log, "INFO: diffoscope not installed, falling back to cmp")
> >                  subprocess.call(["cmp", "-b", build_1_image, build_2_image], stdout=diff, stderr=self.log)
> >
> > -        if os.stat(reproducible_results).st_size > 0:
> > +        if os.path.exists(reproducible_results) and os.stat(reproducible_results).st_size > 0:
> >              log_write(self.log, "INFO: Build is non-reproducible.")
> >              return -1
> >
> >

Apologies for the late reply. I was recommended to rest for a couple
of days after being hit by the flu.

-- 
Regards,
Atharva Lele

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Buildroot] [PATCH v4 2/5] autobuild-run: initial implementation of get_reproducibility_failure_reason()
  2019-09-08 17:06   ` Arnout Vandecappelle
  2019-09-08 22:42     ` Thomas Petazzoni
@ 2019-09-12 12:47     ` Atharva Lele
  2019-09-14 17:27       ` Arnout Vandecappelle
  1 sibling, 1 reply; 16+ messages in thread
From: Atharva Lele @ 2019-09-12 12:47 UTC (permalink / raw)
  To: buildroot

On Sun, Sep 8, 2019 at 10:36 PM Arnout Vandecappelle <arnout@mind.be> wrote:
>
>  Hi Atharva,
>
>  I'm very sorry that I'm so late reviewing this...
>
> On 20/08/2019 16:52, Atharva Lele wrote:
> > Analyze the JSON formatted output from diffoscope and check if
> > the differences are due to a filesystem reproducibility issue
> > or a package reproducibility issue.
> >
> > Also, discard the deltas because they might take up too much space.
> >
> > Signed-off-by: Atharva Lele <itsatharva@gmail.com>
> > ---
> > Changes v2 -> v4:
> >   - Change if-else to try-except
> >   - remove blank line
> > Changes v1 -> v2:
> >   - Refactor using subfunctions and local variables (suggested by Thomas)
> >   - Added comments (suggested by Thomas)
> >   - Use more pythonic loops (suggested by Thomas)
> > ---
> >  scripts/autobuild-run | 88 +++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 88 insertions(+)
> >
> > diff --git a/scripts/autobuild-run b/scripts/autobuild-run
> > index 99b57dd..384cf73 100755
> > --- a/scripts/autobuild-run
> > +++ b/scripts/autobuild-run
> > @@ -131,6 +131,7 @@ import csv
> >  import docopt
> >  import errno
> >  import hashlib
> > +import json
> >  import mmap
> >  import multiprocessing
> >  import os
> > @@ -599,6 +600,93 @@ class Builder:
> >          if reject_results():
> >              return
> >
> > +        def get_reproducibility_failure_reason(reproducible_results):
>
>  This function really warrants a docstring. Especially because it does more than
> the title says: it not only returns the reason, but it also updates the JSON
> file with it and changes the way the diffs are stored.
>

That's true.. I'll add it in a future revision.

>  A more pythonesque way to do this would be
>
> added = [line for line in delta if line.startswith("+")]
>

Man.. 3 months coding mostly in python and its still hard to code the
python way. ;)
Thanks!

> > +
> > +            def get_package(sourcef):
> > +                # Returns which package the source file belongs to.
> > +                with open(packages_file_list, "r") as packagef:
>
>  It is strange to have packages_file_list defined at a higher scope and then
> used within this function. I think it's better to move its definition inside
> this function as well.
>

Right, thanks!

> > +                    for line in packagef:
> > +                        if sourcef in line:
> > +                            package = line.split(',')[0]
>
>  I think it is better to keep a list of packages for the JSON output, and use
> the last one for the reason output. So here you would have:
>
>  packages = [line.split(',', 1)[0] for line in packagef if sourcef in line]
>

Alright, I'll switch it.

>
> > +
> > +                try:
> > +                    # Get package version
> > +                    package_info = json.loads(subprocess.check_output(["make", "--no-print-directory",
> > +                                                                       "O=%s" % self.outputdir,
> > +                                                                       "-C", self.srcdir,
> > +                                                                       "%s-show-info" % package]))
> > +                    if "version" in package_info[package]:
> > +                        version = package_info[package]["version"]
> > +                        return [package, version]
>
>  I don't know how useful it is to extract the package version... It is currently
> part of the reason file mostly by accident, I think.
>

I kept it purely because it was being reported on the autobuilder site
since before I started work. I read the comments from you and Thomas'
and I agree that we'd just need to manually check things. So it
doesn't make sense to go the extra efforts of extracting the version
here. I'll remove it for the next version of the patch.

> > +                    else:
> > +                        return [package]
> > +                except:
> > +                    return ["not found"]
> > +
> > +            def cleanup(l):
> > +                # Takes a list and removes data which is redundant (source2) or data
> > +                # that might take up too much space (like huge diffs).
> > +                if "unified_diff" in l:
> > +                    l.pop("unified_diff")
>
>  Condition can be avoided with
>
>  l.pop("unified_diff", None)
>
> (or maybe that's only python3? I'm not used to legacy python any more :-)
>

That is valid for dictionaries, even in python2. However, we are
passing a list element to cleanup() and list.pop() only takes in 1
argument i.e. the index.

> > +                if "source2" in l:
> > +                    l.pop("source2")
> > +
> > +            packages_file_list = os.path.join(self.outputdir, "build", "packages-file-list.txt")
> > +
> > +            with open(reproducible_results, "r") as reproduciblef:
> > +                json_data = json.load(reproduciblef)
> > +
> > +            if json_data["unified_diff"] == None:
>
>  == None makes little sense - you should just use 'not' in that case. If you
> really want to check that it's None, you should use 'is None'.  However, do you
> really get a null entry in the json output? Isn't it that the "unified_diff"
> section is missing in the json output? In that case, the
> json_data["unified_diff"] would give a KeyError exception...

I do get a null entry in the json output. unified_diff is not missing
but rather set to null. I should use 'is None' then. Thanks!

>
> > +                # Remove the file list because it is not useful, i.e. it only shows
> > +                # which files vary, and nothing more.
> > +                if json_data["details"][0]["source1"] == "file list":
> > +                    json_data["details"].pop(0)
> > +
> > +                # Iterate over details in the diffoscope output.
> > +                for item in json_data["details"]:
> > +                    diff_src = item["source1"]
> > +                    item["package"] = get_package(diff_src)
> > +
> > +                    # In some cases, diffoscope uses multiple commands to get various
> > +                    # diffs. Due to this, it generates a "details" key for those files
> > +                    # instead of just storing the diff in the "unified_diff" key.
> > +                    if item["unified_diff"] == None:
> > +                        for item_details in item["details"]:
> > +                            diff = item_details["unified_diff"].split("\n")
> > +                            split_deltas = split_delta(diff)
> > +                            item_details["added"] = split_deltas[0][:100]
> > +                            item_details["deleted"] = split_deltas[1][:100]
> > +                            cleanup(item_details)
> > +                    else:
> > +                        diff = item["unified_diff"].split("\n")
> > +                        split_deltas = split_delta(diff)
> > +                        item["added"] = split_deltas[0][:100]
> > +                        item["deleted"] = split_deltas[1][:100]
> > +                    cleanup(item)
>
>  This cleanup has nothing to do with getting the failure reason. I think it
> would be better to do it in a separate function (and a separate patch). Also,
> I'm not really happy yet with this diff cleanup, because we loose all context
> information. So maybe, for now, it's better to just use the --max-report-size
> option.
>

You're right, until we find a way to save context its better to use
--max-report-size. What about the issue that it truncates diff output?

> > +                # We currently just set the reason from first non-reproducible package in the
> > +                # dictionary.
> > +                reason = json_data["details"][0]["package"]
> > +
> > +                # If there does exist a unified_diff directly for the .tar images, it is probably
> > +                # a filesystem reproducibility issue.
>
>  This comment should come after the else, or before the condition. Like this, it
> looks like it belongs to the == None condition.
>
>  Maybe it's cleaner to swap the condition and put this filesystem assumption first.
>
Ah yes, it will definitely be cleaner to swap the conditions.

>
>  Regards,
>  Arnout
>
> > +            else:
> > +                reason = ["filesystem"]
> > +
> > +            with open(reproducible_results, "w") as reproduciblef:
> > +                json.dump(json_data, reproduciblef, sort_keys=True, indent=4)
> > +
> > +            return reason
> > +
> >          def get_failure_reason():
> >              # Output is a tuple (package, version), or None.
> >              lastlines = decode_bytes(subprocess.Popen(
> >

Thanks for the review! And sorry for replying a few days late..

-- 
Regards,
Atharva Lele

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Buildroot] [PATCH v4 3/5] autobuild-run: account for reproducibility failures in get_failure_reason()
  2019-09-08 17:13   ` Arnout Vandecappelle
@ 2019-09-12 12:59     ` Atharva Lele
  2019-09-14 17:33       ` Arnout Vandecappelle
  0 siblings, 1 reply; 16+ messages in thread
From: Atharva Lele @ 2019-09-12 12:59 UTC (permalink / raw)
  To: buildroot

On Sun, Sep 8, 2019 at 10:43 PM Arnout Vandecappelle <arnout@mind.be> wrote:
>
>
>
> On 20/08/2019 16:52, Atharva Lele wrote:
> > Signed-off-by: Atharva Lele <itsatharva@gmail.com>
> > ---
> > Changes v1 -> v3:
> >   - Account for changed name of diffoscope output
> > ---
> >  scripts/autobuild-run | 29 ++++++++++++++++++++---------
> >  1 file changed, 20 insertions(+), 9 deletions(-)
> >
> > diff --git a/scripts/autobuild-run b/scripts/autobuild-run
> > index 384cf73..876feb2 100755
> > --- a/scripts/autobuild-run
> > +++ b/scripts/autobuild-run
> > @@ -689,15 +689,26 @@ class Builder:
> >
> >          def get_failure_reason():
> >              # Output is a tuple (package, version), or None.
> > -            lastlines = decode_bytes(subprocess.Popen(
> > -                ["tail", "-n", "3", os.path.join(self.outputdir, "logfile")],
> > -                stdout=subprocess.PIPE).communicate()[0]).splitlines()
> > -
> > -            regexp = re.compile(r'make: \*\*\* .*/(?:build|toolchain)/([^/]*)/')
> > -            for line in lastlines:
> > -                m = regexp.search(line)
> > -                if m:
> > -                    return m.group(1).rsplit('-', 1)
> > +            # Output is "package-reproducible" in case of reproducibility failure.
> > +
> > +            reproducible_results = os.path.join(self.resultdir, "diffoscope-results.json")
>
>  Since file is now used in several functions, it would make sense to move it to
> the class itself, like we do for so many others.

Yup will do.

>
> > +            if os.path.exists(reproducible_results) and os.stat(reproducible_results).st_size > 0:
> > +                if self.sysinfo.has("diffoscope"):
>
>  It would make more sense to move this condition up - if diffoscope is not
> installed, there will be no diffoscope-results.json. Oh, actually there will be,
> but it's not a JSON file... So, actually, in case diffoscope is not installed,
> we should not create diffoscope-results.json, but diffoscope-results.txt (and no
> json).
>

We're now outputting two files:
1) diffoscope-results.json for tooling purposes (reproducible_results)
2) diffoscope-results.txt for human-reading (reproducible_results_txt)
diffoscope-results.txt is used if diffoscope is not installed (check
patch 4 in the series).

So I guess after patch 4, checking if diffoscope is installed here
doesn't make sense. If reproducible_results exists, it would mean that
its a diffoscope outputted file.

>  Regards,
>  Arnout
>
> > +                    reason = get_reproducibility_failure_reason(reproducible_results)
> > +                    reason.append("nonreproducible")
> > +                    return reason
> > +                else:
> > +                    return ["nonreproducible"]
> > +            else:
> > +                lastlines = decode_bytes(subprocess.Popen(
> > +                    ["tail", "-n", "3", os.path.join(self.outputdir, "logfile")],
> > +                    stdout=subprocess.PIPE).communicate()[0]).splitlines()
> > +
> > +                regexp = re.compile(r'make: \*\*\* .*/(?:build|toolchain)/([^/]*)/')
> > +                for line in lastlines:
> > +                    m = regexp.search(line)
> > +                    if m:
> > +                        return m.group(1).rsplit('-', 1)
> >
> >              # not found
> >              return None
> >

Thanks for the review!

-- 
Regards,
Atharva Lele

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Buildroot] [PATCH v4 2/5] autobuild-run: initial implementation of get_reproducibility_failure_reason()
  2019-09-12 12:47     ` Atharva Lele
@ 2019-09-14 17:27       ` Arnout Vandecappelle
  0 siblings, 0 replies; 16+ messages in thread
From: Arnout Vandecappelle @ 2019-09-14 17:27 UTC (permalink / raw)
  To: buildroot



On 12/09/2019 14:47, Atharva Lele wrote:
> On Sun, Sep 8, 2019 at 10:36 PM Arnout Vandecappelle <arnout@mind.be> wrote:
>>
>>  Hi Atharva,
>>
>>  I'm very sorry that I'm so late reviewing this...
>>
>> On 20/08/2019 16:52, Atharva Lele wrote:
[snip]
>>> +            def cleanup(l):
>>> +                # Takes a list and removes data which is redundant (source2) or data
>>> +                # that might take up too much space (like huge diffs).
>>> +                if "unified_diff" in l:
>>> +                    l.pop("unified_diff")
>>
>>  Condition can be avoided with
>>
>>  l.pop("unified_diff", None)
>>
>> (or maybe that's only python3? I'm not used to legacy python any more :-)
>>
> 
> That is valid for dictionaries, even in python2. However, we are
> passing a list element to cleanup() and list.pop() only takes in 1
> argument i.e. the index.

 Err, but if it is a list, it would give us a TypeError, because list.pop()
takes an index as argument, not a value...

[snip]
>>  This cleanup has nothing to do with getting the failure reason. I think it
>> would be better to do it in a separate function (and a separate patch). Also,
>> I'm not really happy yet with this diff cleanup, because we loose all context
>> information. So maybe, for now, it's better to just use the --max-report-size
>> option.
>>
> 
> You're right, until we find a way to save context its better to use
> --max-report-size. What about the issue that it truncates diff output?

 I've never seen how exactly it truncates. If you have a really large diff, do
you just get - lines? Or is it slightly more intelligent and do you get both -
and + lines?

 Regardless, I don't think the current solution is that much better, so for now
I'd keep using the max-report-size option.



 Regards,
 Arnout

[snip]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Buildroot] [PATCH v4 3/5] autobuild-run: account for reproducibility failures in get_failure_reason()
  2019-09-12 12:59     ` Atharva Lele
@ 2019-09-14 17:33       ` Arnout Vandecappelle
  0 siblings, 0 replies; 16+ messages in thread
From: Arnout Vandecappelle @ 2019-09-14 17:33 UTC (permalink / raw)
  To: buildroot



On 12/09/2019 14:59, Atharva Lele wrote:
> On Sun, Sep 8, 2019 at 10:43 PM Arnout Vandecappelle <arnout@mind.be> wrote:
>>
>>
>>
>> On 20/08/2019 16:52, Atharva Lele wrote:
[snip]
>>> +            if os.path.exists(reproducible_results) and os.stat(reproducible_results).st_size > 0:
>>> +                if self.sysinfo.has("diffoscope"):
>>
>>  It would make more sense to move this condition up - if diffoscope is not
>> installed, there will be no diffoscope-results.json. Oh, actually there will be,
>> but it's not a JSON file... So, actually, in case diffoscope is not installed,
>> we should not create diffoscope-results.json, but diffoscope-results.txt (and no
>> json).
>>
> 
> We're now outputting two files:
> 1) diffoscope-results.json for tooling purposes (reproducible_results)
> 2) diffoscope-results.txt for human-reading (reproducible_results_txt)
> diffoscope-results.txt is used if diffoscope is not installed (check
> patch 4 in the series).
> 
> So I guess after patch 4, checking if diffoscope is installed here
> doesn't make sense. If reproducible_results exists, it would mean that
> its a diffoscope outputted file.

 Ah yes.

 It would make sense to move patch 4 earlier in the series I think. It's for
sure one of the less controversial changes in the series. But that may be a
pretty complicated rebase...

 Regards,
 Arnout

[snip]

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2019-09-14 17:33 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-20 14:52 [Buildroot] [PATCH v4 1/5] autobuild-run: check if reproducibile_results exists before checking its size Atharva Lele
2019-08-20 14:52 ` [Buildroot] [PATCH v4 2/5] autobuild-run: initial implementation of get_reproducibility_failure_reason() Atharva Lele
2019-09-08 17:06   ` Arnout Vandecappelle
2019-09-08 22:42     ` Thomas Petazzoni
2019-09-09  7:35       ` Arnout Vandecappelle
2019-09-09  7:45         ` Thomas Petazzoni
2019-09-12 12:47     ` Atharva Lele
2019-09-14 17:27       ` Arnout Vandecappelle
2019-08-20 14:52 ` [Buildroot] [PATCH v4 3/5] autobuild-run: account for reproducibility failures in get_failure_reason() Atharva Lele
2019-09-08 17:13   ` Arnout Vandecappelle
2019-09-12 12:59     ` Atharva Lele
2019-09-14 17:33       ` Arnout Vandecappelle
2019-08-20 14:52 ` [Buildroot] [PATCH v4 4/5] autobuild-run: move with open to appropriate place in check_reproducibility() Atharva Lele
2019-08-20 14:52 ` [Buildroot] [PATCH v4 5/5] autobuild-run: initial implementation of categorization() of nonreproducibility Atharva Lele
2019-09-08 16:43 ` [Buildroot] [PATCH v4 1/5] autobuild-run: check if reproducibile_results exists before checking its size Arnout Vandecappelle
2019-09-12 12:00   ` Atharva Lele

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.