All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] bitbake: fetch2: Revalidate checksums, YOCTO #5571
@ 2015-02-23 14:22 Clemens Lang
  2015-02-26 14:42 ` Clemens Lang
  0 siblings, 1 reply; 12+ messages in thread
From: Clemens Lang @ 2015-02-23 14:22 UTC (permalink / raw)
  To: Alejandro Hernandez; +Cc: yocto

[YOCTO #5571] -- https://bugzilla.yoctoproject.org/show_bug.cgi?id=5571

The following workflow (whether accidentally or deliberately) would
previously not result in a checksum error, but would be helpful to do
so:
 - Write a recipe with correct checksums
 - Fetch the sources for this recipe using bitbake
 - Change the checksums
Since the bitbake fetcher writes a done stamp after the initial download
and does not verify the checksums again (even if they are changed in the
recipe afterwards), the change of checksums is silently ignored.

Fix this without the overhead of computing the checksums from scratch on
every do_fetch by storing them in pickled format in the done stamp and
verifying that they still match those in the recipe.

Signed-off-by: Clemens Lang <clemens.lang@bmw-carit.de>
---
 bitbake/lib/bb/fetch2/__init__.py | 106 ++++++++++++++++++++++++++++++++++----
 1 file changed, 96 insertions(+), 10 deletions(-)

diff --git a/bitbake/lib/bb/fetch2/__init__.py b/bitbake/lib/bb/fetch2/__init__.py
index 599ea8c..c0c8312 100644
--- a/bitbake/lib/bb/fetch2/__init__.py
+++ b/bitbake/lib/bb/fetch2/__init__.py
@@ -45,6 +45,13 @@ _checksum_cache = bb.checksum.FileChecksumCache()
 
 logger = logging.getLogger("BitBake.Fetcher")
 
+try:
+    import cPickle as pickle
+except ImportError:
+    import pickle
+    logger.info("Importing cPickle failed. "
+                "Falling back to a very slow implementation.")
+
 class BBFetchException(Exception):
     """Class all fetch exceptions inherit from"""
     def __init__(self, message):
@@ -525,7 +532,7 @@ def fetcher_compare_revisions(d):
 def mirror_from_string(data):
     return [ i.split() for i in (data or "").replace('\\n','\n').split('\n') if i ]
 
-def verify_checksum(ud, d):
+def verify_checksum(ud, d, precomputed={}):
     """
     verify the MD5 and SHA256 checksum for downloaded src
 
@@ -533,13 +540,28 @@ def verify_checksum(ud, d):
     the downloaded file, or if BB_STRICT_CHECKSUM is set and there are no
     checksums specified.
 
+    Returns a dict of checksums that can be stored in a done stamp file and
+    passed in as precomputed parameter in a later call to avoid re-computing
+    the checksums from the file. This allows verifying the checksums of the
+    file against those in the recipe each time, rather than only after
+    downloading. See https://bugzilla.yoctoproject.org/show_bug.cgi?id=5571.
     """
 
+    _MD5_KEY = "md5"
+    _SHA256_KEY = "sha256"
+
     if ud.ignore_checksums or not ud.method.supports_checksum(ud):
-        return
+        return {}
 
-    md5data = bb.utils.md5_file(ud.localpath)
-    sha256data = bb.utils.sha256_file(ud.localpath)
+    if _MD5_KEY in precomputed:
+        md5data = precomputed[_MD5_KEY]
+    else:
+        md5data = bb.utils.md5_file(ud.localpath)
+
+    if _SHA256_KEY in precomputed:
+        sha256data = precomputed[_SHA256_KEY]
+    else:
+        sha256data = bb.utils.sha256_file(ud.localpath)
 
     if ud.method.recommends_checksum(ud):
         # If strict checking enabled and neither sum defined, raise error
@@ -589,6 +611,66 @@ def verify_checksum(ud, d):
     if len(msg):
         raise ChecksumError('Checksum mismatch!%s' % msg, ud.url, md5data)
 
+    return {
+        _MD5_KEY: md5data,
+        _SHA256_KEY: sha256data
+    }
+
+
+def verify_donestamp(ud, d):
+    """
+    Check whether the done stamp file has the right checksums (if the fetch
+    method supports them). If it doesn't, delete the done stamp and force
+    a re-download.
+
+    Returns True, if the donestamp exists and is valid, False otherwise. When
+    returning False, any existing done stamps are removed.
+    """
+    if not os.path.exists(ud.donestamp):
+        return False
+
+    if not ud.method.supports_checksum(ud):
+        # done stamp exists, checksums not supported; assume the local file is
+        # current
+        return True
+
+    if not os.path.exists(ud.localpath):
+        # done stamp exists, but the downloaded file does not; the done stamp
+        # must be incorrect, re-trigger the download
+        bb.utils.remove(ud.donestamp)
+        return False
+
+    precomputed_checksums = {}
+    try:
+        with open(ud.donestamp, "rb") as cachefile:
+            pickled = pickle.Unpickler(cachefile)
+            precomputed_checksums.update(pickled.load())
+    except Exception as e:
+        # Avoid the warnings on the upgrade path from emtpy done stamp files to
+        # those containing the checksums.
+        if not isinstance(e, EOFError):
+            # Ignore errors, they aren't fatal
+            logger.warn("Couldn't load checksums from donestamp %s: %s "
+                        "(msg: %s)" % (ud.donestamp, type(e).__name__, str(e)))
+
+    try:
+        checksums = verify_checksum(ud, d, precomputed_checksums)
+        # If the cache file did not have the checksums, compute and store them
+        # as an upgrade path from the previous done stamp file format.
+        if checksums != precomputed_checksums:
+            with open(ud.donestamp, "wb") as cachefile:
+                p = pickle.Pickler(cachefile, pickle.HIGHEST_PROTOCOL)
+                p.dump(checksums)
+        return True
+    except ChecksumError as e:
+        # Checksums failed to verify, trigger re-download and remove the
+        # incorrect stamp file.
+        logger.warn("Checksum mismatch for local file %s\n"
+                    "Cleaning and trying again." % ud.localpath)
+        rename_bad_checksum(ud, e.checksum)
+        bb.utils.remove(ud.donestamp)
+    return False
+
 
 def update_stamp(ud, d):
     """
@@ -603,8 +685,11 @@ def update_stamp(ud, d):
             # Errors aren't fatal here
             pass
     else:
-        verify_checksum(ud, d)
-        open(ud.donestamp, 'w').close()
+        checksums = verify_checksum(ud, d)
+        # Store the checksums for later re-verification against the recipe
+        with open(ud.donestamp, "wb") as cachefile:
+            p = pickle.Pickler(cachefile, pickle.HIGHEST_PROTOCOL)
+            p.dump(checksums)
 
 def subprocess_setup():
     # Python installs a SIGPIPE handler by default. This is usually not what
@@ -805,7 +890,7 @@ def try_mirror_url(origud, ud, ld, check = False):
 
         os.chdir(ld.getVar("DL_DIR", True))
 
-        if not os.path.exists(ud.donestamp) or ud.method.need_update(ud, ld):
+        if not verify_donestamp(ud, ld) or ud.method.need_update(ud, ld):
             ud.method.download(ud, ld)
             if hasattr(ud.method,"build_mirror_data"):
                 ud.method.build_mirror_data(ud, ld)
@@ -821,12 +906,13 @@ def try_mirror_url(origud, ud, ld, check = False):
         dldir = ld.getVar("DL_DIR", True)
         if origud.mirrortarball and os.path.basename(ud.localpath) == os.path.basename(origud.mirrortarball) \
                 and os.path.basename(ud.localpath) != os.path.basename(origud.localpath):
+            # Create donestamp in old format to avoid triggering a re-download
             bb.utils.mkdirhier(os.path.dirname(ud.donestamp))
             open(ud.donestamp, 'w').close()
             dest = os.path.join(dldir, os.path.basename(ud.localpath))
             if not os.path.exists(dest):
                 os.symlink(ud.localpath, dest)
-            if not os.path.exists(origud.donestamp) or origud.method.need_update(origud, ld):
+            if not verify_donestamp(origud, ld) or origud.method.need_update(origud, ld):
                 origud.method.download(origud, ld)
                 if hasattr(origud.method,"build_mirror_data"):
                     origud.method.build_mirror_data(origud, ld)
@@ -1422,7 +1508,7 @@ class Fetch(object):
             try:
                 self.d.setVar("BB_NO_NETWORK", network)
  
-                if os.path.exists(ud.donestamp) and not m.need_update(ud, self.d):
+                if verify_donestamp(ud, self.d) and not m.need_update(ud, self.d):
                     localpath = ud.localpath
                 elif m.try_premirror(ud, self.d):
                     logger.debug(1, "Trying PREMIRRORS")
@@ -1435,7 +1521,7 @@ class Fetch(object):
                 os.chdir(self.d.getVar("DL_DIR", True))
 
                 firsterr = None
-                if not localpath and ((not os.path.exists(ud.donestamp)) or m.need_update(ud, self.d)):
+                if not localpath and ((not verify_donestamp(ud, self.d)) or m.need_update(ud, self.d)):
                     try:
                         logger.debug(1, "Trying Upstream")
                         m.download(ud, self.d)
-- 
2.1.0


-- 
Clemens Lang • Development Specialist
BMW Car IT GmbH • Lise-Meitner-Str. 14 • 89081 Ulm • http://bmw-carit.com
-------------------------------------------------------------------------
BMW Car IT GmbH
Geschäftsführer: Michael Würtenberger und Reinhard Stolle
Sitz und Registergericht: München HRB 134810
-------------------------------------------------------------------------


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] bitbake: fetch2: Revalidate checksums, YOCTO #5571
  2015-02-23 14:22 [PATCH] bitbake: fetch2: Revalidate checksums, YOCTO #5571 Clemens Lang
@ 2015-02-26 14:42 ` Clemens Lang
  2015-02-26 22:17   ` Alejandro Hernandez
  0 siblings, 1 reply; 12+ messages in thread
From: Clemens Lang @ 2015-02-26 14:42 UTC (permalink / raw)
  To: yocto

Hi,

On Mon, Feb 23, 2015 at 03:22:43PM +0100, Clemens Lang wrote:
> [YOCTO #5571] -- https://bugzilla.yoctoproject.org/show_bug.cgi?id=5571

Can somebody review this? Is there anything I need to do to get this
considered for merging?

-- 
Clemens Lang • Development Specialist
BMW Car IT GmbH • Lise-Meitner-Str. 14 • 89081 Ulm • http://bmw-carit.com
-------------------------------------------------------------------------
BMW Car IT GmbH
Geschäftsführer: Michael Würtenberger und Reinhard Stolle
Sitz und Registergericht: München HRB 134810
-------------------------------------------------------------------------


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] bitbake: fetch2: Revalidate checksums, YOCTO #5571
  2015-02-26 14:42 ` Clemens Lang
@ 2015-02-26 22:17   ` Alejandro Hernandez
  2015-02-27 12:02     ` Clemens Lang
  0 siblings, 1 reply; 12+ messages in thread
From: Alejandro Hernandez @ 2015-02-26 22:17 UTC (permalink / raw)
  To: yocto

Hi Clemens,

On 23/02/15 08:22, Clemens Lang wrote:

This is the problem I see, I am not sure we want to import extra 
libraries here, as you state the use of pickle might be ok since it 
avoids the overhead of calculating the checksums each time, but I don't 
know if it is actually worth it, since checksums are calculated fairly 
quick anyway, this is my personal opinion, you may still submit it for 
review.
>   
> +try:
> +    import cPickle as pickle
> +except ImportError:
> +    import pickle
> +    logger.info("Importing cPickle failed. "
> +                "Falling back to a very slow implementation.")
> +
>
On 26/02/15 08:42, Clemens Lang wrote:
> Hi,
>
> On Mon, Feb 23, 2015 at 03:22:43PM +0100, Clemens Lang wrote:
>> [YOCTO #5571] -- https://bugzilla.yoctoproject.org/show_bug.cgi?id=5571
> Can somebody review this? Is there anything I need to do to get this
> considered for merging?
And perhaps the reason it hasn't been reviewed is that I believe this 
patch should go to bitbake-devel@lists.openembedded.org instead.

I also had this WIP branch on contrib, its from a couple of months back 
its "dirty" but in case you'd like to look around.

http://git.yoctoproject.org/cgit/cgit.cgi/poky-contrib/commit/?h=hsalejandro/5571&id=2f59e26e534dc6b6d4c5bb8d78c574042e0fe7a7


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] bitbake: fetch2: Revalidate checksums, YOCTO #5571
  2015-02-26 22:17   ` Alejandro Hernandez
@ 2015-02-27 12:02     ` Clemens Lang
  0 siblings, 0 replies; 12+ messages in thread
From: Clemens Lang @ 2015-02-27 12:02 UTC (permalink / raw)
  To: Alejandro Hernandez; +Cc: yocto

Hi Alejandro,

On Thu, Feb 26, 2015 at 04:17:23PM -0600, Alejandro Hernandez wrote:
> This is the problem I see, I am not sure we want to import extra libraries
> here, as you state the use of pickle might be ok since it avoids the
> overhead of calculating the checksums each time, but I don't know if it is
> actually worth it, since checksums are calculated fairly quick anyway, this
> is my personal opinion, you may still submit it for review.

I don't think the use of cPickle is an issue here. It's already widely
used in other parts of the bitbake codebase. I agree one could just
re-compute the checkums every time, but that might take a while for
larger downloaded files, especially for SHA-256.

> And perhaps the reason it hasn't been reviewed is that I believe this
> patch should go to bitbake-devel@lists.openembedded.org instead.

I've re-sent the patch to bitbake-devel, thanks for the pointer.

> I also had this WIP branch on contrib, its from a couple of months
> back its "dirty" but in case you'd like to look around.
> 
> http://git.yoctoproject.org/cgit/cgit.cgi/poky-contrib/commit/?h=hsalejandro/5571&id=2f59e26e534dc6b6d4c5bb8d78c574042e0fe7a7

Thanks, that was helpful in confirming that my approach isn't completely
unreasonable.


Clemens
-- 
Clemens Lang • Development Specialist
BMW Car IT GmbH • Lise-Meitner-Str. 14 • 89081 Ulm • http://bmw-carit.com
-------------------------------------------------------------------------
BMW Car IT GmbH
Geschäftsführer: Michael Würtenberger und Reinhard Stolle
Sitz und Registergericht: München HRB 134810
-------------------------------------------------------------------------


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] bitbake: fetch2: Revalidate checksums, YOCTO #5571
  2015-03-06 14:27           ` Clemens Lang
@ 2015-03-06 14:28             ` Clemens Lang
  0 siblings, 0 replies; 12+ messages in thread
From: Clemens Lang @ 2015-03-06 14:28 UTC (permalink / raw)
  To: Richard Purdie; +Cc: bitbake-devel

[YOCTO #5571] -- https://bugzilla.yoctoproject.org/show_bug.cgi?id=5571

The following workflow (whether accidentally or deliberately) would
previously not result in a checksum error, but would be helpful to do
so:
 - Write a recipe with correct checksums
 - Fetch the sources for this recipe using bitbake
 - Change the checksums
Since the bitbake fetcher writes a done stamp after the initial download
and does not verify the checksums again (even if they are changed in the
recipe afterwards), the change of checksums is silently ignored.

Fix this without the overhead of computing the checksums from scratch on
every do_fetch by storing them in pickled format in the done stamp and
verifying that they still match those in the recipe.

Signed-off-by: Clemens Lang <clemens.lang@bmw-carit.de>
---
 bitbake/lib/bb/fetch2/__init__.py | 112 ++++++++++++++++++++++++++++++++++----
 1 file changed, 102 insertions(+), 10 deletions(-)

diff --git a/bitbake/lib/bb/fetch2/__init__.py b/bitbake/lib/bb/fetch2/__init__.py
index 599ea8c..b004dae 100644
--- a/bitbake/lib/bb/fetch2/__init__.py
+++ b/bitbake/lib/bb/fetch2/__init__.py
@@ -45,6 +45,13 @@ _checksum_cache = bb.checksum.FileChecksumCache()
 
 logger = logging.getLogger("BitBake.Fetcher")
 
+try:
+    import cPickle as pickle
+except ImportError:
+    import pickle
+    logger.info("Importing cPickle failed. "
+                "Falling back to a very slow implementation.")
+
 class BBFetchException(Exception):
     """Class all fetch exceptions inherit from"""
     def __init__(self, message):
@@ -525,7 +532,7 @@ def fetcher_compare_revisions(d):
 def mirror_from_string(data):
     return [ i.split() for i in (data or "").replace('\\n','\n').split('\n') if i ]
 
-def verify_checksum(ud, d):
+def verify_checksum(ud, d, precomputed={}):
     """
     verify the MD5 and SHA256 checksum for downloaded src
 
@@ -533,13 +540,28 @@ def verify_checksum(ud, d):
     the downloaded file, or if BB_STRICT_CHECKSUM is set and there are no
     checksums specified.
 
+    Returns a dict of checksums that can be stored in a done stamp file and
+    passed in as precomputed parameter in a later call to avoid re-computing
+    the checksums from the file. This allows verifying the checksums of the
+    file against those in the recipe each time, rather than only after
+    downloading. See https://bugzilla.yoctoproject.org/show_bug.cgi?id=5571.
     """
 
+    _MD5_KEY = "md5"
+    _SHA256_KEY = "sha256"
+
     if ud.ignore_checksums or not ud.method.supports_checksum(ud):
-        return
+        return {}
 
-    md5data = bb.utils.md5_file(ud.localpath)
-    sha256data = bb.utils.sha256_file(ud.localpath)
+    if _MD5_KEY in precomputed:
+        md5data = precomputed[_MD5_KEY]
+    else:
+        md5data = bb.utils.md5_file(ud.localpath)
+
+    if _SHA256_KEY in precomputed:
+        sha256data = precomputed[_SHA256_KEY]
+    else:
+        sha256data = bb.utils.sha256_file(ud.localpath)
 
     if ud.method.recommends_checksum(ud):
         # If strict checking enabled and neither sum defined, raise error
@@ -589,6 +611,72 @@ def verify_checksum(ud, d):
     if len(msg):
         raise ChecksumError('Checksum mismatch!%s' % msg, ud.url, md5data)
 
+    return {
+        _MD5_KEY: md5data,
+        _SHA256_KEY: sha256data
+    }
+
+
+def verify_donestamp(ud, d):
+    """
+    Check whether the done stamp file has the right checksums (if the fetch
+    method supports them). If it doesn't, delete the done stamp and force
+    a re-download.
+
+    Returns True, if the donestamp exists and is valid, False otherwise. When
+    returning False, any existing done stamps are removed.
+    """
+    if not os.path.exists(ud.donestamp):
+        return False
+
+    if not ud.method.supports_checksum(ud):
+        # done stamp exists, checksums not supported; assume the local file is
+        # current
+        return True
+
+    if not os.path.exists(ud.localpath):
+        # done stamp exists, but the downloaded file does not; the done stamp
+        # must be incorrect, re-trigger the download
+        bb.utils.remove(ud.donestamp)
+        return False
+
+    precomputed_checksums = {}
+    # Only re-use the precomputed checksums if the donestamp is newer than the
+    # file. Do not rely on the mtime of directories, though. If ud.localpath is
+    # a directory, there will probably not be any checksums anyway.
+    if (os.path.isdir(ud.localpath) or
+            os.path.getmtime(ud.localpath) < os.path.getmtime(ud.donestamp)):
+        try:
+            with open(ud.donestamp, "rb") as cachefile:
+                pickled = pickle.Unpickler(cachefile)
+                precomputed_checksums.update(pickled.load())
+        except Exception as e:
+            # Avoid the warnings on the upgrade path from emtpy done stamp
+            # files to those containing the checksums.
+            if not isinstance(e, EOFError):
+                # Ignore errors, they aren't fatal
+                logger.warn("Couldn't load checksums from donestamp %s: %s "
+                            "(msg: %s)" % (ud.donestamp, type(e).__name__,
+                                           str(e)))
+
+    try:
+        checksums = verify_checksum(ud, d, precomputed_checksums)
+        # If the cache file did not have the checksums, compute and store them
+        # as an upgrade path from the previous done stamp file format.
+        if checksums != precomputed_checksums:
+            with open(ud.donestamp, "wb") as cachefile:
+                p = pickle.Pickler(cachefile, pickle.HIGHEST_PROTOCOL)
+                p.dump(checksums)
+        return True
+    except ChecksumError as e:
+        # Checksums failed to verify, trigger re-download and remove the
+        # incorrect stamp file.
+        logger.warn("Checksum mismatch for local file %s\n"
+                    "Cleaning and trying again." % ud.localpath)
+        rename_bad_checksum(ud, e.checksum)
+        bb.utils.remove(ud.donestamp)
+    return False
+
 
 def update_stamp(ud, d):
     """
@@ -603,8 +691,11 @@ def update_stamp(ud, d):
             # Errors aren't fatal here
             pass
     else:
-        verify_checksum(ud, d)
-        open(ud.donestamp, 'w').close()
+        checksums = verify_checksum(ud, d)
+        # Store the checksums for later re-verification against the recipe
+        with open(ud.donestamp, "wb") as cachefile:
+            p = pickle.Pickler(cachefile, pickle.HIGHEST_PROTOCOL)
+            p.dump(checksums)
 
 def subprocess_setup():
     # Python installs a SIGPIPE handler by default. This is usually not what
@@ -805,7 +896,7 @@ def try_mirror_url(origud, ud, ld, check = False):
 
         os.chdir(ld.getVar("DL_DIR", True))
 
-        if not os.path.exists(ud.donestamp) or ud.method.need_update(ud, ld):
+        if not verify_donestamp(ud, ld) or ud.method.need_update(ud, ld):
             ud.method.download(ud, ld)
             if hasattr(ud.method,"build_mirror_data"):
                 ud.method.build_mirror_data(ud, ld)
@@ -821,12 +912,13 @@ def try_mirror_url(origud, ud, ld, check = False):
         dldir = ld.getVar("DL_DIR", True)
         if origud.mirrortarball and os.path.basename(ud.localpath) == os.path.basename(origud.mirrortarball) \
                 and os.path.basename(ud.localpath) != os.path.basename(origud.localpath):
+            # Create donestamp in old format to avoid triggering a re-download
             bb.utils.mkdirhier(os.path.dirname(ud.donestamp))
             open(ud.donestamp, 'w').close()
             dest = os.path.join(dldir, os.path.basename(ud.localpath))
             if not os.path.exists(dest):
                 os.symlink(ud.localpath, dest)
-            if not os.path.exists(origud.donestamp) or origud.method.need_update(origud, ld):
+            if not verify_donestamp(origud, ld) or origud.method.need_update(origud, ld):
                 origud.method.download(origud, ld)
                 if hasattr(origud.method,"build_mirror_data"):
                     origud.method.build_mirror_data(origud, ld)
@@ -1422,7 +1514,7 @@ class Fetch(object):
             try:
                 self.d.setVar("BB_NO_NETWORK", network)
  
-                if os.path.exists(ud.donestamp) and not m.need_update(ud, self.d):
+                if verify_donestamp(ud, self.d) and not m.need_update(ud, self.d):
                     localpath = ud.localpath
                 elif m.try_premirror(ud, self.d):
                     logger.debug(1, "Trying PREMIRRORS")
@@ -1435,7 +1527,7 @@ class Fetch(object):
                 os.chdir(self.d.getVar("DL_DIR", True))
 
                 firsterr = None
-                if not localpath and ((not os.path.exists(ud.donestamp)) or m.need_update(ud, self.d)):
+                if not localpath and ((not verify_donestamp(ud, self.d)) or m.need_update(ud, self.d)):
                     try:
                         logger.debug(1, "Trying Upstream")
                         m.download(ud, self.d)
-- 
2.1.0

-- 
Clemens Lang • Development Specialist
BMW Car IT GmbH • Lise-Meitner-Str. 14 • 89081 Ulm • http://bmw-carit.com
-------------------------------------------------------------------------
BMW Car IT GmbH
Geschäftsführer: Michael Würtenberger und Reinhard Stolle
Sitz und Registergericht: München HRB 134810
-------------------------------------------------------------------------


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] bitbake: fetch2: Revalidate checksums, YOCTO #5571
  2015-03-06 13:42         ` Richard Purdie
@ 2015-03-06 14:27           ` Clemens Lang
  2015-03-06 14:28             ` Clemens Lang
  0 siblings, 1 reply; 12+ messages in thread
From: Clemens Lang @ 2015-03-06 14:27 UTC (permalink / raw)
  To: Richard Purdie; +Cc: bitbake-devel

Hi,

On Fri, Mar 06, 2015 at 01:42:45PM +0000, Richard Purdie wrote:
> I am nervous about the amount and kind of code changes this is
> involving. Having "binary" data format files in pickle format is
> suboptimal in that the user can't easily inspect or change them and
> its not clear what the contents means.

That's correct, but I don't see it as a large problem because any issues
with the file format can be resolved by just deleting the donestamp (or
with the next changeset, touching the downloaded file).


> I was wondering about whether we should just drop to one checksum
> format and simplify the problem somewhat. I understand the reasons for
> supporting multiple checksum types though and if we add in a
> requirement to track timestamps too, the single format doesn't buy us
> anything.

I think the advantages of having multiple checksums when one of the
algorithms is no longer considered secure (as is already the case with
md5) outweighs the increased complexity. I'd rather modify the code in a
way that no longer hardcodes the supported checksum formats, so moving
to a more modern checksum would be as simple as updating documentation
and recipes.

New patch incoming after this mail.

-- 
Clemens Lang • Development Specialist
BMW Car IT GmbH • Lise-Meitner-Str. 14 • 89081 Ulm • http://bmw-carit.com
-------------------------------------------------------------------------
BMW Car IT GmbH
Geschäftsführer: Michael Würtenberger und Reinhard Stolle
Sitz und Registergericht: München HRB 134810
-------------------------------------------------------------------------


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] bitbake: fetch2: Revalidate checksums, YOCTO #5571
  2015-03-06 13:29       ` Martin Jansa
@ 2015-03-06 13:42         ` Richard Purdie
  2015-03-06 14:27           ` Clemens Lang
  0 siblings, 1 reply; 12+ messages in thread
From: Richard Purdie @ 2015-03-06 13:42 UTC (permalink / raw)
  To: Martin Jansa; +Cc: bitbake-devel

On Fri, 2015-03-06 at 14:29 +0100, Martin Jansa wrote:
> On Fri, Mar 06, 2015 at 02:03:20PM +0100, Clemens Lang wrote:
> > Hi Martin,
> > 
> > On Fri, Mar 06, 2015 at 01:22:38PM +0100, Martin Jansa wrote:
> > > Using pickle is much better than counting the checksums on every
> > > do_fetch.. we have GBs of sources and we don't want to spend extra
> > > minutes on the build to re-check them if they are identical.
> > 
> > That's what I thought as well. The pickle method should scale a lot
> > better than re-running the hash calculation, especially in terms of I/O
> > ops.
> > 
> > > Maybe add the file time stamp as well, if the file was modified then
> > > .done could be invalidated and checksums re-verified.
> > 
> > I could modify the code to avoid reading the precomputed checksums if
> > the .done file is older than the downloaded file, which should cover
> > this case. Of course, the modification date will not be of much use if
> > the download is not a file, but a directory.
> > 
> > Does this sound good to you?
> 
> yes :)

Firstly, the delay in getting to reviewing this is mainly that we're
having issues in OE-Core which are causing me a lot of headaches so
sorry about that. We've long held the belief we needed to improve this
mechanism.

I am nervous about the amount and kind of code changes this is
involving. Having "binary" data format files in pickle format is
suboptimal in that the user can't easily inspect or change them and its
not clear what the contents means. Using pickle doesn't give a
dependency issue since we use it elsewhere in bitbake as you mentioned.

I was wondering about whether we should just drop to one checksum format
and simplify the problem somewhat. I understand the reasons for
supporting multiple checksum types though and if we add in a requirement
to track timestamps too, the single format doesn't buy us anything.

I guess above all, I've just been trying to push consolidation in the
fetcher, it has at various points been an unmaintainable nightmare. This
is a worthwhile improvement though.

Cheers,

Richard





^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] bitbake: fetch2: Revalidate checksums, YOCTO #5571
  2015-03-06 13:03     ` Clemens Lang
@ 2015-03-06 13:29       ` Martin Jansa
  2015-03-06 13:42         ` Richard Purdie
  0 siblings, 1 reply; 12+ messages in thread
From: Martin Jansa @ 2015-03-06 13:29 UTC (permalink / raw)
  To: Clemens Lang; +Cc: bitbake-devel

[-- Attachment #1: Type: text/plain, Size: 1029 bytes --]

On Fri, Mar 06, 2015 at 02:03:20PM +0100, Clemens Lang wrote:
> Hi Martin,
> 
> On Fri, Mar 06, 2015 at 01:22:38PM +0100, Martin Jansa wrote:
> > Using pickle is much better than counting the checksums on every
> > do_fetch.. we have GBs of sources and we don't want to spend extra
> > minutes on the build to re-check them if they are identical.
> 
> That's what I thought as well. The pickle method should scale a lot
> better than re-running the hash calculation, especially in terms of I/O
> ops.
> 
> > Maybe add the file time stamp as well, if the file was modified then
> > .done could be invalidated and checksums re-verified.
> 
> I could modify the code to avoid reading the precomputed checksums if
> the .done file is older than the downloaded file, which should cover
> this case. Of course, the modification date will not be of much use if
> the download is not a file, but a directory.
> 
> Does this sound good to you?

yes :)

-- 
Martin 'JaMa' Jansa     jabber: Martin.Jansa@gmail.com

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 188 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] bitbake: fetch2: Revalidate checksums, YOCTO #5571
  2015-03-06 12:22   ` Martin Jansa
@ 2015-03-06 13:03     ` Clemens Lang
  2015-03-06 13:29       ` Martin Jansa
  0 siblings, 1 reply; 12+ messages in thread
From: Clemens Lang @ 2015-03-06 13:03 UTC (permalink / raw)
  To: Martin Jansa; +Cc: bitbake-devel

Hi Martin,

On Fri, Mar 06, 2015 at 01:22:38PM +0100, Martin Jansa wrote:
> Using pickle is much better than counting the checksums on every
> do_fetch.. we have GBs of sources and we don't want to spend extra
> minutes on the build to re-check them if they are identical.

That's what I thought as well. The pickle method should scale a lot
better than re-running the hash calculation, especially in terms of I/O
ops.

> Maybe add the file time stamp as well, if the file was modified then
> .done could be invalidated and checksums re-verified.

I could modify the code to avoid reading the precomputed checksums if
the .done file is older than the downloaded file, which should cover
this case. Of course, the modification date will not be of much use if
the download is not a file, but a directory.

Does this sound good to you?


Clemens
-- 
BMW Car IT GmbH
Clemens Lang
Spezialist Entwicklung
Lise-Meitner-Straße 14
89081 Ulm

Tel: +49-731-37804182
Mail: clemens.lang@bmw-carit.de
Web: http://www.bmw-carit.de
--------------------------------------------------------------------
BMW Car IT GmbH
Geschäftsführer: Michael Würtenberger und Reinhard Stolle
Sitz und Registergericht: München HRB 134810
--------------------------------------------------------------------


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] bitbake: fetch2: Revalidate checksums, YOCTO #5571
  2015-03-06 11:43 ` Clemens Lang
@ 2015-03-06 12:22   ` Martin Jansa
  2015-03-06 13:03     ` Clemens Lang
  0 siblings, 1 reply; 12+ messages in thread
From: Martin Jansa @ 2015-03-06 12:22 UTC (permalink / raw)
  To: Clemens Lang; +Cc: bitbake-devel

[-- Attachment #1: Type: text/plain, Size: 1001 bytes --]

On Fri, Mar 06, 2015 at 12:43:28PM +0100, Clemens Lang wrote:
> Hello,
> 
> On Fri, Feb 27, 2015 at 01:00:26PM +0100, Clemens Lang wrote:
> > Here's the patch, please review and consider applying:
> 
> Any feedback on this? Anything I can do to get this reviewed?

FWIW: I like this, we were hit by this issue when the archive is
modified in premirror and builders are happily using it, because there
is .done file next to it.

This will at least fix the case when the file is intentionally modified
and checksums are updated in recipe to prevent re-using stalled file
from some premirror.

Using pickle is much better than counting the checksums on every
do_fetch.. we have GBs of sources and we don't want to spend extra
minutes on the build to re-check them if they are identical. Maybe add
the file time stamp as well, if the file was modified then .done could
be invalidated and checksums re-verified.

Regards,
-- 
Martin 'JaMa' Jansa     jabber: Martin.Jansa@gmail.com

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 188 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] bitbake: fetch2: Revalidate checksums, YOCTO #5571
  2015-02-27 12:00 Clemens Lang
@ 2015-03-06 11:43 ` Clemens Lang
  2015-03-06 12:22   ` Martin Jansa
  0 siblings, 1 reply; 12+ messages in thread
From: Clemens Lang @ 2015-03-06 11:43 UTC (permalink / raw)
  To: bitbake-devel

Hello,

On Fri, Feb 27, 2015 at 01:00:26PM +0100, Clemens Lang wrote:
> Here's the patch, please review and consider applying:

Any feedback on this? Anything I can do to get this reviewed?


Thanks,
Clemens
-- 
Clemens Lang • Development Specialist
BMW Car IT GmbH • Lise-Meitner-Str. 14 • 89081 Ulm • http://bmw-carit.com
-------------------------------------------------------------------------
BMW Car IT GmbH
Geschäftsführer: Michael Würtenberger und Reinhard Stolle
Sitz und Registergericht: München HRB 134810
-------------------------------------------------------------------------


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH] bitbake: fetch2: Revalidate checksums, YOCTO #5571
@ 2015-02-27 12:00 Clemens Lang
  2015-03-06 11:43 ` Clemens Lang
  0 siblings, 1 reply; 12+ messages in thread
From: Clemens Lang @ 2015-02-27 12:00 UTC (permalink / raw)
  To: bitbake-devel

Hi,

I wrote a patch to implement checksum verification after the initial
download to catch the case of modified checksums in the recipe while the
cached download and done stamp are unmodified.

The change imports cPickle, but I don't think that's an issue, because
it is already used in other parts of bitbake. The use of cPickle avoids
recomputing the checksums each time do_fetch runs, which could take a
while for large downloads.

Here's the patch, please review and consider applying:

From 30a168607be0b66a6310b1104738b839bc119898 Mon Sep 17 00:00:00 2001
From: Clemens Lang <clemens.lang@bmw-carit.de>
Date: Fri, 6 Feb 2015 16:23:01 +0100
Subject: [PATCH] bitbake: fetch2: Revalidate checksums, YOCTO #5571

[YOCTO #5571] -- https://bugzilla.yoctoproject.org/show_bug.cgi?id=5571

The following workflow (whether accidentally or deliberately) would
previously not result in a checksum error, but would be helpful to do
so:
 - Write a recipe with correct checksums
 - Fetch the sources for this recipe using bitbake
 - Change the checksums
Since the bitbake fetcher writes a done stamp after the initial download
and does not verify the checksums again (even if they are changed in the
recipe afterwards), the change of checksums is silently ignored.

Fix this without the overhead of computing the checksums from scratch on
every do_fetch by storing them in pickled format in the done stamp and
verifying that they still match those in the recipe.

Signed-off-by: Clemens Lang <clemens.lang@bmw-carit.de>
---
 bitbake/lib/bb/fetch2/__init__.py | 106 ++++++++++++++++++++++++++++++++++----
 1 file changed, 96 insertions(+), 10 deletions(-)

diff --git a/bitbake/lib/bb/fetch2/__init__.py b/bitbake/lib/bb/fetch2/__init__.py
index 599ea8c..c0c8312 100644
--- a/bitbake/lib/bb/fetch2/__init__.py
+++ b/bitbake/lib/bb/fetch2/__init__.py
@@ -45,6 +45,13 @@ _checksum_cache = bb.checksum.FileChecksumCache()
 
 logger = logging.getLogger("BitBake.Fetcher")
 
+try:
+    import cPickle as pickle
+except ImportError:
+    import pickle
+    logger.info("Importing cPickle failed. "
+                "Falling back to a very slow implementation.")
+
 class BBFetchException(Exception):
     """Class all fetch exceptions inherit from"""
     def __init__(self, message):
@@ -525,7 +532,7 @@ def fetcher_compare_revisions(d):
 def mirror_from_string(data):
     return [ i.split() for i in (data or "").replace('\\n','\n').split('\n') if i ]
 
-def verify_checksum(ud, d):
+def verify_checksum(ud, d, precomputed={}):
     """
     verify the MD5 and SHA256 checksum for downloaded src
 
@@ -533,13 +540,28 @@ def verify_checksum(ud, d):
     the downloaded file, or if BB_STRICT_CHECKSUM is set and there are no
     checksums specified.
 
+    Returns a dict of checksums that can be stored in a done stamp file and
+    passed in as precomputed parameter in a later call to avoid re-computing
+    the checksums from the file. This allows verifying the checksums of the
+    file against those in the recipe each time, rather than only after
+    downloading. See https://bugzilla.yoctoproject.org/show_bug.cgi?id=5571.
     """
 
+    _MD5_KEY = "md5"
+    _SHA256_KEY = "sha256"
+
     if ud.ignore_checksums or not ud.method.supports_checksum(ud):
-        return
+        return {}
 
-    md5data = bb.utils.md5_file(ud.localpath)
-    sha256data = bb.utils.sha256_file(ud.localpath)
+    if _MD5_KEY in precomputed:
+        md5data = precomputed[_MD5_KEY]
+    else:
+        md5data = bb.utils.md5_file(ud.localpath)
+
+    if _SHA256_KEY in precomputed:
+        sha256data = precomputed[_SHA256_KEY]
+    else:
+        sha256data = bb.utils.sha256_file(ud.localpath)
 
     if ud.method.recommends_checksum(ud):
         # If strict checking enabled and neither sum defined, raise error
@@ -589,6 +611,66 @@ def verify_checksum(ud, d):
     if len(msg):
         raise ChecksumError('Checksum mismatch!%s' % msg, ud.url, md5data)
 
+    return {
+        _MD5_KEY: md5data,
+        _SHA256_KEY: sha256data
+    }
+
+
+def verify_donestamp(ud, d):
+    """
+    Check whether the done stamp file has the right checksums (if the fetch
+    method supports them). If it doesn't, delete the done stamp and force
+    a re-download.
+
+    Returns True, if the donestamp exists and is valid, False otherwise. When
+    returning False, any existing done stamps are removed.
+    """
+    if not os.path.exists(ud.donestamp):
+        return False
+
+    if not ud.method.supports_checksum(ud):
+        # done stamp exists, checksums not supported; assume the local file is
+        # current
+        return True
+
+    if not os.path.exists(ud.localpath):
+        # done stamp exists, but the downloaded file does not; the done stamp
+        # must be incorrect, re-trigger the download
+        bb.utils.remove(ud.donestamp)
+        return False
+
+    precomputed_checksums = {}
+    try:
+        with open(ud.donestamp, "rb") as cachefile:
+            pickled = pickle.Unpickler(cachefile)
+            precomputed_checksums.update(pickled.load())
+    except Exception as e:
+        # Avoid the warnings on the upgrade path from emtpy done stamp files to
+        # those containing the checksums.
+        if not isinstance(e, EOFError):
+            # Ignore errors, they aren't fatal
+            logger.warn("Couldn't load checksums from donestamp %s: %s "
+                        "(msg: %s)" % (ud.donestamp, type(e).__name__, str(e)))
+
+    try:
+        checksums = verify_checksum(ud, d, precomputed_checksums)
+        # If the cache file did not have the checksums, compute and store them
+        # as an upgrade path from the previous done stamp file format.
+        if checksums != precomputed_checksums:
+            with open(ud.donestamp, "wb") as cachefile:
+                p = pickle.Pickler(cachefile, pickle.HIGHEST_PROTOCOL)
+                p.dump(checksums)
+        return True
+    except ChecksumError as e:
+        # Checksums failed to verify, trigger re-download and remove the
+        # incorrect stamp file.
+        logger.warn("Checksum mismatch for local file %s\n"
+                    "Cleaning and trying again." % ud.localpath)
+        rename_bad_checksum(ud, e.checksum)
+        bb.utils.remove(ud.donestamp)
+    return False
+
 
 def update_stamp(ud, d):
     """
@@ -603,8 +685,11 @@ def update_stamp(ud, d):
             # Errors aren't fatal here
             pass
     else:
-        verify_checksum(ud, d)
-        open(ud.donestamp, 'w').close()
+        checksums = verify_checksum(ud, d)
+        # Store the checksums for later re-verification against the recipe
+        with open(ud.donestamp, "wb") as cachefile:
+            p = pickle.Pickler(cachefile, pickle.HIGHEST_PROTOCOL)
+            p.dump(checksums)
 
 def subprocess_setup():
     # Python installs a SIGPIPE handler by default. This is usually not what
@@ -805,7 +890,7 @@ def try_mirror_url(origud, ud, ld, check = False):
 
         os.chdir(ld.getVar("DL_DIR", True))
 
-        if not os.path.exists(ud.donestamp) or ud.method.need_update(ud, ld):
+        if not verify_donestamp(ud, ld) or ud.method.need_update(ud, ld):
             ud.method.download(ud, ld)
             if hasattr(ud.method,"build_mirror_data"):
                 ud.method.build_mirror_data(ud, ld)
@@ -821,12 +906,13 @@ def try_mirror_url(origud, ud, ld, check = False):
         dldir = ld.getVar("DL_DIR", True)
         if origud.mirrortarball and os.path.basename(ud.localpath) == os.path.basename(origud.mirrortarball) \
                 and os.path.basename(ud.localpath) != os.path.basename(origud.localpath):
+            # Create donestamp in old format to avoid triggering a re-download
             bb.utils.mkdirhier(os.path.dirname(ud.donestamp))
             open(ud.donestamp, 'w').close()
             dest = os.path.join(dldir, os.path.basename(ud.localpath))
             if not os.path.exists(dest):
                 os.symlink(ud.localpath, dest)
-            if not os.path.exists(origud.donestamp) or origud.method.need_update(origud, ld):
+            if not verify_donestamp(origud, ld) or origud.method.need_update(origud, ld):
                 origud.method.download(origud, ld)
                 if hasattr(origud.method,"build_mirror_data"):
                     origud.method.build_mirror_data(origud, ld)
@@ -1422,7 +1508,7 @@ class Fetch(object):
             try:
                 self.d.setVar("BB_NO_NETWORK", network)
  
-                if os.path.exists(ud.donestamp) and not m.need_update(ud, self.d):
+                if verify_donestamp(ud, self.d) and not m.need_update(ud, self.d):
                     localpath = ud.localpath
                 elif m.try_premirror(ud, self.d):
                     logger.debug(1, "Trying PREMIRRORS")
@@ -1435,7 +1521,7 @@ class Fetch(object):
                 os.chdir(self.d.getVar("DL_DIR", True))
 
                 firsterr = None
-                if not localpath and ((not os.path.exists(ud.donestamp)) or m.need_update(ud, self.d)):
+                if not localpath and ((not verify_donestamp(ud, self.d)) or m.need_update(ud, self.d)):
                     try:
                         logger.debug(1, "Trying Upstream")
                         m.download(ud, self.d)
-- 
2.1.0

-- 
Clemens Lang • Development Specialist
BMW Car IT GmbH • Lise-Meitner-Str. 14 • 89081 Ulm • http://bmw-carit.com
-------------------------------------------------------------------------
BMW Car IT GmbH
Geschäftsführer: Michael Würtenberger und Reinhard Stolle
Sitz und Registergericht: München HRB 134810
-------------------------------------------------------------------------


^ permalink raw reply related	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2015-03-06 14:28 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-23 14:22 [PATCH] bitbake: fetch2: Revalidate checksums, YOCTO #5571 Clemens Lang
2015-02-26 14:42 ` Clemens Lang
2015-02-26 22:17   ` Alejandro Hernandez
2015-02-27 12:02     ` Clemens Lang
2015-02-27 12:00 Clemens Lang
2015-03-06 11:43 ` Clemens Lang
2015-03-06 12:22   ` Martin Jansa
2015-03-06 13:03     ` Clemens Lang
2015-03-06 13:29       ` Martin Jansa
2015-03-06 13:42         ` Richard Purdie
2015-03-06 14:27           ` Clemens Lang
2015-03-06 14:28             ` Clemens Lang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.