bitbake-devel.lists.openembedded.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] fetch2/s3: improve checking if object exists in s3 by using botocore
@ 2021-07-08 12:30 Damian Wrobel
  2021-07-11  9:47 ` [bitbake-devel] " Richard Purdie
  0 siblings, 1 reply; 3+ messages in thread
From: Damian Wrobel @ 2021-07-08 12:30 UTC (permalink / raw)
  To: bitbake-devel; +Cc: Adam Romanek

From: Adam Romanek <romanek.adam@gmail.com>

Botocore uses urllib3 which in turn uses a connection pool to issue HTTP
requests. Keeping the S3 client obtained from botocore library allows
one to reuse that connection pool.

The S3 client obtained from botocore is now kept in a connection
cache, which in case of sstate-cache checks is set up in
openembedded-core/classes/sstate.bbclass.

On Python 3.5+ a ResourceWarning on unclosed SSL socket is issued when
dropping the reference to botocore S3 client, which is a known issue but
apparently a false positive: https://github.com/boto/boto3/issues/454

Initial results show the time to check ~2700 objects is cut by 85%
(4 mins -> 35 secs).

With this change setting SSTATE_MIRRORS directly to S3 could now be the
preferred method for using sstate-cache from S3 (no need to sync
sstate-cache from S3 prior running the build), especially when the cache
grows significantly over time (e.g. from 4GB to 30GB during the day).

Signed-off-by: Adam Romanek <romanek.adam@gmail.com>
Signed-off-by: Damian Wrobel <dwrobel@ertelnet.rybnik.pl>
---
 lib/bb/fetch2/s3.py | 89 ++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 75 insertions(+), 14 deletions(-)

diff --git a/lib/bb/fetch2/s3.py b/lib/bb/fetch2/s3.py
index 6b8ffd53..bbc9f007 100644
--- a/lib/bb/fetch2/s3.py
+++ b/lib/bb/fetch2/s3.py
@@ -1,12 +1,14 @@
 """
 BitBake 'Fetch' implementation for Amazon AWS S3.
 
-Class for fetching files from Amazon S3 using the AWS Command Line Interface.
+Class for fetching files from Amazon S3 using the
+AWS Command Line Interface and python botocore library, if available.
 The aws tool must be correctly installed and configured prior to use.
 
 """
 
 # Copyright (C) 2017, Andre McCurdy <armccurdy@gmail.com>
+# Copyright (C) 2020, Adam Romanek <romanek.adam@gmail.com>
 #
 # Based in part on bb.fetch2.wget:
 #    Copyright (C) 2003, 2004  Chris Larson
@@ -21,8 +23,17 @@ import urllib.request, urllib.parse, urllib.error
 import re
 from bb.fetch2 import FetchMethod
 from bb.fetch2 import FetchError
+from bb.fetch2 import logger
 from bb.fetch2 import runfetchcmd
 
+try:
+    import botocore.exceptions
+    import botocore.session
+    HAS_BOTOCORE = True
+except ImportError:
+    HAS_BOTOCORE = False
+
+
 def convertToBytes(value, unit):
     value = float(value)
     if (unit == "KiB"):
@@ -60,7 +71,13 @@ class S3ProgressHandler(bb.progress.LineFilterProgressHandler):
 
 
 class S3(FetchMethod):
-    """Class to fetch urls via 'aws s3'"""
+    """Class to fetch urls from AWS S3"""
+
+    def init(self, d):
+        if HAS_BOTOCORE:
+            logger.info("S3 FetchMethod: using botocore implementation")
+        else:
+            logger.warning("S3 FetchMethod: consider installing botocore to speed up S3 access")
 
     def supports(self, ud, d):
         """
@@ -106,19 +123,63 @@ class S3(FetchMethod):
 
         return True
 
-    def checkstatus(self, fetch, ud, d):
-        """
-        Check the status of a URL
-        """
+    if HAS_BOTOCORE:
+        def checkstatus(self, fetch, ud, d):
+            try:
+                return self._check_object_exists(fetch, ud, d)
+            except Exception as error:
+                logger.warning('S3 FetchMethod checkstatus error: %s', error)
+                return False
+    else:
+        def checkstatus(self, fetch, ud, d):
+            """
+            Check the status of a URL
+            """
 
-        cmd = '%s ls s3://%s%s' % (ud.basecmd, ud.host, ud.path)
-        bb.fetch2.check_network_access(d, cmd, ud.url)
-        output = runfetchcmd(cmd, d)
+            cmd = '%s ls s3://%s%s' % (ud.basecmd, ud.host, ud.path)
+            bb.fetch2.check_network_access(d, cmd, ud.url)
+            output = runfetchcmd(cmd, d)
 
-        # "aws s3 ls s3://mybucket/foo" will exit with success even if the file
-        # is not found, so check output of the command to confirm success.
+            # "aws s3 ls s3://mybucket/foo" will exit with success even if the file
+            # is not found, so check output of the command to confirm success.
 
-        if not output:
-            raise FetchError("The aws ls command for s3://%s%s gave empty output" % (ud.host, ud.path))
+            if not output:
+                raise FetchError("The aws ls command for s3://%s%s gave empty output" % (ud.host, ud.path))
 
-        return True
+            return True
+
+    def _check_object_exists(self, fetch, ud, d):
+        """
+        Call HeadObject via botocore to check if an object exists in S3 bucket.
+        """
+        def create_s3_client():
+            return botocore.session.get_session().create_client('s3')
+
+        if fetch.connection_cache:
+            class FakeConnection(object):
+                def __init__(self, client):
+                    self.client = client
+
+                def close(self):
+                    self.client = None  # see https://github.com/boto/boto3/issues/454
+
+            dummy_host = ''
+            dummy_port = 0
+
+            connection = fetch.connection_cache.get_connection(dummy_host, dummy_port)
+            if connection:
+                client = connection.client
+            else:
+                client = create_s3_client()
+                fetch.connection_cache.add_connection(dummy_host, dummy_port, FakeConnection(client))
+        else:
+            client = create_s3_client()
+
+        try:
+            client.head_object(Bucket=ud.host, Key=ud.path[1:])
+        except botocore.exceptions.ClientError as error:
+            if error.response['Error']['Code'] not in ('NoSuchKey', '404'):  # see https://github.com/boto/boto3/issues/2442
+                logger.warning('S3 FetchMethod ClientError: %s', error)
+            return False
+        else:
+            return True
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [bitbake-devel] [PATCH] fetch2/s3: improve checking if object exists in s3 by using botocore
  2021-07-08 12:30 [PATCH] fetch2/s3: improve checking if object exists in s3 by using botocore Damian Wrobel
@ 2021-07-11  9:47 ` Richard Purdie
  2021-07-12  9:52   ` Damian Wrobel
  0 siblings, 1 reply; 3+ messages in thread
From: Richard Purdie @ 2021-07-11  9:47 UTC (permalink / raw)
  To: Damian Wrobel, bitbake-devel; +Cc: Adam Romanek

On Thu, 2021-07-08 at 14:30 +0200, Damian Wrobel wrote:
> From: Adam Romanek <romanek.adam@gmail.com>
> 
> Botocore uses urllib3 which in turn uses a connection pool to issue HTTP
> requests. Keeping the S3 client obtained from botocore library allows
> one to reuse that connection pool.
> 
> The S3 client obtained from botocore is now kept in a connection
> cache, which in case of sstate-cache checks is set up in
> openembedded-core/classes/sstate.bbclass.
> 
> On Python 3.5+ a ResourceWarning on unclosed SSL socket is issued when
> dropping the reference to botocore S3 client, which is a known issue but
> apparently a false positive: https://github.com/boto/boto3/issues/454
> 
> Initial results show the time to check ~2700 objects is cut by 85%
> (4 mins -> 35 secs).
> 
> With this change setting SSTATE_MIRRORS directly to S3 could now be the
> preferred method for using sstate-cache from S3 (no need to sync
> sstate-cache from S3 prior running the build), especially when the cache
> grows significantly over time (e.g. from 4GB to 30GB during the day).
> 
> Signed-off-by: Adam Romanek <romanek.adam@gmail.com>
> Signed-off-by: Damian Wrobel <dwrobel@ertelnet.rybnik.pl>

With this change all the builds on the autobuilder and my local builds 
start showing:

WARNING: S3 FetchMethod: consider installing botocore to speed up S3 access

which isn't really appropriate for users not using S3.

I'm also a little worried that the changes in this patch appear to effectively
totally change the fetcher type to effectively different code, I'm torn on that
and whether it is nearly a different fetcher type.

Cheers,

Richard


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [bitbake-devel] [PATCH] fetch2/s3: improve checking if object exists in s3 by using botocore
  2021-07-11  9:47 ` [bitbake-devel] " Richard Purdie
@ 2021-07-12  9:52   ` Damian Wrobel
  0 siblings, 0 replies; 3+ messages in thread
From: Damian Wrobel @ 2021-07-12  9:52 UTC (permalink / raw)
  To: Richard Purdie; +Cc: bitbake-devel, Adam Romanek




 ---- On Sun, 11 Jul 2021 11:47:58 +0200 Richard Purdie <richard.purdie@linuxfoundation.org> wrote ----
 > On Thu, 2021-07-08 at 14:30 +0200, Damian Wrobel wrote:
 > > From: Adam Romanek <romanek.adam@gmail.com>
 > > 
 > > Botocore uses urllib3 which in turn uses a connection pool to issue HTTP
 > > requests. Keeping the S3 client obtained from botocore library allows
 > > one to reuse that connection pool.
 > > 
 > > The S3 client obtained from botocore is now kept in a connection
 > > cache, which in case of sstate-cache checks is set up in
 > > openembedded-core/classes/sstate.bbclass.
 > > 
 > > On Python 3.5+ a ResourceWarning on unclosed SSL socket is issued when
 > > dropping the reference to botocore S3 client, which is a known issue but
 > > apparently a false positive: https://github.com/boto/boto3/issues/454
 > > 
 > > Initial results show the time to check ~2700 objects is cut by 85%
 > > (4 mins -> 35 secs).
 > > 
 > > With this change setting SSTATE_MIRRORS directly to S3 could now be the
 > > preferred method for using sstate-cache from S3 (no need to sync
 > > sstate-cache from S3 prior running the build), especially when the cache
 > > grows significantly over time (e.g. from 4GB to 30GB during the day).
 > > 
 > > Signed-off-by: Adam Romanek <romanek.adam@gmail.com>
 > > Signed-off-by: Damian Wrobel <dwrobel@ertelnet.rybnik.pl>
 > 
 > With this change all the builds on the autobuilder and my local builds 
 > start showing:
 > 
 > WARNING: S3 FetchMethod: consider installing botocore to speed up S3 access
 > 
 > which isn't really appropriate for users not using S3.

If you're not using s3:// scheme and you see this warning then perhaps we should
answer the question whether the s3 fetcher needs to be instantiated unconditionally?
If not (which would be the preferred solution), then it should be fixed/improved in the common
bitbake code.

FYI, we're using s3 extensively and our additional observation is that s3 fetcher is being
instantiated three times, which might suggest that something is suboptimal in the bitbake.

 > 
 > I'm also a little worried that the changes in this patch appear to effectively
 > totally change the fetcher type to effectively different code, I'm torn on that

Saying "totally" is a strong statement, while from the implementation point of view
it just implements a single function checkstatus() in a different (a little bit faster) way.

--
Regards,
Damian

 > and whether it is nearly a different fetcher type.
 > 
 > Cheers,
 > 
 > Richard
 > 
 > 

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-07-12  9:52 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-08 12:30 [PATCH] fetch2/s3: improve checking if object exists in s3 by using botocore Damian Wrobel
2021-07-11  9:47 ` [bitbake-devel] " Richard Purdie
2021-07-12  9:52   ` Damian Wrobel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).