All of lore.kernel.org
 help / color / mirror / Atom feed
* [master][PATCH] fetch2/wget: support releases from private github repositories
@ 2019-11-14 14:21 André Draszik
  2019-11-21 20:27 ` André Draszik
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: André Draszik @ 2019-11-14 14:21 UTC (permalink / raw)
  To: bitbake-devel

The wget / http fetcher currently doesn't support fetching
assets attached to releases on private GitHub repositories,
i.e. release artefacts like
    https://github.com/<user>/<project>/releases/download/v1.0.0/asset1.txt

Those are special, in that HTTP basic auth is not used / possible
on the URL as seen in the GitHub UI, but instead the GitHub API
must be used for downloading (which does support HTTP basic auth)
where the URL will be different.

To be able to access the GitHub API, opportunistic authentication
(auth-no-challenge) needs to be enabled. Then the API needs to
be queried for the real URL of the file to be downloaded, and
finally application/octet-stream must be specified explicitly.

Note that there is a slight difference in the location of the
REST API endpoints between GitHub.com and GitHub Enterprise.

    https://developer.github.com/v3/repos/releases/
    https://developer.github.com/enterprise/2.19/v3/enterprise-admin/

As it's impossible to determine if a repository is on GitHub
or not (considering GitHub Enterprise), and even more so if a
repository is private or not, a new flag is introduced that
should be set to "1" - "github_private_asset", e.g.

    SRC_URI = "https://github.com/<user>/<project>/releases/download/v1.0.0/asset1.txt;github_private_asset=1"

Some notes:
* --auth-no-challenge is added unconditionally because we know
  username / password will definitely be needed, and they are
  likely to be specified in ~/.netrc, rather than in the recipe
* the release information returned looks sth like:
[
    {
        ...
        "assets": [
            {
                ...
                "browser_download_url": "https://github.com/<user>/<project>/releases/download/v1.0.0/asset1.txt",
                "url": "https://api.github.com/repos/<user>/<project>/releases/assets/16146291",
                ...
            },
            ...
        ],
        ...
    },
    ...
]
  hence we need to pass -O to wget to explicitly download using
  the original name
* this has been tested with github.com and GitHub Enterprise on
  private repositories, with and without PREMIRRORS

Signed-off-by: André Draszik <git@andred.net>
---
 lib/bb/fetch2/wget.py | 90 ++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 85 insertions(+), 5 deletions(-)

diff --git a/lib/bb/fetch2/wget.py b/lib/bb/fetch2/wget.py
index 725586d2..90aa9b19 100644
--- a/lib/bb/fetch2/wget.py
+++ b/lib/bb/fetch2/wget.py
@@ -4,6 +4,12 @@ BitBake 'Fetch' implementations
 Classes for obtaining upstream sources for the
 BitBake build tools.
 
+Supported SRC_URI options are:
+
+- github_private_asset
+   Whether or not the URI is pointing to a release artefact
+   in a private GitHub repository. The default is no.
+
 """
 
 # Copyright (C) 2003, 2004  Chris Larson
@@ -23,11 +29,13 @@ import bb.progress
 import socket
 import http.client
 import urllib.request, urllib.parse, urllib.error
+import json
 from   bb.fetch2 import FetchMethod
 from   bb.fetch2 import FetchError
 from   bb.fetch2 import logger
 from   bb.fetch2 import runfetchcmd
 from   bb.fetch2 import FetchConnectionCache
+from   bb.fetch2 import uri_replace
 from   bb.utils import export_proxies
 from   bs4 import BeautifulSoup
 from   bs4 import SoupStrainer
@@ -78,6 +86,8 @@ class Wget(FetchMethod):
         if not ud.localfile:
             ud.localfile = d.expand(urllib.parse.unquote(ud.host + ud.path).replace("/", "."))
 
+        ud.github_private = ud.parm.get("github_private_asset","0") == "1"
+
         self.basecmd = d.getVar("FETCHCMD_wget") or "/usr/bin/env wget -t 2 -T 30 --passive-ftp --no-check-certificate"
 
     def _runwget(self, ud, d, command, quiet, workdir=None):
@@ -93,15 +103,85 @@ class Wget(FetchMethod):
 
         fetchcmd = self.basecmd
 
-        if 'downloadfilename' in ud.parm:
+        uri = ud.url.split(";")[0]
+        gh_asset_uri = None
+
+        if (ud.user and ud.pswd) or ud.github_private:
+            fetchcmd += " --auth-no-challenge"
+            if ud.user and ud.pswd:
+                fetchcmd += " --user=%s --password=%s" % (ud.user, ud.pswd)
+
+        if ud.github_private:
+            # Github private repositories support basic-auth via the API
+            # endpoints only. Using those, the download URL will be
+            # different, and we need to download using application/octet-stream.
+            # The API endpoint mapping is different for github.com and
+            # GitHub Enterprise:
+            #     github.com -> api.github.com
+            #     github.example.com -> github.example.com/api/v3/
+            # The Accept header is used in any case to fix the API version
+            #
+            # To get the download URL when using the API, all the releases
+            # are listed via
+            #     https://api.github.com/<user>/<project>/releases
+            # which returns a JSON message describing all releases and all
+            # their attached artefacts. We can easily search that for
+            # the artefact that we're trying to download, and use
+            # the replacement URL from that response.
+            gh_relcmd = fetchcmd + " --header='Accept: application/vnd.github.v3+json'"
+            api_replacements = ['https?$://github.com/.* TYPE://api.github.com/repos/REPORELEASES',
+                                'https?$://.*/.* TYPE://HOST/api/v3/repos/REPORELEASES']
+            replacements = {}
+            replacements["TYPE"] = ud.type
+            replacements["HOST"] = ud.host
+            # github release artifacts are of the form
+            #     https://github.com/<user>/<project>/releases/download/v1.0.0/asset1.txt
+            # drop everything after .../releases and point to api.github.com
+            replacements["REPORELEASES"] = ud.path.rsplit('/', maxsplit=3)[0]
+            for api_replacement in api_replacements:
+                (find, replace) = api_replacement.split()
+                rel_api_uri = uri_replace(ud, find, replace, replacements, d)
+                if rel_api_uri == None:
+                    continue
+                # uri_replace() keeps the params, and the actual filename.
+                # drop both - we only want
+                #     https://api.github.com/<user>/<project>/releases
+                # from the example above
+                rel_api_uri = rel_api_uri.split(';')[0].rsplit('/', maxsplit=1)[0]
+                with tempfile.TemporaryDirectory(prefix="wget-github-release-") as workdir, \
+                        tempfile.NamedTemporaryFile(mode="w+", dir=workdir, prefix="wget-release-") as f:
+                    gh_relcmd += " -O " + f.name + " '" + rel_api_uri + "'"
+                    try:
+                        self._runwget(ud, d, gh_relcmd, True)
+                    except FetchError as e:
+                        # Accessing a (PRE)MIRROR using the github API
+                        # obviously doesn't work, just ignore
+                        continue
+                    if os.path.getsize(f.name) == 0:
+                        # the fetch resulted in a zero size file, ignore
+                        continue
+                    releases = json.load(f)
+                    # As per https://developer.github.com/v3/repos/releases/#list-releases-for-a-repository
+                    # Each release will have a list of assets, where the 'browser_download_url'
+                    # is what we intended to download, but we need to get it via the 'url',
+                    # which points to the github api and supports username/password
+                    for release in releases:
+                        for asset in release['assets']:
+                            if asset['browser_download_url'] == uri:
+                                gh_asset_uri = asset['url']
+                                break
+                        if gh_asset_uri:
+                            break
+                if gh_asset_uri:
+                    uri = gh_asset_uri
+                    fetchcmd += " --header='Accept: application/octet-stream'"
+                    break
+
+        if 'downloadfilename' in ud.parm or gh_asset_uri:
             dldir = d.getVar("DL_DIR")
             bb.utils.mkdirhier(os.path.dirname(dldir + os.sep + ud.localfile))
             fetchcmd += " -O " + dldir + os.sep + ud.localfile
 
-        if ud.user and ud.pswd:
-            fetchcmd += " --user=%s --password=%s --auth-no-challenge" % (ud.user, ud.pswd)
-
-        uri = ud.url.split(";")[0]
         if os.path.exists(ud.localpath):
             # file exists, but we didnt complete it.. trying again..
             fetchcmd += d.expand(" -c -P ${DL_DIR} '%s'" % uri)
-- 
2.23.0.rc1



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [master][PATCH] fetch2/wget: support releases from private github repositories
  2019-11-14 14:21 [master][PATCH] fetch2/wget: support releases from private github repositories André Draszik
@ 2019-11-21 20:27 ` André Draszik
  2019-11-22 17:01 ` Ross Burton
  2019-12-20 10:08 ` [master][PATCH v2] fetch2/githubprivate: new fetcher for " André Draszik
  2 siblings, 0 replies; 8+ messages in thread
From: André Draszik @ 2019-11-21 20:27 UTC (permalink / raw)
  To: bitbake-devel

On Thu, 2019-11-14 at 14:21 +0000, André Draszik wrote:
> The wget / http fetcher currently doesn't support fetching
> assets attached to releases on private GitHub repositories,

Any thoughts or comments on this patch?

Cheers,
Andre'

> i.e. release artefacts like
>     https://github.com/<user>/<project>/releases/download/v1.0.0/asset1.txt
> 
> Those are special, in that HTTP basic auth is not used / possible
> on the URL as seen in the GitHub UI, but instead the GitHub API
> must be used for downloading (which does support HTTP basic auth)
> where the URL will be different.
> 
> To be able to access the GitHub API, opportunistic authentication
> (auth-no-challenge) needs to be enabled. Then the API needs to
> be queried for the real URL of the file to be downloaded, and
> finally application/octet-stream must be specified explicitly.
> 
> Note that there is a slight difference in the location of the
> REST API endpoints between GitHub.com and GitHub Enterprise.
> 
>     https://developer.github.com/v3/repos/releases/
>     https://developer.github.com/enterprise/2.19/v3/enterprise-admin/
> 
> As it's impossible to determine if a repository is on GitHub
> or not (considering GitHub Enterprise), and even more so if a
> repository is private or not, a new flag is introduced that
> should be set to "1" - "github_private_asset", e.g.
> 
>     SRC_URI = "https://github.com/<user>/<project>/releases/download/v1.0.0/asset1.txt;github_private_asset=1"
> 
> Some notes:
> * --auth-no-challenge is added unconditionally because we know
>   username / password will definitely be needed, and they are
>   likely to be specified in ~/.netrc, rather than in the recipe
> * the release information returned looks sth like:
> [
>     {
>         ...
>         "assets": [
>             {
>                 ...
>                 "browser_download_url": "https://github.com/<user>/<project>/releases/download/v1.0.0/asset1.txt",
>                 "url": "https://api.github.com/repos/<user>/<project>/releases/assets/16146291",
>                 ...
>             },
>             ...
>         ],
>         ...
>     },
>     ...
> ]
>   hence we need to pass -O to wget to explicitly download using
>   the original name
> * this has been tested with github.com and GitHub Enterprise on
>   private repositories, with and without PREMIRRORS
> 
> Signed-off-by: André Draszik <git@andred.net>
> ---
>  lib/bb/fetch2/wget.py | 90 ++++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 85 insertions(+), 5 deletions(-)
> 
> diff --git a/lib/bb/fetch2/wget.py b/lib/bb/fetch2/wget.py
> index 725586d2..90aa9b19 100644
> --- a/lib/bb/fetch2/wget.py
> +++ b/lib/bb/fetch2/wget.py
> @@ -4,6 +4,12 @@ BitBake 'Fetch' implementations
>  Classes for obtaining upstream sources for the
>  BitBake build tools.
>  
> +Supported SRC_URI options are:
> +
> +- github_private_asset
> +   Whether or not the URI is pointing to a release artefact
> +   in a private GitHub repository. The default is no.
> +
>  """
>  
>  # Copyright (C) 2003, 2004  Chris Larson
> @@ -23,11 +29,13 @@ import bb.progress
>  import socket
>  import http.client
>  import urllib.request, urllib.parse, urllib.error
> +import json
>  from   bb.fetch2 import FetchMethod
>  from   bb.fetch2 import FetchError
>  from   bb.fetch2 import logger
>  from   bb.fetch2 import runfetchcmd
>  from   bb.fetch2 import FetchConnectionCache
> +from   bb.fetch2 import uri_replace
>  from   bb.utils import export_proxies
>  from   bs4 import BeautifulSoup
>  from   bs4 import SoupStrainer
> @@ -78,6 +86,8 @@ class Wget(FetchMethod):
>          if not ud.localfile:
>              ud.localfile = d.expand(urllib.parse.unquote(ud.host + ud.path).replace("/", "."))
>  
> +        ud.github_private = ud.parm.get("github_private_asset","0") == "1"
> +
>          self.basecmd = d.getVar("FETCHCMD_wget") or "/usr/bin/env wget -t 2 -T 30 --passive-ftp --no-check-
> certificate"
>  
>      def _runwget(self, ud, d, command, quiet, workdir=None):
> @@ -93,15 +103,85 @@ class Wget(FetchMethod):
>  
>          fetchcmd = self.basecmd
>  
> -        if 'downloadfilename' in ud.parm:
> +        uri = ud.url.split(";")[0]
> +        gh_asset_uri = None
> +
> +        if (ud.user and ud.pswd) or ud.github_private:
> +            fetchcmd += " --auth-no-challenge"
> +            if ud.user and ud.pswd:
> +                fetchcmd += " --user=%s --password=%s" % (ud.user, ud.pswd)
> +
> +        if ud.github_private:
> +            # Github private repositories support basic-auth via the API
> +            # endpoints only. Using those, the download URL will be
> +            # different, and we need to download using application/octet-stream.
> +            # The API endpoint mapping is different for github.com and
> +            # GitHub Enterprise:
> +            #     github.com -> api.github.com
> +            #     github.example.com -> github.example.com/api/v3/
> +            # The Accept header is used in any case to fix the API version
> +            #
> +            # To get the download URL when using the API, all the releases
> +            # are listed via
> +            #     https://api.github.com/<user>/<project>/releases
> +            # which returns a JSON message describing all releases and all
> +            # their attached artefacts. We can easily search that for
> +            # the artefact that we're trying to download, and use
> +            # the replacement URL from that response.
> +            gh_relcmd = fetchcmd + " --header='Accept: application/vnd.github.v3+json'"
> +            api_replacements = ['https?$://github.com/.* TYPE://api.github.com/repos/REPORELEASES',
> +                                'https?$://.*/.* TYPE://HOST/api/v3/repos/REPORELEASES']
> +            replacements = {}
> +            replacements["TYPE"] = ud.type
> +            replacements["HOST"] = ud.host
> +            # github release artifacts are of the form
> +            #     https://github.com/<user>/<project>/releases/download/v1.0.0/asset1.txt
> +            # drop everything after .../releases and point to api.github.com
> +            replacements["REPORELEASES"] = ud.path.rsplit('/', maxsplit=3)[0]
> +            for api_replacement in api_replacements:
> +                (find, replace) = api_replacement.split()
> +                rel_api_uri = uri_replace(ud, find, replace, replacements, d)
> +                if rel_api_uri == None:
> +                    continue
> +                # uri_replace() keeps the params, and the actual filename.
> +                # drop both - we only want
> +                #     https://api.github.com/<user>/<project>/releases
> +                # from the example above
> +                rel_api_uri = rel_api_uri.split(';')[0].rsplit('/', maxsplit=1)[0]
> +                with tempfile.TemporaryDirectory(prefix="wget-github-release-") as workdir, \
> +                        tempfile.NamedTemporaryFile(mode="w+", dir=workdir, prefix="wget-release-") as f:
> +                    gh_relcmd += " -O " + f.name + " '" + rel_api_uri + "'"
> +                    try:
> +                        self._runwget(ud, d, gh_relcmd, True)
> +                    except FetchError as e:
> +                        # Accessing a (PRE)MIRROR using the github API
> +                        # obviously doesn't work, just ignore
> +                        continue
> +                    if os.path.getsize(f.name) == 0:
> +                        # the fetch resulted in a zero size file, ignore
> +                        continue
> +                    releases = json.load(f)
> +                    # As per https://developer.github.com/v3/repos/releases/#list-releases-for-a-repository
> +                    # Each release will have a list of assets, where the 'browser_download_url'
> +                    # is what we intended to download, but we need to get it via the 'url',
> +                    # which points to the github api and supports username/password
> +                    for release in releases:
> +                        for asset in release['assets']:
> +                            if asset['browser_download_url'] == uri:
> +                                gh_asset_uri = asset['url']
> +                                break
> +                        if gh_asset_uri:
> +                            break
> +                if gh_asset_uri:
> +                    uri = gh_asset_uri
> +                    fetchcmd += " --header='Accept: application/octet-stream'"
> +                    break
> +
> +        if 'downloadfilename' in ud.parm or gh_asset_uri:
>              dldir = d.getVar("DL_DIR")
>              bb.utils.mkdirhier(os.path.dirname(dldir + os.sep + ud.localfile))
>              fetchcmd += " -O " + dldir + os.sep + ud.localfile
>  
> -        if ud.user and ud.pswd:
> -            fetchcmd += " --user=%s --password=%s --auth-no-challenge" % (ud.user, ud.pswd)
> -
> -        uri = ud.url.split(";")[0]
>          if os.path.exists(ud.localpath):
>              # file exists, but we didnt complete it.. trying again..
>              fetchcmd += d.expand(" -c -P ${DL_DIR} '%s'" % uri)



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [master][PATCH] fetch2/wget: support releases from private github repositories
  2019-11-14 14:21 [master][PATCH] fetch2/wget: support releases from private github repositories André Draszik
  2019-11-21 20:27 ` André Draszik
@ 2019-11-22 17:01 ` Ross Burton
  2019-11-26  9:40   ` André Draszik
  2019-12-20 10:08 ` [master][PATCH v2] fetch2/githubprivate: new fetcher for " André Draszik
  2 siblings, 1 reply; 8+ messages in thread
From: Ross Burton @ 2019-11-22 17:01 UTC (permalink / raw)
  To: bitbake-devel

On 14/11/2019 14:21, André Draszik wrote:
> The wget / http fetcher currently doesn't support fetching
> assets attached to releases on private GitHub repositories,
> i.e. release artefacts like
>      https://github.com/<user>/<project>/releases/download/v1.0.0/asset1.txt
> 
> Those are special, in that HTTP basic auth is not used / possible
> on the URL as seen in the GitHub UI, but instead the GitHub API
> must be used for downloading (which does support HTTP basic auth)
> where the URL will be different.
> 
> To be able to access the GitHub API, opportunistic authentication
> (auth-no-challenge) needs to be enabled. Then the API needs to
> be queried for the real URL of the file to be downloaded, and
> finally application/octet-stream must be specified explicitly.
> 
> Note that there is a slight difference in the location of the
> REST API endpoints between GitHub.com and GitHub Enterprise.
> 
>      https://developer.github.com/v3/repos/releases/
>      https://developer.github.com/enterprise/2.19/v3/enterprise-admin/
> 
> As it's impossible to determine if a repository is on GitHub
> or not (considering GitHub Enterprise), and even more so if a
> repository is private or not, a new flag is introduced that
> should be set to "1" - "github_private_asset", e.g.

That's a lot of "special" and "unfortunately" and "but", adding a 
site-specific option and a non-trivial site-specific chunk of code to 
the otherwise generic HTTP fetcher.

I'd suggest implementing this as a new fetcher instead of shoe-horning 
it into the wget fetcher.  Then you can do something like githubasset:// 
in SRC_URI and the fetcher can do the right thing instead of jumping 
through hoops and guessing.

Ross


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [master][PATCH] fetch2/wget: support releases from private github repositories
  2019-11-22 17:01 ` Ross Burton
@ 2019-11-26  9:40   ` André Draszik
  0 siblings, 0 replies; 8+ messages in thread
From: André Draszik @ 2019-11-26  9:40 UTC (permalink / raw)
  To: Ross Burton, bitbake-devel

Hi Ross,

On Fri, 2019-11-22 at 17:01 +0000, Ross Burton wrote:
> On 14/11/2019 14:21, André Draszik wrote:
> > The wget / http fetcher currently doesn't support fetching
> > assets attached to releases on private GitHub repositories,
> > i.e. release artefacts like
> >      https://github.com/<user>/<project>/releases/download/v1.0.0/asset1.txt
> > 
> > Those are special, in that HTTP basic auth is not used / possible
> > on the URL as seen in the GitHub UI, but instead the GitHub API
> > must be used for downloading (which does support HTTP basic auth)
> > where the URL will be different.
> > 
> > To be able to access the GitHub API, opportunistic authentication
> > (auth-no-challenge) needs to be enabled. Then the API needs to
> > be queried for the real URL of the file to be downloaded, and
> > finally application/octet-stream must be specified explicitly.
> > 
> > Note that there is a slight difference in the location of the
> > REST API endpoints between GitHub.com and GitHub Enterprise.
> > 
> >      https://developer.github.com/v3/repos/releases/
> >      https://developer.github.com/enterprise/2.19/v3/enterprise-admin/
> > 
> > As it's impossible to determine if a repository is on GitHub
> > or not (considering GitHub Enterprise), and even more so if a
> > repository is private or not, a new flag is introduced that
> > should be set to "1" - "github_private_asset", e.g.
> 
> That's a lot of "special" and "unfortunately" and "but", adding a 
> site-specific option and a non-trivial site-specific chunk of code to 
> the otherwise generic HTTP fetcher.
> 
> I'd suggest implementing this as a new fetcher instead of shoe-horning 
> it into the wget fetcher.  Then you can do something like githubasset:// 
> in SRC_URI and the fetcher can do the right thing instead of jumping 
> through hoops and guessing.

Thanks for your comments. Yes, I was in two minds about it as well. I was also
contemplating a new bbclass for this instead of a new fetcher for github
private repositories.

I'll write a new fetcher and see how that will work out...


Thanks,
Andre'




^ permalink raw reply	[flat|nested] 8+ messages in thread

* [master][PATCH v2] fetch2/githubprivate: new fetcher for private github repositories
  2019-11-14 14:21 [master][PATCH] fetch2/wget: support releases from private github repositories André Draszik
  2019-11-21 20:27 ` André Draszik
  2019-11-22 17:01 ` Ross Burton
@ 2019-12-20 10:08 ` André Draszik
  2019-12-23 15:38   ` Ross Burton
  2020-01-14 15:09   ` André Draszik
  2 siblings, 2 replies; 8+ messages in thread
From: André Draszik @ 2019-12-20 10:08 UTC (permalink / raw)
  To: bitbake-devel

The wget / http fetcher doesn't support fetching assets
attached to releases on private GitHub repositories, i.e.
release artefacts like
    https://github.com/<user>/<project>/releases/download/v1.0.0/asset1.txt

Those are special, in that HTTP basic auth is not used / possible
on the URL as seen in the GitHub UI, but instead the GitHub API
must be used for downloading (which does support HTTP basic auth)
where the URL will be different.

Implement a new fetcher that:
    * uses the GitHub API to determine the asset URL
    * re-uses the existing wget fetcher to download this URL
      instead
    * supports checkstatus() (bitbake -c checkuri)
    * supports latest_versionstring() (devtool latest-version)
    * supports GitHub.com and GitHub Enterprise for the above

Implementation notes:
To be able to access the GitHub API, opportunistic authentication
(auth-no-challenge) needs to be enabled. Then the API needs to
be queried for the real URL of the file to be downloaded, and
finally application/octet-stream must be specified explicitly.

Note that there is a slight difference in the location of the
REST API endpoints between GitHub.com and GitHub Enterprise.

    https://developer.github.com/v3/repos/releases/
    https://developer.github.com/enterprise/2.19/v3/enterprise-admin/

Some notes:
* --auth-no-challenge is added unconditionally because we know
  username / password will definitely be needed, and they are
  likely specified in ~/.netrc, rather than in the recipe (but
  username / password via recipe is still supported)
* the release information returned looks sth like:
[
    {
        ...
        "name": <name of the release>
        "assets": [
            {
                ...
                "browser_download_url": "https://github.com/<user>/<project>/releases/download/v1.0.0/asset1.txt",
                "url": "https://api.github.com/repos/<user>/<project>/releases/assets/16146291",
                ...
            },
            ...
        ],
        ...
    },
    ...
]
  hence we need to pass -O to wget to explicitly download using
  the original name
* to determine the latest available version, we can simply query
  the API for the version (name) that the SRC_URI entry is
  attached to, and then figure out if there is a more recent
  version available, rather than doing lots of matches using
  regexes
* this has been tested with github.com and GitHub Enterprise on
  private repositories, with and without PREMIRRORS

Signed-off-by: André Draszik <git@andred.net>
---
 bitbake/lib/bb/fetch2/__init__.py      |   6 +-
 bitbake/lib/bb/fetch2/githubprivate.py | 174 +++++++++++++++++++++++++
 2 files changed, 178 insertions(+), 2 deletions(-)
 create mode 100644 bitbake/lib/bb/fetch2/githubprivate.py

diff --git a/bitbake/lib/bb/fetch2/__init__.py b/bitbake/lib/bb/fetch2/__init__.py
index 07de6c2693..5c533cf78e 100644
--- a/bitbake/lib/bb/fetch2/__init__.py
+++ b/bitbake/lib/bb/fetch2/__init__.py
@@ -1238,13 +1238,13 @@ class FetchData(object):
             self.sha256_name = "sha256sum"
         if self.md5_name in self.parm:
             self.md5_expected = self.parm[self.md5_name]
-        elif self.type not in ["http", "https", "ftp", "ftps", "sftp", "s3"]:
+        elif self.type not in ["http", "https", "ftp", "ftps", "githubprivate", "sftp", "s3"]:
             self.md5_expected = None
         else:
             self.md5_expected = d.getVarFlag("SRC_URI", self.md5_name)
         if self.sha256_name in self.parm:
             self.sha256_expected = self.parm[self.sha256_name]
-        elif self.type not in ["http", "https", "ftp", "ftps", "sftp", "s3"]:
+        elif self.type not in ["http", "https", "ftp", "ftps", "githubprivate", "sftp", "s3"]:
             self.sha256_expected = None
         else:
             self.sha256_expected = d.getVarFlag("SRC_URI", self.sha256_name)
@@ -1853,6 +1853,7 @@ from . import osc
 from . import repo
 from . import clearcase
 from . import npm
+from . import githubprivate
 
 methods.append(local.Local())
 methods.append(wget.Wget())
@@ -1871,3 +1872,4 @@ methods.append(osc.Osc())
 methods.append(repo.Repo())
 methods.append(clearcase.ClearCase())
 methods.append(npm.Npm())
+methods.append(githubprivate.Githubprivate())
diff --git a/bitbake/lib/bb/fetch2/githubprivate.py b/bitbake/lib/bb/fetch2/githubprivate.py
new file mode 100644
index 0000000000..5a007c4e69
--- /dev/null
+++ b/bitbake/lib/bb/fetch2/githubprivate.py
@@ -0,0 +1,174 @@
+#
+# SPDX-License-Identifier: GPL-2.0-only
+#
+"""
+Bitbake "Fetch" implementation for assets attached to private
+repositories on GitHub or GitHub Enterprise.
+"""
+
+import os
+import json
+import tempfile
+import bb
+from   bb.fetch2.wget import Wget
+from   bb.fetch2 import FetchError
+from   bb.fetch2 import logger
+from   bb.fetch2 import uri_replace
+
+class Githubprivate(Wget):
+    """Class to fetch an asset from a private repository on GitHub
+       (or GitHub Enterprise)."""
+
+    def supports(self, ud, d):
+        return ud.type in ['githubprivate']
+
+    def urldata_init(self, ud, d):
+        ud.proto = 'https'
+        if 'protocol' in ud.parm:
+            ud.proto = ud.parm['protocol']
+        if not ud.proto in ('http', 'https'):
+            raise bb.fetch2.ParameterError("Invalid protocol type", ud.url)
+
+        if not 'downloadfilename' in ud.parm:
+            # The asset filename determined using the GitHub API will
+            # not match the filename of the release artefact (as in
+            # SRC_URI). Hence we need to unconditionally instruct
+            # wget to download using -O. This can be achieved by
+            # unconditionally setting 'downloadfilename' here.
+            ud.parm['downloadfilename'] = os.path.basename(ud.path)
+        super(Githubprivate, self).urldata_init(ud, d)
+        # To be able to access the GitHub API, opportunistic authentication
+        # needs to be enabled. Also username / password will definitely be
+        # needed, and they are likely specified in ~/.netrc, rather than in
+        # the recipe itself.
+        self.basecmd += " --auth-no-challenge"
+
+    def _get_gh_releases_info(self, uri, ud, d):
+        fetchcmd = self.basecmd
+        if ud.user and ud.pswd:
+            fetchcmd += " --user=%s --password=%s" % (ud.user, ud.pswd)
+
+        # Github private repositories support basic-auth via the API
+        # endpoints only. Using those, the download URL will be
+        # different, and we need to download using application/octet-stream.
+        # The API endpoint mapping is different for github.com and
+        # GitHub Enterprise:
+        #     github.com -> api.github.com
+        #     github.example.com -> github.example.com/api/v3/
+        # The Accept header is used in any case to fix the API version to
+        # the supported level (version 3).
+        #
+        # To get the download URL when using the API, all the releases
+        # are listed via
+        #     https://api.github.com/<user>/<project>/releases
+        # which returns a JSON message describing all releases and all
+        # their attached artefacts. We can easily search that for
+        # the artefact that we're trying to download, and use
+        # the replacement URL from that response.
+        assetinfo_cmd = fetchcmd + " --header='Accept: application/vnd.github.v3+json'"
+        api_replacements = ['githubprivate://github.com/.* TYPE://api.github.com/repos/REPORELEASES',
+                            'githubprivate://.*/.* TYPE://HOST/api/v3/repos/REPORELEASES']
+        replacements = {}
+        replacements["TYPE"] = ud.proto
+        replacements["HOST"] = ud.host
+        # github release artifacts are of the form
+        #     https://github.com/<user>/<project>/releases/download/v1.0.0/asset1.txt
+        # drop everything after .../releases and point to api.github.com
+        replacements["REPORELEASES"] = ud.path.rsplit('/', maxsplit=3)[0]
+        for api_replacement in api_replacements:
+            (find, replace) = api_replacement.split()
+            rel_api_uri = uri_replace(ud, find, replace, replacements, d)
+            if rel_api_uri == None:
+                continue
+            # uri_replace() keeps the params, and the actual filename.
+            # drop both - we only want
+            #     https://api.github.com/<user>/<project>/releases
+            # from the example above
+            rel_api_uri = rel_api_uri.split(';')[0].rsplit('/', maxsplit=1)[0]
+            with tempfile.TemporaryDirectory(prefix="wget-github-release-") as workdir, \
+                    tempfile.NamedTemporaryFile(mode="w+", dir=workdir, prefix="wget-release-") as f:
+                assetinfo_cmd += " -O " + f.name + " '" + rel_api_uri + "'"
+                logger.debug(2, "For url %s trying to retrieve asset info from %s" % (uri, assetinfo_cmd))
+                try:
+                    self._runwget(ud, d, assetinfo_cmd, True)
+                except FetchError as e:
+                    # Accessing a (PRE)MIRROR using the github API
+                    # obviously doesn't work, just ignore
+                    continue
+                if os.path.getsize(f.name) == 0:
+                    # the fetch resulted in a zero size file, ignore
+                    logger.debug(2, "Could not retrieve asset info from %s" % rel_api_uri)
+                    continue
+                return json.load(f)
+
+        return []
+
+    def _get_gh_asset_uri(self, uri, ud, d):
+        uri = uri.replace("githubprivate://", ud.proto + "://", 1)
+        gh_asset_uri = None
+        releases = self._get_gh_releases_info(uri, ud, d)
+        # As per https://developer.github.com/v3/repos/releases/#list-releases-for-a-repository
+        # Each release will have a list of assets, where the 'browser_download_url'
+        # is what we intended to download, but we need to get it via the 'url',
+        # which points to the github api and supports username/password
+        for release in releases:
+            for asset in release['assets']:
+                logger.debug(2, "Comparing asset id %u URL %s" \
+                                % (asset['id'], asset['browser_download_url']))
+                if asset['browser_download_url'] == uri:
+                    gh_asset_uri = asset['url']
+                    logger.debug(2, "For URI %s using GitHub asset %s" % (uri, gh_asset_uri))
+                    break
+            if gh_asset_uri:
+                break
+
+        if not gh_asset_uri:
+            raise FetchError("Could not determine the GitHub asset URI for URI %s" % uri, uri)
+
+        return gh_asset_uri
+
+    def download(self, ud, d):
+        """Fetch urls"""
+        orig_uri = ud.url.split(";")[0]
+        gh_asset_uri = self._get_gh_asset_uri(orig_uri, ud, d)
+        ud.url = ud.url.replace(orig_uri, gh_asset_uri, 1)
+        # To be able to download the actual asset, we need to force
+        # the mime-type. Otherwise we'll get the asset info json.
+        self.basecmd += " --header='Accept: application/octet-stream'"
+        return super(Githubprivate, self).download(ud, d)
+
+    def latest_versionstring(self, ud, d):
+        """
+        Manipulate the URL and try to obtain the latest package version
+        using GitHub API.
+        """
+        # We first get the release (name) that corresponds to the URL ...
+        uri = ud.url.split(";")[0].replace("githubprivate://", ud.proto + "://", 1)
+        releases = self._get_gh_releases_info(uri, ud, d)
+        current_version = '0'
+        for release in releases:
+            bb.debug(3, "Getting current version info for URL %s" % uri)
+            for release in releases:
+                for asset in release['assets']:
+                    if asset['browser_download_url'] == uri:
+                        current_version = release['name']
+                        break
+                if current_version != '0':
+                    break
+            if current_version != '0':
+                bb.debug(3, "Current version info is %s" % current_version)
+
+        # ... and then try to find a newer release (name).
+        for release in releases:
+            this_version = ['', release['name'], '']
+            if self._vercmp(['', current_version, ''], this_version) < 0:
+                current_version = this_version[1]
+
+        return (current_version, '')
+
+    def checkstatus(self, fetch, urldata, d):
+        """Check if urls are accessible"""
+        orig_uri = urldata.url.split(";")[0]
+        gh_asset_uri = self._get_gh_asset_uri(orig_uri, urldata, d)
+        urldata.url = urldata.url.replace(orig_uri, gh_asset_uri, 1)
+        return super(Githubprivate, self).checkstatus(fetch, urldata, d)
-- 
2.23.0.rc1



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [master][PATCH v2] fetch2/githubprivate: new fetcher for private github repositories
  2019-12-20 10:08 ` [master][PATCH v2] fetch2/githubprivate: new fetcher for " André Draszik
@ 2019-12-23 15:38   ` Ross Burton
  2020-01-14 15:09   ` André Draszik
  1 sibling, 0 replies; 8+ messages in thread
From: Ross Burton @ 2019-12-23 15:38 UTC (permalink / raw)
  To: bitbake-devel

On 20/12/2019 10:08, André Draszik wrote:
> -        elif self.type not in ["http", "https", "ftp", "ftps", "sftp", "s3"]:
> +        elif self.type not in ["http", "https", "ftp", "ftps", "githubprivate", "sftp", "s3"]:

I hate these lines.  I wonder if the fetcher class itself can have a 
md5_expected boolean field that can be used instead.

> +class Githubprivate(Wget):

It's too near Christmas to review this properly but that looks *a lot* 
neater!

Ross


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [master][PATCH v2] fetch2/githubprivate: new fetcher for private github repositories
  2019-12-20 10:08 ` [master][PATCH v2] fetch2/githubprivate: new fetcher for " André Draszik
  2019-12-23 15:38   ` Ross Burton
@ 2020-01-14 15:09   ` André Draszik
  2020-01-27 11:26     ` André Draszik
  1 sibling, 1 reply; 8+ messages in thread
From: André Draszik @ 2020-01-14 15:09 UTC (permalink / raw)
  To: bitbake-devel

ping

On Fri, 2019-12-20 at 10:08 +0000, André Draszik wrote:
> The wget / http fetcher doesn't support fetching assets
> attached to releases on private GitHub repositories, i.e.
> release artefacts like
>     https://github.com/<user>/<project>/releases/download/v1.0.0/asset1.txt
> 
> Those are special, in that HTTP basic auth is not used / possible
> on the URL as seen in the GitHub UI, but instead the GitHub API
> must be used for downloading (which does support HTTP basic auth)
> where the URL will be different.
> 
> Implement a new fetcher that:
>     * uses the GitHub API to determine the asset URL
>     * re-uses the existing wget fetcher to download this URL
>       instead
>     * supports checkstatus() (bitbake -c checkuri)
>     * supports latest_versionstring() (devtool latest-version)
>     * supports GitHub.com and GitHub Enterprise for the above
> 
> Implementation notes:
> To be able to access the GitHub API, opportunistic authentication
> (auth-no-challenge) needs to be enabled. Then the API needs to
> be queried for the real URL of the file to be downloaded, and
> finally application/octet-stream must be specified explicitly.
> 
> Note that there is a slight difference in the location of the
> REST API endpoints between GitHub.com and GitHub Enterprise.
> 
>     https://developer.github.com/v3/repos/releases/
>     https://developer.github.com/enterprise/2.19/v3/enterprise-admin/
> 
> Some notes:
> * --auth-no-challenge is added unconditionally because we know
>   username / password will definitely be needed, and they are
>   likely specified in ~/.netrc, rather than in the recipe (but
>   username / password via recipe is still supported)
> * the release information returned looks sth like:
> [
>     {
>         ...
>         "name": <name of the release>
>         "assets": [
>             {
>                 ...
>                 "browser_download_url": "https://github.com/<user>/<project>/releases/download/v1.0.0/asset1.txt",
>                 "url": "https://api.github.com/repos/<user>/<project>/releases/assets/16146291",
>                 ...
>             },
>             ...
>         ],
>         ...
>     },
>     ...
> ]
>   hence we need to pass -O to wget to explicitly download using
>   the original name
> * to determine the latest available version, we can simply query
>   the API for the version (name) that the SRC_URI entry is
>   attached to, and then figure out if there is a more recent
>   version available, rather than doing lots of matches using
>   regexes
> * this has been tested with github.com and GitHub Enterprise on
>   private repositories, with and without PREMIRRORS
> 
> Signed-off-by: André Draszik <git@andred.net>
> ---
>  bitbake/lib/bb/fetch2/__init__.py      |   6 +-
>  bitbake/lib/bb/fetch2/githubprivate.py | 174 +++++++++++++++++++++++++
>  2 files changed, 178 insertions(+), 2 deletions(-)
>  create mode 100644 bitbake/lib/bb/fetch2/githubprivate.py
> 
> diff --git a/bitbake/lib/bb/fetch2/__init__.py b/bitbake/lib/bb/fetch2/__init__.py
> index 07de6c2693..5c533cf78e 100644
> --- a/bitbake/lib/bb/fetch2/__init__.py
> +++ b/bitbake/lib/bb/fetch2/__init__.py
> @@ -1238,13 +1238,13 @@ class FetchData(object):
>              self.sha256_name = "sha256sum"
>          if self.md5_name in self.parm:
>              self.md5_expected = self.parm[self.md5_name]
> -        elif self.type not in ["http", "https", "ftp", "ftps", "sftp", "s3"]:
> +        elif self.type not in ["http", "https", "ftp", "ftps", "githubprivate", "sftp", "s3"]:
>              self.md5_expected = None
>          else:
>              self.md5_expected = d.getVarFlag("SRC_URI", self.md5_name)
>          if self.sha256_name in self.parm:
>              self.sha256_expected = self.parm[self.sha256_name]
> -        elif self.type not in ["http", "https", "ftp", "ftps", "sftp", "s3"]:
> +        elif self.type not in ["http", "https", "ftp", "ftps", "githubprivate", "sftp", "s3"]:
>              self.sha256_expected = None
>          else:
>              self.sha256_expected = d.getVarFlag("SRC_URI", self.sha256_name)
> @@ -1853,6 +1853,7 @@ from . import osc
>  from . import repo
>  from . import clearcase
>  from . import npm
> +from . import githubprivate
>  
>  methods.append(local.Local())
>  methods.append(wget.Wget())
> @@ -1871,3 +1872,4 @@ methods.append(osc.Osc())
>  methods.append(repo.Repo())
>  methods.append(clearcase.ClearCase())
>  methods.append(npm.Npm())
> +methods.append(githubprivate.Githubprivate())
> diff --git a/bitbake/lib/bb/fetch2/githubprivate.py b/bitbake/lib/bb/fetch2/githubprivate.py
> new file mode 100644
> index 0000000000..5a007c4e69
> --- /dev/null
> +++ b/bitbake/lib/bb/fetch2/githubprivate.py
> @@ -0,0 +1,174 @@
> +#
> +# SPDX-License-Identifier: GPL-2.0-only
> +#
> +"""
> +Bitbake "Fetch" implementation for assets attached to private
> +repositories on GitHub or GitHub Enterprise.
> +"""
> +
> +import os
> +import json
> +import tempfile
> +import bb
> +from   bb.fetch2.wget import Wget
> +from   bb.fetch2 import FetchError
> +from   bb.fetch2 import logger
> +from   bb.fetch2 import uri_replace
> +
> +class Githubprivate(Wget):
> +    """Class to fetch an asset from a private repository on GitHub
> +       (or GitHub Enterprise)."""
> +
> +    def supports(self, ud, d):
> +        return ud.type in ['githubprivate']
> +
> +    def urldata_init(self, ud, d):
> +        ud.proto = 'https'
> +        if 'protocol' in ud.parm:
> +            ud.proto = ud.parm['protocol']
> +        if not ud.proto in ('http', 'https'):
> +            raise bb.fetch2.ParameterError("Invalid protocol type", ud.url)
> +
> +        if not 'downloadfilename' in ud.parm:
> +            # The asset filename determined using the GitHub API will
> +            # not match the filename of the release artefact (as in
> +            # SRC_URI). Hence we need to unconditionally instruct
> +            # wget to download using -O. This can be achieved by
> +            # unconditionally setting 'downloadfilename' here.
> +            ud.parm['downloadfilename'] = os.path.basename(ud.path)
> +        super(Githubprivate, self).urldata_init(ud, d)
> +        # To be able to access the GitHub API, opportunistic authentication
> +        # needs to be enabled. Also username / password will definitely be
> +        # needed, and they are likely specified in ~/.netrc, rather than in
> +        # the recipe itself.
> +        self.basecmd += " --auth-no-challenge"
> +
> +    def _get_gh_releases_info(self, uri, ud, d):
> +        fetchcmd = self.basecmd
> +        if ud.user and ud.pswd:
> +            fetchcmd += " --user=%s --password=%s" % (ud.user, ud.pswd)
> +
> +        # Github private repositories support basic-auth via the API
> +        # endpoints only. Using those, the download URL will be
> +        # different, and we need to download using application/octet-stream.
> +        # The API endpoint mapping is different for github.com and
> +        # GitHub Enterprise:
> +        #     github.com -> api.github.com
> +        #     github.example.com -> github.example.com/api/v3/
> +        # The Accept header is used in any case to fix the API version to
> +        # the supported level (version 3).
> +        #
> +        # To get the download URL when using the API, all the releases
> +        # are listed via
> +        #     https://api.github.com/<user>/<project>/releases
> +        # which returns a JSON message describing all releases and all
> +        # their attached artefacts. We can easily search that for
> +        # the artefact that we're trying to download, and use
> +        # the replacement URL from that response.
> +        assetinfo_cmd = fetchcmd + " --header='Accept: application/vnd.github.v3+json'"
> +        api_replacements = ['githubprivate://github.com/.* TYPE://api.github.com/repos/REPORELEASES',
> +                            'githubprivate://.*/.* TYPE://HOST/api/v3/repos/REPORELEASES']
> +        replacements = {}
> +        replacements["TYPE"] = ud.proto
> +        replacements["HOST"] = ud.host
> +        # github release artifacts are of the form
> +        #     https://github.com/<user>/<project>/releases/download/v1.0.0/asset1.txt
> +        # drop everything after .../releases and point to api.github.com
> +        replacements["REPORELEASES"] = ud.path.rsplit('/', maxsplit=3)[0]
> +        for api_replacement in api_replacements:
> +            (find, replace) = api_replacement.split()
> +            rel_api_uri = uri_replace(ud, find, replace, replacements, d)
> +            if rel_api_uri == None:
> +                continue
> +            # uri_replace() keeps the params, and the actual filename.
> +            # drop both - we only want
> +            #     https://api.github.com/<user>/<project>/releases
> +            # from the example above
> +            rel_api_uri = rel_api_uri.split(';')[0].rsplit('/', maxsplit=1)[0]
> +            with tempfile.TemporaryDirectory(prefix="wget-github-release-") as workdir, \
> +                    tempfile.NamedTemporaryFile(mode="w+", dir=workdir, prefix="wget-release-") as f:
> +                assetinfo_cmd += " -O " + f.name + " '" + rel_api_uri + "'"
> +                logger.debug(2, "For url %s trying to retrieve asset info from %s" % (uri, assetinfo_cmd))
> +                try:
> +                    self._runwget(ud, d, assetinfo_cmd, True)
> +                except FetchError as e:
> +                    # Accessing a (PRE)MIRROR using the github API
> +                    # obviously doesn't work, just ignore
> +                    continue
> +                if os.path.getsize(f.name) == 0:
> +                    # the fetch resulted in a zero size file, ignore
> +                    logger.debug(2, "Could not retrieve asset info from %s" % rel_api_uri)
> +                    continue
> +                return json.load(f)
> +
> +        return []
> +
> +    def _get_gh_asset_uri(self, uri, ud, d):
> +        uri = uri.replace("githubprivate://", ud.proto + "://", 1)
> +        gh_asset_uri = None
> +        releases = self._get_gh_releases_info(uri, ud, d)
> +        # As per https://developer.github.com/v3/repos/releases/#list-releases-for-a-repository
> +        # Each release will have a list of assets, where the 'browser_download_url'
> +        # is what we intended to download, but we need to get it via the 'url',
> +        # which points to the github api and supports username/password
> +        for release in releases:
> +            for asset in release['assets']:
> +                logger.debug(2, "Comparing asset id %u URL %s" \
> +                                % (asset['id'], asset['browser_download_url']))
> +                if asset['browser_download_url'] == uri:
> +                    gh_asset_uri = asset['url']
> +                    logger.debug(2, "For URI %s using GitHub asset %s" % (uri, gh_asset_uri))
> +                    break
> +            if gh_asset_uri:
> +                break
> +
> +        if not gh_asset_uri:
> +            raise FetchError("Could not determine the GitHub asset URI for URI %s" % uri, uri)
> +
> +        return gh_asset_uri
> +
> +    def download(self, ud, d):
> +        """Fetch urls"""
> +        orig_uri = ud.url.split(";")[0]
> +        gh_asset_uri = self._get_gh_asset_uri(orig_uri, ud, d)
> +        ud.url = ud.url.replace(orig_uri, gh_asset_uri, 1)
> +        # To be able to download the actual asset, we need to force
> +        # the mime-type. Otherwise we'll get the asset info json.
> +        self.basecmd += " --header='Accept: application/octet-stream'"
> +        return super(Githubprivate, self).download(ud, d)
> +
> +    def latest_versionstring(self, ud, d):
> +        """
> +        Manipulate the URL and try to obtain the latest package version
> +        using GitHub API.
> +        """
> +        # We first get the release (name) that corresponds to the URL ...
> +        uri = ud.url.split(";")[0].replace("githubprivate://", ud.proto + "://", 1)
> +        releases = self._get_gh_releases_info(uri, ud, d)
> +        current_version = '0'
> +        for release in releases:
> +            bb.debug(3, "Getting current version info for URL %s" % uri)
> +            for release in releases:
> +                for asset in release['assets']:
> +                    if asset['browser_download_url'] == uri:
> +                        current_version = release['name']
> +                        break
> +                if current_version != '0':
> +                    break
> +            if current_version != '0':
> +                bb.debug(3, "Current version info is %s" % current_version)
> +
> +        # ... and then try to find a newer release (name).
> +        for release in releases:
> +            this_version = ['', release['name'], '']
> +            if self._vercmp(['', current_version, ''], this_version) < 0:
> +                current_version = this_version[1]
> +
> +        return (current_version, '')
> +
> +    def checkstatus(self, fetch, urldata, d):
> +        """Check if urls are accessible"""
> +        orig_uri = urldata.url.split(";")[0]
> +        gh_asset_uri = self._get_gh_asset_uri(orig_uri, urldata, d)
> +        urldata.url = urldata.url.replace(orig_uri, gh_asset_uri, 1)
> +        return super(Githubprivate, self).checkstatus(fetch, urldata, d)



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [master][PATCH v2] fetch2/githubprivate: new fetcher for private github repositories
  2020-01-14 15:09   ` André Draszik
@ 2020-01-27 11:26     ` André Draszik
  0 siblings, 0 replies; 8+ messages in thread
From: André Draszik @ 2020-01-27 11:26 UTC (permalink / raw)
  To: bitbake-devel

ping

On Tue, 2020-01-14 at 15:09 +0000, André Draszik wrote:
> ping
> 
> On Fri, 2019-12-20 at 10:08 +0000, André Draszik wrote:
> > The wget / http fetcher doesn't support fetching assets
> > attached to releases on private GitHub repositories, i.e.
> > release artefacts like
> >     https://github.com/<user>/<project>/releases/download/v1.0.0/asset1.txt
> > 
> > Those are special, in that HTTP basic auth is not used / possible
> > on the URL as seen in the GitHub UI, but instead the GitHub API
> > must be used for downloading (which does support HTTP basic auth)
> > where the URL will be different.
> > 
> > Implement a new fetcher that:
> >     * uses the GitHub API to determine the asset URL
> >     * re-uses the existing wget fetcher to download this URL
> >       instead
> >     * supports checkstatus() (bitbake -c checkuri)
> >     * supports latest_versionstring() (devtool latest-version)
> >     * supports GitHub.com and GitHub Enterprise for the above
> > 
> > Implementation notes:
> > To be able to access the GitHub API, opportunistic authentication
> > (auth-no-challenge) needs to be enabled. Then the API needs to
> > be queried for the real URL of the file to be downloaded, and
> > finally application/octet-stream must be specified explicitly.
> > 
> > Note that there is a slight difference in the location of the
> > REST API endpoints between GitHub.com and GitHub Enterprise.
> > 
> >     https://developer.github.com/v3/repos/releases/
> >     https://developer.github.com/enterprise/2.19/v3/enterprise-admin/
> > 
> > Some notes:
> > * --auth-no-challenge is added unconditionally because we know
> >   username / password will definitely be needed, and they are
> >   likely specified in ~/.netrc, rather than in the recipe (but
> >   username / password via recipe is still supported)
> > * the release information returned looks sth like:
> > [
> >     {
> >         ...
> >         "name": <name of the release>
> >         "assets": [
> >             {
> >                 ...
> >                 "browser_download_url": "https://github.com/<user>/<project>/releases/download/v1.0.0/asset1.txt",
> >                 "url": "https://api.github.com/repos/<user>/<project>/releases/assets/16146291",
> >                 ...
> >             },
> >             ...
> >         ],
> >         ...
> >     },
> >     ...
> > ]
> >   hence we need to pass -O to wget to explicitly download using
> >   the original name
> > * to determine the latest available version, we can simply query
> >   the API for the version (name) that the SRC_URI entry is
> >   attached to, and then figure out if there is a more recent
> >   version available, rather than doing lots of matches using
> >   regexes
> > * this has been tested with github.com and GitHub Enterprise on
> >   private repositories, with and without PREMIRRORS
> > 
> > Signed-off-by: André Draszik <git@andred.net>
> > ---
> >  bitbake/lib/bb/fetch2/__init__.py      |   6 +-
> >  bitbake/lib/bb/fetch2/githubprivate.py | 174 +++++++++++++++++++++++++
> >  2 files changed, 178 insertions(+), 2 deletions(-)
> >  create mode 100644 bitbake/lib/bb/fetch2/githubprivate.py
> > 
> > diff --git a/bitbake/lib/bb/fetch2/__init__.py b/bitbake/lib/bb/fetch2/__init__.py
> > index 07de6c2693..5c533cf78e 100644
> > --- a/bitbake/lib/bb/fetch2/__init__.py
> > +++ b/bitbake/lib/bb/fetch2/__init__.py
> > @@ -1238,13 +1238,13 @@ class FetchData(object):
> >              self.sha256_name = "sha256sum"
> >          if self.md5_name in self.parm:
> >              self.md5_expected = self.parm[self.md5_name]
> > -        elif self.type not in ["http", "https", "ftp", "ftps", "sftp", "s3"]:
> > +        elif self.type not in ["http", "https", "ftp", "ftps", "githubprivate", "sftp", "s3"]:
> >              self.md5_expected = None
> >          else:
> >              self.md5_expected = d.getVarFlag("SRC_URI", self.md5_name)
> >          if self.sha256_name in self.parm:
> >              self.sha256_expected = self.parm[self.sha256_name]
> > -        elif self.type not in ["http", "https", "ftp", "ftps", "sftp", "s3"]:
> > +        elif self.type not in ["http", "https", "ftp", "ftps", "githubprivate", "sftp", "s3"]:
> >              self.sha256_expected = None
> >          else:
> >              self.sha256_expected = d.getVarFlag("SRC_URI", self.sha256_name)
> > @@ -1853,6 +1853,7 @@ from . import osc
> >  from . import repo
> >  from . import clearcase
> >  from . import npm
> > +from . import githubprivate
> >  
> >  methods.append(local.Local())
> >  methods.append(wget.Wget())
> > @@ -1871,3 +1872,4 @@ methods.append(osc.Osc())
> >  methods.append(repo.Repo())
> >  methods.append(clearcase.ClearCase())
> >  methods.append(npm.Npm())
> > +methods.append(githubprivate.Githubprivate())
> > diff --git a/bitbake/lib/bb/fetch2/githubprivate.py b/bitbake/lib/bb/fetch2/githubprivate.py
> > new file mode 100644
> > index 0000000000..5a007c4e69
> > --- /dev/null
> > +++ b/bitbake/lib/bb/fetch2/githubprivate.py
> > @@ -0,0 +1,174 @@
> > +#
> > +# SPDX-License-Identifier: GPL-2.0-only
> > +#
> > +"""
> > +Bitbake "Fetch" implementation for assets attached to private
> > +repositories on GitHub or GitHub Enterprise.
> > +"""
> > +
> > +import os
> > +import json
> > +import tempfile
> > +import bb
> > +from   bb.fetch2.wget import Wget
> > +from   bb.fetch2 import FetchError
> > +from   bb.fetch2 import logger
> > +from   bb.fetch2 import uri_replace
> > +
> > +class Githubprivate(Wget):
> > +    """Class to fetch an asset from a private repository on GitHub
> > +       (or GitHub Enterprise)."""
> > +
> > +    def supports(self, ud, d):
> > +        return ud.type in ['githubprivate']
> > +
> > +    def urldata_init(self, ud, d):
> > +        ud.proto = 'https'
> > +        if 'protocol' in ud.parm:
> > +            ud.proto = ud.parm['protocol']
> > +        if not ud.proto in ('http', 'https'):
> > +            raise bb.fetch2.ParameterError("Invalid protocol type", ud.url)
> > +
> > +        if not 'downloadfilename' in ud.parm:
> > +            # The asset filename determined using the GitHub API will
> > +            # not match the filename of the release artefact (as in
> > +            # SRC_URI). Hence we need to unconditionally instruct
> > +            # wget to download using -O. This can be achieved by
> > +            # unconditionally setting 'downloadfilename' here.
> > +            ud.parm['downloadfilename'] = os.path.basename(ud.path)
> > +        super(Githubprivate, self).urldata_init(ud, d)
> > +        # To be able to access the GitHub API, opportunistic authentication
> > +        # needs to be enabled. Also username / password will definitely be
> > +        # needed, and they are likely specified in ~/.netrc, rather than in
> > +        # the recipe itself.
> > +        self.basecmd += " --auth-no-challenge"
> > +
> > +    def _get_gh_releases_info(self, uri, ud, d):
> > +        fetchcmd = self.basecmd
> > +        if ud.user and ud.pswd:
> > +            fetchcmd += " --user=%s --password=%s" % (ud.user, ud.pswd)
> > +
> > +        # Github private repositories support basic-auth via the API
> > +        # endpoints only. Using those, the download URL will be
> > +        # different, and we need to download using application/octet-stream.
> > +        # The API endpoint mapping is different for github.com and
> > +        # GitHub Enterprise:
> > +        #     github.com -> api.github.com
> > +        #     github.example.com -> github.example.com/api/v3/
> > +        # The Accept header is used in any case to fix the API version to
> > +        # the supported level (version 3).
> > +        #
> > +        # To get the download URL when using the API, all the releases
> > +        # are listed via
> > +        #     https://api.github.com/<user>/<project>/releases
> > +        # which returns a JSON message describing all releases and all
> > +        # their attached artefacts. We can easily search that for
> > +        # the artefact that we're trying to download, and use
> > +        # the replacement URL from that response.
> > +        assetinfo_cmd = fetchcmd + " --header='Accept: application/vnd.github.v3+json'"
> > +        api_replacements = ['githubprivate://github.com/.* TYPE://api.github.com/repos/REPORELEASES',
> > +                            'githubprivate://.*/.* TYPE://HOST/api/v3/repos/REPORELEASES']
> > +        replacements = {}
> > +        replacements["TYPE"] = ud.proto
> > +        replacements["HOST"] = ud.host
> > +        # github release artifacts are of the form
> > +        #     https://github.com/<user>/<project>/releases/download/v1.0.0/asset1.txt
> > +        # drop everything after .../releases and point to api.github.com
> > +        replacements["REPORELEASES"] = ud.path.rsplit('/', maxsplit=3)[0]
> > +        for api_replacement in api_replacements:
> > +            (find, replace) = api_replacement.split()
> > +            rel_api_uri = uri_replace(ud, find, replace, replacements, d)
> > +            if rel_api_uri == None:
> > +                continue
> > +            # uri_replace() keeps the params, and the actual filename.
> > +            # drop both - we only want
> > +            #     https://api.github.com/<user>/<project>/releases
> > +            # from the example above
> > +            rel_api_uri = rel_api_uri.split(';')[0].rsplit('/', maxsplit=1)[0]
> > +            with tempfile.TemporaryDirectory(prefix="wget-github-release-") as workdir, \
> > +                    tempfile.NamedTemporaryFile(mode="w+", dir=workdir, prefix="wget-release-") as f:
> > +                assetinfo_cmd += " -O " + f.name + " '" + rel_api_uri + "'"
> > +                logger.debug(2, "For url %s trying to retrieve asset info from %s" % (uri, assetinfo_cmd))
> > +                try:
> > +                    self._runwget(ud, d, assetinfo_cmd, True)
> > +                except FetchError as e:
> > +                    # Accessing a (PRE)MIRROR using the github API
> > +                    # obviously doesn't work, just ignore
> > +                    continue
> > +                if os.path.getsize(f.name) == 0:
> > +                    # the fetch resulted in a zero size file, ignore
> > +                    logger.debug(2, "Could not retrieve asset info from %s" % rel_api_uri)
> > +                    continue
> > +                return json.load(f)
> > +
> > +        return []
> > +
> > +    def _get_gh_asset_uri(self, uri, ud, d):
> > +        uri = uri.replace("githubprivate://", ud.proto + "://", 1)
> > +        gh_asset_uri = None
> > +        releases = self._get_gh_releases_info(uri, ud, d)
> > +        # As per https://developer.github.com/v3/repos/releases/#list-releases-for-a-repository
> > +        # Each release will have a list of assets, where the 'browser_download_url'
> > +        # is what we intended to download, but we need to get it via the 'url',
> > +        # which points to the github api and supports username/password
> > +        for release in releases:
> > +            for asset in release['assets']:
> > +                logger.debug(2, "Comparing asset id %u URL %s" \
> > +                                % (asset['id'], asset['browser_download_url']))
> > +                if asset['browser_download_url'] == uri:
> > +                    gh_asset_uri = asset['url']
> > +                    logger.debug(2, "For URI %s using GitHub asset %s" % (uri, gh_asset_uri))
> > +                    break
> > +            if gh_asset_uri:
> > +                break
> > +
> > +        if not gh_asset_uri:
> > +            raise FetchError("Could not determine the GitHub asset URI for URI %s" % uri, uri)
> > +
> > +        return gh_asset_uri
> > +
> > +    def download(self, ud, d):
> > +        """Fetch urls"""
> > +        orig_uri = ud.url.split(";")[0]
> > +        gh_asset_uri = self._get_gh_asset_uri(orig_uri, ud, d)
> > +        ud.url = ud.url.replace(orig_uri, gh_asset_uri, 1)
> > +        # To be able to download the actual asset, we need to force
> > +        # the mime-type. Otherwise we'll get the asset info json.
> > +        self.basecmd += " --header='Accept: application/octet-stream'"
> > +        return super(Githubprivate, self).download(ud, d)
> > +
> > +    def latest_versionstring(self, ud, d):
> > +        """
> > +        Manipulate the URL and try to obtain the latest package version
> > +        using GitHub API.
> > +        """
> > +        # We first get the release (name) that corresponds to the URL ...
> > +        uri = ud.url.split(";")[0].replace("githubprivate://", ud.proto + "://", 1)
> > +        releases = self._get_gh_releases_info(uri, ud, d)
> > +        current_version = '0'
> > +        for release in releases:
> > +            bb.debug(3, "Getting current version info for URL %s" % uri)
> > +            for release in releases:
> > +                for asset in release['assets']:
> > +                    if asset['browser_download_url'] == uri:
> > +                        current_version = release['name']
> > +                        break
> > +                if current_version != '0':
> > +                    break
> > +            if current_version != '0':
> > +                bb.debug(3, "Current version info is %s" % current_version)
> > +
> > +        # ... and then try to find a newer release (name).
> > +        for release in releases:
> > +            this_version = ['', release['name'], '']
> > +            if self._vercmp(['', current_version, ''], this_version) < 0:
> > +                current_version = this_version[1]
> > +
> > +        return (current_version, '')
> > +
> > +    def checkstatus(self, fetch, urldata, d):
> > +        """Check if urls are accessible"""
> > +        orig_uri = urldata.url.split(";")[0]
> > +        gh_asset_uri = self._get_gh_asset_uri(orig_uri, urldata, d)
> > +        urldata.url = urldata.url.replace(orig_uri, gh_asset_uri, 1)
> > +        return super(Githubprivate, self).checkstatus(fetch, urldata, d)



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-01-27 11:26 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-14 14:21 [master][PATCH] fetch2/wget: support releases from private github repositories André Draszik
2019-11-21 20:27 ` André Draszik
2019-11-22 17:01 ` Ross Burton
2019-11-26  9:40   ` André Draszik
2019-12-20 10:08 ` [master][PATCH v2] fetch2/githubprivate: new fetcher for " André Draszik
2019-12-23 15:38   ` Ross Burton
2020-01-14 15:09   ` André Draszik
2020-01-27 11:26     ` André Draszik

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.