All of lore.kernel.org
 help / color / mirror / Atom feed
* [Buildroot] [PATCH v3 0/3] Use aiohttp in pkg-stats
@ 2020-08-08 18:08 Thomas Petazzoni
  2020-08-08 18:08 ` [Buildroot] [PATCH v3 1/3] support/scripts/pkg-stats: use aiohttp for latest version retrieval Thomas Petazzoni
                   ` (4 more replies)
  0 siblings, 5 replies; 9+ messages in thread
From: Thomas Petazzoni @ 2020-08-08 18:08 UTC (permalink / raw)
  To: buildroot

Hello,

I started investigating why pkg-stats was sometimes stuck on the
server running it on a daily basis to populate
autobuild.buildroot.org/stats/ and send the autobuilder e-mails. The
subprocesses started by "multiprocessing" to retrieve the latest
upstream version from release-monitoring.org were stuck holding a
lock. Without providing a definitive conclusion, some preliminary
research showed that multiprocessing can be tricky and cause some
issues with locks.

Discussing this with Titouan, he suggested to use aiohttp instead of
multiprocessing. And indeed, it makes a lot of sense to use this
popular asynchronous HTTP library.

This patch series switches the latest version retrieval and the
upstream URL checking to aiohttp, and as a bonus adds some logging to
show the progress of the retrieval, as it can be quite long.

Changes since v2:

 - Use python3 in the shebang

 - Use asyncio.TimeoutError in the exception handling

 - Slightly rework how packages with "no valid infra" are handled in
   the "latest version" check, but we keep a loop to handle such
   packages before the main loop, as we want the real count of
   packages that the main loop will handle.

 - Use asyncio.get_event_loop() + loop.run_until_complete() instead of
   asyncio.run().

Changes since v1:

 - Pass trust_env=True when creating the aiohttp.ClientSession() so
   that HTTP proxy environment variables are taken into
   account. Suggested by Matt Weber.

 - Fix bogus indentation that was breaking the logic of the latest
   version retrieval.

Thanks,

Thomas


Thomas Petazzoni (3):
  support/scripts/pkg-stats: use aiohttp for latest version retrieval
  support/scripts/pkg-stats: use aiohttp for upstream URL checking
  support/scripts/pkg-stats: show progress of upstream URL and latest
    version

 support/scripts/pkg-stats | 213 ++++++++++++++++++++++----------------
 1 file changed, 126 insertions(+), 87 deletions(-)

-- 
2.26.2

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Buildroot] [PATCH v3 1/3] support/scripts/pkg-stats: use aiohttp for latest version retrieval
  2020-08-08 18:08 [Buildroot] [PATCH v3 0/3] Use aiohttp in pkg-stats Thomas Petazzoni
@ 2020-08-08 18:08 ` Thomas Petazzoni
  2020-08-28 15:52   ` Peter Korsgaard
  2020-08-08 18:08 ` [Buildroot] [PATCH v3 2/3] support/scripts/pkg-stats: use aiohttp for upstream URL checking Thomas Petazzoni
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 9+ messages in thread
From: Thomas Petazzoni @ 2020-08-08 18:08 UTC (permalink / raw)
  To: buildroot

This commit reworks the code that retrieves the latest upstream
version of each package from release-monitoring.org using the aiohttp
module. This makes the implementation much more elegant, and avoids
the problematic multiprocessing Pool which is causing issues in some
situations.

Since we're now using some async functionality, the script is Python
3.x only, so the shebang is changed to make this clear.

Suggested-by: Titouan Christophe <titouan.christophe@railnova.eu>
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
---
 support/scripts/pkg-stats | 148 +++++++++++++++++++++-----------------
 1 file changed, 81 insertions(+), 67 deletions(-)

diff --git a/support/scripts/pkg-stats b/support/scripts/pkg-stats
index ec4d538758..3423c44815 100755
--- a/support/scripts/pkg-stats
+++ b/support/scripts/pkg-stats
@@ -1,4 +1,4 @@
-#!/usr/bin/env python
+#!/usr/bin/env python3
 
 # Copyright (C) 2009 by Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
 #
@@ -16,7 +16,9 @@
 # along with this program; if not, write to the Free Software
 # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
 
+import aiohttp
 import argparse
+import asyncio
 import datetime
 import fnmatch
 import os
@@ -26,13 +28,10 @@ import subprocess
 import requests  # URL checking
 import json
 import ijson
-import certifi
 import distutils.version
 import time
 import gzip
 import sys
-from urllib3 import HTTPSConnectionPool
-from urllib3.exceptions import HTTPError
 from multiprocessing import Pool
 
 sys.path.append('utils/')
@@ -54,10 +53,6 @@ CVE_AFFECTS = 1
 CVE_DOESNT_AFFECT = 2
 CVE_UNKNOWN = 3
 
-# Used to make multiple requests to the same host. It is global
-# because it's used by sub-processes.
-http_pool = None
-
 
 class Defconfig:
     def __init__(self, name, path):
@@ -526,54 +521,88 @@ def check_package_urls(packages):
     pool.terminate()
 
 
-def release_monitoring_get_latest_version_by_distro(pool, name):
-    try:
-        req = pool.request('GET', "/api/project/Buildroot/%s" % name)
-    except HTTPError:
-        return (RM_API_STATUS_ERROR, None, None)
-
-    if req.status != 200:
-        return (RM_API_STATUS_NOT_FOUND, None, None)
+def check_package_latest_version_set_status(pkg, status, version, identifier):
+    pkg.latest_version = {
+        "status": status,
+        "version": version,
+        "id": identifier,
+    }
 
-    data = json.loads(req.data)
+    if pkg.latest_version['status'] == RM_API_STATUS_ERROR:
+        pkg.status['version'] = ('warning', "Release Monitoring API error")
+    elif pkg.latest_version['status'] == RM_API_STATUS_NOT_FOUND:
+        pkg.status['version'] = ('warning', "Package not found on Release Monitoring")
 
-    if 'version' in data:
-        return (RM_API_STATUS_FOUND_BY_DISTRO, data['version'], data['id'])
+    if pkg.latest_version['version'] is None:
+        pkg.status['version'] = ('warning', "No upstream version available on Release Monitoring")
+    elif pkg.latest_version['version'] != pkg.current_version:
+        pkg.status['version'] = ('error', "The newer version {} is available upstream".format(pkg.latest_version['version']))
     else:
-        return (RM_API_STATUS_FOUND_BY_DISTRO, None, data['id'])
+        pkg.status['version'] = ('ok', 'up-to-date')
 
 
-def release_monitoring_get_latest_version_by_guess(pool, name):
+async def check_package_get_latest_version_by_distro(session, pkg, retry=True):
+    url = "https://release-monitoring.org//api/project/Buildroot/%s" % pkg.name
     try:
-        req = pool.request('GET', "/api/projects/?pattern=%s" % name)
-    except HTTPError:
-        return (RM_API_STATUS_ERROR, None, None)
+        async with session.get(url) as resp:
+            if resp.status != 200:
+                return False
 
-    if req.status != 200:
-        return (RM_API_STATUS_NOT_FOUND, None, None)
+            data = await resp.json()
+            version = data['version'] if 'version' in data else None
+            check_package_latest_version_set_status(pkg,
+                                                    RM_API_STATUS_FOUND_BY_DISTRO,
+                                                    version,
+                                                    data['id'])
+            return True
+
+    except (aiohttp.ClientError, asyncio.TimeoutError):
+        if retry:
+            return await check_package_get_latest_version_by_distro(session, pkg, retry=False)
+        else:
+            return False
 
-    data = json.loads(req.data)
 
-    projects = data['projects']
-    projects.sort(key=lambda x: x['id'])
+async def check_package_get_latest_version_by_guess(session, pkg, retry=True):
+    url = "https://release-monitoring.org/api/projects/?pattern=%s" % pkg.name
+    try:
+        async with session.get(url) as resp:
+            if resp.status != 200:
+                return False
+
+            data = await resp.json()
+            # filter projects that have the right name and a version defined
+            projects = [p for p in data['projects'] if p['name'] == pkg.name and 'version' in p]
+            projects.sort(key=lambda x: x['id'])
+
+            if len(projects) > 0:
+                check_package_latest_version_set_status(pkg,
+                                                        RM_API_STATUS_FOUND_BY_DISTRO,
+                                                        projects[0]['version'],
+                                                        projects[0]['id'])
+                return True
+
+    except (aiohttp.ClientError, asyncio.TimeoutError):
+        if retry:
+            return await check_package_get_latest_version_by_guess(session, pkg, retry=False)
+        else:
+            return False
 
-    for p in projects:
-        if p['name'] == name and 'version' in p:
-            return (RM_API_STATUS_FOUND_BY_PATTERN, p['version'], p['id'])
 
-    return (RM_API_STATUS_NOT_FOUND, None, None)
+async def check_package_latest_version_get(session, pkg):
 
+    if await check_package_get_latest_version_by_distro(session, pkg):
+        return
 
-def check_package_latest_version_worker(name):
-    """Wrapper to try both by name then by guess"""
-    print(name)
-    res = release_monitoring_get_latest_version_by_distro(http_pool, name)
-    if res[0] == RM_API_STATUS_NOT_FOUND:
-        res = release_monitoring_get_latest_version_by_guess(http_pool, name)
-    return res
+    if await check_package_get_latest_version_by_guess(session, pkg):
+        return
 
+    check_package_latest_version_set_status(pkg,
+                                            RM_API_STATUS_NOT_FOUND,
+                                            None, None)
 
-def check_package_latest_version(packages):
+
+async def check_package_latest_version(packages):
     """
     Fills in the .latest_version field of all Package objects
 
@@ -587,33 +616,17 @@ def check_package_latest_version(packages):
     - id: string containing the id of the project corresponding to this
       package, as known by release-monitoring.org
     """
-    global http_pool
-    http_pool = HTTPSConnectionPool('release-monitoring.org', port=443,
-                                    cert_reqs='CERT_REQUIRED', ca_certs=certifi.where(),
-                                    timeout=30)
-    worker_pool = Pool(processes=64)
-    results = worker_pool.map(check_package_latest_version_worker, (pkg.name for pkg in packages))
-    for pkg, r in zip(packages, results):
-        pkg.latest_version = dict(zip(['status', 'version', 'id'], r))
-
-        if not pkg.has_valid_infra:
-            pkg.status['version'] = ("na", "no valid package infra")
-            continue
-
-        if pkg.latest_version['status'] == RM_API_STATUS_ERROR:
-            pkg.status['version'] = ('warning', "Release Monitoring API error")
-        elif pkg.latest_version['status'] == RM_API_STATUS_NOT_FOUND:
-            pkg.status['version'] = ('warning', "Package not found on Release Monitoring")
 
-        if pkg.latest_version['version'] is None:
-            pkg.status['version'] = ('warning', "No upstream version available on Release Monitoring")
-        elif pkg.latest_version['version'] != pkg.current_version:
-            pkg.status['version'] = ('error', "The newer version {} is available upstream".format(pkg.latest_version['version']))
-        else:
-            pkg.status['version'] = ('ok', 'up-to-date')
+    for pkg in [p for p in packages if not p.has_valid_infra]:
+        pkg.status['version'] = ("na", "no valid package infra")
 
-    worker_pool.terminate()
-    del http_pool
+    tasks = []
+    connector = aiohttp.TCPConnector(limit_per_host=5)
+    async with aiohttp.ClientSession(connector=connector, trust_env=True) as sess:
+        packages = [p for p in packages if p.has_valid_infra]
+        for pkg in packages:
+            tasks.append(check_package_latest_version_get(sess, pkg))
+        await asyncio.wait(tasks)
 
 
 def check_package_cves(nvd_path, packages):
@@ -1057,7 +1070,8 @@ def __main__():
     print("Checking URL status")
     check_package_urls(packages)
     print("Getting latest versions ...")
-    check_package_latest_version(packages)
+    loop = asyncio.get_event_loop()
+    loop.run_until_complete(check_package_latest_version(packages))
     if args.nvd_path:
         print("Checking packages CVEs")
         check_package_cves(args.nvd_path, {p.name: p for p in packages})
-- 
2.26.2

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [Buildroot] [PATCH v3 2/3] support/scripts/pkg-stats: use aiohttp for upstream URL checking
  2020-08-08 18:08 [Buildroot] [PATCH v3 0/3] Use aiohttp in pkg-stats Thomas Petazzoni
  2020-08-08 18:08 ` [Buildroot] [PATCH v3 1/3] support/scripts/pkg-stats: use aiohttp for latest version retrieval Thomas Petazzoni
@ 2020-08-08 18:08 ` Thomas Petazzoni
  2020-08-28 15:52   ` Peter Korsgaard
  2020-08-08 18:08 ` [Buildroot] [PATCH v3 3/3] support/scripts/pkg-stats: show progress of upstream URL and latest version Thomas Petazzoni
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 9+ messages in thread
From: Thomas Petazzoni @ 2020-08-08 18:08 UTC (permalink / raw)
  To: buildroot

This commit reworks the code that checks if the upstream URL of each
package (specified by its Config.in file) using the aiohttp
module. This makes the implementation much more elegant, and avoids
the problematic multiprocessing Pool which is causing issues in some
situations.

Suggested-by: Titouan Christophe <titouan.christophe@railnova.eu>
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
---
 support/scripts/pkg-stats | 46 +++++++++++++++++++++------------------
 1 file changed, 25 insertions(+), 21 deletions(-)

diff --git a/support/scripts/pkg-stats b/support/scripts/pkg-stats
index 3423c44815..70e7fa7a0c 100755
--- a/support/scripts/pkg-stats
+++ b/support/scripts/pkg-stats
@@ -25,14 +25,13 @@ import os
 from collections import defaultdict
 import re
 import subprocess
-import requests  # URL checking
+import requests  # NVD database download
 import json
 import ijson
 import distutils.version
 import time
 import gzip
 import sys
-from multiprocessing import Pool
 
 sys.path.append('utils/')
 from getdeveloperlib import parse_developers  # noqa: E402
@@ -499,26 +498,30 @@ def package_init_make_info():
             Package.all_ignored_cves[pkgvar] = value.split()
 
 
-def check_url_status_worker(url, url_status):
-    if url_status[0] == 'ok':
-        try:
-            url_status_code = requests.head(url, timeout=30).status_code
-            if url_status_code >= 400:
-                return ("error", "invalid {}".format(url_status_code))
-        except requests.exceptions.RequestException:
-            return ("error", "invalid (err)")
-        return ("ok", "valid")
-    return url_status
+async def check_url_status(session, pkg, retry=True):
+    try:
+        async with session.get(pkg.url) as resp:
+            if resp.status >= 400:
+                pkg.status['url'] = ("error", "invalid {}".format(resp.status))
+                return
+    except (aiohttp.ClientError, asyncio.TimeoutError):
+        if retry:
+            return await check_url_status(session, pkg, retry=False)
+        else:
+            pkg.status['url'] = ("error", "invalid (err)")
+            return
 
+    pkg.status['url'] = ("ok", "valid")
 
-def check_package_urls(packages):
-    pool = Pool(processes=64)
-    for pkg in packages:
-        pkg.url_worker = pool.apply_async(check_url_status_worker, (pkg.url, pkg.status['url']))
-    for pkg in packages:
-        pkg.status['url'] = pkg.url_worker.get(timeout=3600)
-        del pkg.url_worker
-    pool.terminate()
+
+async def check_package_urls(packages):
+    tasks = []
+    connector = aiohttp.TCPConnector(limit_per_host=5)
+    async with aiohttp.ClientSession(connector=connector, trust_env=True) as sess:
+        packages = [p for p in packages if p.status['url'][0] == 'ok']
+        for pkg in packages:
+            tasks.append(check_url_status(sess, pkg))
+        await asyncio.wait(tasks)
 
 
 def check_package_latest_version_set_status(pkg, status, version, identifier):
@@ -1068,7 +1071,8 @@ def __main__():
         pkg.set_url()
         pkg.set_developers(developers)
     print("Checking URL status")
-    check_package_urls(packages)
+    loop = asyncio.get_event_loop()
+    loop.run_until_complete(check_package_urls(packages))
     print("Getting latest versions ...")
     loop = asyncio.get_event_loop()
     loop.run_until_complete(check_package_latest_version(packages))
-- 
2.26.2

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [Buildroot] [PATCH v3 3/3] support/scripts/pkg-stats: show progress of upstream URL and latest version
  2020-08-08 18:08 [Buildroot] [PATCH v3 0/3] Use aiohttp in pkg-stats Thomas Petazzoni
  2020-08-08 18:08 ` [Buildroot] [PATCH v3 1/3] support/scripts/pkg-stats: use aiohttp for latest version retrieval Thomas Petazzoni
  2020-08-08 18:08 ` [Buildroot] [PATCH v3 2/3] support/scripts/pkg-stats: use aiohttp for upstream URL checking Thomas Petazzoni
@ 2020-08-08 18:08 ` Thomas Petazzoni
  2020-08-28 15:52   ` Peter Korsgaard
  2020-08-11 20:33 ` [Buildroot] [PATCH v3 0/3] Use aiohttp in pkg-stats Thomas Petazzoni
  2020-08-28 15:52 ` Peter Korsgaard
  4 siblings, 1 reply; 9+ messages in thread
From: Thomas Petazzoni @ 2020-08-08 18:08 UTC (permalink / raw)
  To: buildroot

This commit slightly improves the output of pkg-stats by showing the
progress of the upstream URL checks and latest version retrieval, on a
package basis:

Checking URL status
[0001/0062] curlpp
[0002/0062] cmocka
[0003/0062] snappy
[0004/0062] nload
[...]
[0060/0062] librtas
[0061/0062] libsilk
[0062/0062] jhead
Getting latest versions ...
[0001/0064] libglob
[0002/0064] perl-http-daemon
[0003/0064] shadowsocks-libev
[...]
[0061/0064] lua-flu
[0062/0064] python-aiohttp-security
[0063/0064] ljlinenoise
[0064/0064] matchbox-lib

Note that the above sample was run on 64 packages. Only 62 packages
appear for the URL status check, because packages that do not have any
URL in their Config.in file, or don't have any Config.in file at all,
are not checked and therefore not accounted.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
---
 support/scripts/pkg-stats | 31 ++++++++++++++++++++++++++-----
 1 file changed, 26 insertions(+), 5 deletions(-)

diff --git a/support/scripts/pkg-stats b/support/scripts/pkg-stats
index 70e7fa7a0c..303af2f588 100755
--- a/support/scripts/pkg-stats
+++ b/support/scripts/pkg-stats
@@ -498,20 +498,31 @@ def package_init_make_info():
             Package.all_ignored_cves[pkgvar] = value.split()
 
 
-async def check_url_status(session, pkg, retry=True):
+check_url_count = 0
+
+
+async def check_url_status(session, pkg, npkgs, retry=True):
+    global check_url_count
+
     try:
         async with session.get(pkg.url) as resp:
             if resp.status >= 400:
                 pkg.status['url'] = ("error", "invalid {}".format(resp.status))
+                check_url_count += 1
+                print("[%04d/%04d] %s" % (check_url_count, npkgs, pkg.name))
                 return
     except (aiohttp.ClientError, asyncio.TimeoutError):
         if retry:
-            return await check_url_status(session, pkg, retry=False)
+            return await check_url_status(session, pkg, npkgs, retry=False)
         else:
             pkg.status['url'] = ("error", "invalid (err)")
+            check_url_count += 1
+            print("[%04d/%04d] %s" % (check_url_count, npkgs, pkg.name))
             return
 
     pkg.status['url'] = ("ok", "valid")
+    check_url_count += 1
+    print("[%04d/%04d] %s" % (check_url_count, npkgs, pkg.name))
 
 
 async def check_package_urls(packages):
@@ -520,7 +531,7 @@ async def check_package_urls(packages):
     async with aiohttp.ClientSession(connector=connector, trust_env=True) as sess:
         packages = [p for p in packages if p.status['url'][0] == 'ok']
         for pkg in packages:
-            tasks.append(check_url_status(sess, pkg))
+            tasks.append(check_url_status(sess, pkg, len(packages)))
         await asyncio.wait(tasks)
 
 
@@ -592,17 +603,27 @@ async def check_package_get_latest_version_by_guess(session, pkg, retry=True):
             return False
 
 
-async def check_package_latest_version_get(session, pkg):
+check_latest_count = 0
+
+
+async def check_package_latest_version_get(session, pkg, npkgs):
+    global check_latest_count
 
     if await check_package_get_latest_version_by_distro(session, pkg):
+        check_latest_count += 1
+        print("[%04d/%04d] %s" % (check_latest_count, npkgs, pkg.name))
         return
 
     if await check_package_get_latest_version_by_guess(session, pkg):
+        check_latest_count += 1
+        print("[%04d/%04d] %s" % (check_latest_count, npkgs, pkg.name))
         return
 
     check_package_latest_version_set_status(pkg,
                                             RM_API_STATUS_NOT_FOUND,
                                             None, None)
+    check_latest_count += 1
+    print("[%04d/%04d] %s" % (check_latest_count, npkgs, pkg.name))
 
 
 async def check_package_latest_version(packages):
@@ -628,7 +649,7 @@ async def check_package_latest_version(packages):
     async with aiohttp.ClientSession(connector=connector, trust_env=True) as sess:
         packages = [p for p in packages if p.has_valid_infra]
         for pkg in packages:
-            tasks.append(check_package_latest_version_get(sess, pkg))
+            tasks.append(check_package_latest_version_get(sess, pkg, len(packages)))
         await asyncio.wait(tasks)
 
 
-- 
2.26.2

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [Buildroot] [PATCH v3 0/3] Use aiohttp in pkg-stats
  2020-08-08 18:08 [Buildroot] [PATCH v3 0/3] Use aiohttp in pkg-stats Thomas Petazzoni
                   ` (2 preceding siblings ...)
  2020-08-08 18:08 ` [Buildroot] [PATCH v3 3/3] support/scripts/pkg-stats: show progress of upstream URL and latest version Thomas Petazzoni
@ 2020-08-11 20:33 ` Thomas Petazzoni
  2020-08-28 15:52 ` Peter Korsgaard
  4 siblings, 0 replies; 9+ messages in thread
From: Thomas Petazzoni @ 2020-08-11 20:33 UTC (permalink / raw)
  To: buildroot

On Sat,  8 Aug 2020 20:08:22 +0200
Thomas Petazzoni <thomas.petazzoni@bootlin.com> wrote:

> Thomas Petazzoni (3):
>   support/scripts/pkg-stats: use aiohttp for latest version retrieval
>   support/scripts/pkg-stats: use aiohttp for upstream URL checking
>   support/scripts/pkg-stats: show progress of upstream URL and latest
>     version

Since this has gone through 3 iterations, and we really need this to
have pkg-stats continue to deliver results reliably, I have applied the
series. I've applied to both master and next.

Best regards,

Thomas
-- 
Thomas Petazzoni, CTO, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Buildroot] [PATCH v3 0/3] Use aiohttp in pkg-stats
  2020-08-08 18:08 [Buildroot] [PATCH v3 0/3] Use aiohttp in pkg-stats Thomas Petazzoni
                   ` (3 preceding siblings ...)
  2020-08-11 20:33 ` [Buildroot] [PATCH v3 0/3] Use aiohttp in pkg-stats Thomas Petazzoni
@ 2020-08-28 15:52 ` Peter Korsgaard
  4 siblings, 0 replies; 9+ messages in thread
From: Peter Korsgaard @ 2020-08-28 15:52 UTC (permalink / raw)
  To: buildroot

>>>>> "Thomas" == Thomas Petazzoni <thomas.petazzoni@bootlin.com> writes:

 > Hello,
 > I started investigating why pkg-stats was sometimes stuck on the
 > server running it on a daily basis to populate
 > autobuild.buildroot.org/stats/ and send the autobuilder e-mails. The
 > subprocesses started by "multiprocessing" to retrieve the latest
 > upstream version from release-monitoring.org were stuck holding a
 > lock. Without providing a definitive conclusion, some preliminary
 > research showed that multiprocessing can be tricky and cause some
 > issues with locks.

 > Discussing this with Titouan, he suggested to use aiohttp instead of
 > multiprocessing. And indeed, it makes a lot of sense to use this
 > popular asynchronous HTTP library.

 > This patch series switches the latest version retrieval and the
 > upstream URL checking to aiohttp, and as a bonus adds some logging to
 > show the progress of the retrieval, as it can be quite long.

 > Changes since v2:

 >  - Use python3 in the shebang

 >  - Use asyncio.TimeoutError in the exception handling

 >  - Slightly rework how packages with "no valid infra" are handled in
 >    the "latest version" check, but we keep a loop to handle such
 >    packages before the main loop, as we want the real count of
 >    packages that the main loop will handle.

 >  - Use asyncio.get_event_loop() + loop.run_until_complete() instead of
 >    asyncio.run().

Committed to 2020.02.x and 2020.05.x, thanks.

-- 
Bye, Peter Korsgaard

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Buildroot] [PATCH v3 1/3] support/scripts/pkg-stats: use aiohttp for latest version retrieval
  2020-08-08 18:08 ` [Buildroot] [PATCH v3 1/3] support/scripts/pkg-stats: use aiohttp for latest version retrieval Thomas Petazzoni
@ 2020-08-28 15:52   ` Peter Korsgaard
  0 siblings, 0 replies; 9+ messages in thread
From: Peter Korsgaard @ 2020-08-28 15:52 UTC (permalink / raw)
  To: buildroot

>>>>> "Thomas" == Thomas Petazzoni <thomas.petazzoni@bootlin.com> writes:

 > This commit reworks the code that retrieves the latest upstream
 > version of each package from release-monitoring.org using the aiohttp
 > module. This makes the implementation much more elegant, and avoids
 > the problematic multiprocessing Pool which is causing issues in some
 > situations.

 > Since we're now using some async functionality, the script is Python
 > 3.x only, so the shebang is changed to make this clear.

 > Suggested-by: Titouan Christophe <titouan.christophe@railnova.eu>
 > Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>

Committed to 2020.02.x and 2020.05.x, thanks.

-- 
Bye, Peter Korsgaard

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Buildroot] [PATCH v3 2/3] support/scripts/pkg-stats: use aiohttp for upstream URL checking
  2020-08-08 18:08 ` [Buildroot] [PATCH v3 2/3] support/scripts/pkg-stats: use aiohttp for upstream URL checking Thomas Petazzoni
@ 2020-08-28 15:52   ` Peter Korsgaard
  0 siblings, 0 replies; 9+ messages in thread
From: Peter Korsgaard @ 2020-08-28 15:52 UTC (permalink / raw)
  To: buildroot

>>>>> "Thomas" == Thomas Petazzoni <thomas.petazzoni@bootlin.com> writes:

 > This commit reworks the code that checks if the upstream URL of each
 > package (specified by its Config.in file) using the aiohttp
 > module. This makes the implementation much more elegant, and avoids
 > the problematic multiprocessing Pool which is causing issues in some
 > situations.

 > Suggested-by: Titouan Christophe <titouan.christophe@railnova.eu>
 > Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>

Committed to 2020.02.x and 2020.05.x, thanks.

-- 
Bye, Peter Korsgaard

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Buildroot] [PATCH v3 3/3] support/scripts/pkg-stats: show progress of upstream URL and latest version
  2020-08-08 18:08 ` [Buildroot] [PATCH v3 3/3] support/scripts/pkg-stats: show progress of upstream URL and latest version Thomas Petazzoni
@ 2020-08-28 15:52   ` Peter Korsgaard
  0 siblings, 0 replies; 9+ messages in thread
From: Peter Korsgaard @ 2020-08-28 15:52 UTC (permalink / raw)
  To: buildroot

>>>>> "Thomas" == Thomas Petazzoni <thomas.petazzoni@bootlin.com> writes:

 > This commit slightly improves the output of pkg-stats by showing the
 > progress of the upstream URL checks and latest version retrieval, on a
 > package basis:

 > Checking URL status
 > [0001/0062] curlpp
 > [0002/0062] cmocka
 > [0003/0062] snappy
 > [0004/0062] nload
 > [...]
 > [0060/0062] librtas
 > [0061/0062] libsilk
 > [0062/0062] jhead
 > Getting latest versions ...
 > [0001/0064] libglob
 > [0002/0064] perl-http-daemon
 > [0003/0064] shadowsocks-libev
 > [...]
 > [0061/0064] lua-flu
 > [0062/0064] python-aiohttp-security
 > [0063/0064] ljlinenoise
 > [0064/0064] matchbox-lib

 > Note that the above sample was run on 64 packages. Only 62 packages
 > appear for the URL status check, because packages that do not have any
 > URL in their Config.in file, or don't have any Config.in file at all,
 > are not checked and therefore not accounted.

 > Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>

Committed to 2020.02.x and 2020.05.x, thanks.

-- 
Bye, Peter Korsgaard

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-08-28 15:52 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-08 18:08 [Buildroot] [PATCH v3 0/3] Use aiohttp in pkg-stats Thomas Petazzoni
2020-08-08 18:08 ` [Buildroot] [PATCH v3 1/3] support/scripts/pkg-stats: use aiohttp for latest version retrieval Thomas Petazzoni
2020-08-28 15:52   ` Peter Korsgaard
2020-08-08 18:08 ` [Buildroot] [PATCH v3 2/3] support/scripts/pkg-stats: use aiohttp for upstream URL checking Thomas Petazzoni
2020-08-28 15:52   ` Peter Korsgaard
2020-08-08 18:08 ` [Buildroot] [PATCH v3 3/3] support/scripts/pkg-stats: show progress of upstream URL and latest version Thomas Petazzoni
2020-08-28 15:52   ` Peter Korsgaard
2020-08-11 20:33 ` [Buildroot] [PATCH v3 0/3] Use aiohttp in pkg-stats Thomas Petazzoni
2020-08-28 15:52 ` Peter Korsgaard

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.