All of lore.kernel.org
 help / color / mirror / Atom feed
* [Buildroot] [PATCH 1/4] support/script/pkg-stats: allow disabling CPE matching
@ 2022-04-02 14:15 Thomas Petazzoni via buildroot
  2022-04-02 14:15 ` [Buildroot] [PATCH 2/4] support/scripts/pkg-stats: allow disabling package warnings retrieval Thomas Petazzoni via buildroot
                   ` (4 more replies)
  0 siblings, 5 replies; 12+ messages in thread
From: Thomas Petazzoni via buildroot @ 2022-04-02 14:15 UTC (permalink / raw)
  To: buildroot; +Cc: Thomas Petazzoni

This is useful when debugging/developing the pkg-stats script.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
---
 support/scripts/pkg-stats | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/support/scripts/pkg-stats b/support/scripts/pkg-stats
index 8cc64a54d1..ef9482ed95 100755
--- a/support/scripts/pkg-stats
+++ b/support/scripts/pkg-stats
@@ -1125,7 +1125,7 @@ def parse_args():
     parser.add_argument('--nvd-path', dest='nvd_path',
                         help='Path to the local NVD database', type=resolvepath)
     parser.add_argument('--disable', type=list_str,
-                        help='Features to disable, comma-separated (cve, upstream, url)',
+                        help='Features to disable, comma-separated (cve, upstream, url, cpe)',
                         default=[])
     args = parser.parse_args()
     if not args.html and not args.json:
@@ -1184,6 +1184,8 @@ def __main__():
     if "cve" not in args.disable and args.nvd_path:
         print("Checking packages CVEs")
         check_package_cves(args.nvd_path, packages)
+    if "cpe" not in args.disable and args.nvd_path:
+        print("Checking packages CPEs")
         check_package_cpes(args.nvd_path, packages)
     print("Calculate stats")
     stats = calculate_stats(packages)
-- 
2.35.1

_______________________________________________
buildroot mailing list
buildroot@buildroot.org
https://lists.buildroot.org/mailman/listinfo/buildroot

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Buildroot] [PATCH 2/4] support/scripts/pkg-stats: allow disabling package warnings retrieval
  2022-04-02 14:15 [Buildroot] [PATCH 1/4] support/script/pkg-stats: allow disabling CPE matching Thomas Petazzoni via buildroot
@ 2022-04-02 14:15 ` Thomas Petazzoni via buildroot
  2022-04-04 12:40   ` Peter Korsgaard
  2022-04-02 14:15 ` [Buildroot] [PATCH 3/4] support/scripts/pkg-stats: add a timeout on HTTP requests for upstream URLs Thomas Petazzoni via buildroot
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 12+ messages in thread
From: Thomas Petazzoni via buildroot @ 2022-04-02 14:15 UTC (permalink / raw)
  To: buildroot; +Cc: Thomas Petazzoni

This is useful when debugging/developing the pkg-stats script.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
---
 support/scripts/pkg-stats | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/support/scripts/pkg-stats b/support/scripts/pkg-stats
index ef9482ed95..329b30a5ee 100755
--- a/support/scripts/pkg-stats
+++ b/support/scripts/pkg-stats
@@ -1125,7 +1125,7 @@ def parse_args():
     parser.add_argument('--nvd-path', dest='nvd_path',
                         help='Path to the local NVD database', type=resolvepath)
     parser.add_argument('--disable', type=list_str,
-                        help='Features to disable, comma-separated (cve, upstream, url, cpe)',
+                        help='Features to disable, comma-separated (cve, upstream, url, cpe, warning)',
                         default=[])
     args = parser.parse_args()
     if not args.html and not args.json:
@@ -1167,7 +1167,8 @@ def __main__():
         pkg.set_license()
         pkg.set_hash_info()
         pkg.set_patch_count()
-        pkg.set_check_package_warnings()
+        if "warnings" not in args.disable:
+            pkg.set_check_package_warnings()
         pkg.set_current_version()
         pkg.set_cpeid()
         pkg.set_url()
-- 
2.35.1

_______________________________________________
buildroot mailing list
buildroot@buildroot.org
https://lists.buildroot.org/mailman/listinfo/buildroot

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Buildroot] [PATCH 3/4] support/scripts/pkg-stats: add a timeout on HTTP requests for upstream URLs
  2022-04-02 14:15 [Buildroot] [PATCH 1/4] support/script/pkg-stats: allow disabling CPE matching Thomas Petazzoni via buildroot
  2022-04-02 14:15 ` [Buildroot] [PATCH 2/4] support/scripts/pkg-stats: allow disabling package warnings retrieval Thomas Petazzoni via buildroot
@ 2022-04-02 14:15 ` Thomas Petazzoni via buildroot
  2022-04-04 12:40   ` Peter Korsgaard
  2022-04-02 14:15 ` [Buildroot] [PATCH 4/4] support/scripts/pkg-stats: reimplement CPE parsing in pkg-stats Thomas Petazzoni via buildroot
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 12+ messages in thread
From: Thomas Petazzoni via buildroot @ 2022-04-02 14:15 UTC (permalink / raw)
  To: buildroot; +Cc: Thomas Petazzoni

Some upstream sites are very slow to respond, and the default timeout
of 300 seconds of the aiohttp.ClientSession() is too long. Let's
reduce it to 15 seconds.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
---
 support/scripts/pkg-stats | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/support/scripts/pkg-stats b/support/scripts/pkg-stats
index 329b30a5ee..ae1a9aa5e4 100755
--- a/support/scripts/pkg-stats
+++ b/support/scripts/pkg-stats
@@ -451,7 +451,8 @@ async def check_url_status(session, pkg, npkgs, retry=True):
 async def check_package_urls(packages):
     tasks = []
     connector = aiohttp.TCPConnector(limit_per_host=5)
-    async with aiohttp.ClientSession(connector=connector, trust_env=True) as sess:
+    async with aiohttp.ClientSession(connector=connector, trust_env=True,
+                                     timeout=aiohttp.ClientTimeout(total=15)) as sess:
         packages = [p for p in packages if p.status['url'][0] == 'ok']
         for pkg in packages:
             tasks.append(asyncio.ensure_future(check_url_status(sess, pkg, len(packages))))
-- 
2.35.1

_______________________________________________
buildroot mailing list
buildroot@buildroot.org
https://lists.buildroot.org/mailman/listinfo/buildroot

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Buildroot] [PATCH 4/4] support/scripts/pkg-stats: reimplement CPE parsing in pkg-stats
  2022-04-02 14:15 [Buildroot] [PATCH 1/4] support/script/pkg-stats: allow disabling CPE matching Thomas Petazzoni via buildroot
  2022-04-02 14:15 ` [Buildroot] [PATCH 2/4] support/scripts/pkg-stats: allow disabling package warnings retrieval Thomas Petazzoni via buildroot
  2022-04-02 14:15 ` [Buildroot] [PATCH 3/4] support/scripts/pkg-stats: add a timeout on HTTP requests for upstream URLs Thomas Petazzoni via buildroot
@ 2022-04-02 14:15 ` Thomas Petazzoni via buildroot
  2022-04-02 14:17   ` Thomas Petazzoni via buildroot
                     ` (2 more replies)
  2022-04-02 14:42 ` [Buildroot] [PATCH 1/4] support/script/pkg-stats: allow disabling CPE matching Yann E. MORIN
  2022-04-04 12:40 ` Peter Korsgaard
  4 siblings, 3 replies; 12+ messages in thread
From: Thomas Petazzoni via buildroot @ 2022-04-02 14:15 UTC (permalink / raw)
  To: buildroot; +Cc: Thomas Petazzoni

pkg-stats currently uses the services from support/scripts/cpedb.py to
match the CPE identifiers of packages with the official CPE database.

Unfortunately, the cpedb.py code uses regular ElementTree parsing,
which involves loading the full XML tree into memory. This causes the
pkg-stats process to consume a huge amount of memory:

thomas   1310458 85.2 21.4 3708952 3450164 pts/5 R+   16:04   0:33  |   |   \_ python3 ./support/scripts/pkg-stats

So, 3.7 GB of VSZ and 3.4 GB of RSS are used by the pkg-stats
process. This is causing the OOM killer to kick-in on machines with
relatively low memory.

This commit reimplements the XML parsing needed to do the CPE matching
directly in pkg-stats, using the XmlParser functionality of
ElementTree, also called "streaming parsing". Thanks to this, we never
load the entire XML tree in RAM, but only stream it through the
parser, and construct a very simple list of all CPE identifiers. The
max memory consumption of pkg-stats is now:

thomas   1317511 74.2  0.9 381104 152224 pts/5   R+   16:08   0:17  |   |   \_ python3 ./support/scripts/pkg-stats

So, 381 MB of VSZ and 152 MB of RSS, which is obviously much better.

Now, one will probably wonder why this isn't directly changed in
cpedb.py. The reason is simple: cpedb.py is also used by
support/scripts/missing-cpe, which (for now) heavily relies on having
in memory the ElementTree objects, to re-generate a snippet of XML
that allows us to submit to NIST new CPE entries.

So, future work could include one of those two options:

 (1) Re-integrate cpedb.py into missing-cpe directly, and live with
     two different ways of processing the CPE database.

 (2) Rewrite the missing-cpe logic to also be compatible with a
     streaming parsing, which would allow this logic to be again
     shared between pkg-stats and missing-cpe.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
---
 support/scripts/pkg-stats | 39 +++++++++++++++++++++++++++++++++++----
 1 file changed, 35 insertions(+), 4 deletions(-)

diff --git a/support/scripts/pkg-stats b/support/scripts/pkg-stats
index ae1a9aa5e4..cc163ebb1a 100755
--- a/support/scripts/pkg-stats
+++ b/support/scripts/pkg-stats
@@ -27,12 +27,14 @@ import re
 import subprocess
 import json
 import sys
+import time
+import gzip
+import xml.etree.ElementTree
 
 brpath = os.path.normpath(os.path.join(os.path.dirname(__file__), "..", ".."))
 
 sys.path.append(os.path.join(brpath, "utils"))
 from getdeveloperlib import parse_developers  # noqa: E402
-from cpedb import CPEDB  # noqa: E402
 
 INFRA_RE = re.compile(r"\$\(eval \$\(([a-z-]*)-package\)\)")
 URL_RE = re.compile(r"\s*https?://\S*\s*$")
@@ -42,6 +44,7 @@ RM_API_STATUS_FOUND_BY_DISTRO = 2
 RM_API_STATUS_FOUND_BY_PATTERN = 3
 RM_API_STATUS_NOT_FOUND = 4
 
+CPEDB_URL = "https://static.nvd.nist.gov/feeds/xml/cpe/dictionary/official-cpe-dictionary_v2.3.xml.gz"
 
 class Defconfig:
     def __init__(self, name, path):
@@ -624,12 +627,40 @@ def check_package_cves(nvd_path, packages):
 
 
 def check_package_cpes(nvd_path, packages):
-    cpedb = CPEDB(nvd_path)
-    cpedb.get_xml_dict()
+    class CpeXmlParser:
+        cpes = []
+        def start(self, tag, attrib):
+            if tag == "{http://scap.nist.gov/schema/cpe-extension/2.3}cpe23-item":
+                self.cpes.append(attrib['name'])
+        def close(self):
+            return self.cpes
+
+
+    print("CPE: Setting up NIST dictionary")
+    if not os.path.exists(os.path.join(nvd_path, "cpe")):
+        os.makedirs(os.path.join(nvd_path, "cpe"))
+
+    cpe_dict_local = os.path.join(nvd_path, "cpe", os.path.basename(CPEDB_URL))
+    if not os.path.exists(cpe_dict_local) or os.stat(cpe_dict_local).st_mtime < time.time() - 86400:
+        print("CPE: Fetching xml manifest from [" + CPEDB_URL + "]")
+        cpe_dict = requests.get(CPEDB_URL)
+        open(cpe_dict_local, "wb").write(cpe_dict.content)
+
+    print("CPE: Unzipping xml manifest...")
+    nist_cpe_file = gzip.GzipFile(fileobj=open(cpe_dict_local, 'rb'))
+
+    parser = xml.etree.ElementTree.XMLParser(target=CpeXmlParser())
+    while True:
+        c = nist_cpe_file.read(1024*1024)
+        if not c:
+            break
+        parser.feed(c)
+    cpes = parser.close()
+
     for p in packages:
         if not p.cpeid:
             continue
-        if cpedb.find(p.cpeid):
+        if p.cpeid in cpes:
             p.status['cpe'] = ("ok", "verified CPE identifier")
         else:
             p.status['cpe'] = ("error", "CPE version unknown in CPE database")
-- 
2.35.1

_______________________________________________
buildroot mailing list
buildroot@buildroot.org
https://lists.buildroot.org/mailman/listinfo/buildroot

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [Buildroot] [PATCH 4/4] support/scripts/pkg-stats: reimplement CPE parsing in pkg-stats
  2022-04-02 14:15 ` [Buildroot] [PATCH 4/4] support/scripts/pkg-stats: reimplement CPE parsing in pkg-stats Thomas Petazzoni via buildroot
@ 2022-04-02 14:17   ` Thomas Petazzoni via buildroot
  2022-04-02 17:20   ` Yann E. MORIN
  2022-04-04 12:40   ` Peter Korsgaard
  2 siblings, 0 replies; 12+ messages in thread
From: Thomas Petazzoni via buildroot @ 2022-04-02 14:17 UTC (permalink / raw)
  To: buildroot

Hello,

On Sat,  2 Apr 2022 16:15:30 +0200
Thomas Petazzoni via buildroot <buildroot@buildroot.org> wrote:

> This commit reimplements the XML parsing needed to do the CPE matching
> directly in pkg-stats, using the XmlParser functionality of
> ElementTree, also called "streaming parsing". Thanks to this, we never
> load the entire XML tree in RAM, but only stream it through the
> parser, and construct a very simple list of all CPE identifiers. The
> max memory consumption of pkg-stats is now:

[...]

I forgot to mention that the JSON output of pkg-stats for the full
package set, before and after this commit, is exactly identical.

Thomas
-- 
Thomas Petazzoni, co-owner and CEO, Bootlin
Embedded Linux and Kernel engineering and training
https://bootlin.com
_______________________________________________
buildroot mailing list
buildroot@buildroot.org
https://lists.buildroot.org/mailman/listinfo/buildroot

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Buildroot] [PATCH 1/4] support/script/pkg-stats: allow disabling CPE matching
  2022-04-02 14:15 [Buildroot] [PATCH 1/4] support/script/pkg-stats: allow disabling CPE matching Thomas Petazzoni via buildroot
                   ` (2 preceding siblings ...)
  2022-04-02 14:15 ` [Buildroot] [PATCH 4/4] support/scripts/pkg-stats: reimplement CPE parsing in pkg-stats Thomas Petazzoni via buildroot
@ 2022-04-02 14:42 ` Yann E. MORIN
  2022-04-04 12:40 ` Peter Korsgaard
  4 siblings, 0 replies; 12+ messages in thread
From: Yann E. MORIN @ 2022-04-02 14:42 UTC (permalink / raw)
  To: Thomas Petazzoni; +Cc: buildroot

Thomas, All,

On 2022-04-02 16:15 +0200, Thomas Petazzoni via buildroot spake thusly:
> This is useful when debugging/developing the pkg-stats script.
> 
> Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>

Patches 1-3 applied to master, thanks.

I'll resume looking at the 4th patch later.

Regards,
Yann E. MORIN.

> ---
>  support/scripts/pkg-stats | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/support/scripts/pkg-stats b/support/scripts/pkg-stats
> index 8cc64a54d1..ef9482ed95 100755
> --- a/support/scripts/pkg-stats
> +++ b/support/scripts/pkg-stats
> @@ -1125,7 +1125,7 @@ def parse_args():
>      parser.add_argument('--nvd-path', dest='nvd_path',
>                          help='Path to the local NVD database', type=resolvepath)
>      parser.add_argument('--disable', type=list_str,
> -                        help='Features to disable, comma-separated (cve, upstream, url)',
> +                        help='Features to disable, comma-separated (cve, upstream, url, cpe)',
>                          default=[])
>      args = parser.parse_args()
>      if not args.html and not args.json:
> @@ -1184,6 +1184,8 @@ def __main__():
>      if "cve" not in args.disable and args.nvd_path:
>          print("Checking packages CVEs")
>          check_package_cves(args.nvd_path, packages)
> +    if "cpe" not in args.disable and args.nvd_path:
> +        print("Checking packages CPEs")
>          check_package_cpes(args.nvd_path, packages)
>      print("Calculate stats")
>      stats = calculate_stats(packages)
> -- 
> 2.35.1
> 
> _______________________________________________
> buildroot mailing list
> buildroot@buildroot.org
> https://lists.buildroot.org/mailman/listinfo/buildroot

-- 
.-----------------.--------------------.------------------.--------------------.
|  Yann E. MORIN  | Real-Time Embedded | /"\ ASCII RIBBON | Erics' conspiracy: |
| +33 662 376 056 | Software  Designer | \ / CAMPAIGN     |  ___               |
| +33 561 099 427 `------------.-------:  X  AGAINST      |  \e/  There is no  |
| http://ymorin.is-a-geek.org/ | _/*\_ | / \ HTML MAIL    |   v   conspiracy.  |
'------------------------------^-------^------------------^--------------------'
_______________________________________________
buildroot mailing list
buildroot@buildroot.org
https://lists.buildroot.org/mailman/listinfo/buildroot

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Buildroot] [PATCH 4/4] support/scripts/pkg-stats: reimplement CPE parsing in pkg-stats
  2022-04-02 14:15 ` [Buildroot] [PATCH 4/4] support/scripts/pkg-stats: reimplement CPE parsing in pkg-stats Thomas Petazzoni via buildroot
  2022-04-02 14:17   ` Thomas Petazzoni via buildroot
@ 2022-04-02 17:20   ` Yann E. MORIN
  2022-04-03  8:05     ` Thomas Petazzoni via buildroot
  2022-04-04 12:40   ` Peter Korsgaard
  2 siblings, 1 reply; 12+ messages in thread
From: Yann E. MORIN @ 2022-04-02 17:20 UTC (permalink / raw)
  To: Thomas Petazzoni; +Cc: buildroot

Thomas, All,

On 2022-04-02 16:15 +0200, Thomas Petazzoni via buildroot spake thusly:
> pkg-stats currently uses the services from support/scripts/cpedb.py to
> match the CPE identifiers of packages with the official CPE database.
> 
> Unfortunately, the cpedb.py code uses regular ElementTree parsing,
> which involves loading the full XML tree into memory. This causes the
> pkg-stats process to consume a huge amount of memory:
> 
> thomas   1310458 85.2 21.4 3708952 3450164 pts/5 R+   16:04   0:33  |   |   \_ python3 ./support/scripts/pkg-stats
> 
> So, 3.7 GB of VSZ and 3.4 GB of RSS are used by the pkg-stats
> process. This is causing the OOM killer to kick-in on machines with
> relatively low memory.
> 
> This commit reimplements the XML parsing needed to do the CPE matching
> directly in pkg-stats, using the XmlParser functionality of
> ElementTree, also called "streaming parsing". Thanks to this, we never
> load the entire XML tree in RAM, but only stream it through the
> parser, and construct a very simple list of all CPE identifiers. The
> max memory consumption of pkg-stats is now:
> 
> thomas   1317511 74.2  0.9 381104 152224 pts/5   R+   16:08   0:17  |   |   \_ python3 ./support/scripts/pkg-stats
> 
> So, 381 MB of VSZ and 152 MB of RSS, which is obviously much better.
> 
> Now, one will probably wonder why this isn't directly changed in
> cpedb.py. The reason is simple: cpedb.py is also used by
> support/scripts/missing-cpe, which (for now) heavily relies on having
> in memory the ElementTree objects, to re-generate a snippet of XML
> that allows us to submit to NIST new CPE entries.
> 
> So, future work could include one of those two options:
> 
>  (1) Re-integrate cpedb.py into missing-cpe directly, and live with
>      two different ways of processing the CPE database.
> 
>  (2) Rewrite the missing-cpe logic to also be compatible with a
>      streaming parsing, which would allow this logic to be again
>      shared between pkg-stats and missing-cpe.
> 
> Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
> ---
>  support/scripts/pkg-stats | 39 +++++++++++++++++++++++++++++++++++----
>  1 file changed, 35 insertions(+), 4 deletions(-)
> 
> diff --git a/support/scripts/pkg-stats b/support/scripts/pkg-stats
> index ae1a9aa5e4..cc163ebb1a 100755
> --- a/support/scripts/pkg-stats
> +++ b/support/scripts/pkg-stats
> @@ -27,12 +27,14 @@ import re
>  import subprocess
>  import json
>  import sys
> +import time
> +import gzip
> +import xml.etree.ElementTree

You for to import requests, which is used later on.

I also fixed a bunch of flake8 issues:

    support/scripts/pkg-stats:49:1: E302 expected 2 blank lines, found 1
    support/scripts/pkg-stats:632:9: E306 expected 1 blank line before a nested definition, found 0
    support/scripts/pkg-stats:635:9: E306 expected 1 blank line before a nested definition, found 0
    support/scripts/pkg-stats:639:5: E303 too many blank lines (2)
    1     E302 expected 2 blank lines, found 1
    1     E303 too many blank lines (2)
    2     E306 expected 1 blank line before a nested definition, found 0

>  brpath = os.path.normpath(os.path.join(os.path.dirname(__file__), "..", ".."))
>  
>  sys.path.append(os.path.join(brpath, "utils"))
>  from getdeveloperlib import parse_developers  # noqa: E402
> -from cpedb import CPEDB  # noqa: E402
>  
>  INFRA_RE = re.compile(r"\$\(eval \$\(([a-z-]*)-package\)\)")
>  URL_RE = re.compile(r"\s*https?://\S*\s*$")
> @@ -42,6 +44,7 @@ RM_API_STATUS_FOUND_BY_DISTRO = 2
>  RM_API_STATUS_FOUND_BY_PATTERN = 3
>  RM_API_STATUS_NOT_FOUND = 4
>  
> +CPEDB_URL = "https://static.nvd.nist.gov/feeds/xml/cpe/dictionary/official-cpe-dictionary_v2.3.xml.gz"

Instead of duplicating it here, I changed that to import it from cpedb.

Applied to master with all the aboved fixed, thanks.

Regards,
Yann E. MORIN.

>  class Defconfig:
>      def __init__(self, name, path):
> @@ -624,12 +627,40 @@ def check_package_cves(nvd_path, packages):
>  
>  
>  def check_package_cpes(nvd_path, packages):
> -    cpedb = CPEDB(nvd_path)
> -    cpedb.get_xml_dict()
> +    class CpeXmlParser:
> +        cpes = []
> +        def start(self, tag, attrib):
> +            if tag == "{http://scap.nist.gov/schema/cpe-extension/2.3}cpe23-item":
> +                self.cpes.append(attrib['name'])
> +        def close(self):
> +            return self.cpes
> +
> +
> +    print("CPE: Setting up NIST dictionary")
> +    if not os.path.exists(os.path.join(nvd_path, "cpe")):
> +        os.makedirs(os.path.join(nvd_path, "cpe"))
> +
> +    cpe_dict_local = os.path.join(nvd_path, "cpe", os.path.basename(CPEDB_URL))
> +    if not os.path.exists(cpe_dict_local) or os.stat(cpe_dict_local).st_mtime < time.time() - 86400:
> +        print("CPE: Fetching xml manifest from [" + CPEDB_URL + "]")
> +        cpe_dict = requests.get(CPEDB_URL)
> +        open(cpe_dict_local, "wb").write(cpe_dict.content)
> +
> +    print("CPE: Unzipping xml manifest...")
> +    nist_cpe_file = gzip.GzipFile(fileobj=open(cpe_dict_local, 'rb'))
> +
> +    parser = xml.etree.ElementTree.XMLParser(target=CpeXmlParser())
> +    while True:
> +        c = nist_cpe_file.read(1024*1024)
> +        if not c:
> +            break
> +        parser.feed(c)
> +    cpes = parser.close()
> +
>      for p in packages:
>          if not p.cpeid:
>              continue
> -        if cpedb.find(p.cpeid):
> +        if p.cpeid in cpes:
>              p.status['cpe'] = ("ok", "verified CPE identifier")
>          else:
>              p.status['cpe'] = ("error", "CPE version unknown in CPE database")
> -- 
> 2.35.1
> 
> _______________________________________________
> buildroot mailing list
> buildroot@buildroot.org
> https://lists.buildroot.org/mailman/listinfo/buildroot

-- 
.-----------------.--------------------.------------------.--------------------.
|  Yann E. MORIN  | Real-Time Embedded | /"\ ASCII RIBBON | Erics' conspiracy: |
| +33 662 376 056 | Software  Designer | \ / CAMPAIGN     |  ___               |
| +33 561 099 427 `------------.-------:  X  AGAINST      |  \e/  There is no  |
| http://ymorin.is-a-geek.org/ | _/*\_ | / \ HTML MAIL    |   v   conspiracy.  |
'------------------------------^-------^------------------^--------------------'
_______________________________________________
buildroot mailing list
buildroot@buildroot.org
https://lists.buildroot.org/mailman/listinfo/buildroot

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Buildroot] [PATCH 4/4] support/scripts/pkg-stats: reimplement CPE parsing in pkg-stats
  2022-04-02 17:20   ` Yann E. MORIN
@ 2022-04-03  8:05     ` Thomas Petazzoni via buildroot
  0 siblings, 0 replies; 12+ messages in thread
From: Thomas Petazzoni via buildroot @ 2022-04-03  8:05 UTC (permalink / raw)
  To: Yann E. MORIN; +Cc: buildroot

Hello Yann,

On Sat, 2 Apr 2022 19:20:12 +0200
"Yann E. MORIN" <yann.morin.1998@free.fr> wrote:

> > diff --git a/support/scripts/pkg-stats b/support/scripts/pkg-stats
> > index ae1a9aa5e4..cc163ebb1a 100755
> > --- a/support/scripts/pkg-stats
> > +++ b/support/scripts/pkg-stats
> > @@ -27,12 +27,14 @@ import re
> >  import subprocess
> >  import json
> >  import sys
> > +import time
> > +import gzip
> > +import xml.etree.ElementTree  
> 
> You for to import requests, which is used later on.

I suppose s/for/forgot/ ? But then how it could have worked for me? Huh.

> I also fixed a bunch of flake8 issues:

Ah, gah, forgot once again to run flake8, sorry about that.


> > +CPEDB_URL = "https://static.nvd.nist.gov/feeds/xml/cpe/dictionary/official-cpe-dictionary_v2.3.xml.gz"  
> 
> Instead of duplicating it here, I changed that to import it from cpedb.

ACK.

> Applied to master with all the aboved fixed, thanks.

Many thanks. Peter, could you backport those patches, or at least PATCH
3/4 and 4/4 to all stable branches still in activity?

Indeed, thanks to this fix, the pkg-stats run of this morning on the
master branch worked fine, but it failed miserably on the 2021.02.x
branch.

Thanks!

Thomas
-- 
Thomas Petazzoni, co-owner and CEO, Bootlin
Embedded Linux and Kernel engineering and training
https://bootlin.com
_______________________________________________
buildroot mailing list
buildroot@buildroot.org
https://lists.buildroot.org/mailman/listinfo/buildroot

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Buildroot] [PATCH 1/4] support/script/pkg-stats: allow disabling CPE matching
  2022-04-02 14:15 [Buildroot] [PATCH 1/4] support/script/pkg-stats: allow disabling CPE matching Thomas Petazzoni via buildroot
                   ` (3 preceding siblings ...)
  2022-04-02 14:42 ` [Buildroot] [PATCH 1/4] support/script/pkg-stats: allow disabling CPE matching Yann E. MORIN
@ 2022-04-04 12:40 ` Peter Korsgaard
  4 siblings, 0 replies; 12+ messages in thread
From: Peter Korsgaard @ 2022-04-04 12:40 UTC (permalink / raw)
  To: Thomas Petazzoni via buildroot; +Cc: Thomas Petazzoni

>>>>> "Thomas" == Thomas Petazzoni via buildroot <buildroot@buildroot.org> writes:

 > This is useful when debugging/developing the pkg-stats script.
 > Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>

Committed to 2021.02.x and 2022.02.x, thanks.

-- 
Bye, Peter Korsgaard
_______________________________________________
buildroot mailing list
buildroot@buildroot.org
https://lists.buildroot.org/mailman/listinfo/buildroot

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Buildroot] [PATCH 3/4] support/scripts/pkg-stats: add a timeout on HTTP requests for upstream URLs
  2022-04-02 14:15 ` [Buildroot] [PATCH 3/4] support/scripts/pkg-stats: add a timeout on HTTP requests for upstream URLs Thomas Petazzoni via buildroot
@ 2022-04-04 12:40   ` Peter Korsgaard
  0 siblings, 0 replies; 12+ messages in thread
From: Peter Korsgaard @ 2022-04-04 12:40 UTC (permalink / raw)
  To: Thomas Petazzoni via buildroot; +Cc: Thomas Petazzoni

>>>>> "Thomas" == Thomas Petazzoni via buildroot <buildroot@buildroot.org> writes:

 > Some upstream sites are very slow to respond, and the default timeout
 > of 300 seconds of the aiohttp.ClientSession() is too long. Let's
 > reduce it to 15 seconds.

 > Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>

Committed to 2021.02.x and 2022.02.x, thanks.

-- 
Bye, Peter Korsgaard
_______________________________________________
buildroot mailing list
buildroot@buildroot.org
https://lists.buildroot.org/mailman/listinfo/buildroot

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Buildroot] [PATCH 2/4] support/scripts/pkg-stats: allow disabling package warnings retrieval
  2022-04-02 14:15 ` [Buildroot] [PATCH 2/4] support/scripts/pkg-stats: allow disabling package warnings retrieval Thomas Petazzoni via buildroot
@ 2022-04-04 12:40   ` Peter Korsgaard
  0 siblings, 0 replies; 12+ messages in thread
From: Peter Korsgaard @ 2022-04-04 12:40 UTC (permalink / raw)
  To: Thomas Petazzoni via buildroot; +Cc: Thomas Petazzoni

>>>>> "Thomas" == Thomas Petazzoni via buildroot <buildroot@buildroot.org> writes:

 > This is useful when debugging/developing the pkg-stats script.
 > Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>

Committed to 2021.02.x and 2022.02.x, thanks.

-- 
Bye, Peter Korsgaard
_______________________________________________
buildroot mailing list
buildroot@buildroot.org
https://lists.buildroot.org/mailman/listinfo/buildroot

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Buildroot] [PATCH 4/4] support/scripts/pkg-stats: reimplement CPE parsing in pkg-stats
  2022-04-02 14:15 ` [Buildroot] [PATCH 4/4] support/scripts/pkg-stats: reimplement CPE parsing in pkg-stats Thomas Petazzoni via buildroot
  2022-04-02 14:17   ` Thomas Petazzoni via buildroot
  2022-04-02 17:20   ` Yann E. MORIN
@ 2022-04-04 12:40   ` Peter Korsgaard
  2 siblings, 0 replies; 12+ messages in thread
From: Peter Korsgaard @ 2022-04-04 12:40 UTC (permalink / raw)
  To: Thomas Petazzoni via buildroot; +Cc: Thomas Petazzoni

>>>>> "Thomas" == Thomas Petazzoni via buildroot <buildroot@buildroot.org> writes:

 > pkg-stats currently uses the services from support/scripts/cpedb.py to
 > match the CPE identifiers of packages with the official CPE database.

 > Unfortunately, the cpedb.py code uses regular ElementTree parsing,
 > which involves loading the full XML tree into memory. This causes the
 > pkg-stats process to consume a huge amount of memory:

 > thomas   1310458 85.2 21.4 3708952 3450164 pts/5 R+   16:04   0:33  |   |   \_ python3 ./support/scripts/pkg-stats

 > So, 3.7 GB of VSZ and 3.4 GB of RSS are used by the pkg-stats
 > process. This is causing the OOM killer to kick-in on machines with
 > relatively low memory.

 > This commit reimplements the XML parsing needed to do the CPE matching
 > directly in pkg-stats, using the XmlParser functionality of
 > ElementTree, also called "streaming parsing". Thanks to this, we never
 > load the entire XML tree in RAM, but only stream it through the
 > parser, and construct a very simple list of all CPE identifiers. The
 > max memory consumption of pkg-stats is now:

 > thomas   1317511 74.2  0.9 381104 152224 pts/5   R+   16:08   0:17  |   |   \_ python3 ./support/scripts/pkg-stats

 > So, 381 MB of VSZ and 152 MB of RSS, which is obviously much better.

 > Now, one will probably wonder why this isn't directly changed in
 > cpedb.py. The reason is simple: cpedb.py is also used by
 > support/scripts/missing-cpe, which (for now) heavily relies on having
 > in memory the ElementTree objects, to re-generate a snippet of XML
 > that allows us to submit to NIST new CPE entries.

 > So, future work could include one of those two options:

 >  (1) Re-integrate cpedb.py into missing-cpe directly, and live with
 >      two different ways of processing the CPE database.

 >  (2) Rewrite the missing-cpe logic to also be compatible with a
 >      streaming parsing, which would allow this logic to be again
 >      shared between pkg-stats and missing-cpe.

 > Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>

Committed to 2021.02.x and 2022.02.x, thanks.

-- 
Bye, Peter Korsgaard
_______________________________________________
buildroot mailing list
buildroot@buildroot.org
https://lists.buildroot.org/mailman/listinfo/buildroot

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-04-04 12:40 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-02 14:15 [Buildroot] [PATCH 1/4] support/script/pkg-stats: allow disabling CPE matching Thomas Petazzoni via buildroot
2022-04-02 14:15 ` [Buildroot] [PATCH 2/4] support/scripts/pkg-stats: allow disabling package warnings retrieval Thomas Petazzoni via buildroot
2022-04-04 12:40   ` Peter Korsgaard
2022-04-02 14:15 ` [Buildroot] [PATCH 3/4] support/scripts/pkg-stats: add a timeout on HTTP requests for upstream URLs Thomas Petazzoni via buildroot
2022-04-04 12:40   ` Peter Korsgaard
2022-04-02 14:15 ` [Buildroot] [PATCH 4/4] support/scripts/pkg-stats: reimplement CPE parsing in pkg-stats Thomas Petazzoni via buildroot
2022-04-02 14:17   ` Thomas Petazzoni via buildroot
2022-04-02 17:20   ` Yann E. MORIN
2022-04-03  8:05     ` Thomas Petazzoni via buildroot
2022-04-04 12:40   ` Peter Korsgaard
2022-04-02 14:42 ` [Buildroot] [PATCH 1/4] support/script/pkg-stats: allow disabling CPE matching Yann E. MORIN
2022-04-04 12:40 ` Peter Korsgaard

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.