All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ricardo Martincoski <ricardo.martincoski@gmail.com>
To: buildroot@busybox.net
Subject: [Buildroot] [PATCH v4 5/5] support/scripts/pkgstats: add CPE reporting
Date: Fri, 18 May 2018 00:07:27 -0300	[thread overview]
Message-ID: <5afe436faeb5c_409b3f9b640c8a8410015a@ultri5.mail> (raw)
In-Reply-To: bb3dfdb4-6479-41c0-1ca0-0bd68225a7b7@mind.be

Hello,

On Wed, May 16, 2018 at 08:32 PM, Arnout Vandecappelle wrote:
> On 16-05-18 05:43, Ricardo Martincoski wrote:

[snip]
>> @@ -569,4 +569,5 @@ class CPE:
>>              cpe_file = gzip.GzipFile(fileobj=StringIO(compressed_cpe_file.read())).read()
>> -            print("CPE: Converting xml manifest to dict...")
>> -            self.all_cpes = xmltodict.parse(cpe_file)
>> +            print("CPE: Converting xml manifest to list...")
>> +            tree = ET.fromstring(cpe_file)
>> +            self.all_cpes = [i.get('name') for i in tree.iter('{http://scap.nist.gov/schema/cpe-extension/2.3}cpe23-item')]
> 
>  So after this you get basically the same as after comparison patch 1, right? So
> the xmltodict takes 4 minutes? Or am I missing something?

No. I missed something important and jumped to wrong conclusions.

After adding some simple instrumentation code to display relative timestamps,
the main difference in performance is *not* related to the xml parser used but
it is related to the code used for find() and find_partial().

I didn't performed further testings but it seems related to the use of
startswith as you said.

patch 1:
0:00:00.001015 CPE: Fetching xml manifest...
0:00:03.924777 CPE: Unzipping xml manifest...
0:00:11.672462 CPE: Converting xml manifest to list...
0:00:11.672504 before xmltodict.parse
0:00:36.343417 before append
0:00:36.462400 list created
0:00:36.738042 Build package list ...
0:00:36.875742 Getting package make info ...
0:00:58.543116 Getting package details ...
0:01:00.016925 BR Infra Not building CPE for pkg: [UBOOT]
0:01:07.714094 BR Infra Not building CPE for pkg: [IMX_USB_LOADER]
...
0:08:00.615649 BR Infra Not building CPE for pkg: [INTLTOOL]
0:08:01.243667 BR Infra Not building CPE for pkg: [DOXYGEN]
0:08:02.035463 Calculate stats
0:08:02.042401 Write HTML

patch 2:
0:00:00.000889 CPE: Fetching xml manifest...
0:00:03.640856 CPE: Unzipping xml manifest...
0:00:14.569496 CPE: Converting xml manifest to list...
0:00:14.569541 before ET.fromstring
0:00:21.325842 before list comprehension
0:00:21.609946 list created
0:00:21.612443 Build package list ...
0:00:21.754223 Getting package make info ...
0:00:43.111196 Getting package details ...
0:00:43.828047 BR Infra Not building CPE for pkg: [UBOOT]
0:00:47.125995 BR Infra Not building CPE for pkg: [IMX_USB_LOADER]
...
0:03:46.279893 BR Infra Not building CPE for pkg: [INTLTOOL]
0:03:46.571266 BR Infra Not building CPE for pkg: [DOXYGEN]
0:03:46.892839 Calculate stats
0:03:46.895765 Write HTML

>  Oh, actually, the [... for ... iter...] is also more efficient than
> for...: append() so that could be an effect here as well. But this part of the
> code is only O(#cpe packages) so it shouldn't have that much impact.
> 
>>          except urllib2.HTTPError:package
>> @@ -580,5 +581,5 @@ class CPE:
>>          print("CPE: Searching for partial [%s]" % cpe_str)
>> -        for cpe in self.all_cpes['cpe-list']['cpe-item']:
>> -            if cpe_str in cpe['cpe-23:cpe23-item']['@name']:
>> -                return cpe['cpe-23:cpe23-item']['@name']
>> +        for cpe in self.all_cpes:
>> +            if cpe.startswith(cpe_str):
> 
>  Originally it was 'in' instead of startswith(). Obviously startswith() will be
> more efficient. And also more correct, I guess, or does the partial match not
[snip]

Regards,
Ricardo

  parent reply	other threads:[~2018-05-18  3:07 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-10 18:58 [Buildroot] [PATCH v4 0/5] CPE ID Support Matt Weber
2018-05-10 18:58 ` [Buildroot] [PATCH v4 1/5] cpe-info: new make target Matt Weber
2018-05-10 18:58 ` [Buildroot] [PATCH v4 2/5] cpe-info: id prefix/suffix Matt Weber
2018-05-10 18:58 ` [Buildroot] [PATCH v4 3/5] cpe-info: only report target pkgs Matt Weber
2018-05-10 18:58 ` [Buildroot] [PATCH v4 4/5] cpe-info: update manual for new pkg vars Matt Weber
2018-05-10 18:58 ` [Buildroot] [PATCH v4 5/5] support/scripts/pkgstats: add CPE reporting Matt Weber
2018-05-16  3:43   ` Ricardo Martincoski
2018-05-16 23:32     ` Arnout Vandecappelle
2018-05-17  1:42       ` Matthew Weber
2018-05-18  3:16         ` Ricardo Martincoski
2018-05-18  3:21           ` Matthew Weber
2018-05-18  3:44             ` Ricardo Martincoski
2018-05-18 13:07               ` Matthew Weber
2018-05-18  3:07       ` Ricardo Martincoski [this message]
2018-05-18  3:18         ` Matthew Weber
2018-05-16 23:34   ` Arnout Vandecappelle
2018-05-17  1:42     ` Matthew Weber

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5afe436faeb5c_409b3f9b640c8a8410015a@ultri5.mail \
    --to=ricardo.martincoski@gmail.com \
    --cc=buildroot@busybox.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.