All of lore.kernel.org
 help / color / mirror / Atom feed
* [Buildroot] [PATCH v4 0/5] CPE ID Support
@ 2018-05-10 18:58 Matt Weber
  2018-05-10 18:58 ` [Buildroot] [PATCH v4 1/5] cpe-info: new make target Matt Weber
                   ` (4 more replies)
  0 siblings, 5 replies; 17+ messages in thread
From: Matt Weber @ 2018-05-10 18:58 UTC (permalink / raw)
  To: buildroot

This series begins adding CPE identifier support to Buildroot. The
intent is to establish and maintain a baseline of CPE IDs, one for each
package. Each of these IDs ties back to a NIST database entry for the
respective piece of software, which is linked to specific vunderabilities.

Within Buildroot, a CPE report can be generated (like legal-info) that
captures a target build's list of CPE IDs. This report can then be
checked for validity using the pkgstat script or another third party tool.

The pkgstats script has been extended to provide CPE ID checking of
matching/requires update/new as part of its html output.

As part of testing this series, the following branch contains a series
of fixups required to make these specific packages match the database.
(I can submit these to the mailing list but there are ~70 of them)

https://github.com/rc-matthew-l-weber/buildroot/tree/cpe-info-github
Commit 14c3ee6 to 567732d

A follow-on patchset will be submitted adding support for pkgstat generation
of CPE updates in XML and Buildroot manual updates for guidance on submission
of those XML database updates to the NIST organization. (We'd like to get
feedback on this series first to save us effort on the update XML stuff)

Matt Weber (5):
  cpe-info: new make target
  cpe-info: id prefix/suffix
  cpe-info: only report target pkgs
  cpe-info: update manual for new pkg vars
  support/scripts/pkgstats: add CPE reporting

 Makefile                                |  17 ++-
 docs/manual/adding-packages-generic.txt | 117 ++++++++++++--------
 package/Makefile.in                     |   4 +
 package/pkg-generic.mk                  |  21 ++++
 package/pkg-utils.mk                    |   8 ++
 support/scripts/pkg-stats               | 188 +++++++++++++++++++++++++++++---
 6 files changed, 293 insertions(+), 62 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Buildroot] [PATCH v4 1/5] cpe-info: new make target
  2018-05-10 18:58 [Buildroot] [PATCH v4 0/5] CPE ID Support Matt Weber
@ 2018-05-10 18:58 ` Matt Weber
  2018-05-10 18:58 ` [Buildroot] [PATCH v4 2/5] cpe-info: id prefix/suffix Matt Weber
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 17+ messages in thread
From: Matt Weber @ 2018-05-10 18:58 UTC (permalink / raw)
  To: buildroot

Similar to make legal-info, produce a csv delimited file containing
all selected packages CPE identification.

Have the pkg infra define CPE_ID_* defaults using the package name
for the vendor and name as most CPE IDs seem to align with that
assumption. Also use the pkg version as the CPE ID's version field.

Signed-off-by: Matt Weber <matthew.weber@rockwellcollins.com>
---
Changes
v2
[Thomas P
 - Moved comment on conditionals back to this patchset where
   the conditional is created vs later

v3
[Thomas P
 - Merged infra define CPE_ID_*  into this patch
 - Report all packages vs restricting to just allowing based on if
   the VENDOR was set (v2). This now represents Thomas P's original
   idea to report everything.  At first I felt I should restrict
   the reporting to those CPE IDs we had made sure were correct.
   Turns out we should have actually let the script handle fixing
   the CPEs and just make a complete design of this up front.

[Matt
 - Moved to using the _project on all vendors instead of just name
---
 Makefile               | 17 ++++++++++++++++-
 package/pkg-generic.mk | 13 +++++++++++++
 package/pkg-utils.mk   |  8 ++++++++
 3 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index c024c65..71632bb 100644
--- a/Makefile
+++ b/Makefile
@@ -146,7 +146,7 @@ nobuild_targets := source %-source \
 	clean distclean help show-targets graph-depends \
 	%-graph-depends %-show-depends %-show-version \
 	graph-build graph-size list-defconfigs \
-	savedefconfig printvars
+	savedefconfig printvars cpe-info %-cpe-info
 ifeq ($(MAKECMDGOALS),)
 BR_BUILDING = y
 else ifneq ($(filter-out $(nobuild_targets),$(MAKECMDGOALS)),)
@@ -233,6 +233,7 @@ LEGAL_MANIFEST_CSV_TARGET = $(LEGAL_INFO_DIR)/manifest.csv
 LEGAL_MANIFEST_CSV_HOST = $(LEGAL_INFO_DIR)/host-manifest.csv
 LEGAL_WARNINGS = $(LEGAL_INFO_DIR)/.warnings
 LEGAL_REPORT = $(LEGAL_INFO_DIR)/README
+CPE_MANIFEST_CSV = $(BASE_DIR)/cpe-manifest.csv
 
 BR2_CONFIG = $(CONFIG_DIR)/.config
 
@@ -802,6 +803,19 @@ legal-info: dirs legal-info-clean legal-info-prepare $(foreach p,$(PACKAGES),$(p
 		mv .legal-info.sha256 legal-info.sha256)
 	@echo "Legal info produced in $(LEGAL_INFO_DIR)"
 
+.PHONY: cpe-info-clean
+cpe-info-clean:
+	@rm -f $(CPE_MANIFEST_CSV)
+
+.PHONY: cpe-info-prepare
+cpe-info-prepare:
+	@$(call MESSAGE,"Gathering CPE info")
+	@$(call cpe-manifest,CPE ID,CVE PATCHED,PACKAGE,VERSION,SOURCE SITE)
+
+.PHONY: cpe-info
+cpe-info: cpe-info-clean cpe-info-prepare $(foreach p,$(PACKAGES),$(p)-cpe-info)
+	@echo "CPE info produced in $(CPE_MANIFEST_CSV)"
+
 .PHONY: show-targets
 show-targets:
 	@echo $(sort $(PACKAGES)) $(sort $(TARGETS_ROOTFS))
@@ -1070,6 +1084,7 @@ help:
 	@echo '  source                 - download all sources needed for offline-build'
 	@echo '  external-deps          - list external packages used'
 	@echo '  legal-info             - generate info about license compliance'
+	@echo '  cpe-info               - generate info about security CPE identification'
 	@echo '  printvars              - dump all the internal variables'
 	@echo
 	@echo '  make V=0|1             - 0 => quiet build (default), 1 => verbose build'
diff --git a/package/pkg-generic.mk b/package/pkg-generic.mk
index 8a3b5f9..67ac436 100644
--- a/package/pkg-generic.mk
+++ b/package/pkg-generic.mk
@@ -861,6 +861,18 @@ else
 $(2)_KCONFIG_VAR = BR2_PACKAGE_$(2)
 endif
 
+$(2)_CPE_ID_VENDOR ?= $$($(2)_NAME)_project
+$(2)_CPE_ID_NAME ?= $$($(2)_NAME)
+$(2)_CPE_ID_VERSION ?= $$($(2)_VERSION)
+$(2)_CPE_ID ?= $$($(2)_CPE_ID_VENDOR):$$($(2)_CPE_ID_NAME):$$($(2)_CPE_ID_VERSION)
+
+$(1)-cpe-info: PKG=$(2)
+$(1)-cpe-info:
+ifneq ($$(call qstrip,$$($(2)_SOURCE)),)
+	@$$(call MESSAGE,"Collecting cpe info")
+	$(Q)$$(call cpe-manifest,$$($(2)_CPE_ID),$$($(2)_CVE_PATCHED),$$($(2)_RAWNAME),$$($(2)_VERSION),$$($(2)_ACTUAL_SOURCE_SITE))
+endif # ifneq ($$(call qstrip,$$($(2)_SOURCE)),)
+
 # legal-info: declare dependencies and set values used later for the manifest
 ifneq ($$($(2)_LICENSE_FILES),)
 $(2)_MANIFEST_LICENSE_FILES = $$($(2)_LICENSE_FILES)
@@ -1002,6 +1014,7 @@ DL_TOOLS_DEPENDENCIES += $$(call extractor-dependency,$$($(2)_SOURCE))
 	$(1)-clean-for-reconfigure \
 	$(1)-clean-for-reinstall \
 	$(1)-configure \
+	$(1)-cpe-info \
 	$(1)-depends \
 	$(1)-dirclean \
 	$(1)-external-deps \
diff --git a/package/pkg-utils.mk b/package/pkg-utils.mk
index c3acc22..11a2457 100644
--- a/package/pkg-utils.mk
+++ b/package/pkg-utils.mk
@@ -95,3 +95,11 @@ define legal-license-file # pkgname, pkgname-pkgver, pkgdir, filename, file-full
 	} && \
 	cp $(5) $(LICENSE_FILES_DIR_$(6))/$(2)/$(4)
 endef
+
+#
+# cpe-info helper functions
+#
+
+define cpe-manifest # cpe, cve patched, pkg name, version, url
+	echo '"$(1)","$(2)","$(3)","$(4)","$(5)"' >>$(CPE_MANIFEST_CSV)
+endef
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Buildroot] [PATCH v4 2/5] cpe-info: id prefix/suffix
  2018-05-10 18:58 [Buildroot] [PATCH v4 0/5] CPE ID Support Matt Weber
  2018-05-10 18:58 ` [Buildroot] [PATCH v4 1/5] cpe-info: new make target Matt Weber
@ 2018-05-10 18:58 ` Matt Weber
  2018-05-10 18:58 ` [Buildroot] [PATCH v4 3/5] cpe-info: only report target pkgs Matt Weber
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 17+ messages in thread
From: Matt Weber @ 2018-05-10 18:58 UTC (permalink / raw)
  To: buildroot

There are two types of software cpe prefixes, one for applications and
one for operating systems. Note: There is a third type for hardware.

This patchset determines which should be used and stores that
information with the package for later use when assembling the CPE
report.

There is also a suffix which we just default to wildcards at this
point.

Refs:
   https://nvlpubs.nist.gov/nistpubs/Legacy/IR/nistir7695.pdf
   https://cpe.mitre.org/specification/

Signed-off-by: Matthew Weber <matthew.weber@rockwellcollins.com>
---
Changes
v1 -> v2
[Thomas P
 - Change to using a filter on pkg name value vs ifelse

v3
[Arnout
 - Moved CPE prefix and suffix defines to package/Makefile.in
---
 package/Makefile.in    | 4 ++++
 package/pkg-generic.mk | 8 +++++++-
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/package/Makefile.in b/package/Makefile.in
index 4325f7b..ae69c4e 100644
--- a/package/Makefile.in
+++ b/package/Makefile.in
@@ -395,6 +395,10 @@ TARGET_CONFIGURE_ARGS = \
 
 ################################################################################
 
+CPE_PREFIX_OS = cpe:2.3:o
+CPE_PREFIX_APP = cpe:2.3:a
+CPE_SUFFIX = *:*:*:*:*:*:*
+
 ifeq ($(BR2_SYSTEM_ENABLE_NLS),y)
 NLS_OPTS = --enable-nls
 TARGET_NLS_DEPENDENCIES = host-gettext
diff --git a/package/pkg-generic.mk b/package/pkg-generic.mk
index 67ac436..b56fefa 100644
--- a/package/pkg-generic.mk
+++ b/package/pkg-generic.mk
@@ -866,11 +866,17 @@ $(2)_CPE_ID_NAME ?= $$($(2)_NAME)
 $(2)_CPE_ID_VERSION ?= $$($(2)_VERSION)
 $(2)_CPE_ID ?= $$($(2)_CPE_ID_VENDOR):$$($(2)_CPE_ID_NAME):$$($(2)_CPE_ID_VERSION)
 
+ifneq ($(filter linux linux-headers,$(1)),)
+$(2)_CPE_PREFIX = $(CPE_PREFIX_OS)
+else
+$(2)_CPE_PREFIX = $(CPE_PREFIX_APP)
+endif
+
 $(1)-cpe-info: PKG=$(2)
 $(1)-cpe-info:
 ifneq ($$(call qstrip,$$($(2)_SOURCE)),)
 	@$$(call MESSAGE,"Collecting cpe info")
-	$(Q)$$(call cpe-manifest,$$($(2)_CPE_ID),$$($(2)_CVE_PATCHED),$$($(2)_RAWNAME),$$($(2)_VERSION),$$($(2)_ACTUAL_SOURCE_SITE))
+	$(Q)$$(call cpe-manifest,$$($(2)_CPE_PREFIX):$$($(2)_CPE_ID):$(CPE_SUFFIX),$$($(2)_CVE_PATCHED),$$($(2)_RAWNAME),$$($(2)_VERSION),$$($(2)_ACTUAL_SOURCE_SITE))
 endif # ifneq ($$(call qstrip,$$($(2)_SOURCE)),)
 
 # legal-info: declare dependencies and set values used later for the manifest
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Buildroot] [PATCH v4 3/5] cpe-info: only report target pkgs
  2018-05-10 18:58 [Buildroot] [PATCH v4 0/5] CPE ID Support Matt Weber
  2018-05-10 18:58 ` [Buildroot] [PATCH v4 1/5] cpe-info: new make target Matt Weber
  2018-05-10 18:58 ` [Buildroot] [PATCH v4 2/5] cpe-info: id prefix/suffix Matt Weber
@ 2018-05-10 18:58 ` Matt Weber
  2018-05-10 18:58 ` [Buildroot] [PATCH v4 4/5] cpe-info: update manual for new pkg vars Matt Weber
  2018-05-10 18:58 ` [Buildroot] [PATCH v4 5/5] support/scripts/pkgstats: add CPE reporting Matt Weber
  4 siblings, 0 replies; 17+ messages in thread
From: Matt Weber @ 2018-05-10 18:58 UTC (permalink / raw)
  To: buildroot

The reporting of host packages causes some duplication and complicates
what is really in the targets configuration. For the purpose of the
first version of this patchset, its assumed that host packages aren't
relevant for the configuration and we only report the target's
contents.

Signed-off-by: Matthew Weber <matthew.weber@rockwellcollins.com>
---
Changes
v1 -> v2
[Thomas P
 - select if target vs selecting not host

v3
 - Fixed host build error because cpe-info wasn't defined
---
 package/pkg-generic.mk | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/package/pkg-generic.mk b/package/pkg-generic.mk
index b56fefa..5d77c83 100644
--- a/package/pkg-generic.mk
+++ b/package/pkg-generic.mk
@@ -874,10 +874,12 @@ endif
 
 $(1)-cpe-info: PKG=$(2)
 $(1)-cpe-info:
+ifeq ($$($(2)_TYPE),target)
 ifneq ($$(call qstrip,$$($(2)_SOURCE)),)
 	@$$(call MESSAGE,"Collecting cpe info")
 	$(Q)$$(call cpe-manifest,$$($(2)_CPE_PREFIX):$$($(2)_CPE_ID):$(CPE_SUFFIX),$$($(2)_CVE_PATCHED),$$($(2)_RAWNAME),$$($(2)_VERSION),$$($(2)_ACTUAL_SOURCE_SITE))
 endif # ifneq ($$(call qstrip,$$($(2)_SOURCE)),)
+endif # ifeq ($$($(2)_TYPE),target)
 
 # legal-info: declare dependencies and set values used later for the manifest
 ifneq ($$($(2)_LICENSE_FILES),)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Buildroot] [PATCH v4 4/5] cpe-info: update manual for new pkg vars
  2018-05-10 18:58 [Buildroot] [PATCH v4 0/5] CPE ID Support Matt Weber
                   ` (2 preceding siblings ...)
  2018-05-10 18:58 ` [Buildroot] [PATCH v4 3/5] cpe-info: only report target pkgs Matt Weber
@ 2018-05-10 18:58 ` Matt Weber
  2018-05-10 18:58 ` [Buildroot] [PATCH v4 5/5] support/scripts/pkgstats: add CPE reporting Matt Weber
  4 siblings, 0 replies; 17+ messages in thread
From: Matt Weber @ 2018-05-10 18:58 UTC (permalink / raw)
  To: buildroot

Provide guidance on setting up the *_CPE_* and *_CVE_* variables.

Signed-off-by: Matt Weber <matthew.weber@rockwellcollins.com>
---
Changes
v2
[Thomas P
 - Reworded LIBFOO_CVE_PATCHED description

[Matt W
 - Added definition for new preset variables to auto-gen the CPE ID
 - Added example LIBFOO_CPE_ID_VENDOR to LIBFOO

v3
 - Updated to make *_CPE_VENDOR optional
 - Changed wording around _CPE_ID as there is only one defined now
---
 docs/manual/adding-packages-generic.txt | 117 ++++++++++++++++++++------------
 1 file changed, 74 insertions(+), 43 deletions(-)

diff --git a/docs/manual/adding-packages-generic.txt b/docs/manual/adding-packages-generic.txt
index 7e1f246..45f0279 100644
--- a/docs/manual/adding-packages-generic.txt
+++ b/docs/manual/adding-packages-generic.txt
@@ -24,57 +24,59 @@ system is based on hand-written Makefiles or shell scripts.
 09: LIBFOO_SITE = http://www.foosoftware.org/download
 10: LIBFOO_LICENSE = GPL-3.0+
 11: LIBFOO_LICENSE_FILES = COPYING
-12: LIBFOO_INSTALL_STAGING = YES
-13: LIBFOO_CONFIG_SCRIPTS = libfoo-config
-14: LIBFOO_DEPENDENCIES = host-libaaa libbbb
-15:
-16: define LIBFOO_BUILD_CMDS
-17:	$(MAKE) $(TARGET_CONFIGURE_OPTS) -C $(@D) all
-18: endef
-19:
-20: define LIBFOO_INSTALL_STAGING_CMDS
-21:	$(INSTALL) -D -m 0755 $(@D)/libfoo.a $(STAGING_DIR)/usr/lib/libfoo.a
-22:	$(INSTALL) -D -m 0644 $(@D)/foo.h $(STAGING_DIR)/usr/include/foo.h
-23:	$(INSTALL) -D -m 0755 $(@D)/libfoo.so* $(STAGING_DIR)/usr/lib
-24: endef
-25:
-26: define LIBFOO_INSTALL_TARGET_CMDS
-27:	$(INSTALL) -D -m 0755 $(@D)/libfoo.so* $(TARGET_DIR)/usr/lib
-28:	$(INSTALL) -d -m 0755 $(TARGET_DIR)/etc/foo.d
-29: endef
-30:
-31: define LIBFOO_USERS
-32:	foo -1 libfoo -1 * - - - LibFoo daemon
-33: endef
-34:
-35: define LIBFOO_DEVICES
-36:	/dev/foo  c  666  0  0	42  0  -  -  -
-37: endef
-38:
-39: define LIBFOO_PERMISSIONS
-40:	/bin/foo  f  4755  foo  libfoo	 -  -  -  -  -
-41: endef
-42:
-43: $(eval $(generic-package))
+12: LIBFOO_CPE_ID_VENDOR = foosoftware
+13: LIBFOO_INSTALL_STAGING = YES
+14: LIBFOO_CONFIG_SCRIPTS = libfoo-config
+15: LIBFOO_DEPENDENCIES = host-libaaa libbbb
+16:
+17: define LIBFOO_BUILD_CMDS
+18:	$(MAKE) $(TARGET_CONFIGURE_OPTS) -C $(@D) all
+19: endef
+20:
+21: define LIBFOO_INSTALL_STAGING_CMDS
+22:	$(INSTALL) -D -m 0755 $(@D)/libfoo.a $(STAGING_DIR)/usr/lib/libfoo.a
+23:	$(INSTALL) -D -m 0644 $(@D)/foo.h $(STAGING_DIR)/usr/include/foo.h
+24:	$(INSTALL) -D -m 0755 $(@D)/libfoo.so* $(STAGING_DIR)/usr/lib
+25: endef
+26:
+27: define LIBFOO_INSTALL_TARGET_CMDS
+28:	$(INSTALL) -D -m 0755 $(@D)/libfoo.so* $(TARGET_DIR)/usr/lib
+29:	$(INSTALL) -d -m 0755 $(TARGET_DIR)/etc/foo.d
+30: endef
+31:
+32: define LIBFOO_USERS
+33:	foo -1 libfoo -1 * - - - LibFoo daemon
+34: endef
+35:
+36: define LIBFOO_DEVICES
+37:	/dev/foo  c  666  0  0	42  0  -  -  -
+38: endef
+39:
+40: define LIBFOO_PERMISSIONS
+41:	/bin/foo  f  4755  foo  libfoo	 -  -  -  -  -
+42: endef
+43:
+44: $(eval $(generic-package))
 --------------------------------
 
-The Makefile begins on line 7 to 11 with metadata information: the
+The Makefile begins on line 7 to 12 with metadata information: the
 version of the package (+LIBFOO_VERSION+), the name of the
 tarball containing the package (+LIBFOO_SOURCE+) (xz-ed tarball recommended)
 the Internet location at which the tarball can be downloaded from
-(+LIBFOO_SITE+), the license (+LIBFOO_LICENSE+) and file with the
-license text (+LIBFOO_LICENSE_FILES+). All variables must start with
+(+LIBFOO_SITE+), the license (+LIBFOO_LICENSE+), the file with the
+license text (+LIBFOO_LICENSE_FILES+) and the vendor for vunerability
+analysis (+LIBFOO_CPE_ID_VENDOR+). All variables must start with
 the same prefix, +LIBFOO_+ in this case. This prefix is always the
 uppercased version of the package name (see below to understand where
 the package name is defined).
 
-On line 12, we specify that this package wants to install something to
+On line 13, we specify that this package wants to install something to
 the staging space. This is often needed for libraries, since they must
 install header files and other development files in the staging space.
 This will ensure that the commands listed in the
 +LIBFOO_INSTALL_STAGING_CMDS+ variable will be executed.
 
-On line 13, we specify that there is some fixing to be done to some
+On line 14, we specify that there is some fixing to be done to some
 of the 'libfoo-config' files that were installed during
 +LIBFOO_INSTALL_STAGING_CMDS+ phase.
 These *-config files are executable shell script files that are
@@ -122,14 +124,14 @@ IMAGEMAGICK_CONFIG_SCRIPTS = \
 --------------------------------
 ================================
 
-On line 14, we specify the list of dependencies this package relies
+On line 15, we specify the list of dependencies this package relies
 on. These dependencies are listed in terms of lower-case package names,
 which can be packages for the target (without the +host-+
 prefix) or packages for the host (with the +host-+) prefix).
 Buildroot will ensure that all these packages are built and installed
 'before' the current package starts its configuration.
 
-The rest of the Makefile, lines 16..29, defines what should be done
+The rest of the Makefile, lines 17..29, defines what should be done
 at the different steps of the package configuration, compilation and
 installation.
 +LIBFOO_BUILD_CMDS+ tells what steps should be performed to
@@ -142,16 +144,16 @@ All these steps rely on the +$(@D)+ variable, which
 contains the directory where the source code of the package has been
 extracted.
 
-On lines 31..43, we define a user that is used by this package (e.g.
+On lines 32..44, we define a user that is used by this package (e.g.
 to run a daemon as non-root) (+LIBFOO_USERS+).
 
-On line 35..37, we define a device-node file used by this package
+On line 36..38, we define a device-node file used by this package
 (+LIBFOO_DEVICES+).
 
-On line 39..41, we define the permissions to set to specific files
+On line 40..42, we define the permissions to set to specific files
 installed by this package (+LIBFOO_PERMISSIONS+).
 
-Finally, on line 43, we call the +generic-package+ function, which
+Finally, on line 44, we call the +generic-package+ function, which
 generates, according to the variables defined previously, all the
 Makefile code necessary to make your package working.
 
@@ -469,6 +471,35 @@ information is (assuming the package name is +libfoo+) :
   locations, `/lib/firmware`, `/usr/lib/firmware`, `/lib/modules`,
   `/usr/lib/modules`, and `/usr/share`, which are automatically excluded.
 
+* +LIBFOO_CPE_ID_VENDOR+
+  This variable is optional. It only must be defined if the package name
+  does not match what the CPE ID uses for the vendor. By default it's set
+  to <pkg-name>_project.
+
+* +LIBFOO_CPE_ID_NAME+
+  This variable is optional. It only must be defined if the package name
+  does not match what the CPE ID uses for the name. By default it's set
+  to <pkg-name>.
+
+* +LIBFOO_CPE_ID_VERSION+
+  This variable is optional. By default it's set to <pkg-version>.
+
+* +LIBFOO_CPE_ID+ is optional, as the package infrastructure hangles the
+  default case of a single package's Common Product Enumeration (CPE)
+  identification string. +make cpe-info+ copies all of these into a
+  +cpe-manifest.csv+ file. To identify a package's possible CPE,
+  the National Vunerability Database can be searched at
+  https://nvd.nist.gov/products/cpe/search.
+
+* +LIBFOO_CVE_PATCHED+ is a space-separated list of the package's Common
+  Vunerability Enumeration (CVE) identification strings. This list
+  enumerates CVEs which are fixed by patches added in Buildroot. This
+  allows the CPE reporting to provide additional detail on CVEs which
+  have been fixed, even if Buildroot is not yet using an updated upstream
+  release including the fix. This variable is optional. If it is not
+  defined, the +CVE PATCHED+ field will appear empty in the manifest
+  file for this package.
+
 The recommended way to define these variables is to use the following
 syntax:
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Buildroot] [PATCH v4 5/5] support/scripts/pkgstats: add CPE reporting
  2018-05-10 18:58 [Buildroot] [PATCH v4 0/5] CPE ID Support Matt Weber
                   ` (3 preceding siblings ...)
  2018-05-10 18:58 ` [Buildroot] [PATCH v4 4/5] cpe-info: update manual for new pkg vars Matt Weber
@ 2018-05-10 18:58 ` Matt Weber
  2018-05-16  3:43   ` Ricardo Martincoski
  2018-05-16 23:34   ` Arnout Vandecappelle
  4 siblings, 2 replies; 17+ messages in thread
From: Matt Weber @ 2018-05-10 18:58 UTC (permalink / raw)
  To: buildroot

Pkg status now includes CPE as an item reported in the html
output (stat summary and for each pkg)

Option added to allow analysis of a specific Buildroot target's
'make cpe-info' reports accuracy against CPE database.

Signed-off-by: Matthew Weber <matthew.weber@rockwellcollins.com>
---
Changes
v3 -> v4
 - Collapsed patch 5 and 6 together into this single patch

[Eric
 - added except handling around file io
 - fixed condition where buildroot isn't generating a CPE
   string as part of the infra and output that is the case.
   (eventually these probably could be fixed but there aren't
   many at this point)

[Ricardo
 - fixed patch naming and resolved flake8 issues
 - added except handling to have proper exits
 - cleaned up csv file header skippin
 - condensed partial cve string split
 - updated help txt as suggested
 - reworked output file requirement.  Removed -o as required but
   added check if provided when -c isn't used

v3
 - New patch
---
 support/scripts/pkg-stats | 188 +++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 170 insertions(+), 18 deletions(-)

diff --git a/support/scripts/pkg-stats b/support/scripts/pkg-stats
index 43f7e8d..c28d397 100755
--- a/support/scripts/pkg-stats
+++ b/support/scripts/pkg-stats
@@ -24,14 +24,22 @@ from collections import defaultdict
 import re
 import subprocess
 import sys
+import urllib2
+import xmltodict
+import gzip
+from StringIO import StringIO
+import csv
 
 INFRA_RE = re.compile("\$\(eval \$\(([a-z-]*)-package\)\)")
 
+CPE_XML_URL = "https://static.nvd.nist.gov/feeds/xml/cpe/dictionary/official-cpe-dictionary_v2.3.xml.gz"
+
 
 class Package:
     all_licenses = list()
     all_license_files = list()
     all_versions = dict()
+    all_cpe_id = dict()
 
     def __init__(self, name, path):
         self.name = name
@@ -43,6 +51,8 @@ class Package:
         self.patch_count = 0
         self.warnings = 0
         self.current_version = None
+        self.cpe_id = None
+        self.has_cpe = False
 
     def pkgvar(self):
         return self.name.upper().replace("-", "_")
@@ -116,6 +126,25 @@ class Package:
                 self.warnings = int(m.group(1))
                 return
 
+    def set_cpe_info(self, cpe_dict):
+        """
+        Fills in the .has_cpe field
+        """
+        var = self.pkgvar()
+        if var in self.all_cpe_id:
+            self.cpe_id = self.all_cpe_id[var]
+        if self.cpe_id is None:
+            print("BR Infra Not building CPE for pkg: [%s]" % var)
+            return
+        result = cpe_dict.find(self.cpe_id)
+        if not result:
+            result = cpe_dict.find_partial(cpe_dict.get_cpe_no_version(self.cpe_id))
+            if result:
+                self.has_cpe = "Update"
+            # Unset case for has_cpe is assumed missing/does not exist
+        else:
+            self.has_cpe = cpe_dict.get_nvd_url(self.cpe_id)
+
     def __eq__(self, other):
         return self.path == other.path
 
@@ -254,6 +283,23 @@ def package_init_make_info():
 
         Package.all_versions[pkgvar] = value
 
+    # CPE ID
+    o = subprocess.check_output(["make", "BR2_HAVE_DOT_CONFIG=y",
+                                 "-s", "printvars", "VARS=%_CPE_ID"])
+    for l in o.splitlines():
+        # Get variable name and value
+        pkgvar, value = l.split("=")
+
+        # Strip _CPE_ID
+        pkgvar = pkgvar[:-7]
+
+        if pkgvar == "LINUX":
+            Package.all_cpe_id[pkgvar] = "cpe:2.3:o:" + value + ":*:*:*:*:*:*:*"
+        elif pkgvar == "LINUX_HEADERS":
+            Package.all_cpe_id[pkgvar] = "cpe:2.3:o:" + value + ":*:*:*:*:*:*:*"
+        else:
+            Package.all_cpe_id[pkgvar] = "cpe:2.3:a:" + value + ":*:*:*:*:*:*:*"
+
 
 def calculate_stats(packages):
     stats = defaultdict(int)
@@ -279,6 +325,12 @@ def calculate_stats(packages):
             stats["hash"] += 1
         else:
             stats["no-hash"] += 1
+        if pkg.has_cpe == "Update":
+            stats["update-cpe"] += 1
+        elif pkg.has_cpe:
+            stats["cpe"] += 1
+        else:
+            stats["no-cpe"] += 1
         stats["patches"] += pkg.patch_count
     return stats
 
@@ -422,6 +474,20 @@ def dump_html_pkg(f, pkg):
     f.write("  <td class=\"%s\">%d</td>\n" %
             (" ".join(td_class), pkg.warnings))
 
+    # CPE Valid
+    td_class = ["centered"]
+    if not pkg.has_cpe:
+        td_class.append("wrong")
+        f.write("  <td class=\"%s\">%s</td>\n" %
+                (" ".join(td_class), boolean_str(pkg.has_cpe)))
+    elif pkg.has_cpe == "Update":
+        td_class.append("wrong")
+        f.write("  <td class=\"%s\">Update</td>\n" %
+                (" ".join(td_class)))
+    else:
+        td_class.append("correct")
+        f.write("  <td class=\"%s\"><a href=\"%s\">%s</a></td>\n" %
+                (" ".join(td_class), pkg.has_cpe, boolean_str(pkg.has_cpe)))
     f.write(" </tr>\n")
 
 
@@ -437,6 +503,7 @@ def dump_html_all_pkgs(f, packages):
 <td class=\"centered\">Hash file</td>
 <td class=\"centered\">Current version</td>
 <td class=\"centered\">Warnings</td>
+<td class=\"centered\">CPE Valid</td>
 </tr>
 """)
     for pkg in sorted(packages):
@@ -463,6 +530,12 @@ def dump_html_stats(f, stats):
             stats["hash"])
     f.write(" <tr><td>Packages not having a hash file</td><td>%s</td></tr>\n" %
             stats["no-hash"])
+    f.write(" <tr><td>Packages having a registered CPE</td><td>%s</td></tr>\n" %
+            stats["cpe"])
+    f.write(" <tr><td>Packages needing CPE update</td><td>%s</td></tr>\n" %
+            stats["update-cpe"])
+    f.write(" <tr><td>Packages missing a registered CPE</td><td>%s</td></tr>\n" %
+            stats["no-cpe"])
     f.write(" <tr><td>Total number of patches</td><td>%s</td></tr>\n" %
             stats["patches"])
     f.write("</table>\n")
@@ -485,42 +558,121 @@ def dump_html(packages, stats, output):
         f.write(html_footer)
 
 
+class CPE:
+    all_cpes = dict()
+
+    def get_xml_dict(self):
+        print("CPE: Fetching xml manifest...")
+        try:
+            compressed_cpe_file = urllib2.urlopen(CPE_XML_URL)
+            print("CPE: Unzipping xml manifest...")
+            cpe_file = gzip.GzipFile(fileobj=StringIO(compressed_cpe_file.read())).read()
+            print("CPE: Converting xml manifest to dict...")
+            self.all_cpes = xmltodict.parse(cpe_file)
+        except urllib2.HTTPError:
+            print("CPE: HTTP Error: %s" % CPE_XML_URL)
+            sys.exit(1)
+        except urllib2.URLError:
+            print("CPE: URL Error: %s" % CPE_XML_URL)
+            sys.exit(1)
+
+    def find_partial(self, cpe_str):
+        print("CPE: Searching for partial [%s]" % cpe_str)
+        for cpe in self.all_cpes['cpe-list']['cpe-item']:
+            if cpe_str in cpe['cpe-23:cpe23-item']['@name']:
+                return cpe['cpe-23:cpe23-item']['@name']
+
+    def find(self, cpe_str):
+        print("CPE: Searching for [%s]" % cpe_str)
+        for cpe in self.all_cpes['cpe-list']['cpe-item']:
+            if cpe['cpe-23:cpe23-item']['@name'] == cpe_str:
+                return cpe['cpe-23:cpe23-item']['@name']
+
+    def get_cpe_no_version(self, cpe):
+        return "".join(cpe.split(":")[:5])
+
+    def get_nvd_url(self, cpe_str):
+        return "https://nvd.nist.gov/products/cpe/search/results?keyword=" + \
+                urllib2.quote(cpe_str) + \
+                "&status=FINAL&orderBy=CPEURI&namingFormat=2.3"
+
+
+def get_target_cpe_report(cpe_report_file, cpe_dict):
+    report_cpe_exact_match = ""
+    report_cpe_needing_update = ""
+    report_cpe_missing = ""
+
+    print("CPE: Checking for matches...")
+    try:
+        with open(cpe_report_file) as cpe_file:
+            cpe_list = csv.reader(cpe_file)
+            next(cpe_list)  # make cpe-info has a one line header
+            for cpe in cpe_list:
+                result = cpe_dict.find(cpe[0])
+                if not result:
+                    result = cpe_dict.find_partial(cpe_dict.get_cpe_no_version(cpe[0]))
+                    if not result:
+                        report_cpe_missing += cpe[0] + "\n"
+                    else:
+                        report_cpe_needing_update += cpe[0] + "\n"
+                else:
+                    report_cpe_exact_match += cpe[0] + "\n"
+    except (OSError, IOError) as e:
+        print("CPE: report csv file (%s): %s" % (e.errno, e.strerror))
+        sys.exit(1)
+
+    print("CPE: Found EXACT match:\n" + report_cpe_exact_match)
+    print("CPE: Found but REQUIRES UPDATE:\n" + report_cpe_needing_update)
+    print("CPE: Not found (proposing the following to be added):\n" + report_cpe_missing)
+
+
 def parse_args():
     parser = argparse.ArgumentParser()
-    parser.add_argument('-o', dest='output', action='store', required=True,
+    parser.add_argument('-o', dest='output', action='store',
                         help='HTML output file')
     parser.add_argument('-n', dest='npackages', type=int, action='store',
                         help='Number of packages')
     parser.add_argument('-p', dest='packages', action='store',
                         help='List of packages (comma separated)')
+    parser.add_argument('-c', dest='cpe_report', action='store',
+                        help='CPE Report generated by make cpe-info (csv format)')
     return parser.parse_args()
 
 
 def __main__():
     args = parse_args()
     if args.npackages and args.packages:
-        print "ERROR: -n and -p are mutually exclusive"
+        print("ERROR: -n and -p are mutually exclusive")
         sys.exit(1)
     if args.packages:
         package_list = args.packages.split(",")
     else:
         package_list = None
-    print "Build package list ..."
-    packages = get_pkglist(args.npackages, package_list)
-    print "Getting package make info ..."
-    package_init_make_info()
-    print "Getting package details ..."
-    for pkg in packages:
-        pkg.set_infra()
-        pkg.set_license()
-        pkg.set_hash_info()
-        pkg.set_patch_count()
-        pkg.set_check_package_warnings()
-        pkg.set_current_version()
-    print "Calculate stats"
-    stats = calculate_stats(packages)
-    print "Write HTML"
-    dump_html(packages, stats, args.output)
+    cpe_dict = CPE()
+    cpe_dict.get_xml_dict()
+    if args.cpe_report:
+        print("Performing Target CPE Report Analysis...")
+        get_target_cpe_report(args.cpe_report, cpe_dict)
+    elif args.output:
+        print("Build package list ...")
+        packages = get_pkglist(args.npackages, package_list)
+        print("Getting package make info ...")
+        package_init_make_info()
+        print("Getting package details ...")
+        for pkg in packages:
+            pkg.set_infra()
+            pkg.set_license()
+            pkg.set_hash_info()
+            pkg.set_patch_count()
+            pkg.set_check_package_warnings()
+            pkg.set_current_version()
+            pkg.set_cpe_info(cpe_dict)
+        print("Calculate stats")
+        stats = calculate_stats(packages)
+        print("Write HTML")
+        dump_html(packages, stats, args.output)
+    else:
+        print("Please provide the -o HTML output file arg")
 
 
 __main__()
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Buildroot] [PATCH v4 5/5] support/scripts/pkgstats: add CPE reporting
  2018-05-10 18:58 ` [Buildroot] [PATCH v4 5/5] support/scripts/pkgstats: add CPE reporting Matt Weber
@ 2018-05-16  3:43   ` Ricardo Martincoski
  2018-05-16 23:32     ` Arnout Vandecappelle
  2018-05-16 23:34   ` Arnout Vandecappelle
  1 sibling, 1 reply; 17+ messages in thread
From: Ricardo Martincoski @ 2018-05-16  3:43 UTC (permalink / raw)
  To: buildroot

Hello,

See below:
 - an important typo to fix;
 - some considerations about verbosity;
 - a possible trivial patch to split from this one;
 - some considerations about performance (to possibly run 2x or 4x faster, iff
   we can make some assumptions about the XML).

I hope other people can give more input about the last three items above.

On Thu, May 10, 2018 at 03:58 PM, Matt Weber wrote:

> support/scripts/pkgstats: add CPE reporting

nit: support/scripts/pkg-stats: add CPE reporting

[snip]
> +    def find_partial(self, cpe_str):
> +        print("CPE: Searching for partial [%s]" % cpe_str)

This ...

> +        for cpe in self.all_cpes['cpe-list']['cpe-item']:
> +            if cpe_str in cpe['cpe-23:cpe23-item']['@name']:
> +                return cpe['cpe-23:cpe23-item']['@name']
> +
> +    def find(self, cpe_str):
> +        print("CPE: Searching for [%s]" % cpe_str)

... and this make the script very verbose.

It generates 4000+ lines to stdout when generating the html.
Should we remove it from the final version?
Should we add a -v option for debug?

It seems to me it can be removed.

> +        for cpe in self.all_cpes['cpe-list']['cpe-item']:
> +            if cpe['cpe-23:cpe23-item']['@name'] == cpe_str:
> +                return cpe['cpe-23:cpe23-item']['@name']
> +
> +    def get_cpe_no_version(self, cpe):
> +        return "".join(cpe.split(":")[:5])

typo: it should be ":".join...
otherwise no partial is found for any package.

[snip]
>  def __main__():
>      args = parse_args()
>      if args.npackages and args.packages:
> -        print "ERROR: -n and -p are mutually exclusive"
> +        print("ERROR: -n and -p are mutually exclusive")

Ideally this (and similar changes below) should be a separate patch, early in
the series. Since the new patch would be trivial it would potentially be
reviewed/applied quickly. Also it would be less changes to carry/rework if new
iterations (due to other patches) are needed.

But well... this is the only line that would not be touched by this patch, so I
am OK with this.
Maybe others disagree.
Mentioning in the commit would be good, something like "Take the opportunity to
..."

>          sys.exit(1)
>      if args.packages:
>          package_list = args.packages.split(",")
>      else:
>          package_list = None
> -    print "Build package list ..."
> -    packages = get_pkglist(args.npackages, package_list)
> -    print "Getting package make info ..."
> -    package_init_make_info()
> -    print "Getting package details ..."
> -    for pkg in packages:
> -        pkg.set_infra()
> -        pkg.set_license()
> -        pkg.set_hash_info()
> -        pkg.set_patch_count()
> -        pkg.set_check_package_warnings()
> -        pkg.set_current_version()
> -    print "Calculate stats"
> -    stats = calculate_stats(packages)
> -    print "Write HTML"
> -    dump_html(packages, stats, args.output)
> +    cpe_dict = CPE()
> +    cpe_dict.get_xml_dict()
> +    if args.cpe_report:
> +        print("Performing Target CPE Report Analysis...")
> +        get_target_cpe_report(args.cpe_report, cpe_dict)
> +    elif args.output:

This is not common in other scripts in the tree. All checks between options are
done at the start of __main__.
But having two different code paths is not common either (in the scripts in the
tree), so it seems to me it makes sense here.
Maybe others disagree.

> +        print("Build package list ...")
> +        packages = get_pkglist(args.npackages, package_list)
> +        print("Getting package make info ...")
> +        package_init_make_info()
> +        print("Getting package details ...")
> +        for pkg in packages:
> +            pkg.set_infra()
> +            pkg.set_license()
> +            pkg.set_hash_info()
> +            pkg.set_patch_count()
> +            pkg.set_check_package_warnings()
> +            pkg.set_current_version()
> +            pkg.set_cpe_info(cpe_dict)
> +        print("Calculate stats")
> +        stats = calculate_stats(packages)
> +        print("Write HTML")
> +        dump_html(packages, stats, args.output)
> +    else:
> +        print("Please provide the -o HTML output file arg")
>  
>  
>  __main__()

About the performance:
17 minutes is not incredibly good but it is not terrible for 2000+ packages.

I created 2 *poorly-tested* patches for comparison purposes only. They seem
(again *poorly-tested*) to produce the same html output as v4 when called
without -c.

series v4 + typo fixed:
17 minutes
series v4 + typo fixed + comparison patch 1:
 8 minutes
series v4 + typo fixed + comparison patch 2:
 4 minutes

I am OK with first having the script functional and later, in another patch,
improving its performance while keeping the same functionality.
Maybe others disagree.
Maybe you like any patch below and find the time to fully test that it really
covers all cases and decide to merge to this one.
Maybe you foresee any use for other fields from the XML. In that case, both
comparison patches would not fit, as they assume we will always only need to
know cpe-23:cpe23-item/@name and that there will be always one, and only one,
cpe-23:cpe23-item/@name per cpe-item. Notice: I did not read the specs for the
CPE dictionary, I looked only to this script.


comparison patch 1:
Can we assume we will only need cpe-23:cpe23-item/@name and pre-process?
----->8-----
+++ b/support/scripts/pkg-stats
@@ -561,3 +561,3 @@ def dump_html(packages, stats, output):
 class CPE:
-    all_cpes = dict()
+    all_cpes = list()
 
@@ -569,4 +569,6 @@ class CPE:
             cpe_file = gzip.GzipFile(fileobj=StringIO(compressed_cpe_file.read())).read()
-            print("CPE: Converting xml manifest to dict...")
-            self.all_cpes = xmltodict.parse(cpe_file)
+            print("CPE: Converting xml manifest to list...")
+            raw_dict = xmltodict.parse(cpe_file)
+            for cpe in raw_dict['cpe-list']['cpe-item']:
+                self.all_cpes.append(cpe['cpe-23:cpe23-item']['@name'])
         except urllib2.HTTPError:
@@ -580,5 +582,5 @@ class CPE:
         print("CPE: Searching for partial [%s]" % cpe_str)
-        for cpe in self.all_cpes['cpe-list']['cpe-item']:
-            if cpe_str in cpe['cpe-23:cpe23-item']['@name']:
-                return cpe['cpe-23:cpe23-item']['@name']
+        for cpe in self.all_cpes:
+            if cpe_str in cpe:
+                return cpe
 
@@ -586,5 +588,4 @@ class CPE:
         print("CPE: Searching for [%s]" % cpe_str)
-        for cpe in self.all_cpes['cpe-list']['cpe-item']:
-            if cpe['cpe-23:cpe23-item']['@name'] == cpe_str:
-                return cpe['cpe-23:cpe23-item']['@name']
+        if cpe_str in self.all_cpes:
+            return cpe_str
----->8-----


comparison patch 2:
Note: I usually don't parse XML using Python, so I am not sure how future-proof
is the string that includes the namespace. It seems (from google search results)
people invented few hacks to disable namespaces in ElementTree. xmltodict has
process_namespaces disabled by default.
----->8-----
+++ b/support/scripts/pkg-stats
@@ -27,3 +27,3 @@ import sys
 import urllib2
-import xmltodict
+import xml.etree.ElementTree as ET
 import gzip
@@ -561,3 +561,3 @@ def dump_html(packages, stats, output):
 class CPE:
-    all_cpes = dict()
+    all_cpes = None
 
@@ -569,4 +569,5 @@ class CPE:
             cpe_file = gzip.GzipFile(fileobj=StringIO(compressed_cpe_file.read())).read()
-            print("CPE: Converting xml manifest to dict...")
-            self.all_cpes = xmltodict.parse(cpe_file)
+            print("CPE: Converting xml manifest to list...")
+            tree = ET.fromstring(cpe_file)
+            self.all_cpes = [i.get('name') for i in tree.iter('{http://scap.nist.gov/schema/cpe-extension/2.3}cpe23-item')]
         except urllib2.HTTPError:
@@ -580,5 +581,5 @@ class CPE:
         print("CPE: Searching for partial [%s]" % cpe_str)
-        for cpe in self.all_cpes['cpe-list']['cpe-item']:
-            if cpe_str in cpe['cpe-23:cpe23-item']['@name']:
-                return cpe['cpe-23:cpe23-item']['@name']
+        for cpe in self.all_cpes:
+            if cpe.startswith(cpe_str):
+                return cpe
 
@@ -586,5 +587,5 @@ class CPE:
         print("CPE: Searching for [%s]" % cpe_str)
-        for cpe in self.all_cpes['cpe-list']['cpe-item']:
-            if cpe['cpe-23:cpe23-item']['@name'] == cpe_str:
-                return cpe['cpe-23:cpe23-item']['@name']
+        for cpe in self.all_cpes:
+            if cpe == cpe_str:
+                return cpe
----->8-----

Regards,
Ricardo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Buildroot] [PATCH v4 5/5] support/scripts/pkgstats: add CPE reporting
  2018-05-16  3:43   ` Ricardo Martincoski
@ 2018-05-16 23:32     ` Arnout Vandecappelle
  2018-05-17  1:42       ` Matthew Weber
  2018-05-18  3:07       ` Ricardo Martincoski
  0 siblings, 2 replies; 17+ messages in thread
From: Arnout Vandecappelle @ 2018-05-16 23:32 UTC (permalink / raw)
  To: buildroot

 Hi Matt, Ricardo,

 I haven't been following things very closely so I may miss something entirely,
however...

On 16-05-18 05:43, Ricardo Martincoski wrote:
> Hello,
> 
> See below:
>  - an important typo to fix;
>  - some considerations about verbosity;
>  - a possible trivial patch to split from this one;
>  - some considerations about performance (to possibly run 2x or 4x faster, iff
>    we can make some assumptions about the XML).

 I thought the performance was completely dominated by the network, but
apparently it isn't?

 Oh, now I see, the set_cpe_info is in fact O(#cpe_packages *
#buildroot_packages), so 17 minutes is just 1ms for the inner loop, which
doesn't sound entirely unreasonable.


> I hope other people can give more input about the last three items above.
> 
> On Thu, May 10, 2018 at 03:58 PM, Matt Weber wrote:
> 
>> support/scripts/pkgstats: add CPE reporting
> 
> nit: support/scripts/pkg-stats: add CPE reporting
> 
> [snip]
>> +    def find_partial(self, cpe_str):
>> +        print("CPE: Searching for partial [%s]" % cpe_str)
> 
> This ...
> 
>> +        for cpe in self.all_cpes['cpe-list']['cpe-item']:
>> +            if cpe_str in cpe['cpe-23:cpe23-item']['@name']:
>> +                return cpe['cpe-23:cpe23-item']['@name']
>> +
>> +    def find(self, cpe_str):
>> +        print("CPE: Searching for [%s]" % cpe_str)
> 
> ... and this make the script very verbose.
> 
> It generates 4000+ lines to stdout when generating the html.
> Should we remove it from the final version?
> Should we add a -v option for debug?
> 
> It seems to me it can be removed.

 Well, if it takes 17 minutes, it's nice to see some progress... But if you can
reduce it to 4 minutes the need isn't there so much.

[snip]

>> +    cpe_dict = CPE()
>> +    cpe_dict.get_xml_dict()
>> +    if args.cpe_report:
>> +        print("Performing Target CPE Report Analysis...")
>> +        get_target_cpe_report(args.cpe_report, cpe_dict)
>> +    elif args.output:
> 
> This is not common in other scripts in the tree. All checks between options are
> done at the start of __main__.
> But having two different code paths is not common either (in the scripts in the
> tree), so it seems to me it makes sense here.
> Maybe others disagree.

 The way I see it, there is almost no commonality between the script with the -c
option and the normal pkg-stats, so it makes no sense to have them in the same
script. I would split off the CPE class into a separate module that would be
imported into this script and a new cpe_report script.

> 
>> +        print("Build package list ...")
>> +        packages = get_pkglist(args.npackages, package_list)
>> +        print("Getting package make info ...")
>> +        package_init_make_info()
>> +        print("Getting package details ...")
>> +        for pkg in packages:
>> +            pkg.set_infra()
>> +            pkg.set_license()
>> +            pkg.set_hash_info()
>> +            pkg.set_patch_count()
>> +            pkg.set_check_package_warnings()
>> +            pkg.set_current_version()
>> +            pkg.set_cpe_info(cpe_dict)
>> +        print("Calculate stats")
>> +        stats = calculate_stats(packages)
>> +        print("Write HTML")
>> +        dump_html(packages, stats, args.output)
>> +    else:
>> +        print("Please provide the -o HTML output file arg")
>>  
>>  
>>  __main__()
> 
> About the performance:
> 17 minutes is not incredibly good but it is not terrible for 2000+ packages.

 IMO it is terrible...

> 
> I created 2 *poorly-tested* patches for comparison purposes only. They seem
> (again *poorly-tested*) to produce the same html output as v4 when called
> without -c.
> 
> series v4 + typo fixed:
> 17 minutes
> series v4 + typo fixed + comparison patch 1:
>  8 minutes
> series v4 + typo fixed + comparison patch 2:
>  4 minutes
> 
> I am OK with first having the script functional and later, in another patch,
> improving its performance while keeping the same functionality.
> Maybe others disagree.
> Maybe you like any patch below and find the time to fully test that it really
> covers all cases and decide to merge to this one.
> Maybe you foresee any use for other fields from the XML. In that case, both
> comparison patches would not fit, as they assume we will always only need to
> know cpe-23:cpe23-item/@name and that there will be always one, and only one,
> cpe-23:cpe23-item/@name per cpe-item. Notice: I did not read the specs for the
> CPE dictionary, I looked only to this script.
> 
> 
> comparison patch 1:
> Can we assume we will only need cpe-23:cpe23-item/@name and pre-process?

 We can easily keep both the dict and the list.

> ----->8-----
> +++ b/support/scripts/pkg-stats
> @@ -561,3 +561,3 @@ def dump_html(packages, stats, output):
>  class CPE:
> -    all_cpes = dict()
> +    all_cpes = list()

 This (both the dict() and the list()) is a bit unconventional. If two instances
of class CPE are created, they will refer to the same dict/list...  Normally
members are set in the __init__ constructor. Was this intentional?

 Personally, I'm a fan of the attrs module, which allows you to do this very
elegantly. Unfortunately, attrs is not in the standard library.

>  
> @@ -569,4 +569,6 @@ class CPE:
>              cpe_file = gzip.GzipFile(fileobj=StringIO(compressed_cpe_file.read())).read()
> -            print("CPE: Converting xml manifest to dict...")
> -            self.all_cpes = xmltodict.parse(cpe_file)
> +            print("CPE: Converting xml manifest to list...")
> +            raw_dict = xmltodict.parse(cpe_file)
> +            for cpe in raw_dict['cpe-list']['cpe-item']:
> +                self.all_cpes.append(cpe['cpe-23:cpe23-item']['@name'])
>          except urllib2.HTTPError:
> @@ -580,5 +582,5 @@ class CPE:
>          print("CPE: Searching for partial [%s]" % cpe_str)
> -        for cpe in self.all_cpes['cpe-list']['cpe-item']:
> -            if cpe_str in cpe['cpe-23:cpe23-item']['@name']:
> -                return cpe['cpe-23:cpe23-item']['@name']
> +        for cpe in self.all_cpes:
> +            if cpe_str in cpe:
> +                return cpe
>  
> @@ -586,5 +588,4 @@ class CPE:
>          print("CPE: Searching for [%s]" % cpe_str)
> -        for cpe in self.all_cpes['cpe-list']['cpe-item']:
> -            if cpe['cpe-23:cpe23-item']['@name'] == cpe_str:
> -                return cpe['cpe-23:cpe23-item']['@name']
> +        if cpe_str in self.all_cpes:
> +            return cpe_str
> ----->8-----
> 
> 
> comparison patch 2:
> Note: I usually don't parse XML using Python, so I am not sure how future-proof
> is the string that includes the namespace. It seems (from google search results)
> people invented few hacks to disable namespaces in ElementTree. xmltodict has
> process_namespaces disabled by default.
> ----->8-----
> +++ b/support/scripts/pkg-stats
> @@ -27,3 +27,3 @@ import sys
>  import urllib2
> -import xmltodict
> +import xml.etree.ElementTree as ET
>  import gzip
> @@ -561,3 +561,3 @@ def dump_html(packages, stats, output):
>  class CPE:
> -    all_cpes = dict()
> +    all_cpes = None

 This should still be list() IMO, so that it has a consistent interface also in
case of e.g. HTTPError.

>  
> @@ -569,4 +569,5 @@ class CPE:
>              cpe_file = gzip.GzipFile(fileobj=StringIO(compressed_cpe_file.read())).read()
> -            print("CPE: Converting xml manifest to dict...")
> -            self.all_cpes = xmltodict.parse(cpe_file)
> +            print("CPE: Converting xml manifest to list...")
> +            tree = ET.fromstring(cpe_file)
> +            self.all_cpes = [i.get('name') for i in tree.iter('{http://scap.nist.gov/schema/cpe-extension/2.3}cpe23-item')]

 So after this you get basically the same as after comparison patch 1, right? So
the xmltodict takes 4 minutes? Or am I missing something?

 Oh, actually, the [... for ... iter...] is also more efficient than
for...: append() so that could be an effect here as well. But this part of the
code is only O(#cpe packages) so it shouldn't have that much impact.

>          except urllib2.HTTPError:package
> @@ -580,5 +581,5 @@ class CPE:
>          print("CPE: Searching for partial [%s]" % cpe_str)
> -        for cpe in self.all_cpes['cpe-list']['cpe-item']:
> -            if cpe_str in cpe['cpe-23:cpe23-item']['@name']:
> -                return cpe['cpe-23:cpe23-item']['@name']
> +        for cpe in self.all_cpes:
> +            if cpe.startswith(cpe_str):

 Originally it was 'in' instead of startswith(). Obviously startswith() will be
more efficient. And also more correct, I guess, or does the partial match not
need to be anchored at the beginning of the string? Well, since it always starts
with 'cpe:2.3:a:' a match somewhere in the middle seems unlikely...

> +                return cpe
>  
> @@ -586,5 +587,5 @@ class CPE:
>          print("CPE: Searching for [%s]" % cpe_str)
> -        for cpe in self.all_cpes['cpe-list']['cpe-item']:
> -            if cpe['cpe-23:cpe23-item']['@name'] == cpe_str:
> -                return cpe['cpe-23:cpe23-item']['@name']
> +        for cpe in self.all_cpes:
> +            if cpe == cpe_str:
> +                return cpe

 Looks to me like the lookup time can be cut in half again by merging the two
find functions:

    def find(self, cpe_str):
        partial_matches = []
        partial_cpe_str = self.get_cpe_no_version(cpe_str)
        for cpe in self.all_cpes:
            if cpe == cpe_str:
                return self.get_nvd_url(cpe_str)
            elif cpe.startswith(partial_cpe_str):
                partial_matches.append(cpe)
        if partial_matches:
            return 'Updated' # Or we could return the 'best' version?

 Although... that probably doesn't make much of a difference.

 What would probably make a difference, however, is to make two sets of cpe strings:

   all_cpes = set()
   all_cpes_no_version = set()

...
           print("CPE: Converting xml manifest to dict...")
           for cpe in raw_dict['cpe-list']['cpe-item']:
               cpe_str = cpe['cpe-23:cpe23-item']['@name']
               cpe_str_no_version = self.get_cpe_no_version(cpe_str)
               self.all_cpes.add(cpe_str)
               self.all_cpes_no_version.add(cpe_str_no_version)
...
    def find(self, cpe_str):
        if cpe_str in self.all_cpes:
            return self.get_cpe_no_version(cpe_str)
        cpe_str_no_version = self.get_cpe_no_version(cpe_str)
        if cpe_str_no_version in self.all_cpes_no_version:
            return 'Updated'

 Regards,
 Arnout

> ----->8-----
> 
> Regards,
> Ricardo
> 
> 
> 
> _______________________________________________
> buildroot mailing list
> buildroot at busybox.net
> http://lists.busybox.net/mailman/listinfo/buildroot
> 

-- 
Arnout Vandecappelle                          arnout at mind be
Senior Embedded Software Architect            +32-16-286500
Essensium/Mind                                http://www.mind.be
G.Geenslaan 9, 3001 Leuven, Belgium           BE 872 984 063 RPR Leuven
LinkedIn profile: http://www.linkedin.com/in/arnoutvandecappelle
GPG fingerprint:  7493 020B C7E3 8618 8DEC 222C 82EB F404 F9AC 0DDF

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Buildroot] [PATCH v4 5/5] support/scripts/pkgstats: add CPE reporting
  2018-05-10 18:58 ` [Buildroot] [PATCH v4 5/5] support/scripts/pkgstats: add CPE reporting Matt Weber
  2018-05-16  3:43   ` Ricardo Martincoski
@ 2018-05-16 23:34   ` Arnout Vandecappelle
  2018-05-17  1:42     ` Matthew Weber
  1 sibling, 1 reply; 17+ messages in thread
From: Arnout Vandecappelle @ 2018-05-16 23:34 UTC (permalink / raw)
  To: buildroot



On 10-05-18 20:58, Matt Weber wrote:
> +
> +        if pkgvar == "LINUX":
> +            Package.all_cpe_id[pkgvar] = "cpe:2.3:o:" + value + ":*:*:*:*:*:*:*"
> +        elif pkgvar == "LINUX_HEADERS":
> +            Package.all_cpe_id[pkgvar] = "cpe:2.3:o:" + value + ":*:*:*:*:*:*:*"

 A bit more idiomatic:

        if pkgvar in ("LINUX", "LINUX_HEADERS"):
            Package.all_cpe_id[pkgvar] = "cpe:2.3:o:" + value + ":*:*:*:*:*:*:*"

 Regards,
 Arnout

> +        else:
> +            Package.all_cpe_id[pkgvar] = "cpe:2.3:a:" + value + ":*:*:*:*:*:*:*"

-- 
Arnout Vandecappelle                          arnout at mind be
Senior Embedded Software Architect            +32-16-286500
Essensium/Mind                                http://www.mind.be
G.Geenslaan 9, 3001 Leuven, Belgium           BE 872 984 063 RPR Leuven
LinkedIn profile: http://www.linkedin.com/in/arnoutvandecappelle
GPG fingerprint:  7493 020B C7E3 8618 8DEC 222C 82EB F404 F9AC 0DDF

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Buildroot] [PATCH v4 5/5] support/scripts/pkgstats: add CPE reporting
  2018-05-16 23:32     ` Arnout Vandecappelle
@ 2018-05-17  1:42       ` Matthew Weber
  2018-05-18  3:16         ` Ricardo Martincoski
  2018-05-18  3:07       ` Ricardo Martincoski
  1 sibling, 1 reply; 17+ messages in thread
From: Matthew Weber @ 2018-05-17  1:42 UTC (permalink / raw)
  To: buildroot

Arnout, Ricardo,

On Wed, May 16, 2018 at 6:32 PM, Arnout Vandecappelle <arnout@mind.be> wrote:
>
>  Hi Matt, Ricardo,
>
>  I haven't been following things very closely so I may miss something entirely,
> however...
>
> On 16-05-18 05:43, Ricardo Martincoski wrote:
> > Hello,
> >
> > See below:
> >  - an important typo to fix;
> >  - some considerations about verbosity;
> >  - a possible trivial patch to split from this one;
> >  - some considerations about performance (to possibly run 2x or 4x faster, iff
> >    we can make some assumptions about the XML).

Ricardo, thanks for going through v4 and all the great feedback.  As
Arnout responded to your email, I'll add on below.


> >> +    def find(self, cpe_str):
> >> +        print("CPE: Searching for [%s]" % cpe_str)
> >
> > ... and this make the script very verbose.
> >
> > It generates 4000+ lines to stdout when generating the html.
> > Should we remove it from the final version?
> > Should we add a -v option for debug?
> >
> > It seems to me it can be removed.
>
>  Well, if it takes 17 minutes, it's nice to see some progress... But if you can
> reduce it to 4 minutes the need isn't there so much.

I believe the verbosity can be removed/reduced.  Both CPE activities
end with a report output of some sort.

> >> +    cpe_dict = CPE()
> >> +    cpe_dict.get_xml_dict()
> >> +    if args.cpe_report:
> >> +        print("Performing Target CPE Report Analysis...")
> >> +        get_target_cpe_report(args.cpe_report, cpe_dict)
> >> +    elif args.output:
> >
> > This is not common in other scripts in the tree. All checks between options are
> > done at the start of __main__.
> > But having two different code paths is not common either (in the scripts in the
> > tree), so it seems to me it makes sense here.
> > Maybe others disagree.
>
>  The way I see it, there is almost no commonality between the script with the -c
> option and the normal pkg-stats, so it makes no sense to have them in the same
> script. I would split off the CPE class into a separate module that would be
> imported into this script and a new cpe_report script.

I debated this.  Ricardo/Arnout, what naming and directory scheme
would you propose?

> > About the performance:
> > 17 minutes is not incredibly good but it is not terrible for 2000+ packages.
>
>  IMO it is terrible...

It's going to get longer once we have the "updated pkg available"
checking merged, so I'm up for optimizing now.

>
> >
> > I created 2 *poorly-tested* patches for comparison purposes only. They seem
> > (again *poorly-tested*) to produce the same html output as v4 when called
> > without -c.
> >
> > series v4 + typo fixed:
> > 17 minutes
> > series v4 + typo fixed + comparison patch 1:
> >  8 minutes
> > series v4 + typo fixed + comparison patch 2:
> >  4 minutes
> >
> > I am OK with first having the script functional and later, in another patch,
> > improving its performance while keeping the same functionality.
> > Maybe others disagree.
> > Maybe you like any patch below and find the time to fully test that it really
> > covers all cases and decide to merge to this one.
> > Maybe you foresee any use for other fields from the XML. In that case, both
> > comparison patches would not fit, as they assume we will always only need to
> > know cpe-23:cpe23-item/@name and that there will be always one, and only one,
> > cpe-23:cpe23-item/@name per cpe-item. Notice: I did not read the specs for the
> > CPE dictionary, I looked only to this script.
> >
> >
> > comparison patch 1:
> > Can we assume we will only need cpe-23:cpe23-item/@name and pre-process?
>
>  We can easily keep both the dict and the list.

For the validity check we'd only use the name entry but for building
the "update xml file" to make changes to an existing CPE (patchset
being developed now), we mine some additional detail out of the dict.
Like mentioned below we could build two sets that are pre-processed
and still have the dict available.

>
> > ----->8-----
> > +++ b/support/scripts/pkg-stats
> > @@ -561,3 +561,3 @@ def dump_html(packages, stats, output):
> >  class CPE:
> > -    all_cpes = dict()
> > +    all_cpes = list()
>
>  This (both the dict() and the list()) is a bit unconventional. If two instances
> of class CPE are created, they will refer to the same dict/list...  Normally
> members are set in the __init__ constructor. Was this intentional?

Nope, just maturity of my python knowledge.  How would that work
instead?  I can only see one instance of the class for the current use
case.  However those variables could be shared between instances, we'd
just need to manage if they're valid.

> >
> > comparison patch 2:
> > Note: I usually don't parse XML using Python, so I am not sure how future-proof
> > is the string that includes the namespace. It seems (from google search results)
> > people invented few hacks to disable namespaces in ElementTree. xmltodict has
> > process_namespaces disabled by default.
> > ----->8-----
[snip]
> >          except urllib2.HTTPError:package
> > @@ -580,5 +581,5 @@ class CPE:
> >          print("CPE: Searching for partial [%s]" % cpe_str)
> > -        for cpe in self.all_cpes['cpe-list']['cpe-item']:
> > -            if cpe_str in cpe['cpe-23:cpe23-item']['@name']:
> > -                return cpe['cpe-23:cpe23-item']['@name']
> > +        for cpe in self.all_cpes:
> > +            if cpe.startswith(cpe_str):
>
>  Originally it was 'in' instead of startswith(). Obviously startswith() will be
> more efficient. And also more correct, I guess, or does the partial match not
> need to be anchored at the beginning of the string? Well, since it always starts
> with 'cpe:2.3:a:' a match somewhere in the middle seems unlikely...

startswith() will work.

>
> > +                return cpe
> >
> > @@ -586,5 +587,5 @@ class CPE:
> >          print("CPE: Searching for [%s]" % cpe_str)
> > -        for cpe in self.all_cpes['cpe-list']['cpe-item']:
> > -            if cpe['cpe-23:cpe23-item']['@name'] == cpe_str:
> > -                return cpe['cpe-23:cpe23-item']['@name']
> > +        for cpe in self.all_cpes:
> > +            if cpe == cpe_str:
> > +                return cpe
>
>  Looks to me like the lookup time can be cut in half again by merging the two
> find functions:
>
>     def find(self, cpe_str):
>         partial_matches = []
>         partial_cpe_str = self.get_cpe_no_version(cpe_str)
>         for cpe in self.all_cpes:
>             if cpe == cpe_str:
>                 return self.get_nvd_url(cpe_str)
>             elif cpe.startswith(partial_cpe_str):
>                 partial_matches.append(cpe)
>         if partial_matches:
>             return 'Updated' # Or we could return the 'best' version?
>
>  Although... that probably doesn't make much of a difference.
>
>  What would probably make a difference, however, is to make two sets of cpe strings:
>
>    all_cpes = set()
>    all_cpes_no_version = set()
>
> ...
>            print("CPE: Converting xml manifest to dict...")
>            for cpe in raw_dict['cpe-list']['cpe-item']:
>                cpe_str = cpe['cpe-23:cpe23-item']['@name']
>                cpe_str_no_version = self.get_cpe_no_version(cpe_str)
>                self.all_cpes.add(cpe_str)
>                self.all_cpes_no_version.add(cpe_str_no_version)
> ...
>     def find(self, cpe_str):
>         if cpe_str in self.all_cpes:
>             return self.get_cpe_no_version(cpe_str)
>         cpe_str_no_version = self.get_cpe_no_version(cpe_str)
>         if cpe_str_no_version in self.all_cpes_no_version:
>             return 'Updated'
>

Agree, the pre-processed sets would be cleaner and more efficient.

I think I have enough good examples, should I make an attempt at
another version based on using sets and other patch#2 suggestions?

Matt

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Buildroot] [PATCH v4 5/5] support/scripts/pkgstats: add CPE reporting
  2018-05-16 23:34   ` Arnout Vandecappelle
@ 2018-05-17  1:42     ` Matthew Weber
  0 siblings, 0 replies; 17+ messages in thread
From: Matthew Weber @ 2018-05-17  1:42 UTC (permalink / raw)
  To: buildroot

Arnout,

On Wed, May 16, 2018 at 6:34 PM, Arnout Vandecappelle <arnout@mind.be> wrote:
>
>
> On 10-05-18 20:58, Matt Weber wrote:
>> +
>> +        if pkgvar == "LINUX":
>> +            Package.all_cpe_id[pkgvar] = "cpe:2.3:o:" + value + ":*:*:*:*:*:*:*"
>> +        elif pkgvar == "LINUX_HEADERS":
>> +            Package.all_cpe_id[pkgvar] = "cpe:2.3:o:" + value + ":*:*:*:*:*:*:*"
>
>  A bit more idiomatic:
>
>         if pkgvar in ("LINUX", "LINUX_HEADERS"):
>             Package.all_cpe_id[pkgvar] = "cpe:2.3:o:" + value + ":*:*:*:*:*:*:*"
>

Noted, will update.

Matt

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Buildroot] [PATCH v4 5/5] support/scripts/pkgstats: add CPE reporting
  2018-05-16 23:32     ` Arnout Vandecappelle
  2018-05-17  1:42       ` Matthew Weber
@ 2018-05-18  3:07       ` Ricardo Martincoski
  2018-05-18  3:18         ` Matthew Weber
  1 sibling, 1 reply; 17+ messages in thread
From: Ricardo Martincoski @ 2018-05-18  3:07 UTC (permalink / raw)
  To: buildroot

Hello,

On Wed, May 16, 2018 at 08:32 PM, Arnout Vandecappelle wrote:
> On 16-05-18 05:43, Ricardo Martincoski wrote:

[snip]
>> @@ -569,4 +569,5 @@ class CPE:
>>              cpe_file = gzip.GzipFile(fileobj=StringIO(compressed_cpe_file.read())).read()
>> -            print("CPE: Converting xml manifest to dict...")
>> -            self.all_cpes = xmltodict.parse(cpe_file)
>> +            print("CPE: Converting xml manifest to list...")
>> +            tree = ET.fromstring(cpe_file)
>> +            self.all_cpes = [i.get('name') for i in tree.iter('{http://scap.nist.gov/schema/cpe-extension/2.3}cpe23-item')]
> 
>  So after this you get basically the same as after comparison patch 1, right? So
> the xmltodict takes 4 minutes? Or am I missing something?

No. I missed something important and jumped to wrong conclusions.

After adding some simple instrumentation code to display relative timestamps,
the main difference in performance is *not* related to the xml parser used but
it is related to the code used for find() and find_partial().

I didn't performed further testings but it seems related to the use of
startswith as you said.

patch 1:
0:00:00.001015 CPE: Fetching xml manifest...
0:00:03.924777 CPE: Unzipping xml manifest...
0:00:11.672462 CPE: Converting xml manifest to list...
0:00:11.672504 before xmltodict.parse
0:00:36.343417 before append
0:00:36.462400 list created
0:00:36.738042 Build package list ...
0:00:36.875742 Getting package make info ...
0:00:58.543116 Getting package details ...
0:01:00.016925 BR Infra Not building CPE for pkg: [UBOOT]
0:01:07.714094 BR Infra Not building CPE for pkg: [IMX_USB_LOADER]
...
0:08:00.615649 BR Infra Not building CPE for pkg: [INTLTOOL]
0:08:01.243667 BR Infra Not building CPE for pkg: [DOXYGEN]
0:08:02.035463 Calculate stats
0:08:02.042401 Write HTML

patch 2:
0:00:00.000889 CPE: Fetching xml manifest...
0:00:03.640856 CPE: Unzipping xml manifest...
0:00:14.569496 CPE: Converting xml manifest to list...
0:00:14.569541 before ET.fromstring
0:00:21.325842 before list comprehension
0:00:21.609946 list created
0:00:21.612443 Build package list ...
0:00:21.754223 Getting package make info ...
0:00:43.111196 Getting package details ...
0:00:43.828047 BR Infra Not building CPE for pkg: [UBOOT]
0:00:47.125995 BR Infra Not building CPE for pkg: [IMX_USB_LOADER]
...
0:03:46.279893 BR Infra Not building CPE for pkg: [INTLTOOL]
0:03:46.571266 BR Infra Not building CPE for pkg: [DOXYGEN]
0:03:46.892839 Calculate stats
0:03:46.895765 Write HTML

>  Oh, actually, the [... for ... iter...] is also more efficient than
> for...: append() so that could be an effect here as well. But this part of the
> code is only O(#cpe packages) so it shouldn't have that much impact.
> 
>>          except urllib2.HTTPError:package
>> @@ -580,5 +581,5 @@ class CPE:
>>          print("CPE: Searching for partial [%s]" % cpe_str)
>> -        for cpe in self.all_cpes['cpe-list']['cpe-item']:
>> -            if cpe_str in cpe['cpe-23:cpe23-item']['@name']:
>> -                return cpe['cpe-23:cpe23-item']['@name']
>> +        for cpe in self.all_cpes:
>> +            if cpe.startswith(cpe_str):
> 
>  Originally it was 'in' instead of startswith(). Obviously startswith() will be
> more efficient. And also more correct, I guess, or does the partial match not
[snip]

Regards,
Ricardo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Buildroot] [PATCH v4 5/5] support/scripts/pkgstats: add CPE reporting
  2018-05-17  1:42       ` Matthew Weber
@ 2018-05-18  3:16         ` Ricardo Martincoski
  2018-05-18  3:21           ` Matthew Weber
  0 siblings, 1 reply; 17+ messages in thread
From: Ricardo Martincoski @ 2018-05-18  3:16 UTC (permalink / raw)
  To: buildroot

Hello,

On Wed, May 16, 2018 at 10:42 PM, Matthew Weber wrote:
> On Wed, May 16, 2018 at 6:32 PM, Arnout Vandecappelle wrote:
>> On 16-05-18 05:43, Ricardo Martincoski wrote:

[snip]
>> >> +    cpe_dict = CPE()
>> >> +    cpe_dict.get_xml_dict()
>> >> +    if args.cpe_report:
>> >> +        print("Performing Target CPE Report Analysis...")
>> >> +        get_target_cpe_report(args.cpe_report, cpe_dict)
>> >> +    elif args.output:
>> >
>> > This is not common in other scripts in the tree. All checks between options are
>> > done at the start of __main__.
>> > But having two different code paths is not common either (in the scripts in the
>> > tree), so it seems to me it makes sense here.
>> > Maybe others disagree.
>>
>>  The way I see it, there is almost no commonality between the script with the -c
>> option and the normal pkg-stats, so it makes no sense to have them in the same
>> script. I would split off the CPE class into a separate module that would be
>> imported into this script and a new cpe_report script.
> 
> I debated this.  Ricardo/Arnout, what naming and directory scheme
> would you propose?

It seems you need a single shared module, so inspired by:

getdeveloperlib.py
get-developers -> import getdeveloperlib

you could have:

cpelib.py
pkg-stats -> import cpelib
cpe-report -> import cpelib

But cpe-report seems to be a user-facing script, so it would go to the utils/
directory. I am not sure what is the best way to solve the import for:

support/scripts/cpelib.py
support/scripts/pkg-stats -> import cpelib
utils/cpe-report -> import cpelib

Maybe a symlink utils/cpelib.py -> support/scripts/cpelib.py ? (not tested)
I never tried importing using relative paths without having __init__.py.


Regards,
Ricardo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Buildroot] [PATCH v4 5/5] support/scripts/pkgstats: add CPE reporting
  2018-05-18  3:07       ` Ricardo Martincoski
@ 2018-05-18  3:18         ` Matthew Weber
  0 siblings, 0 replies; 17+ messages in thread
From: Matthew Weber @ 2018-05-18  3:18 UTC (permalink / raw)
  To: buildroot

Ricardo,

On Thu, May 17, 2018 at 10:07 PM, Ricardo Martincoski
<ricardo.martincoski@gmail.com> wrote:
>
> Hello,
>
> On Wed, May 16, 2018 at 08:32 PM, Arnout Vandecappelle wrote:
> > On 16-05-18 05:43, Ricardo Martincoski wrote:
>
> [snip]
> >> @@ -569,4 +569,5 @@ class CPE:
> >>              cpe_file = gzip.GzipFile(fileobj=StringIO(compressed_cpe_file.read())).read()
> >> -            print("CPE: Converting xml manifest to dict...")
> >> -            self.all_cpes = xmltodict.parse(cpe_file)
> >> +            print("CPE: Converting xml manifest to list...")
> >> +            tree = ET.fromstring(cpe_file)
> >> +            self.all_cpes = [i.get('name') for i in tree.iter('{http://scap.nist.gov/schema/cpe-extension/2.3}cpe23-item')]
> >
> >  So after this you get basically the same as after comparison patch 1, right? So
> > the xmltodict takes 4 minutes? Or am I missing something?
>
> No. I missed something important and jumped to wrong conclusions.
>
> After adding some simple instrumentation code to display relative timestamps,
> the main difference in performance is *not* related to the xml parser used but
> it is related to the code used for find() and find_partial().
>
> I didn't performed further testings but it seems related to the use of
> startswith as you said.
>
> patch 1:
> 0:00:00.001015 CPE: Fetching xml manifest...
> 0:00:03.924777 CPE: Unzipping xml manifest...
> 0:00:11.672462 CPE: Converting xml manifest to list...
> 0:00:11.672504 before xmltodict.parse
> 0:00:36.343417 before append
> 0:00:36.462400 list created
> 0:00:36.738042 Build package list ...
> 0:00:36.875742 Getting package make info ...
> 0:00:58.543116 Getting package details ...
> 0:01:00.016925 BR Infra Not building CPE for pkg: [UBOOT]
> 0:01:07.714094 BR Infra Not building CPE for pkg: [IMX_USB_LOADER]
> ...
> 0:08:00.615649 BR Infra Not building CPE for pkg: [INTLTOOL]
> 0:08:01.243667 BR Infra Not building CPE for pkg: [DOXYGEN]
> 0:08:02.035463 Calculate stats
> 0:08:02.042401 Write HTML
>
> patch 2:
> 0:00:00.000889 CPE: Fetching xml manifest...
> 0:00:03.640856 CPE: Unzipping xml manifest...
> 0:00:14.569496 CPE: Converting xml manifest to list...
> 0:00:14.569541 before ET.fromstring
> 0:00:21.325842 before list comprehension
> 0:00:21.609946 list created
> 0:00:21.612443 Build package list ...
> 0:00:21.754223 Getting package make info ...
> 0:00:43.111196 Getting package details ...
> 0:00:43.828047 BR Infra Not building CPE for pkg: [UBOOT]
> 0:00:47.125995 BR Infra Not building CPE for pkg: [IMX_USB_LOADER]
> ...
> 0:03:46.279893 BR Infra Not building CPE for pkg: [INTLTOOL]
> 0:03:46.571266 BR Infra Not building CPE for pkg: [DOXYGEN]
> 0:03:46.892839 Calculate stats
> 0:03:46.895765 Write HTML
>
> >  Oh, actually, the [... for ... iter...] is also more efficient than
> > for...: append() so that could be an effect here as well. But this part of the
> > code is only O(#cpe packages) so it shouldn't have that much impact.
> >
> >>          except urllib2.HTTPError:package
> >> @@ -580,5 +581,5 @@ class CPE:
> >>          print("CPE: Searching for partial [%s]" % cpe_str)
> >> -        for cpe in self.all_cpes['cpe-list']['cpe-item']:
> >> -            if cpe_str in cpe['cpe-23:cpe23-item']['@name']:
> >> -                return cpe['cpe-23:cpe23-item']['@name']
> >> +        for cpe in self.all_cpes:
> >> +            if cpe.startswith(cpe_str):
> >
> >  Originally it was 'in' instead of startswith(). Obviously startswith() will be
> > more efficient. And also more correct, I guess, or does the partial match not

I just sent a revised v5 that's a hybrid of feedback so far.
http://patchwork.ozlabs.org/project/buildroot/list/?series=45124

Optimization wise, I did not switch to using .startswith however would
need to look now traversing a set with a for loop and using startswith
would be more efficient then doing a if "in" check.

Matt

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Buildroot] [PATCH v4 5/5] support/scripts/pkgstats: add CPE reporting
  2018-05-18  3:16         ` Ricardo Martincoski
@ 2018-05-18  3:21           ` Matthew Weber
  2018-05-18  3:44             ` Ricardo Martincoski
  0 siblings, 1 reply; 17+ messages in thread
From: Matthew Weber @ 2018-05-18  3:21 UTC (permalink / raw)
  To: buildroot

Ricardo,

On Thu, May 17, 2018 at 10:16 PM, Ricardo Martincoski
<ricardo.martincoski@gmail.com> wrote:
> Hello,
>
> On Wed, May 16, 2018 at 10:42 PM, Matthew Weber wrote:
>> On Wed, May 16, 2018 at 6:32 PM, Arnout Vandecappelle wrote:
>>> On 16-05-18 05:43, Ricardo Martincoski wrote:
>
> [snip]
>>> >> +    cpe_dict = CPE()
>>> >> +    cpe_dict.get_xml_dict()
>>> >> +    if args.cpe_report:
>>> >> +        print("Performing Target CPE Report Analysis...")
>>> >> +        get_target_cpe_report(args.cpe_report, cpe_dict)
>>> >> +    elif args.output:
>>> >
>>> > This is not common in other scripts in the tree. All checks between options are
>>> > done at the start of __main__.
>>> > But having two different code paths is not common either (in the scripts in the
>>> > tree), so it seems to me it makes sense here.
>>> > Maybe others disagree.
>>>
>>>  The way I see it, there is almost no commonality between the script with the -c
>>> option and the normal pkg-stats, so it makes no sense to have them in the same
>>> script. I would split off the CPE class into a separate module that would be
>>> imported into this script and a new cpe_report script.
>>
>> I debated this.  Ricardo/Arnout, what naming and directory scheme
>> would you propose?
>
> It seems you need a single shared module, so inspired by:
>
> getdeveloperlib.py
> get-developers -> import getdeveloperlib
>
> you could have:
>
> cpelib.py
> pkg-stats -> import cpelib
> cpe-report -> import cpelib
>
> But cpe-report seems to be a user-facing script, so it would go to the utils/
> directory. I am not sure what is the best way to solve the import for:
>
> support/scripts/cpelib.py
> support/scripts/pkg-stats -> import cpelib
> utils/cpe-report -> import cpelib
>
> Maybe a symlink utils/cpelib.py -> support/scripts/cpelib.py ? (not tested)
> I never tried importing using relative paths without having __init__.py.
>
>

I caught Arnout on IRC today and my v5 patchset reflects his
suggestion.  Ended up calling the class/module file
support/scripts/cpedb.py and support/scripts/cpe-report for the
checker script.  I do see what you mean about a user facing script.
What do others think?

Matt

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Buildroot] [PATCH v4 5/5] support/scripts/pkgstats: add CPE reporting
  2018-05-18  3:21           ` Matthew Weber
@ 2018-05-18  3:44             ` Ricardo Martincoski
  2018-05-18 13:07               ` Matthew Weber
  0 siblings, 1 reply; 17+ messages in thread
From: Ricardo Martincoski @ 2018-05-18  3:44 UTC (permalink / raw)
  To: buildroot

Hello,

On Fri, May 18, 2018 at 12:21 AM, Matthew Weber wrote:

> I caught Arnout on IRC today and my v5 patchset reflects his
> suggestion.  Ended up calling the class/module file
> support/scripts/cpedb.py and support/scripts/cpe-report for the

Good names IMO.

Just for reference, in the same machine with v5 takes 3m13,630s to run pkg-stats.
In the current master: 1m40,318s

> checker script.  I do see what you mean about a user facing script.
> What do others think?

Let's wait for more input.

But if for any reason you send a v6, I would suggest to:
Merge patch 7 back to 5.
Split patch 6 into one changing print to print(), and another one
adding CPE report.

Regards,
Ricardo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Buildroot] [PATCH v4 5/5] support/scripts/pkgstats: add CPE reporting
  2018-05-18  3:44             ` Ricardo Martincoski
@ 2018-05-18 13:07               ` Matthew Weber
  0 siblings, 0 replies; 17+ messages in thread
From: Matthew Weber @ 2018-05-18 13:07 UTC (permalink / raw)
  To: buildroot

Ricardo,

On Thu, May 17, 2018 at 10:44 PM, Ricardo Martincoski
<ricardo.martincoski@gmail.com> wrote:
> Hello,
>
> On Fri, May 18, 2018 at 12:21 AM, Matthew Weber wrote:
>
>> I caught Arnout on IRC today and my v5 patchset reflects his
>> suggestion.  Ended up calling the class/module file
>> support/scripts/cpedb.py and support/scripts/cpe-report for the
>
> Good names IMO.
>
> Just for reference, in the same machine with v5 takes 3m13,630s to run pkg-stats.
> In the current master: 1m40,318s
>
>> checker script.  I do see what you mean about a user facing script.
>> What do others think?
>
> Let's wait for more input.
>
> But if for any reason you send a v6, I would suggest to:
> Merge patch 7 back to 5.
> Split patch 6 into one changing print to print(), and another one
> adding CPE report.
>

I just split the  print to print() out into a seperate
http://patchwork.ozlabs.org/patch/916300/.

I'll respin v6 after more feedback and fold the cpe-report in with the
cpedb (7 into 5).

Matt

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2018-05-18 13:07 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-10 18:58 [Buildroot] [PATCH v4 0/5] CPE ID Support Matt Weber
2018-05-10 18:58 ` [Buildroot] [PATCH v4 1/5] cpe-info: new make target Matt Weber
2018-05-10 18:58 ` [Buildroot] [PATCH v4 2/5] cpe-info: id prefix/suffix Matt Weber
2018-05-10 18:58 ` [Buildroot] [PATCH v4 3/5] cpe-info: only report target pkgs Matt Weber
2018-05-10 18:58 ` [Buildroot] [PATCH v4 4/5] cpe-info: update manual for new pkg vars Matt Weber
2018-05-10 18:58 ` [Buildroot] [PATCH v4 5/5] support/scripts/pkgstats: add CPE reporting Matt Weber
2018-05-16  3:43   ` Ricardo Martincoski
2018-05-16 23:32     ` Arnout Vandecappelle
2018-05-17  1:42       ` Matthew Weber
2018-05-18  3:16         ` Ricardo Martincoski
2018-05-18  3:21           ` Matthew Weber
2018-05-18  3:44             ` Ricardo Martincoski
2018-05-18 13:07               ` Matthew Weber
2018-05-18  3:07       ` Ricardo Martincoski
2018-05-18  3:18         ` Matthew Weber
2018-05-16 23:34   ` Arnout Vandecappelle
2018-05-17  1:42     ` Matthew Weber

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.