Re: [RFC PATCH 1/4] xen/Makefile: add analysis-coverity and analysis-eclair

From: Luca Fancellu <Luca.Fancellu@arm.com>
To: Jan Beulich <jbeulich@suse.com>
Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>,
	Wei Chen <Wei.Chen@arm.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	George Dunlap <george.dunlap@citrix.com>,
	Julien Grall <julien@xen.org>,
	Stefano Stabellini <sstabellini@kernel.org>, Wei Liu <wl@xen.org>,
	"xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>
Subject: Re: [RFC PATCH 1/4] xen/Makefile: add analysis-coverity and analysis-eclair
Date: Fri, 11 Nov 2022 10:42:04 +0000	[thread overview]
Message-ID: <E9975528-1600-4F93-8C6E-263FF2A4A7F0@arm.com> (raw)
In-Reply-To: <e8cae5b4-82db-cdb0-eccd-36337bc8ae5a@suse.com>

> On 9 Nov 2022, at 10:36, Jan Beulich <jbeulich@suse.com> wrote:
> 
> On 09.11.2022 11:08, Luca Fancellu wrote:
>>>> On 07.11.2022 11:47, Luca Fancellu wrote:
>>>>> +Here is an example to add a new justification in false-positive-<tool>.json::
>>>> 
>>>> With <tool> already present in the name, ...
>>>> 
>>>>> +|{
>>>>> +|    "version": "1.0",
>>>>> +|    "content": [
>>>>> +|        {
>>>>> +|            "id": "SAF-0-false-positive-<tool>",
>>>>> +|            "analyser": {
>>>>> +|                "<tool>": "<proprietary-id>"
>>>> 
>>>> ... can we avoid the redundancy here? Perhaps ...
>>>> 
>>>>> +|            },
>>>>> +|            "tool-version": "<version>",
>>>> 
>>>> ... it could be
>>>> 
>>>>          "analyser": {
>>>>              "<version>": "<proprietary-id>"
>>>>          },
>> 
>> About this, I’ve investigated a bit and I don’t think this is the right solution, it wouldn't make
>> much sense to have a schema where in one file the analyser dictionary key is the tool name
>> and in another it is a version (or range of versions).
>> 
>> However I can remove the analyser dictionary and use this schema for the false-positive, which is
>> more compact:
>> 
>> |{
>> |    "version": "1.0",
>> |    "content": [
>> |        {
>> |            "id": "SAF-0-false-positive-<tool>",
>> |            “tool-proprietary-id”: "<proprietary-id>”,
>> |            "tool-version": "<version>",
>> |            "name": "R20.7 [...]",
>> |            "text": "[...]"
>> |        },
>> |        {
>> |            "id": "SAF-1-false-positive-<tool>",
>> |            “tool-proprietary-id”: "",
>> |            "tool-version": "",
>> |            "name": "Sentinel",
>> |            "text": "Next ID to be used"
>> |        }
>> |    ]
>> |}
>> 
>> This needs however a change in the initial design and more documentation on the different handlings
>> of the safe.json schema and the false-positive-<tool>.json schema. Is it worth?
> 
> I think it is, but of others disagree, so be it.

So, since no one replied on that, I think everybody agrees that safe and false-positive can have a different schema,
I will update the python tool to handle that and I will update the make recipe consequently.

>>>>> 
>>>>> Hmm, not sure: --include isn't a standard option to grep, and we
>>>>> generally try to be portable. Actually -R (or -r) isn't either. It
>>>>> may still be okay that way if properly documented where the involved
>>>>> goals will work and where not.
>>>> 
>>>> Is a comment before the line ok as documentation? To state that —include and
>>>> -R are not standard options so analysis-{coverity,eclair} will not work without a
>>>> grep that takes those parameters?
>>> 
>>> A comment _might_ be okay. Is there no other documentation on how these
>>> goals are to be used? The main question here is how impacting this might
>>> be to the various environments we allow Xen to be built in: Would at
>>> least modern versions of all Linux distros we care about allow using
>>> these rules? What about non-Linux?
>>> 
>>> And could you at least bail when PARSE_FILE_LIST ends up empty, with a
>>> clear error message augmenting the one grep would have issued?
>> 
>> An empty PARSE_FILE_LIST should not generate an error, it just means there are no
>> justifications, but I see it can be problematic in case grep does not work.
>> 
>> What about this? They should be standard options right?
>> 
>> PARSE_FILE_LIST := $(addsuffix .safparse,$(shell find $(srctree) -type f \
>>    -name '*.c' -o -name '*.h' -exec \
>>    grep -El '^[[:blank:]]*\/\*[[:space:]]+SAF-.*\*\/$$' {} + ))
> 
> Coming closer to being generally usable. You now have the problem of
> potentially exceeding command line limits (iirc there were issues in
> find and/or kernels), but I agree it looks standard-conforming now.
> 
>>>>> And then - why do you escape slashes in the ERE?
>>>>> 
>>>>> Talking of escaping - personally I find backslash escapes harder to
>>>>> read / grok than quotation, so I'd like to recommend using quotes
>>>>> around each of the two --include (if they remain in the first place)
>>>>> instead of the \* construct.
>>>> 
>>>> Ok I’ve removed the escape from the * and also from slashes:
>>>> 
>>>> PARSE_FILE_LIST := $(addsuffix .safparse,$(shell grep -ERl --include='*.h' \
>>>>   --include='*.c' '^[[:blank:]]*/\*[[:space:]]+SAF-.*\*/$$' $(srctree)))
>>> 
>>> Good - seeing things more clearly now my next question is: Isn't
>>> matching just "/* SAF-...*/" a little too lax? And is there really a
>>> need to permit leading blanks?
>> 
>> I’m permitting blanks to allow spaces or tabs, zero or more times before the start of
>> the comment, I think it shall be like that.
> 
> Hmm, I withdraw my question realizing that you want these comments
> indented the same as the line they relate to.
> 
>> About matching, maybe I can match also the number after SAF-, this should be enough,
>> 
>> […] grep -El '^[[:blank:]]*\/\*[[:space:]]+SAF-[0-9]+.*\*\/$$’ […]
> 
> I'd like to suggest to go one tiny step further (and once again to
> drop the escaping of slashes):
> 
> '^[[:blank:]]*/\*[[:space:]]+SAF-[0-9]+-.*\*/$$'

I agree, I will use this one that is safer and includes your suggestions:

PARSE_FILE_LIST := $(addsuffix .safparse,$(shell find $(srctree) -type f \
    -name '*.c' -o -name '*.h' -exec \
    grep -El '^[[:blank:]]*/\*[[:space:]]+SAF-[0-9]+-.*\*/$$' {} \; ))

> 
>>>> Now analysis-build-coverity will be called, the best match is analysis-build-%, so again the dependency
>>>> which is analysis-parse-tags-%, will be translated to analysis-parse-tags-coverity.
>>>> 
>>>> Now analysis-parse-tags-coverity will be called, the best match is analysis-parse-tags-%, so the % will
>>>> Have the ‘coverity’ value and in the dependency we will have $(objtree)/%.sed -> $(objtree)/coverity.sed.
>>>> 
>>>> Looking for $(objtree)/coverity.sed the best match is $(objtree)/%.sed, which will have $(JUSTIFICATION_FILES)
>>>> and the python script in the dependency, here we will use the second expansion to solve
>>>> $(XEN_ROOT)/docs/misra/false-positive-$$*.json in $(XEN_ROOT)/docs/misra/false-positive-coverity.json
>>>> 
>>>> So now after analysis-parse-tags-coverity has ended its dependency it will start with its recipe, after it finishes,
>>>> the recipe of analysis-build-coverity will start and it will call make to actually build Xen.
>>> 
>>> Okay, I see now - this building of Xen really _is_ independent of the
>>> checker chosen. I'm not sure though whether it is a good idea to
>>> integrate all this, including ...
>>> 
>>>> After the build finishes, if the status is good, the analysis-build-coverity has finished and the _analysis-coverity
>>>> recipe can now run, it will call make with the analysis-clean target, restoring any <file>.{c,h}.safparse to <file>.{c,h}.
>>> 
>>> ... the subsequent cleaning. The state of the _source_ tree after a
>>> build failure would be different from that after a successful build.
>>> Personally I consider this at best surprising.
>>> 
>>> I wonder whether instead there could be a shell(?) script driving a
>>> sequence of make invocations, leaving the new make goals all be self-
>>> contained. Such a script could revert the source tree to its original
>>> state even upon build failure by default, with an option allowing to
>>> suppress this behavior.
>> 
>> Instead of adding another tool, so another layer to the overall system, I would be more willing to add documentation
>> about this process, explaining how to use the analysis-* build targets, what to expect after a successful run and what
>> to expect after a failure.
>> 
>> What do you think?
> 
> Personally I'd prefer make goals to behave as such, with no surprises.

The analysis-* goal requires a build step, otherwise no analysis can be performed by the analysis tools, so I hope we agree
we need to integrate that step as a dependency of the analysis-*.
I understand that the analysis-clean might be a “surprise” if not well documented, this comes from the need to substitute the
tags in the tree (to keep the real path in the report log) and to revert them back at the end of the analysis.

So, such script should just mask to the user the analysis-clean invocation in case of errors (with an option to don’t do that)?

> 
> Jan