Re: Deadcode discussion based on Arm NS phys timer

From: Bertrand Marquis <Bertrand.Marquis@arm.com>
To: Michal Orzel <michal.orzel@amd.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>,
	Julien Grall <julien@xen.org>,
	"xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>
Subject: Re: Deadcode discussion based on Arm NS phys timer
Date: Tue, 25 Oct 2022 08:20:49 +0000	[thread overview]
Message-ID: <7C838BB9-E8FA-4239-B851-B179CB6B7A02@arm.com> (raw)
In-Reply-To: <89ff9e50-be23-82c2-cd6b-81e05d339778@amd.com>

Hi Michal,

> On 25 Oct 2022, at 09:07, Michal Orzel <michal.orzel@amd.com> wrote:
> 
> Hi Bertrand,
> 
> On 25/10/2022 09:45, Bertrand Marquis wrote:
>> 
>> 
>> Hi Michal,
>> 
>>> On 25 Oct 2022, at 08:11, Michal Orzel <michal.orzel@amd.com> wrote:
>>> 
>>> Hi,
>>> 
>>> On 25/10/2022 03:29, Stefano Stabellini wrote:
>>>> 
>>>> 
>>>> On Mon, 24 Oct 2022, Julien Grall wrote:
>>>>>> On 24/10/2022 12:51, Julien Grall wrote:
>>>>>>> Caution: This message originated from an External Source. Use proper
>>>>>>> caution when opening attachments, clicking links, or responding.
>>>>>>> 
>>>>>>> 
>>>>>>> On 24/10/2022 10:07, Michal Orzel wrote:
>>>>>>>> Hello,
>>>>>>> 
>>>>>>> Hi Michal,
>>>>>>> 
>>>>>>>> Recently I came across a deadcode in Xen Arm arch timer code. Briefly
>>>>>>>> speaking, we are routing
>>>>>>>> the NS phys timer (CNTP) IRQ to Xen, even though Xen does not make use
>>>>>>>> of it (as it uses the hypervisor timer CNTHP).
>>>>>>>> This timer is fully emulated, which means that there is nothing that can
>>>>>>>> trigger such IRQ. This code is
>>>>>>>> a left over from early days, where the CNTHP was buggy on some models
>>>>>>>> and we had to use the CNTP instead.
>>>>>>>> 
>>>>>>>> As far as the problem itself is not really interesting, it raises a
>>>>>>>> question of what to do with a deadcode,
>>>>>>>> as there might be/are other deadcode places in Xen.
>>>>>>> 
>>>>>>> There are multiple definition of deadcode. Depending on which one you
>>>>>>> chose, then this could cover IS_ENABLED() and possibly #ifdef. So this
>>>>>>> would result to a lot of places impacted with the decision.
>>>>>>> 
>>>>>>> So can you clarify what you mean by deadcode?
>>>>>> In the timer example, I think we have both a deadcode and unreachable code.
>>>>>> For the purpose of this discussion, let's take the MISRA definition of a
>>>>>> deadcode which is a "code that can be executed
>>>>>> but has no effect on the functional behavior of the program". This differs
>>>>>> from the unreachable code definition that is
>>>>>> a "code that cannot be executed". Setting up the IRQ for Xen is an example
>>>>>> of a deadcode. Code within IRQ handler is an unreachable code
>>>>>> (there is nothing that can trigger this IRQ).
>>>>>> 
>>>>>> What I mean by deadcode happens to be the sum of the two cases above i.e.
>>>>>> the code that cannot be executed as well as the code that
>>>>>> does not impact the functionality of the program.
>>>>>> 
>>>>>>> 
>>>>>>>> One may say that it is useful to keep it, because one day,
>>>>>>>> someone might need it when dealing with yet another broken HW. Such
>>>>>>>> person would still need to modify the other
>>>>>>>> part of the code (e.g. reprogram_timer), but there would be less work
>>>>>>>> required overall. Personally, I'm not in favor of
>>>>>>>> such approach, because we should not really support possible scenarios
>>>>>>>> with broken HW (except for erratas listing known issues).
>>>>>>> 
>>>>>>> The difference between "broken HW" and "HW with known errata" is a bit
>>>>>>> unclear to me. Can you clarify how you would make the difference here?
>>>>>>> 
>>>>>>> In particular, at which point do you consider that the HW should not be
>>>>>>> supported by Xen?
>>>>>> I'm not saying that HW should not be supported. The difference for me
>>>>>> between broken HW and
>>>>>> HW with known errata is that for the former, the incorrect behavior is often
>>>>>> due to the early support stage,
>>>>>> using emulators/models instead of real HW, whereas for the latter, the HW is
>>>>>> already released and it happens to be that it is buggy
>>>>>> (the HW vendor is aware of the issue and released erratas).
>>>>> 
>>>>> Thanks for the clarification. What I would call broken is anything that can't
>>>>> be fixed in software. For a not too fictional example, an HW where PCI devices
>>>>> are using the same stream ID. So effectively, passthrough can't be safely
>>>>> supported.
>>>>> 
>>>>> Regarding, not yet released HW, I don't think Xen should have workaround for
>>>>> them. I wouldn't even call it "broken" because they are not yet released and
>>>>> it is common to have bug in early revision.
>>>>> 
>>>>>> Do we have any example in Xen for supporting broken HW,
>>>>>> whose vendor is not aware of the issue or did not release any errata?
>>>>> I will not cite any HW on the ML. But from my experience, the vendors are not
>>>>> very vocal about issues in public (some don't even seem to have public doc).
>>>>> The best way to find the issues is to look at Linux commit.
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>> Also, as part of the certification/FUSA process, there should be no
>>>>>>>> deadcode and we should have explanation for every block of code.
>>>>>>> 
>>>>>>> See above. What are you trying to cover by deadcode? Would protecting
>>>>>>> code with IS_ENABLED() (or #ifdef) ok?
>>>>>> I think this would be ok from the certification point of view (this would at
>>>>>> least means, that we are aware of the issue
>>>>>> and we took some steps). Otherwise, such code is just an example of a
>>>>>> deadcode/unreachable code.
>>>>> 
>>>>> Thanks for the clarification. So the exact approach will depend on the
>>>>> context....
>>>>> 
>>>>>>>> There are different ways to deal with a deadcode: > 1. Get rid of it
>>>>>>>> completely
>>>>>>>> 2. Leave it as it is
>>>>> 
>>>>> ... this is my preference in the context of the timer.
>>>> 
>>>> From a certification point of view, the fewer lines of code the better,
>>>> and ideally all the lines of code used for the certified build should be
>>>> testable and used.
>>>> 
>>>> So I think 2. is the lest useful option from a certification
>>>> perspective. For this reason, I'd prefer another alternative.
>>>> 
>>>> 
>>>>> If the other don't like it, then 1 would be my preference.
>>>>> 
>>>>> In general, my preference would be either 3.3 or 3.2 (see below).
>>>> 
>>>> I also think that 3.2 and 3.3 are good options for the general case. For
>>>> the timer, I can see why 1 is your (second) preference and I am fine
>>>> with 1 as well.
>>> Ok, sounds good to me. Let's still give Bertrand the chance to share his opinion.
>> 
>> We need to get rid of dead code and removing it is not always the best solution.
>> 
>> If the code is or could be useful for someone some day, protecting it with ifdef is ok.
>> 
>> In the mid term we will have to introduce a lot more ifdef or IS_ENABLED in the
>> code so that we can compile out what we do not need and code not applying to
>> some hardware is a case where we will do that (does not mean that by default
>> we will not compile it in but we will make it easier to reduce the code size for a
>> specific use case).
>> 
>> So 3.2 and 3.3 are ok for me.
> 
> So we all agree that the code in the current form is a no go from certification purposes.
> That is good :)
> 
> The reason why I opt for solution 1 and not the others is that in the latter case it would
> mean introducing the Kconfig option to allow user to select the timer to be used by Xen.
> This is not really correct. Also in the current form, it would also require adding more
> code to time.c code because at the moment using CNTP for Xen would not work out of the box.
> The architecture defines the hypervisor timer for a purpose. If it does not work, it means
> that the HW is problematic. I agree that we would want to support such use case but I'm not
> really aware of any issue like that. Adding more code and Kconfig options just because
> one day someone may face issues with a new HW is something I am not a fan of.

I see 2 solutions here:
- somehow push the code to a different file (not quite sure this is feasible here)
- remove completely the code with a clean commit. Doing this it will be easy for someone needing this to later revert the patch

It is definitely true here that adding more code to keep some unused code does not really make sense.
And let’s be realistic here, if we need that one day, it will not take ages to support it somehow.

As said, from a pure certification point of view:
- we must not have deadcode
- proper ifdef is acceptable
- if 0 is not acceptable
- commented code is not acceptable

> 
> I would agree with your solution under the condition that the code is already ready
> for the timer switch.
> 
> I guess we need another round of sharing opinions.
> 

I guess so yes

Cheers
Bertrand

>> 
>>> 
>>>> 
>>>> 
>>>>>>>> 3. Admit that it can be useful one day and:
>>>>>>>>   3.1. protect it with #if 0
>>>>> 
>>>>> #if 0 should not be used in Xen code. IMHO this is the worse of all the world.
>>> I share your opinion here Julien. Unfortunately we still have quite a few examples
>>> in the Arm code using this either to mark something as TODO or to comment out
>>> parts of the code waiting for future support. This is mostly in SMMU code that
>>> was taken from Linux but already diverged quite far (maybe some cleanup is necessary).
>> 
>> Definitely the SMMU code will need some cleaning.
>> #if 0 are a no go from a certification point of view.
>> 
>> Cheers
>> Bertrand
>> 
>>> 
>>>>> 
>>>>>>>>   3.2. protect it with a new Kconfig option (disabled by default)
>>>>>>>> using #ifdef
>>>>>>>>   3.3. protect it with a new Kconfig option (disabled by default)
>>>>>>>> using IS_ENABLED (to make sure code always compile)
>>>>> 
>>>>> I would prefer 3.3 over 3.2. 3.2 would be used if it is too difficult to get
>>>>> the code compiled when !IS_ENABLED.
>>>>> 
>>>>> Similar to one if this is to move all the affected code in a separate file
>>>>> with using obj-$(CONFIG...). That would only work for large chunk of code and
>>>>> would be preferred over 3.2.
>>>> 
>>> 
>>> ~Michal
>> 
> 
> ~Michal