mhi.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: Jeffrey Hugo <quic_jhugo@quicinc.com>
To: Manivannan Sadhasivam <mani@kernel.org>
Cc: Qiang Yu <quic_qianyu@quicinc.com>, <quic_hemantk@quicinc.com>,
	<loic.poulain@linaro.org>, <mhi@lists.linux.dev>,
	<linux-arm-msm@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	<quic_cang@quicinc.com>
Subject: Re: [PATCH] bus: mhi: Disable IRQs instead of freeing them during power down
Date: Mon, 20 Jun 2022 13:59:16 -0600	[thread overview]
Message-ID: <0ad762f2-5acb-cfd1-efca-ff83f97f978d@quicinc.com> (raw)
In-Reply-To: <20220616205936.GG2889@thinkpad>

On 6/16/2022 2:59 PM, Manivannan Sadhasivam wrote:
> On Thu, Jun 16, 2022 at 09:53:34AM -0600, Jeffrey Hugo wrote:
>> On 6/15/2022 3:16 PM, Manivannan Sadhasivam wrote:
>>> On Mon, Jun 13, 2022 at 07:07:02AM -0600, Jeffrey Hugo wrote:
>>>> On 6/12/2022 7:48 PM, Qiang Yu wrote:
>>>>>
>>>>> On 6/10/2022 10:00 PM, Jeffrey Hugo wrote:
>>>>>> On 6/9/2022 9:21 PM, Qiang Yu wrote:
>>>>>>> On 6/9/2022 9:54 PM, Jeffrey Hugo wrote:
>>>>>>>
>>>>>>>> On 6/9/2022 7:43 AM, Qiang Yu wrote:
>>>>>>>>> EP tends to read MSI address/data once and cache them
>>>>>>>>> after BME is set.
>>>>>>>>> So host should avoid changing MSI address/data after BME is set.
>>>>>>>>>
>>>>>>>>> In pci reset function, host invokes free_irq(), which also clears MSI
>>>>>>>>> address/data in EP's PCIe config space. If the invalid address/data
>>>>>>>>> are cached and used by EP, MSI triggered by EP wouldn't be received by
>>>>>>>>> host, because an invalid MSI data is sent to an invalid MSI address.
>>>>>>>>>
>>>>>>>>> To fix this issue, after host runs request_irq() successfully during
>>>>>>>>> mhi driver probe, let's invoke enable_irq()/disable_irq() instead of
>>>>>>>>> request_irq()/free_irq() when we want to power on and power down MHI.
>>>>>>>>> Meanwhile, Host should invoke free_irq() when mhi host driver is
>>>>>>>>> removed.
>>>>>>>>
>>>>>>>> I don't think this works for hotplug, nor cases where there
>>>>>>>> are multiple MHI devices on the system.
>>>>>>>>
>>>>>>>> The EP shouldn't be caching this information for multiple
>>>>>>>> reasons. Masking the MSIs, disabling the MSIs, changing the
>>>>>>>> address when the affinity changes, etc.
>>>>>>>>
>>>>>>>> It really feels like we are solving the problem in the wrong place.
>>>>>>>>
>>>>>>>> Right now, this gets a NACK from me.
>>>>>>>>
>>>>>>> After free_irq(), MSI is still enabled but MSI address and data
>>>>>>> are cleared. So there is a chance that device initiates MSI
>>>>>>> using zero address. How to fix this race conditions.
>>>>>>
>>>>>> On what system is MSI still enabled?  I just removed the AIC100
>>>>>> controller on an random x86 system, and lspci is indicating MSIs are
>>>>>> disabled -
>>>>>>
>>>>>> Capabilities: [50] MSI: Enable- Count=32/32 Maskable+ 64bit+
>>>>>
>>>>> system: Ubuntu18.04, 5.4.0-89-generic,  Intel(R) Core(TM) i7-6700 CPU @
>>>>> 3.40GHz
>>>>>
>>>>> After removing MHI driver, I also see MSI enable is cleared.  But I
>>>>> don't think free_irq clears it. I add log before free_irq and after
>>>>> free_irq as following show:
>>>>>
>>>>> [62777.625111] msi cap before free irq
>>>>> [62777.625125] msi control=0x1bb, address=0xfee00318, data=0x0
>>>>> [62777.625301] msi cap after free irq
>>>>> [62777.625313] msi control=0x1bb, address=0x0, data=0x0
>>>>> [62777.625496] mhi-pci-generic 0000:01:00.0: mhi_pci_remove end of line,
>>>>> block 90 secs.
>>>>> # lspci -vvs 01:00.0
>>>>>            Capabilities: [50] MSI: Enable+ Count=8/32 Maskable+ 64bit+
>>>>>                    Address: 0000000000000000  Data: 0000
>>>>>                    Masking: ffffffff  Pending: 00000000
>>>>
>>>> At this point, the MSI functionality is still enabled, but every MSI is
>>>> masked out (Masking), so per the PCIe spec, the endpoint may not trigger a
>>>> MSI to the host.  The device advertises that it supports maskable MSIs
>>>> (Maskable+), so this is appropiate.
>>>>
>>>> If your device can still send a MSI at this point, then it violates the PCIe
>>>> spec.
>>>>
>>>> disable_irq() will not help you with this as it will do the same thing.
>>>>
>>>> I still think you are trying to fix an issue in the wrong location (host vs
>>>> EP), and causing additional issues by doing so.
>>>>
>>>
>>> Irrespective of caching the MSI data in endpoint, I'd like to get rid of
>>> request_irq/free_irq during the mhi_{power_down/power_up} time. As like the MHI
>>> endpoint stack, we should just do disable/enable irq. Because, the MHI device
>>> may go down several times while running and we do not want to deallocate the
>>> IRQs all the time. And if the device gets removed, ultimately the MHI driver
>>> will get removed and we are fine while loading it back (even if MSI count
>>> changes).
>>>
>>> I didn't had time to look into the patch in detail but I'm in favour of
>>> accepting the proposal.
>>>
>>> @Jeff: Any specific issue you are seeing with hotplug etc...?
>>
>> Perhaps I'm getting confused by the commit text of this change.
>>
>> The issue described is that we free the irq, and then the EP sends a MSI,
>> and the host doesn't receive it.  To me, that is expected.  The host doesn't
>> care about the irq anymore because it freed it, therefore it would be
>> expected that the host doesn't receive the irq.  So, the described issue is
>> not an issue since it is expected behavior from what I can tell.
>>
>> The proposed fix, is to disable the interrupts, and not free them until the
>> driver is removed.  I interpret removing the driver as "rmmod mhi".  Based
>> on this, the problem I see is a scenario where we have N devices in a
>> system, and one device is hotplugged.  On hotplug, we would want to clean up
>> all resources (free irq), but according to the description, we need to rmmod
>> mhi, which is both not automatic and also affects the other N-1 devices
>> which are presumed to be operational.
> 
> No. When the PCI device gets removed during runtime, the remove() callback will
> get called with relevant "struct pci_dev" and that should take care of all
> resource cleanup for that particular device (including free_irq).

That is what I expected, so I was confused.  Seems like we are on the 
same page now.

> You do not need to manually rmmod the driver as that will be done by the
> hotplug driver when there are no devices making use of it. And yes, the commit
> message needs to be changed. >
>>
>> Now, if we throw all of that out the window, and say that the goal is to
>> register the irqs when the controller is registered, free them when the
>> controller is unregistered, and enable/disable based on power up/down as a
>> optimization, that could be sane.  If that is what this change is attempting
>> to do, it is not what the commit text describes.
>>
>> Under the assumption that you want the optimization I just described, I will
>> re-review the code next week when I get back from my travel. Assuming the
>> implementation is good (other than what I've already pointed out), I think
>> the commit text needs to be rewritten.
>>
>> Does that clarify things for you?
> 
> Yep!

Reviewed, with additional comments.  I guess I remove my NACK, but there 
is a lot to address with v2.

  reply	other threads:[~2022-06-20 19:59 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-09 13:43 [PATCH] bus: mhi: Disable IRQs instead of freeing them during power down Qiang Yu
2022-06-09 13:54 ` Jeffrey Hugo
2022-06-10  3:21   ` Qiang Yu
2022-06-10 14:00     ` Jeffrey Hugo
2022-06-13  1:48       ` Qiang Yu
2022-06-13 13:07         ` Jeffrey Hugo
2022-06-15 21:16           ` Manivannan Sadhasivam
2022-06-16 15:53             ` Jeffrey Hugo
2022-06-16 20:59               ` Manivannan Sadhasivam
2022-06-20 19:59                 ` Jeffrey Hugo [this message]
2022-06-20 19:57   ` Jeffrey Hugo
2022-06-21  9:24     ` Qiang Yu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0ad762f2-5acb-cfd1-efca-ff83f97f978d@quicinc.com \
    --to=quic_jhugo@quicinc.com \
    --cc=linux-arm-msm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=loic.poulain@linaro.org \
    --cc=mani@kernel.org \
    --cc=mhi@lists.linux.dev \
    --cc=quic_cang@quicinc.com \
    --cc=quic_hemantk@quicinc.com \
    --cc=quic_qianyu@quicinc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).