[PATCH v4] bus: mhi: host: Disable preemption while processing data events

mhi.lists.linux.dev archive mirror
 help / color / mirror / Atom feed

* [PATCH v4] bus: mhi: host: Disable preemption while processing data events
@ 2022-11-21  9:34 Qiang Yu
  2022-11-22  5:48 ` Jeffrey Hugo
  0 siblings, 1 reply; 4+ messages in thread
From: Qiang Yu @ 2022-11-21  9:34 UTC (permalink / raw)
  To: mani, loic.poulain
  Cc: mhi, linux-arm-msm, linux-kernel, quic_cang, mrana, Qiang Yu

If data processing of an event is scheduled out because core
is busy handling multiple irqs, this can starve the processing
of MHI M0 state change event on another core. Fix this issue by
disabling irq on the core processing data events.

Signed-off-by: Qiang Yu <quic_qianyu@quicinc.com>
---
v3->v4: modify the comment
v2->v3: modify the comment
v1->v2: add comments about why we disable local irq

 drivers/bus/mhi/host/main.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/bus/mhi/host/main.c b/drivers/bus/mhi/host/main.c
index f3aef77a..6c804c3 100644
--- a/drivers/bus/mhi/host/main.c
+++ b/drivers/bus/mhi/host/main.c
@@ -1029,11 +1029,17 @@ void mhi_ev_task(unsigned long data)
 {
 	struct mhi_event *mhi_event = (struct mhi_event *)data;
 	struct mhi_controller *mhi_cntrl = mhi_event->mhi_cntrl;
+	unsigned long flags;
 
+	/*
+	 * When multiple IRQs arrive, the tasklet will be scheduled out with event ring lock
+	 * acquired, causing other high priority events like M0 state transition getting stuck
+	 * while trying to acquire the same event ring lock. Thus, let's disable local IRQs here.
+	 */
+	spin_lock_irqsave(&mhi_event->lock, flags);
 	/* process all pending events */
-	spin_lock_bh(&mhi_event->lock);
 	mhi_event->process_event(mhi_cntrl, mhi_event, U32_MAX);
-	spin_unlock_bh(&mhi_event->lock);
+	spin_unlock_irqrestore(&mhi_event->lock, flags);
 }
 
 void mhi_ctrl_ev_task(unsigned long data)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v4] bus: mhi: host: Disable preemption while processing data events
  2022-11-21  9:34 [PATCH v4] bus: mhi: host: Disable preemption while processing data events Qiang Yu
@ 2022-11-22  5:48 ` Jeffrey Hugo
  2022-12-28 16:35   ` Manivannan Sadhasivam
  0 siblings, 1 reply; 4+ messages in thread
From: Jeffrey Hugo @ 2022-11-22  5:48 UTC (permalink / raw)
  To: Qiang Yu, mani, loic.poulain
  Cc: mhi, linux-arm-msm, linux-kernel, quic_cang, mrana

On 11/21/2022 2:34 AM, Qiang Yu wrote:
> If data processing of an event is scheduled out because core
> is busy handling multiple irqs, this can starve the processing
> of MHI M0 state change event on another core. Fix this issue by
> disabling irq on the core processing data events.
> 
> Signed-off-by: Qiang Yu <quic_qianyu@quicinc.com>

I've been pondering this off and on since it's been proposed.

This solution will break the described deadlock, but I don't like it.

What I really don't like is that this is selfish.  We already preempt 
anything else on the CPU that isn't a hard IRQ because we are using a 
tasklet (which is deprecated, see include/linux/interrupt.h).  Now we 
are going to essentially preempt IRQs as well by preventing them from 
being serviced.  So, now the CPU is essentially dedicated to processing 
MHI events.  It seems selfish to say that MHI is the most important 
thing on a particular CPU.

This can have a huge effect on system behavior.  If say the ssh IRQ is 
assigned to the same CPU, and we block that CPU long enough, then it 
will appear to the user as if the ssh connection has frozen.  I've 
witnessed this occur with other drivers.

How long can we block the CPU?  According to the code, pretty much for 
an unlimited amount of time.  If the tasklet is processing 
mhi_process_data_event_ring(), then we can process U32_MAX events before 
throttling (which might as well be unlimited).  If the tasklet is 
processing mhi_process_ctrl_ev_ring() then there is no throttling.

I'm thinking it would be better of the IRQ handling was refactored to 
use threaded interrupts.  The thread is an actual process, so it could 
move to another CPU.  It is also FIFO priority, so it basically will 
preempt everything but hard IRQs and soft IRQs (eg tasklets).  The 
downside of a tasklet is that it is bound to the scheduling CPU, which 
in our case is the CPU servicing the IRQ, and more than a few systems 
tend to load the majority of the IRQs to CPU0.

I'm not going to go refactor the IRQ code at this time.  This looks like 
an issue that is actually observed based on how it was reported, so it 
likely should be addressed.  I'm not happy with this solution, but I 
don't have an alternative at this time.

Mani, up to you if you want to pick this up.  I'm not nack'ing it. 
Technically I've reviewed it, but I'd say I'm "on the fence" about if 
this really should be accepted.  I can't say there is a flaw in the 
logic, but I don't feel good about this.

> ---
> v3->v4: modify the comment
> v2->v3: modify the comment
> v1->v2: add comments about why we disable local irq
> 
>   drivers/bus/mhi/host/main.c | 10 ++++++++--
>   1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/bus/mhi/host/main.c b/drivers/bus/mhi/host/main.c
> index f3aef77a..6c804c3 100644
> --- a/drivers/bus/mhi/host/main.c
> +++ b/drivers/bus/mhi/host/main.c
> @@ -1029,11 +1029,17 @@ void mhi_ev_task(unsigned long data)
>   {
>   	struct mhi_event *mhi_event = (struct mhi_event *)data;
>   	struct mhi_controller *mhi_cntrl = mhi_event->mhi_cntrl;
> +	unsigned long flags;
>   
> +	/*
> +	 * When multiple IRQs arrive, the tasklet will be scheduled out with event ring lock
> +	 * acquired, causing other high priority events like M0 state transition getting stuck
> +	 * while trying to acquire the same event ring lock. Thus, let's disable local IRQs here.
> +	 */
> +	spin_lock_irqsave(&mhi_event->lock, flags);
>   	/* process all pending events */
> -	spin_lock_bh(&mhi_event->lock);
>   	mhi_event->process_event(mhi_cntrl, mhi_event, U32_MAX);
> -	spin_unlock_bh(&mhi_event->lock);
> +	spin_unlock_irqrestore(&mhi_event->lock, flags);
>   }
>   
>   void mhi_ctrl_ev_task(unsigned long data)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v4] bus: mhi: host: Disable preemption while processing data events
  2022-11-22  5:48 ` Jeffrey Hugo
@ 2022-12-28 16:35   ` Manivannan Sadhasivam
  2022-12-30  6:18     ` Qiang Yu
  0 siblings, 1 reply; 4+ messages in thread
From: Manivannan Sadhasivam @ 2022-12-28 16:35 UTC (permalink / raw)
  To: Jeffrey Hugo
  Cc: Qiang Yu, loic.poulain, mhi, linux-arm-msm, linux-kernel,
	quic_cang, mrana

On Mon, Nov 21, 2022 at 10:48:54PM -0700, Jeffrey Hugo wrote:
> On 11/21/2022 2:34 AM, Qiang Yu wrote:
> > If data processing of an event is scheduled out because core
> > is busy handling multiple irqs, this can starve the processing
> > of MHI M0 state change event on another core. Fix this issue by
> > disabling irq on the core processing data events.
> > 
> > Signed-off-by: Qiang Yu <quic_qianyu@quicinc.com>
> 
> I've been pondering this off and on since it's been proposed.
> 
> This solution will break the described deadlock, but I don't like it.
> 
> What I really don't like is that this is selfish.  We already preempt
> anything else on the CPU that isn't a hard IRQ because we are using a
> tasklet (which is deprecated, see include/linux/interrupt.h).  Now we are
> going to essentially preempt IRQs as well by preventing them from being
> serviced.  So, now the CPU is essentially dedicated to processing MHI
> events.  It seems selfish to say that MHI is the most important thing on a
> particular CPU.
> 
> This can have a huge effect on system behavior.  If say the ssh IRQ is
> assigned to the same CPU, and we block that CPU long enough, then it will
> appear to the user as if the ssh connection has frozen.  I've witnessed this
> occur with other drivers.
> 
> How long can we block the CPU?  According to the code, pretty much for an
> unlimited amount of time.  If the tasklet is processing
> mhi_process_data_event_ring(), then we can process U32_MAX events before
> throttling (which might as well be unlimited).  If the tasklet is processing
> mhi_process_ctrl_ev_ring() then there is no throttling.
> 
> I'm thinking it would be better of the IRQ handling was refactored to use
> threaded interrupts.  The thread is an actual process, so it could move to
> another CPU.  It is also FIFO priority, so it basically will preempt
> everything but hard IRQs and soft IRQs (eg tasklets).  The downside of a
> tasklet is that it is bound to the scheduling CPU, which in our case is the
> CPU servicing the IRQ, and more than a few systems tend to load the majority
> of the IRQs to CPU0.
> 

This sounds like a plausible solution.

> I'm not going to go refactor the IRQ code at this time.  This looks like an
> issue that is actually observed based on how it was reported, so it likely
> should be addressed.  I'm not happy with this solution, but I don't have an
> alternative at this time.
> 
> Mani, up to you if you want to pick this up.  I'm not nack'ing it.
> Technically I've reviewed it, but I'd say I'm "on the fence" about if this
> really should be accepted.  I can't say there is a flaw in the logic, but I
> don't feel good about this.
> 

I do agree with you.

Qiang, can you please look into Jeff's suggestion on fixing this performance
issue?

Thanks,
Mani

> > ---
> > v3->v4: modify the comment
> > v2->v3: modify the comment
> > v1->v2: add comments about why we disable local irq
> > 
> >   drivers/bus/mhi/host/main.c | 10 ++++++++--
> >   1 file changed, 8 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/bus/mhi/host/main.c b/drivers/bus/mhi/host/main.c
> > index f3aef77a..6c804c3 100644
> > --- a/drivers/bus/mhi/host/main.c
> > +++ b/drivers/bus/mhi/host/main.c
> > @@ -1029,11 +1029,17 @@ void mhi_ev_task(unsigned long data)
> >   {
> >   	struct mhi_event *mhi_event = (struct mhi_event *)data;
> >   	struct mhi_controller *mhi_cntrl = mhi_event->mhi_cntrl;
> > +	unsigned long flags;
> > +	/*
> > +	 * When multiple IRQs arrive, the tasklet will be scheduled out with event ring lock
> > +	 * acquired, causing other high priority events like M0 state transition getting stuck
> > +	 * while trying to acquire the same event ring lock. Thus, let's disable local IRQs here.
> > +	 */
> > +	spin_lock_irqsave(&mhi_event->lock, flags);
> >   	/* process all pending events */
> > -	spin_lock_bh(&mhi_event->lock);
> >   	mhi_event->process_event(mhi_cntrl, mhi_event, U32_MAX);
> > -	spin_unlock_bh(&mhi_event->lock);
> > +	spin_unlock_irqrestore(&mhi_event->lock, flags);
> >   }
> >   void mhi_ctrl_ev_task(unsigned long data)
> 

-- 
மணிவண்ணன் சதாசிவம்

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v4] bus: mhi: host: Disable preemption while processing data events
  2022-12-28 16:35   ` Manivannan Sadhasivam
@ 2022-12-30  6:18     ` Qiang Yu
  0 siblings, 0 replies; 4+ messages in thread
From: Qiang Yu @ 2022-12-30  6:18 UTC (permalink / raw)
  To: Manivannan Sadhasivam, Jeffrey Hugo
  Cc: loic.poulain, mhi, linux-arm-msm, linux-kernel, quic_cang, mrana


On 12/29/2022 12:35 AM, Manivannan Sadhasivam wrote:
> On Mon, Nov 21, 2022 at 10:48:54PM -0700, Jeffrey Hugo wrote:
>> On 11/21/2022 2:34 AM, Qiang Yu wrote:
>>> If data processing of an event is scheduled out because core
>>> is busy handling multiple irqs, this can starve the processing
>>> of MHI M0 state change event on another core. Fix this issue by
>>> disabling irq on the core processing data events.
>>>
>>> Signed-off-by: Qiang Yu <quic_qianyu@quicinc.com>
>> I've been pondering this off and on since it's been proposed.
>>
>> This solution will break the described deadlock, but I don't like it.
>>
>> What I really don't like is that this is selfish.  We already preempt
>> anything else on the CPU that isn't a hard IRQ because we are using a
>> tasklet (which is deprecated, see include/linux/interrupt.h).  Now we are
>> going to essentially preempt IRQs as well by preventing them from being
>> serviced.  So, now the CPU is essentially dedicated to processing MHI
>> events.  It seems selfish to say that MHI is the most important thing on a
>> particular CPU.
>>
>> This can have a huge effect on system behavior.  If say the ssh IRQ is
>> assigned to the same CPU, and we block that CPU long enough, then it will
>> appear to the user as if the ssh connection has frozen.  I've witnessed this
>> occur with other drivers.
>>
>> How long can we block the CPU?  According to the code, pretty much for an
>> unlimited amount of time.  If the tasklet is processing
>> mhi_process_data_event_ring(), then we can process U32_MAX events before
>> throttling (which might as well be unlimited).  If the tasklet is processing
>> mhi_process_ctrl_ev_ring() then there is no throttling.
>>
>> I'm thinking it would be better of the IRQ handling was refactored to use
>> threaded interrupts.  The thread is an actual process, so it could move to
>> another CPU.  It is also FIFO priority, so it basically will preempt
>> everything but hard IRQs and soft IRQs (eg tasklets).  The downside of a
>> tasklet is that it is bound to the scheduling CPU, which in our case is the
>> CPU servicing the IRQ, and more than a few systems tend to load the majority
>> of the IRQs to CPU0.
>>
> This sounds like a plausible solution.
>
>> I'm not going to go refactor the IRQ code at this time.  This looks like an
>> issue that is actually observed based on how it was reported, so it likely
>> should be addressed.  I'm not happy with this solution, but I don't have an
>> alternative at this time.
>>
>> Mani, up to you if you want to pick this up.  I'm not nack'ing it.
>> Technically I've reviewed it, but I'd say I'm "on the fence" about if this
>> really should be accepted.  I can't say there is a flaw in the logic, but I
>> don't feel good about this.
>>
> I do agree with you.
>
> Qiang, can you please look into Jeff's suggestion on fixing this performance
> issue?
>
> Thanks,
> Mani

Jeff's suggestion is reasonable. I have no reasons to insist that the 
patch should be accepted.

Thanks,

Qiang

>>> ---
>>> v3->v4: modify the comment
>>> v2->v3: modify the comment
>>> v1->v2: add comments about why we disable local irq
>>>
>>>    drivers/bus/mhi/host/main.c | 10 ++++++++--
>>>    1 file changed, 8 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/bus/mhi/host/main.c b/drivers/bus/mhi/host/main.c
>>> index f3aef77a..6c804c3 100644
>>> --- a/drivers/bus/mhi/host/main.c
>>> +++ b/drivers/bus/mhi/host/main.c
>>> @@ -1029,11 +1029,17 @@ void mhi_ev_task(unsigned long data)
>>>    {
>>>    	struct mhi_event *mhi_event = (struct mhi_event *)data;
>>>    	struct mhi_controller *mhi_cntrl = mhi_event->mhi_cntrl;
>>> +	unsigned long flags;
>>> +	/*
>>> +	 * When multiple IRQs arrive, the tasklet will be scheduled out with event ring lock
>>> +	 * acquired, causing other high priority events like M0 state transition getting stuck
>>> +	 * while trying to acquire the same event ring lock. Thus, let's disable local IRQs here.
>>> +	 */
>>> +	spin_lock_irqsave(&mhi_event->lock, flags);
>>>    	/* process all pending events */
>>> -	spin_lock_bh(&mhi_event->lock);
>>>    	mhi_event->process_event(mhi_cntrl, mhi_event, U32_MAX);
>>> -	spin_unlock_bh(&mhi_event->lock);
>>> +	spin_unlock_irqrestore(&mhi_event->lock, flags);
>>>    }
>>>    void mhi_ctrl_ev_task(unsigned long data)

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-12-30  6:57 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-21  9:34 [PATCH v4] bus: mhi: host: Disable preemption while processing data events Qiang Yu
2022-11-22  5:48 ` Jeffrey Hugo
2022-12-28 16:35   ` Manivannan Sadhasivam
2022-12-30  6:18     ` Qiang Yu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).