linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/1] Fix race in ipmi timer cleanup
@ 2019-08-28 20:36 Jes Sorensen
  2019-08-28 20:36 ` [PATCH 1/1] ipmi_si_intf: Fix race in timer shutdown handling Jes Sorensen
  2019-08-28 22:32 ` [PATCH 0/1] Fix race in ipmi timer cleanup Corey Minyard
  0 siblings, 2 replies; 7+ messages in thread
From: Jes Sorensen @ 2019-08-28 20:36 UTC (permalink / raw)
  To: minyard; +Cc: linux-kernel, openipmi-developer, kernel-team

From: Jes Sorensen <jsorensen@fb.com>

I came across this in 4.16, but I believe the bug is still present
in current 5.x, even if it is less likely to trigger.

Basially stop_timer_and_thread() only calls del_timer_sync() if
timer_running == true. However smi_mod_timer enables the timer before
setting timer_running = true.

I was able to reproduce this in 4.16 running the following on a host

   while :; do rmmod ipmi_si ; modprobe ipmi_si; done

while rebooting the BMC on it in parallel.

5.2 moves the error handling around and does it more centralized, but
relying on timer_running still seems dubious to me.

static void smi_mod_timer(struct smi_info *smi_info, unsigned long new_val)
{
        if (!smi_info->timer_can_start)
                return;
        smi_info->last_timeout_jiffies = jiffies;
        mod_timer(&smi_info->si_timer, new_val);
        smi_info->timer_running = true;
}

static inline void stop_timer_and_thread(struct smi_info *smi_info)
{
        if (smi_info->thread != NULL) {
                kthread_stop(smi_info->thread);
                smi_info->thread = NULL;
        }

        smi_info->timer_can_start = false;
        if (smi_info->timer_running)
                del_timer_sync(&smi_info->si_timer);
}

Cheers,
Jes

Jes Sorensen (1):
  ipmi_si_intf: Fix race in timer shutdown handling

 drivers/char/ipmi/ipmi_si_intf.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

-- 
2.21.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/1] ipmi_si_intf: Fix race in timer shutdown handling
  2019-08-28 20:36 [PATCH 0/1] Fix race in ipmi timer cleanup Jes Sorensen
@ 2019-08-28 20:36 ` Jes Sorensen
  2019-08-28 22:32 ` [PATCH 0/1] Fix race in ipmi timer cleanup Corey Minyard
  1 sibling, 0 replies; 7+ messages in thread
From: Jes Sorensen @ 2019-08-28 20:36 UTC (permalink / raw)
  To: minyard; +Cc: linux-kernel, openipmi-developer, kernel-team

From: Jes Sorensen <jsorensen@fb.com>

smi_mod_timer() enables the timer before setting timer_running. This
means the timer can be running when we get to stop_timer_and_thread()
without timer_running having been set, resulting in del_timer_sync()
not being called and the timer being left to cause havoc during
shutdown.

Instead just call del_timer_sync() unconditionally

Signed-off-by: Jes Sorensen <jsorensen@fb.com>
---
 drivers/char/ipmi/ipmi_si_intf.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c
index da5b6723329a..53425e25ecf4 100644
--- a/drivers/char/ipmi/ipmi_si_intf.c
+++ b/drivers/char/ipmi/ipmi_si_intf.c
@@ -1828,8 +1828,7 @@ static inline void stop_timer_and_thread(struct smi_info *smi_info)
 	}
 
 	smi_info->timer_can_start = false;
-	if (smi_info->timer_running)
-		del_timer_sync(&smi_info->si_timer);
+	del_timer_sync(&smi_info->si_timer);
 }
 
 static struct smi_info *find_dup_si(struct smi_info *info)
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/1] Fix race in ipmi timer cleanup
  2019-08-28 20:36 [PATCH 0/1] Fix race in ipmi timer cleanup Jes Sorensen
  2019-08-28 20:36 ` [PATCH 1/1] ipmi_si_intf: Fix race in timer shutdown handling Jes Sorensen
@ 2019-08-28 22:32 ` Corey Minyard
  2019-08-29  0:53   ` Jes Sorensen
  1 sibling, 1 reply; 7+ messages in thread
From: Corey Minyard @ 2019-08-28 22:32 UTC (permalink / raw)
  To: Jes Sorensen; +Cc: linux-kernel, openipmi-developer, kernel-team

On Wed, Aug 28, 2019 at 04:36:24PM -0400, Jes Sorensen wrote:
> From: Jes Sorensen <jsorensen@fb.com>
> 
> I came across this in 4.16, but I believe the bug is still present
> in current 5.x, even if it is less likely to trigger.
> 
> Basially stop_timer_and_thread() only calls del_timer_sync() if
> timer_running == true. However smi_mod_timer enables the timer before
> setting timer_running = true.

All the modifications/checks for timer_running should be done under
the si_lock.  It looks like a lock is missing in shutdown_smi(),
probably starting before setting interrupt_disabled to true and
after stop_timer_and_thread.  I think that is the right fix for
this problem.

-corey

> 
> I was able to reproduce this in 4.16 running the following on a host
> 
>    while :; do rmmod ipmi_si ; modprobe ipmi_si; done
> 
> while rebooting the BMC on it in parallel.
> 
> 5.2 moves the error handling around and does it more centralized, but
> relying on timer_running still seems dubious to me.
> 
> static void smi_mod_timer(struct smi_info *smi_info, unsigned long new_val)
> {
>         if (!smi_info->timer_can_start)
>                 return;
>         smi_info->last_timeout_jiffies = jiffies;
>         mod_timer(&smi_info->si_timer, new_val);
>         smi_info->timer_running = true;
> }
> 
> static inline void stop_timer_and_thread(struct smi_info *smi_info)
> {
>         if (smi_info->thread != NULL) {
>                 kthread_stop(smi_info->thread);
>                 smi_info->thread = NULL;
>         }
> 
>         smi_info->timer_can_start = false;
>         if (smi_info->timer_running)
>                 del_timer_sync(&smi_info->si_timer);
> }
> 
> Cheers,
> Jes
> 
> Jes Sorensen (1):
>   ipmi_si_intf: Fix race in timer shutdown handling
> 
>  drivers/char/ipmi/ipmi_si_intf.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> -- 
> 2.21.0
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/1] Fix race in ipmi timer cleanup
  2019-08-28 22:32 ` [PATCH 0/1] Fix race in ipmi timer cleanup Corey Minyard
@ 2019-08-29  0:53   ` Jes Sorensen
  2019-08-29 18:15     ` Corey Minyard
  0 siblings, 1 reply; 7+ messages in thread
From: Jes Sorensen @ 2019-08-29  0:53 UTC (permalink / raw)
  To: minyard; +Cc: linux-kernel, openipmi-developer, kernel-team

On 8/28/19 6:32 PM, Corey Minyard wrote:
> On Wed, Aug 28, 2019 at 04:36:24PM -0400, Jes Sorensen wrote:
>> From: Jes Sorensen <jsorensen@fb.com>
>>
>> I came across this in 4.16, but I believe the bug is still present
>> in current 5.x, even if it is less likely to trigger.
>>
>> Basially stop_timer_and_thread() only calls del_timer_sync() if
>> timer_running == true. However smi_mod_timer enables the timer before
>> setting timer_running = true.
> 
> All the modifications/checks for timer_running should be done under
> the si_lock.  It looks like a lock is missing in shutdown_smi(),
> probably starting before setting interrupt_disabled to true and
> after stop_timer_and_thread.  I think that is the right fix for
> this problem.

Hi Corey,

I agree a spin lock could deal with this specific issue too, but calling
del_timer_sync() is safe to call on an already disabled timer. The whole
flagging of timer_running really doesn't make much sense in the first
place either.

As for interrupt_disabled that is even worse. There's multiple places in
the code where interrupt_disabled is checked, some of them are not
protected by a spin lock, including shutdown_smi() where you have this
sequence:

        while (smi_info->curr_msg || (smi_info->si_state != SI_NORMAL)){
                poll(smi_info);
                schedule_timeout_uninterruptible(1);
        }
        if (smi_info->handlers)
                disable_si_irq(smi_info);
        while (smi_info->curr_msg || (smi_info->si_state != SI_NORMAL)){
                poll(smi_info);
                schedule_timeout_uninterruptible(1);
        }

In this case you'll have to drop and retake the long several times.

You also have this call sequence which leads to disable_si_irq() which
checks interrupt_disabled:

  flush_messages()
    smi_event_handler()
      handle_transaction_done()
        handle_flags()
          alloc_msg_handle_irq()
            disable_si_irq()

{disable,enable}_si_irq() themselves are racy:

static inline bool disable_si_irq(struct smi_info *smi_info)
{
        if ((smi_info->io.irq) && (!smi_info->interrupt_disabled)) {
                smi_info->interrupt_disabled = true;

Basically interrupt_disabled need to be atomic here to have any value,
unless you ensure to have a spin lock around every access to it.

Cheers,
Jes

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/1] Fix race in ipmi timer cleanup
  2019-08-29  0:53   ` Jes Sorensen
@ 2019-08-29 18:15     ` Corey Minyard
  2019-09-15  1:08       ` [Openipmi-developer] " Corey Minyard
  0 siblings, 1 reply; 7+ messages in thread
From: Corey Minyard @ 2019-08-29 18:15 UTC (permalink / raw)
  To: Jes Sorensen; +Cc: linux-kernel, openipmi-developer, kernel-team

On Wed, Aug 28, 2019 at 08:53:47PM -0400, Jes Sorensen wrote:
> On 8/28/19 6:32 PM, Corey Minyard wrote:
> > On Wed, Aug 28, 2019 at 04:36:24PM -0400, Jes Sorensen wrote:
> >> From: Jes Sorensen <jsorensen@fb.com>
> >>
> >> I came across this in 4.16, but I believe the bug is still present
> >> in current 5.x, even if it is less likely to trigger.
> >>
> >> Basially stop_timer_and_thread() only calls del_timer_sync() if
> >> timer_running == true. However smi_mod_timer enables the timer before
> >> setting timer_running = true.
> > 
> > All the modifications/checks for timer_running should be done under
> > the si_lock.  It looks like a lock is missing in shutdown_smi(),
> > probably starting before setting interrupt_disabled to true and
> > after stop_timer_and_thread.  I think that is the right fix for
> > this problem.
> 
> Hi Corey,
> 
> I agree a spin lock could deal with this specific issue too, but calling
> del_timer_sync() is safe to call on an already disabled timer. The whole
> flagging of timer_running really doesn't make much sense in the first
> place either.
> 
> As for interrupt_disabled that is even worse. There's multiple places in
> the code where interrupt_disabled is checked, some of them are not
> protected by a spin lock, including shutdown_smi() where you have this
> sequence:
> 
>         while (smi_info->curr_msg || (smi_info->si_state != SI_NORMAL)){
>                 poll(smi_info);
>                 schedule_timeout_uninterruptible(1);
>         }
>         if (smi_info->handlers)
>                 disable_si_irq(smi_info);
>         while (smi_info->curr_msg || (smi_info->si_state != SI_NORMAL)){
>                 poll(smi_info);
>                 schedule_timeout_uninterruptible(1);
>         }

This one doesn't matter.  At this point the driver is single-threaded,
no interrupts, timeouts, or calls from the upper layer can happen.

> 
> In this case you'll have to drop and retake the long several times.
> 
> You also have this call sequence which leads to disable_si_irq() which
> checks interrupt_disabled:
> 
>   flush_messages()
>     smi_event_handler()
>       handle_transaction_done()
>         handle_flags()
>           alloc_msg_handle_irq()
>             disable_si_irq()

This one only happens in run-to-completion mode.  Which is strange,
but a number of people had issues with getting into a new kernel before
the watchdog timeout went off, so the run-to-completion mode runs at
panic time so the driver can run without scheduling so it can extend
the watchdog and store panic information in the IPMI log.

So you actually *don't* want a lock here, since the panic may have
occurred when the IPMI driver was holding the lock.

> 
> {disable,enable}_si_irq() themselves are racy:
> 
> static inline bool disable_si_irq(struct smi_info *smi_info)
> {
>         if ((smi_info->io.irq) && (!smi_info->interrupt_disabled)) {
>                 smi_info->interrupt_disabled = true;
> 
> Basically interrupt_disabled need to be atomic here to have any value,
> unless you ensure to have a spin lock around every access to it.

It needs to be atomic, yes, but I think just adding the spinlock like
I suggested will work.  You are right, the check for timer_running is
not necessary here, and I'm fine with removing it, but there are other
issues with interrupt_disabled (as you said) and with memory ordering
in the timer case.  So even if you remove the timer running check, the
lock is still required here.

It also might be a good idea to add a WARN_ON() to smi_mod_timer() and
alloc_msg_handle_irq() if the lock is not held, just to be sure.

Thanks,

-corey

> 
> Cheers,
> Jes

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Openipmi-developer] [PATCH 0/1] Fix race in ipmi timer cleanup
  2019-08-29 18:15     ` Corey Minyard
@ 2019-09-15  1:08       ` Corey Minyard
  2019-09-16 14:01         ` Jes Sorensen
  0 siblings, 1 reply; 7+ messages in thread
From: Corey Minyard @ 2019-09-15  1:08 UTC (permalink / raw)
  To: Jes Sorensen; +Cc: openipmi-developer, kernel-team, linux-kernel

> 
> > 
> > {disable,enable}_si_irq() themselves are racy:
> > 
> > static inline bool disable_si_irq(struct smi_info *smi_info)
> > {
> >         if ((smi_info->io.irq) && (!smi_info->interrupt_disabled)) {
> >                 smi_info->interrupt_disabled = true;
> > 
> > Basically interrupt_disabled need to be atomic here to have any value,
> > unless you ensure to have a spin lock around every access to it.
> 
> It needs to be atomic, yes, but I think just adding the spinlock like
> I suggested will work.  You are right, the check for timer_running is
> not necessary here, and I'm fine with removing it, but there are other
> issues with interrupt_disabled (as you said) and with memory ordering
> in the timer case.  So even if you remove the timer running check, the
> lock is still required here.

It turns out you were right, all that really needs to be done is the
del_timer_sync().  I've added your patch to my queue.

Sorry for the trouble.

Thanks,

-corey

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Openipmi-developer] [PATCH 0/1] Fix race in ipmi timer cleanup
  2019-09-15  1:08       ` [Openipmi-developer] " Corey Minyard
@ 2019-09-16 14:01         ` Jes Sorensen
  0 siblings, 0 replies; 7+ messages in thread
From: Jes Sorensen @ 2019-09-16 14:01 UTC (permalink / raw)
  To: minyard; +Cc: openipmi-developer, kernel-team, linux-kernel

On 9/14/19 9:08 PM, Corey Minyard wrote:
>>
>>>
>>> {disable,enable}_si_irq() themselves are racy:
>>>
>>> static inline bool disable_si_irq(struct smi_info *smi_info)
>>> {
>>>         if ((smi_info->io.irq) && (!smi_info->interrupt_disabled)) {
>>>                 smi_info->interrupt_disabled = true;
>>>
>>> Basically interrupt_disabled need to be atomic here to have any value,
>>> unless you ensure to have a spin lock around every access to it.
>>
>> It needs to be atomic, yes, but I think just adding the spinlock like
>> I suggested will work.  You are right, the check for timer_running is
>> not necessary here, and I'm fine with removing it, but there are other
>> issues with interrupt_disabled (as you said) and with memory ordering
>> in the timer case.  So even if you remove the timer running check, the
>> lock is still required here.
> 
> It turns out you were right, all that really needs to be done is the
> del_timer_sync().  I've added your patch to my queue.
> 
> Sorry for the trouble.

Awesome!

Sorry I was going to get back and look at it again, but Linux Plumbers
and playing sardine i a tin can got in the way.

Cheers,
Jes


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-09-16 14:01 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-28 20:36 [PATCH 0/1] Fix race in ipmi timer cleanup Jes Sorensen
2019-08-28 20:36 ` [PATCH 1/1] ipmi_si_intf: Fix race in timer shutdown handling Jes Sorensen
2019-08-28 22:32 ` [PATCH 0/1] Fix race in ipmi timer cleanup Corey Minyard
2019-08-29  0:53   ` Jes Sorensen
2019-08-29 18:15     ` Corey Minyard
2019-09-15  1:08       ` [Openipmi-developer] " Corey Minyard
2019-09-16 14:01         ` Jes Sorensen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).