All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sam Kappen <skappen@mvista.com>
To: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: linux-rt-users@vger.kernel.org
Subject: Re: schedule under irqs_disabled in SLUB problem
Date: Mon, 27 Nov 2017 12:16:36 +0530	[thread overview]
Message-ID: <CAJ9FNxtpTqjLevwN1w037=g1eqRPf-Fht+SFqSqJS0jVhFXkLg@mail.gmail.com> (raw)
In-Reply-To: <20171124093724.GB2564@linutronix.de>

Hi,

Many thanks for your kind response.
I will put it for long run test and update.

Could you please look at my below queries?

1.)
I had derived and tried a patch based on the below analysis.
( I referred below open source commit, to derive on this patch.
https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/commit/?h=v4.9.47-rt37-rebase&id=7a347757f027190c95a363a491c18156a926a370
)

In some cases pi_lock in rt_spin_lock_slowlock does not retain the
irqs state while exiting function, this causes
issue in migrate_disable() + enable as they are not symmetrical in
regard to the status of interrupts.
To fix pi_lock & pi_unlock in rt_spin_lock_slowlock, it has been
modified to retain irq state by using
raw_spin_lock and raw_spin_unlock and also modified wait_lock in
rt_spin_lock_slowlock with raw_spin_lock_irqsave & *_restore.



kernel/rtmutex.c | 25 ++++++++++++-------------
 1 file changed, 12 insertions(+), 13 deletions(-)

diff --git a/kernel/rtmutex.c b/kernel/rtmutex.c
index 7cf4b8b..9c67d80 100644
--- a/kernel/rtmutex.c
+++ b/kernel/rtmutex.c
@@ -1191,8 +1191,6 @@ static int adaptive_wait(struct rt_mutex *lock,
 }
 #endif

-# define pi_lock(lock) raw_spin_lock_irq(lock)
-# define pi_unlock(lock) raw_spin_unlock_irq(lock)

 /*
  * Slow path lock function spin_lock style: this variant is very
@@ -1206,14 +1204,15 @@ static void  noinline __sched
rt_spin_lock_slowlock(struct rt_mutex *lock)
  struct task_struct *lock_owner, *self = current;
  struct rt_mutex_waiter waiter, *top_waiter;
  int ret;
+ unsigned long flags;

  rt_mutex_init_waiter(&waiter, true);

- raw_spin_lock(&lock->wait_lock);
+ raw_spin_lock_irqsave(&lock->wait_lock, flags);
  init_lists(lock);

  if (__try_to_take_rt_mutex(lock, self, NULL, STEAL_LATERAL)) {
- raw_spin_unlock(&lock->wait_lock);
+ raw_spin_unlock_irqrestore(&lock->wait_lock, flags);
  return;
  }

@@ -1225,10 +1224,10 @@ static void  noinline __sched
rt_spin_lock_slowlock(struct rt_mutex *lock)
  * as well. We are serialized via pi_lock against wakeups. See
  * try_to_wake_up().
  */
- pi_lock(&self->pi_lock);
+ raw_spin_lock(&self->pi_lock);
  self->saved_state = self->state;
  __set_current_state(TASK_UNINTERRUPTIBLE);
- pi_unlock(&self->pi_lock);
+ raw_spin_unlock(&self->pi_lock);

  ret = task_blocks_on_rt_mutex(lock, &waiter, self, RT_MUTEX_MIN_CHAINWALK);
  BUG_ON(ret);
@@ -1241,18 +1240,18 @@ static void  noinline __sched
rt_spin_lock_slowlock(struct rt_mutex *lock)
  top_waiter = rt_mutex_top_waiter(lock);
  lock_owner = rt_mutex_owner(lock);

- raw_spin_unlock(&lock->wait_lock);
+ raw_spin_unlock_irqrestore(&lock->wait_lock, flags);

  debug_rt_mutex_print_deadlock(&waiter);

  if (top_waiter != &waiter || adaptive_wait(lock, lock_owner))
  schedule_rt_mutex(lock);

- raw_spin_lock(&lock->wait_lock);
+ raw_spin_lock_irqsave(&lock->wait_lock, flags);

- pi_lock(&self->pi_lock);
+ raw_spin_lock(&self->pi_lock);
  __set_current_state(TASK_UNINTERRUPTIBLE);
- pi_unlock(&self->pi_lock);
+ raw_spin_unlock(&self->pi_lock);
  }

  /*
@@ -1262,10 +1261,10 @@ static void  noinline __sched
rt_spin_lock_slowlock(struct rt_mutex *lock)
  * happened while we were blocked. Clear saved_state so
  * try_to_wakeup() does not get confused.
  */
- pi_lock(&self->pi_lock);
+ raw_spin_lock(&self->pi_lock);
  __set_current_state(self->saved_state);
  self->saved_state = TASK_RUNNING;
- pi_unlock(&self->pi_lock);
+ raw_spin_unlock(&self->pi_lock);

  /*
  * try_to_take_rt_mutex() sets the waiter bit
@@ -1276,7 +1275,7 @@ static void  noinline __sched
rt_spin_lock_slowlock(struct rt_mutex *lock)
  BUG_ON(rt_mutex_has_waiters(lock) && &waiter == rt_mutex_top_waiter(lock));
  BUG_ON(!plist_node_empty(&waiter.list_entry));

- raw_spin_unlock(&lock->wait_lock);
+ raw_spin_unlock_irqrestore(&lock->wait_lock, flags);

  debug_rt_mutex_free_waiter(&waiter);
 }
-- 
2.7.4

We were testing above patch on multiple targets we could experience
some stuck issue on some remote target after 2 days. I am not
sure what really happens there, may be the issue when try for
scheduling with irq in disabled state.
The systems I have tested found to be worked 7 days after that I
stopped the test.


2.) With your patch during the slab allocations irqs will be in enabled state.
So if we enable irqs in early stage will there be any side effects? I
am sorry if my question doesn't seem
to be logical.



Regards,
Sam






On Fri, Nov 24, 2017 at 3:07 PM, Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
> On 2017-11-24 12:09:16 [+0530], Sam Kappen wrote:
>> Hi,
> Hi,
>
>> I am also faces a similar kind of issue on X86 target, while testing
>> 3.10.105-rt119.
>> The issue is seen during boot-up when USB/SCSI enumeration starts.
>>
>> Below is the log from my console
>
> Can you try if the patch I posted solves that? From the callchain it
> looks like the same thing.
>
> Sebastian

  reply	other threads:[~2017-11-27  6:46 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CADF-jezvVP2O++FR2KiRSSSJF7oObjy8LSP3-yj1HCmxyTzB_Q@mail.gmail.com>
2017-11-02 16:50 ` schedule under irqs_disabled in SLUB problem Sebastian Andrzej Siewior
2017-11-02 20:55   ` Grygorii Strashko
     [not found]   ` <CADF-jexLs9vRuiuoRmcA+0L6Mp-XxW75okheWV+ipGf1b_Ua1w@mail.gmail.com>
2017-11-03 10:23     ` Pavel V. Panteleev
2017-11-07  9:00       ` Pavel V. Panteleev
2017-11-07  9:14       ` Pavel V. Panteleev
2017-11-07  9:47       ` Pavel V. Panteleev
2017-11-16 16:08         ` Sebastian Andrzej Siewior
2017-11-16 16:39           ` Pavel V. Panteleev
2017-11-17 17:38           ` Julia Cartwright
2017-11-24  6:39             ` Sam Kappen
2017-11-24  9:37               ` Sebastian Andrzej Siewior
2017-11-27  6:46                 ` Sam Kappen [this message]
2017-12-04  9:59                   ` Sebastian Andrzej Siewior
2017-12-05 16:31                     ` Sam Kappen
2017-12-12 10:18                       ` Sebastian Andrzej Siewior
2018-03-05  8:47                         ` Sam Kappen
2018-03-05 17:40                           ` Sebastian Andrzej Siewior
2017-11-24  9:35             ` [PATCH] mm/slub: enable IRQs once scheduling is working Sebastian Andrzej Siewior
2017-11-01 11:31 schedule under irqs_disabled in SLUB problem Pavel V. Panteleev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJ9FNxtpTqjLevwN1w037=g1eqRPf-Fht+SFqSqJS0jVhFXkLg@mail.gmail.com' \
    --to=skappen@mvista.com \
    --cc=bigeasy@linutronix.de \
    --cc=linux-rt-users@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.