* Re: [PATCH] zfcp: Fix spinlock imbalance in zfcp_qdio_sbal_get
[not found] <alpine.LRH.2.02.1307151612341.11390@file01.intranet.prod.int.rdu2.redhat.com>
@ 2013-07-16 9:14 ` Steffen Maier
0 siblings, 0 replies; 3+ messages in thread
From: Steffen Maier @ 2013-07-16 9:14 UTC (permalink / raw)
To: linux-s390
Hi Mikulas,
we worked on Martin's patch quoted below. I'm going to send the result
upstream with my next zfcp patches batch.
Steffen
Linux on System z Development
IBM Deutschland Research & Development GmbH
Vorsitzende des Aufsichtsrats: Martina Koederitz
Geschaeftsfuehrung: Dirk Wittkopp
Sitz der Gesellschaft: Boeblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294
On 07/15/2013 10:15 PM, Mikulas Patocka wrote:
> Any progress on this?
>
> We have a bug report in Red Hat that suggest that a crash happened because
> of this spinlock imbalance, but there is no answer from the maintainer.
>
> Mikulas
>
> On Wed, 12 Jun 2013, Martin Peschke wrote:
>> On Mon, 2013-06-10 at 11:23 +0200, Heiko Carstens wrote:
>>> Fine with me. So we can eiter take Mikulas' patch which is ready or your
>>> approach.
>>> However it would be good to have this fixed within 3.10 since there is
>>> a working fix aroung. Hm?
>>
>>
>> Heiko, Mikulas,
>> this is the code that I would like to go with.
>>
>> I am still running some I/O stress test with frequent queue stall error
>> injection to ensure code coverage for time-out conditions. Looks good so
>> far. Steffen's review is pending.
>>
>> I would appreciate your review and Ack's, too.
>>
>> Thanks a lot, Mikulas for your analysis. We dropped the ball on this
>> one :-(
>>
>> Thanks, Martin
>>
>>
>>
>> [PATCH 2/2] zfcp: use wait_event_interruptible_lock_irq_timeout()
>>
>> The zfcp driver used to call wait_event_interruptible_timeout()
>> in combination with some intricate and error-prone locking. Using
>> wait_event_interruptible_lock_irq_timeout() as a replacement
>> nicely cleans up that locking. This cleanup removes a situation that
>> resulted in a locking imbalance in zfcp_qdio_sbal_get(). And we get
>> rid of that crappy lock-unlock-lock sequence at the beginning of
>> the critical section.
>>
>> Reported-by: Mikulas Patocka <mpatocka@redhat.com>
>> Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
>> Signed-off-by: Martin Peschke <mpeschke@linux.vnet.ibm.com>
>>
>> ---
>> drivers/s390/scsi/zfcp_qdio.c | 8 ++------
>> 1 file changed, 2 insertions(+), 6 deletions(-)
>>
>> --- a/drivers/s390/scsi/zfcp_qdio.c
>> +++ b/drivers/s390/scsi/zfcp_qdio.c
>> @@ -223,11 +223,9 @@ int zfcp_qdio_sbals_from_sg(struct zfcp_
>>
>> static int zfcp_qdio_sbal_check(struct zfcp_qdio *qdio)
>> {
>> - spin_lock_irq(&qdio->req_q_lock);
>> if (atomic_read(&qdio->req_q_free) ||
>> !(atomic_read(&qdio->adapter->status) & ZFCP_STATUS_ADAPTER_QDIOUP))
>> return 1;
>> - spin_unlock_irq(&qdio->req_q_lock);
>> return 0;
>> }
>>
>> @@ -245,9 +243,8 @@ int zfcp_qdio_sbal_get(struct zfcp_qdio
>> {
>> long ret;
>>
>> - spin_unlock_irq(&qdio->req_q_lock);
>> - ret = wait_event_interruptible_timeout(qdio->req_q_wq,
>> - zfcp_qdio_sbal_check(qdio), 5 * HZ);
>> + ret = wait_event_interruptible_lock_irq_timeout(qdio->req_q_wq,
>> + zfcp_qdio_sbal_check(qdio), qdio->req_q_lock, 5 * HZ);
>>
>> if (!(atomic_read(&qdio->adapter->status) & ZFCP_STATUS_ADAPTER_QDIOUP))
>> return -EIO;
>> @@ -261,7 +258,6 @@ int zfcp_qdio_sbal_get(struct zfcp_qdio
>> zfcp_erp_adapter_reopen(qdio->adapter, 0, "qdsbg_1");
>> }
>>
>> - spin_lock_irq(&qdio->req_q_lock);
>> return -EIO;
>> }
>>
>>
>>
>> [PATCH 1/2] Add wait_event_interruptible_lock_irq_timeout()
>>
>> Provide another wait_event() function. It is a straight-forward descendant of
>> wait_event_interruptible_timeout() and wait_event_interruptible_lock_irq().
>>
>> There is a use case for this function in the zfcp device driver.
>>
>> Signed-off-by: Martin Peschke <mpeschke@linux.vnet.ibm.com>
>>
>> ---
>> include/linux/wait.h | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 57 insertions(+)
>>
>> --- a/include/linux/wait.h
>> +++ b/include/linux/wait.h
>> @@ -713,6 +713,63 @@ do { \
>> __ret; \
>> })
>>
>> +#define __wait_event_interruptible_lock_irq_timeout(wq, condition, \
>> + lock, ret) \
>> +do { \
>> + DEFINE_WAIT(__wait); \
>> + \
>> + for (;;) { \
>> + prepare_to_wait(&wq, &__wait, TASK_INTERRUPTIBLE); \
>> + if (condition) \
>> + break; \
>> + if (signal_pending(current)) { \
>> + ret = -ERESTARTSYS; \
>> + break; \
>> + } \
>> + spin_unlock_irq(&lock); \
>> + ret = schedule_timeout(ret); \
>> + spin_lock_irq(&lock); \
>> + if (!ret) \
>> + break; \
>> + } \
>> + finish_wait(&wq, &__wait); \
>> +} while (0)
>> +
>> +/**
>> + * wait_event_interruptible_lock_irq_timeout - sleep until a condition gets true or a timeout elapses.
>> + * The condition is checked under the lock. This is expected
>> + * to be called with the lock taken.
>> + * @wq: the waitqueue to wait on
>> + * @condition: a C expression for the event to wait for
>> + * @lock: a locked spinlock_t, which will be released before schedule()
>> + * and reacquired afterwards.
>> + * @timeout: timeout, in jiffies
>> + *
>> + * The process is put to sleep (TASK_INTERRUPTIBLE) until the
>> + * @condition evaluates to true or signal is received. The @condition is
>> + * checked each time the waitqueue @wq is woken up.
>> + *
>> + * wake_up() has to be called after changing any variable that could
>> + * change the result of the wait condition.
>> + *
>> + * This is supposed to be called while holding the lock. The lock is
>> + * dropped before going to sleep and is reacquired afterwards.
>> + *
>> + * The function returns 0 if the @timeout elapsed, -ERESTARTSYS if it
>> + * was interrupted by a signal, and the remaining jiffies otherwise
>> + * if the condition evaluated to true before the timeout elapsed.
>> + */
>> +#define wait_event_interruptible_lock_irq_timeout(wq, condition, lock, \
>> + timeout) \
>> +({ \
>> + int __ret = timeout; \
>> + \
>> + if (!(condition)) \
>> + __wait_event_interruptible_lock_irq_timeout( \
>> + wq, condition, lock, __ret); \
>> + __ret; \
>> +})
>> +
>>
>> /*
>> * These are the old interfaces to sleep waiting for an event.
>>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] zfcp: Fix spinlock imbalance in zfcp_qdio_sbal_get
[not found] <1371131228.12134.19.camel@br9vgx5g.de.ibm.com>
@ 2013-06-13 14:10 ` Martin Peschke
0 siblings, 0 replies; 3+ messages in thread
From: Martin Peschke @ 2013-06-13 14:10 UTC (permalink / raw)
To: linux-s390
On Wed, 2013-06-12 at 21:25 -0400, Mikulas Patocka wrote:
> Hi
>
> It looks ok. There is a difference - before this patch,
> zfcp_erp_adapter_reopen was called without req_q_lock. With this patch, it
> is called with the lock held.
>
> Can it cause any problems? (deadlock, sleep with spinlock or lock
> inversion?) I didn't find a case where it could, but I am not familiar
> with all the code in this driver.
Ah, I forgot:
I don't see any risk as to lock inversion (erp_lock first, req_q_lock
second won't work).
The erp_lock is only used by a list of functions in zfp_erp.c. The
reopen/shutdown functions might be called from almost anywhere. They
take erp_lock; they do not issue requests and therewith do not use
req_q_lock. The other code paths in zfcp_erp.c might issue requests;
they are users of req_q_lock. But they do so without the erp_lock held,
in order to allow waiting for completion or other blocking operations
(see zfcp_erp_strategy).
Martin
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] zfcp: Fix spinlock imbalance in zfcp_qdio_sbal_get
[not found] <20130515160356.GE4298@osiris>
@ 2013-05-22 12:43 ` Martin Peschke
0 siblings, 0 replies; 3+ messages in thread
From: Martin Peschke @ 2013-05-22 12:43 UTC (permalink / raw)
To: linux-s390
> -------- Original Message --------
> Subject: Re: [PATCH] zfcp: Fix spinlock imbalance in zfcp_qdio_sbal_get
> Date: Wed, 15 May 2013 18:03:56 +0200
> From: Heiko Carstens <heiko.carstens@de.ibm.com>
> To: Mikulas Patocka <mpatocka@redhat.com>
> CC: Steffen Maier <maier@linux.vnet.ibm.com>, linux390@de.ibm.com,
> linux-s390@vger.kernel.org
>
> On Wed, May 15, 2013 at 10:58:59AM -0400, Mikulas Patocka wrote:
> > zfcp: Fix spinlock imbalance in zfcp_qdio_sbal_get
> >
> > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> >
> > ---
> > drivers/s390/scsi/zfcp_qdio.c | 5 ++++-
> > 1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > Index: linux-2.6/drivers/s390/scsi/zfcp_qdio.c
> > ===================================================================
> > --- linux-2.6.orig/drivers/s390/scsi/zfcp_qdio.c 2013-05-15 16:53:14.000000000 +0200
> > +++ linux-2.6/drivers/s390/scsi/zfcp_qdio.c 2013-05-15 16:54:23.000000000 +0200
> > @@ -250,8 +250,11 @@ int zfcp_qdio_sbal_get(struct zfcp_qdio
> > ret = wait_event_interruptible_timeout(qdio->req_q_wq,
> > zfcp_qdio_sbal_check(qdio), 5 * HZ);
> >
> > - if (!(atomic_read(&qdio->adapter->status) & ZFCP_STATUS_ADAPTER_QDIOUP))
> > + if (!(atomic_read(&qdio->adapter->status) & ZFCP_STATUS_ADAPTER_QDIOUP)) {
> > + if (ret <= 0)
> > + spin_lock_irq(&qdio->req_q_lock);
> > return -EIO;
> > + }
>
> Looks good to me. However it's Steffen's call.
Looking at the commit that introduced this race:
commit c2af7545aaff3495d9bf9a7608c52f0af86fb194
Author: Christof Schmitt <christof.schmitt@de.ibm.com>
Date: Mon Jun 21 10:11:32 2010 +0200
[SCSI] zfcp: Do not wait for SBALs on stopped queue
Trying to read the FC host statistics on an offline adapter results in
a 5 seconds wait. Reading the statistics tries to issue an exchange
port data request which first waits up to 5 seconds for an entry in
the request queue.
Change the strategy for getting a free SBAL to exit when the queue is
stopped. Reading the statistics will then fail without the wait.
makes me think that it would be best to revert it.
A much simpler and less error-prone fix for the initial 5 sec delay
usability issue would have been to check the
ZFCP_STATUS_COMMON_UNBLOCKED flag when entering that sysfs code path,
as done in other places (see zfcp_scsi_queuecommand(), for example).
I am reluctant to add more locking statements like the above.
The entire use of req_q_lock sort of works - but the code has become
crappy and should be cleaned up anyway. Just look at the
lock-unlock-lock sequence done prior to entering the critical section.
Martin
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2013-07-16 9:14 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <alpine.LRH.2.02.1307151612341.11390@file01.intranet.prod.int.rdu2.redhat.com>
2013-07-16 9:14 ` [PATCH] zfcp: Fix spinlock imbalance in zfcp_qdio_sbal_get Steffen Maier
[not found] <1371131228.12134.19.camel@br9vgx5g.de.ibm.com>
2013-06-13 14:10 ` Martin Peschke
[not found] <20130515160356.GE4298@osiris>
2013-05-22 12:43 ` Martin Peschke
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.