All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: function call fw_iso_resource_mange(..) (core-iso.c) does not return
       [not found] <8ac7ca3200325ddf85ba57aa6d000f70@gatzka.org>
@ 2013-05-21 20:28 ` Stefan Richter
  2013-05-22  9:08   ` Stephan Gatzka
       [not found] ` <519BA6AC.1080600@hurleysoftware.com>
  1 sibling, 1 reply; 15+ messages in thread
From: Stefan Richter @ 2013-05-21 20:28 UTC (permalink / raw)
  To: stephan.gatzka; +Cc: linux1394-devel, peter, linux-kernel, Tejun Heo

(Adding Cc: LKML and Tejun.  Not adding Ralf, I think he is subscribed.)

On May 21 Stephan Gatzka wrote:
> Hi!
> 
> I want to keep you informed about that issue, Ralf described some month 
> ago:
> 
> http://marc.info/?l=linux1394-devel&m=136152042914650&w=2
> 
> We instrumented the workqueue and got a lot of insight how it works. :)
> 
> The deadlock occurs mainly because the firewire workqueue is allocated 
> with WQ_MEM_RECLAIM. That's why a so called rescuer thread is started 
> while allocating the wq.
> 
> This rescuer thread is responsible to keep the queue working even under 
> high memory pressure so that a memory allocation might sleep. If that 
> happens, all work of that workqueue is designated to that particular 
> rescuer thread. The work in this rescuer thread is done strictly 
> sequential. Now we have the situation that the rescuer thread runs 
> fw_device_init->read_config_rom->read_rom->fw_run_transaction. 
> fw_run_transaction blocks waiting for the completion object. This 
> completion object will be completed in bus_reset_work, but this work 
> will never executed in the rescuer thread.
> 
> The interesting question is why the firewire wq stuff is designated to 
> the rescuer thread because we are definitely not under high memory 
> pressure. I looked into the workqueue code and  found that the work is 
> shifted to the rescuer a new worker thread could not be allocated in a 
> certain time (static bool maybe_create_worker(struct global_cwq *gcwq) 
> in workqueue.c). This timeout (MAYDAY_INITIAL_TIMEOUT) is set to 10ms.

If I recall correctly what was written earlier in this thread, the fact
that firewire-ohci schedules its work from IRQ context seems to matter.
If so, then this raises the generally interesting question how to use
workqueue as bottom half of an interrupt handler.¹  Does this necessitate
separate workqueues --- one for the BH worklets and another for worklets
that depend on the BH worklets?²

But Ralf wrote on March 12:  "Other tests have shown that even the own
workqueue is not enough.  The deadlock also occurs here.  Not nearly as
often, but it happens.  Only  with the tasklet can I yet be sure that the
deadlock does not occur."  So, having a private workqueue just for this
single BH worklet alone is still not the answer.

--------
 ¹) Worklets as bottom halves --- as alternative to tasklets as bottom
    halves --- have been suggested by RT folks in the past because there
    is more control over CPU scheduling of worklets compared to tasklets,
    obviously.

 ²) In the firewire-subsystem, we have a subsystem workqueue for various
    work from firewire-core and occasional work from upper-layer drivers
    IOW protocol drivers.  A while ago we converted a low-level driver IRQ
    BH from tasklet to worklet in order to be able to do some sleeping
    stuff in it. However, progress of the core and upper layer work
    depends on the IRQ BH to be executed concurrently.

> What I don't understand is that the create_worker() call is not 
> satisfied within 10ms. We run the stuff on a 400Mhz PowerPC with Xenomai 
> realtime extension. Because we have a lot of firewire participants, we 
> got a log of bus resets when starting up the whole system. Maybe we 
> spend too much time during the interrupts, but that's just a rough guess 
> right now. We will look further into that issue.

Wait a minute --- I seem to be reading of Xenomai RT extensions for the
first time in this thread.  Has the problem been reproduced on a mainline
kernel too?  (Mainline plus Ralf's firewire upper layer driver maybe, but
without any other 3rd party stuff please.  Actually the issue should be
reproducible even without an upper layer driver performing
fw_iso_resource_manage in fw_workqueue context, shouldn't it?)


There was one or two /other/ reports about split transactions never
completing, i.e. not even timing out, on the ffado-user or ffado-devel
mailinglist:  The userland Jack audio server got stuck in
drivers/firewire/core-cdev.c::fw_device_op_release().  However, whether
this was the same problem or a different one altogether is not known to
me.  There is at least one known hardware³ bug which we do not handle in
yet which can lead to this, orthogonally to the workqueue problem which we
are discussing here.  ---  Long story short, Ralf may possibly be the
only one who suffers from this problem so far, which makes it even more
important to verify whether or not this is a mainline problem.)

--------
 ³) JMB38x

> Our workaround is to declare the firewire workqueue as CPU_INTENSIVE. 
> This gives a slight different execution model in the normal case (no 
> work running on the rescuer thread). Maybe it's worth to think about 
> that because some work looks rather heavy to me.
> 
> Regards,
> 
> Stephan

There is quite a bit going on in firewire-ohci's and firewire-core's parts
of self-ID-complete event handling.  But I suspect that stuff like that is
still not in the league for which CPU_INTENSIVE was intended for
(cryptography etc.).
-- 
Stefan Richter
-=====-===-= -=-= =-=-=
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: function call fw_iso_resource_mange(..) (core-iso.c) does not return
       [not found] ` <519BA6AC.1080600@hurleysoftware.com>
@ 2013-05-21 21:13   ` Stefan Richter
  2013-05-22  8:53     ` Stephan Gatzka
  2013-05-22 13:38     ` Peter Hurley
       [not found]   ` <62922216e6007f9ef83956e0ca202644@gatzka.org>
       [not found]   ` <20130521231847.GA6985@mtj.dyndns.org>
  2 siblings, 2 replies; 15+ messages in thread
From: Stefan Richter @ 2013-05-21 21:13 UTC (permalink / raw)
  To: Peter Hurley; +Cc: stephan.gatzka, linux1394-devel, Tejun Heo, linux-kernel

On May 21 Peter Hurley wrote:
> [ +cc Tejun Heo]
(+cc lkml)

> On 05/21/2013 04:42 AM, Stephan Gatzka wrote:
> > Hi!
> >
> > I want to keep you informed about that issue, Ralf described some month ago:
> >
> > http://marc.info/?l=linux1394-devel&m=136152042914650&w=2
> >
> > We instrumented the workqueue and got a lot of insight how it works. :)
> 
> Well done!
> 
> > The deadlock occurs mainly because the firewire workqueue is allocated with
> > WQ_MEM_RECLAIM. That's why a so called rescuer thread is started while
> > allocating the wq.
>                                          ^^^^^^
> Are there other conditions which contribute?  For example, Ralf reported that
> having the bus_reset_work on its own wq delayed but did not eliminate the
> deadlock.
> 
> > This rescuer thread is responsible to keep the queue working even under high
> > memory pressure so that a memory allocation might sleep. If that happens,
> > all work of that workqueue is designated to that particular rescuer
> > thread. The work in this rescuer thread is done strictly sequential. Now
> > we have the situation that the rescuer thread runs
> > fw_device_init->read_config_rom->read_rom->fw_run_transaction.
> > fw_run_transaction blocks waiting for the completion object. This
> > completion object will be completed in bus_reset_work, but this work will
> > never executed in the rescuer thread.
> 
> Interesting.
> 
> Tejun, is this workqueue behavior as designed?  Ie., that a workqueue used
> as a domain for forward progress guarantees collapses under certain conditions,
> such as scheduler overhead and no longer ensures forward progress?
> 
> > The interesting question is why the firewire wq stuff is designated to the
> > rescuer thread because we are definitely not under high memory pressure. I
> > looked into the workqueue code and  found that the work is shifted to the
> > rescuer a new worker thread could not be allocated in a certain time (static
> > bool maybe_create_worker(struct global_cwq *gcwq) in workqueue.c). This
> > timeout (MAYDAY_INITIAL_TIMEOUT) is set to 10ms.
> >
> > What I don't understand is that the create_worker() call is not satisfied
> > within 10ms. We run the stuff on a 400Mhz PowerPC with Xenomai realtime
> > extension. Because we have a lot of firewire participants, we got a log of
> > bus resets when starting up the whole system. Maybe we spend too much time
> > during the interrupts, but that's just a rough guess right now. We will
> > look further into that issue.
> 
> I've observed the string of bus resets on startup as well and I think it has to do
> with failing to ack the interrupt promptly because the log printk() takes too long (>50us).
> The point being that I don't think this is contributing to the create_worker() +10ms
> delay.
> 
> > Our workaround is to declare the firewire workqueue as CPU_INTENSIVE. This
> > gives a slight different execution model in the normal case (no work running
> > on the rescuer thread). Maybe it's worth to think about that because some work
> > looks rather heavy to me.
> 
> I thought the whole point of needing WQ_MEM_RECLAIM is if a SBP-2 device is swap.

Yes and no.  If, and only if, there is one or more SBP-2 target present
and bound to the firewire-sbp2 initiator driver and
  - there is a swap device on that target, or/and
  - there is a filesystem on that target mounted writable,
then self-ID-complete handling in firewire-ohci and firewire-core and the
reconnect procedure in firewire-sbp2 must not involve any allocations that
would block on writeback.  (Because writeback to the target is blocked
from bus reset until reconnect.)

> FWIW, I still believe that we should revert to the original bus reset
> as tasklet and redo the TI workaround to use TI-workaround-specific versions
> of non-sleeping PHY accesses.
> 
> Regards,
> Peter Hurley

I am a friend of the self-ID-complete worklet, for two reasons:
  - Even if there was no need for the TI TSB41BA3D workaround (e.g. even
    if we simply stopped supporting TSB41BA3D), it would still be
    worthwhile to have at least the self-ID-complete IRQ BH performed in
    a non-atomic context.  We should try to move as much of the
    firewire-core self-ID-complete handler as possible out of the currently
    spinlock protected section in order make more of this stuff
    preemptible and replace a few GFP_ATOMIC slab allocations by GFP_NOFS
    ones.  (Could be GFP_KERNEL in absence of firewire-sbp2.)
    I would have liked to work on this already long ago, but such is life.
  - How do you propose to access the PHY registers without sleeping?
    Or more to the point:  How do you propose to mix sleeping and
    non-sleeping PHY register accesses?  (Since we can't get rid of
    the sleeping ones.)  If the accesses are not fully serialized, you will
    get corrupt PHY reg reads or writes.  If they are fully serialized, the
    non-sleeping PHY reg accesses need to go a try-lock route and will be
    forced to error out during periods when a sleeping PHY reg access goes
    on, without even the ability to reschedule if it is done in a tasklet
    context.
-- 
Stefan Richter
-=====-===-= -=-= =-=-=
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: function call fw_iso_resource_mange(..) (core-iso.c) does not return
       [not found]   ` <62922216e6007f9ef83956e0ca202644@gatzka.org>
@ 2013-05-21 21:53     ` Stefan Richter
  2013-05-21 22:10       ` Peter Hurley
  0 siblings, 1 reply; 15+ messages in thread
From: Stefan Richter @ 2013-05-21 21:53 UTC (permalink / raw)
  To: stephan.gatzka; +Cc: Peter Hurley, linux1394-devel, Tejun Heo, linux-kernel

On May 21 Stephan Gatzka wrote:
> Hi all!
> 
> >> The deadlock occurs mainly because the firewire workqueue is allocated 
> >> with WQ_MEM_RECLAIM. That's why a so called rescuer thread is started 
> >> while allocating the wq.
> >                       ^^^^^^
> > Are there other conditions which contribute?  For example, Ralf 
> > reported that
> > having the bus_reset_work on its own wq delayed but did not eliminate 
> > the
> > deadlock.
> > 
> 
> The worker threads are independant from the workqueues. New worker 
> thread are only allocated if all current worker threads are sleeping and 
> are making no progress. Right now I don't know if Ralfs separate queue 
> for bus_reset_work was allocated with MEM_RECLAIM. Buf even if it has 
> it's own rescuer thread, that should not block because there is no other 
> work in that queue. No, right now I have no explanation for that.

Indeed.  Ralf wrote he had given bus_reset_work its own workqueue (which
reduced the probability of the bug but did not eliminate it), and then
there can only three things happen (from my perspective as a mere API
user):
  a) The ohci->bus_reset_work instance is queued while the wq is empty.
     Which is the trivial case and presumably works.
     queue_work() returns nonzero.

  b) The instance is queued while the very same instance has already been
     queued but is not yet being executed.  Sounds trivial too:  These two
     or more queuing attempts before execution should in total lead to the
     work be executed once.
     queue_work() returns zero in this case.
     Ralf observed this condition to coincide with the hang.

  c) The instance is queued while it is being executed.  In this case,
     the work must be put into the queue but must not be started until its
     present execution finished.  (We have a non-reentrant workqueue which
     is meant to guarantee that one and the same worklet instance is
     executed at most once at any time, systemwide.)
     queue_work() returns nonzero in this case.
     Whether this condition is involved in the problem or not is not clear
     to me.  Maybe one occurrence of condition c initiates the problem,
     and the occurrences of b which had been observed are only a
     by-product.  Or the trouble starts in condition b only...?

Well, a few further conditions can happen if there are several OHCI-1394
LLCs in the system and all fw_ohci instances share a single workqueue.
But since there is no dependency between different ohci->bus_reset_work
instances (in contrast to firewire-core worklets and upper layer worklets
whose progress depends on firewire-ohci worklets' progress), the only
effect of such sharing would be that conditions b and c are somewhat more
likely to occur, especially if bus resets on the different buses are
correlated.

> >> This rescuer thread is responsible to keep the queue working even 
> >> under high memory pressure so that a memory allocation might sleep. If 
> >> that happens, all work of that workqueue is designated to that 
> >> particular rescuer thread. The work in this rescuer thread is done 
> >> strictly sequential. Now we have the situation that the rescuer thread 
> >> runs fw_device_init->read_config_rom->read_rom->fw_run_transaction. 
> >> fw_run_transaction blocks waiting for the completion object. This 
> >> completion object will be completed in bus_reset_work, but this work 
> >> will never executed in the rescuer thread.
> > 
> > Interesting.
> > 
> > Tejun, is this workqueue behavior as designed?  Ie., that a workqueue 
> > used
> > as a domain for forward progress guarantees collapses under certain 
> > conditions,
> > such as scheduler overhead and no longer ensures forward progress?
> > 
> > 
> > I've observed the string of bus resets on startup as well and I think
> > it has to do
> > with failing to ack the interrupt promptly because the log printk()
> > takes too long (>50us).
> > The point being that I don't think this is contributing to the
> > create_worker() +10ms
> > delay.
> 
> That depends on. The console on embedded systems (like ours) often runs 
> on a serial port with 115200 baud. We measured the cost of a printk and 
> dependent of the length and parameters that might easily go up to 
> several ms. So dependent on the console level set you might run into 
> trouble.
> 
> Regards,
> 
> Stephan

-- 
Stefan Richter
-=====-===-= -=-= =-=-=
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: function call fw_iso_resource_mange(..) (core-iso.c) does not return
  2013-05-21 21:53     ` Stefan Richter
@ 2013-05-21 22:10       ` Peter Hurley
  2013-05-22  8:52         ` Stephan Gatzka
  0 siblings, 1 reply; 15+ messages in thread
From: Peter Hurley @ 2013-05-21 22:10 UTC (permalink / raw)
  To: Stefan Richter, stephan.gatzka, Tejun Heo; +Cc: linux1394-devel, linux-kernel

On 05/21/2013 05:53 PM, Stefan Richter wrote:
> On May 21 Stephan Gatzka wrote:
>> Hi all!
>>
>>>> The deadlock occurs mainly because the firewire workqueue is allocated
>>>> with WQ_MEM_RECLAIM. That's why a so called rescuer thread is started
>>>> while allocating the wq.
>>>                        ^^^^^^
>>> Are there other conditions which contribute?  For example, Ralf
>>> reported that
>>> having the bus_reset_work on its own wq delayed but did not eliminate
>>> the
>>> deadlock.
>>>
>>
>> The worker threads are independant from the workqueues. New worker
>> thread are only allocated if all current worker threads are sleeping and
>> are making no progress. Right now I don't know if Ralfs separate queue
>> for bus_reset_work was allocated with MEM_RECLAIM. Buf even if it has
>> it's own rescuer thread, that should not block because there is no other
>> work in that queue. No, right now I have no explanation for that.
>
> Indeed.  Ralf wrote he had given bus_reset_work its own workqueue (which
> reduced the probability of the bug but did not eliminate it), and then
> there can only three things happen (from my perspective as a mere API
> user):
>    a) The ohci->bus_reset_work instance is queued while the wq is empty.
>       Which is the trivial case and presumably works.
>       queue_work() returns nonzero.
>
>    b) The instance is queued while the very same instance has already been
>       queued but is not yet being executed.  Sounds trivial too:  These two
>       or more queuing attempts before execution should in total lead to the
>       work be executed once.
>       queue_work() returns zero in this case.
>       Ralf observed this condition to coincide with the hang.

An important note with case b) is that Ralf (the original reporter) was
running 3.4.something.  I'm aware that some work has gone into workqueue
locks in 3.9 and 3.10. I asked Ralf if he could reproduce on a more
recent kernel but never heard back. Maybe these are two different problems
(ie, a failure to guarantee forward progress on a single workqueue and
a straightforward deadlock when using two wqs)?

Stephan Gatzka,
what kernel version did you instrument and reproduce this with?

>    c) The instance is queued while it is being executed.  In this case,
>       the work must be put into the queue but must not be started until its
>       present execution finished.  (We have a non-reentrant workqueue which
>       is meant to guarantee that one and the same worklet instance is
>       executed at most once at any time, systemwide.)
>       queue_work() returns nonzero in this case.
>       Whether this condition is involved in the problem or not is not clear
>       to me.  Maybe one occurrence of condition c initiates the problem,
>       and the occurrences of b which had been observed are only a
>       by-product.  Or the trouble starts in condition b only...?
>
> Well, a few further conditions can happen if there are several OHCI-1394
> LLCs in the system and all fw_ohci instances share a single workqueue.
> But since there is no dependency between different ohci->bus_reset_work
> instances (in contrast to firewire-core worklets and upper layer worklets
> whose progress depends on firewire-ohci worklets' progress), the only
> effect of such sharing would be that conditions b and c are somewhat more
> likely to occur, especially if bus resets on the different buses are
> correlated.
>
>>>> This rescuer thread is responsible to keep the queue working even
>>>> under high memory pressure so that a memory allocation might sleep. If
>>>> that happens, all work of that workqueue is designated to that
>>>> particular rescuer thread. The work in this rescuer thread is done
>>>> strictly sequential. Now we have the situation that the rescuer thread
>>>> runs fw_device_init->read_config_rom->read_rom->fw_run_transaction.
>>>> fw_run_transaction blocks waiting for the completion object. This
>>>> completion object will be completed in bus_reset_work, but this work
>>>> will never executed in the rescuer thread.
>>>
>>> Interesting.
>>>
>>> Tejun, is this workqueue behavior as designed?  Ie., that a workqueue
>>> used
>>> as a domain for forward progress guarantees collapses under certain
>>> conditions,
>>> such as scheduler overhead and no longer ensures forward progress?
>>>
>>>
>>> I've observed the string of bus resets on startup as well and I think
>>> it has to do
>>> with failing to ack the interrupt promptly because the log printk()
>>> takes too long (>50us).
>>> The point being that I don't think this is contributing to the
>>> create_worker() +10ms
>>> delay.
>>
>> That depends on. The console on embedded systems (like ours) often runs
>> on a serial port with 115200 baud. We measured the cost of a printk and
>> dependent of the length and parameters that might easily go up to
>> several ms. So dependent on the console level set you might run into
>> trouble.
>>
>> Regards,
>>
>> Stephan
>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: function call fw_iso_resource_mange(..) (core-iso.c) does not return
       [not found]   ` <20130521231847.GA6985@mtj.dyndns.org>
@ 2013-05-22  7:48     ` Stefan Richter
  2013-05-22  8:59       ` Stephan Gatzka
  0 siblings, 1 reply; 15+ messages in thread
From: Stefan Richter @ 2013-05-22  7:48 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Peter Hurley, stephan.gatzka, linux1394-devel, linux-kernel

On May 22 Tejun Heo wrote:
> Hello,
> 
> On Tue, May 21, 2013 at 12:54:04PM -0400, Peter Hurley wrote:
> >> This rescuer thread is responsible to keep the queue working even
> >> under high memory pressure so that a memory allocation might
> >> sleep. If that happens, all work of that workqueue is designated to
> >> that particular rescuer thread. The work in this rescuer thread is
> >> done strictly sequential. Now we have the situation that the
> >> rescuer thread runs
> >> fw_device_init->read_config_rom->read_rom->fw_run_transaction. fw_run_transaction
> >> blocks waiting for the completion object. This completion object
> >> will be completed in bus_reset_work, but this work will never
> >> executed in the rescuer thread.
> > 
> > Interesting.
> > 
> > Tejun, is this workqueue behavior as designed?  Ie., that a workqueue used
> > as a domain for forward progress guarantees collapses under certain conditions,
> > such as scheduler overhead and no longer ensures forward progress?
> 
> Yeap, from Documentation/workqueue.txt
> 
>   WQ_MEM_RECLAIM
> 
> 	All wq which might be used in the memory reclaim paths _MUST_
> 	have this flag set.  The wq is guaranteed to have at least one
> 	execution context regardless of memory pressure.
> 		 
> All it guarantees is that there will be at least one execution thread
> working on the workqueue under any conditions.  If there are
> inter-dependent work items which are necessary to make forward
> progress in memory reclaim, they must be put into separate workqueues.
> In turn, workqueues w/ WQ_RESCUER set *must* be able to make forward
> progress in all cases at the concurrency level of 1.  Probably the
> documentation needs a bit of clarification.
[...]
> > I thought the whole point of needing WQ_MEM_RECLAIM is if a SBP-2
> > device is swap.
> > 
> > FWIW, I still believe that we should revert to the original bus
> > reset as tasklet and redo the TI workaround to use
> > TI-workaround-specific versions of non-sleeping PHY accesses.
> 
> The right fix would be either dropping WQ_MEM_RECLAIM or breaking it
> into two workqueues so that work items don't have interdependencies.
> 
> Thanks.

Argh, suddenly it all seems so obvious.  Tejun, Peter, Stephan, thank you
for getting this clarified.

A third (fourth?) way to fix it --- feasible or not --- would be to break
the dependency between the worklets.  In this case, use a timer to cancel
outbound transactions if the request-transmit IRQ event was not received
before a timeout.  We had such a timeout in the older ieee1394 drivers and
we also had it in earlier versions of the firewire drivers, at a risk of a
race between CPU and OHCI.
-- 
Stefan Richter
-=====-===-= -=-= =-==-
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: function call fw_iso_resource_mange(..) (core-iso.c) does not return
  2013-05-21 22:10       ` Peter Hurley
@ 2013-05-22  8:52         ` Stephan Gatzka
  0 siblings, 0 replies; 15+ messages in thread
From: Stephan Gatzka @ 2013-05-22  8:52 UTC (permalink / raw)
  To: Peter Hurley; +Cc: Stefan Richter, Tejun Heo, linux1394-devel, linux-kernel


> An important note with case b) is that Ralf (the original reporter) was
> running 3.4.something.  I'm aware that some work has gone into 
> workqueue
> locks in 3.9 and 3.10. I asked Ralf if he could reproduce on a more
> recent kernel but never heard back. Maybe these are two different 
> problems
> (ie, a failure to guarantee forward progress on a single workqueue and
> a straightforward deadlock when using two wqs)?
> 
> Stephan Gatzka,
> what kernel version did you instrument and reproduce this with?
> 

Ralf and I are working together on the same system. It's kernel 3.4.28, 
patched with Xenomai 2.6.1.

Regards,

Stephan

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: function call fw_iso_resource_mange(..) (core-iso.c) does not return
  2013-05-21 21:13   ` Stefan Richter
@ 2013-05-22  8:53     ` Stephan Gatzka
  2013-05-22 13:38     ` Peter Hurley
  1 sibling, 0 replies; 15+ messages in thread
From: Stephan Gatzka @ 2013-05-22  8:53 UTC (permalink / raw)
  To: Stefan Richter; +Cc: Peter Hurley, linux1394-devel, Tejun Heo, linux-kernel


> I am a friend of the self-ID-complete worklet, for two reasons:
>   - Even if there was no need for the TI TSB41BA3D workaround (e.g. 
> even
>     if we simply stopped supporting TSB41BA3D), it would still be
>     worthwhile to have at least the self-ID-complete IRQ BH performed 
> in
>     a non-atomic context.  We should try to move as much of the
>     firewire-core self-ID-complete handler as possible out of the 
> currently
>     spinlock protected section in order make more of this stuff
>     preemptible and replace a few GFP_ATOMIC slab allocations by 
> GFP_NOFS
>     ones.  (Could be GFP_KERNEL in absence of firewire-sbp2.)
>     I would have liked to work on this already long ago, but such is 
> life.
>   - How do you propose to access the PHY registers without sleeping?
>     Or more to the point:  How do you propose to mix sleeping and
>     non-sleeping PHY register accesses?  (Since we can't get rid of
>     the sleeping ones.)  If the accesses are not fully serialized, you 
> will
>     get corrupt PHY reg reads or writes.  If they are fully serialized, 
> the
>     non-sleeping PHY reg accesses need to go a try-lock route and will 
> be
>     forced to error out during periods when a sleeping PHY reg access 
> goes
>     on, without even the ability to reschedule if it is done in a 
> tasklet
>     context.

I totally agree.

Stephan

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: function call fw_iso_resource_mange(..) (core-iso.c) does not return
  2013-05-22  7:48     ` Stefan Richter
@ 2013-05-22  8:59       ` Stephan Gatzka
  2013-05-22 12:58         ` Stefan Richter
  0 siblings, 1 reply; 15+ messages in thread
From: Stephan Gatzka @ 2013-05-22  8:59 UTC (permalink / raw)
  To: Stefan Richter; +Cc: Tejun Heo, Peter Hurley, linux1394-devel, linux-kernel


> A third (fourth?) way to fix it --- feasible or not --- would be to 
> break
> the dependency between the worklets.  In this case, use a timer to 
> cancel
> outbound transactions if the request-transmit IRQ event was not 
> received
> before a timeout.  We had such a timeout in the older ieee1394 drivers 
> and
> we also had it in earlier versions of the firewire drivers, at a risk 
> of a
> race between CPU and OHCI.

Why do we need a timer? If we guarantee that bus_reset_work() always 
make progress (if we put it into its own queue), it should always be 
able to complete the corresponding completion object the other works are 
waiting for.

Regards,

Stephan

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: function call fw_iso_resource_mange(..) (core-iso.c) does not return
  2013-05-21 20:28 ` function call fw_iso_resource_mange(..) (core-iso.c) does not return Stefan Richter
@ 2013-05-22  9:08   ` Stephan Gatzka
  2013-05-22  9:22     ` Tejun Heo
  2013-05-22 13:11     ` Stefan Richter
  0 siblings, 2 replies; 15+ messages in thread
From: Stephan Gatzka @ 2013-05-22  9:08 UTC (permalink / raw)
  To: Stefan Richter; +Cc: linux1394-devel, peter, linux-kernel, Tejun Heo


> Wait a minute --- I seem to be reading of Xenomai RT extensions for the
> first time in this thread.  Has the problem been reproduced on a 
> mainline
> kernel too?  (Mainline plus Ralf's firewire upper layer driver maybe, 
> but
> without any other 3rd party stuff please.  Actually the issue should be
> reproducible even without an upper layer driver performing
> fw_iso_resource_manage in fw_workqueue context, shouldn't it?)

Unfortunately it's not that easy. While I agree that it should be 
reproducible just with mainline stuff, we have to force our system into 
the situation that it "thinks" it's under memory pressure to hand over 
the work to the rescuer thread. It looks that a simple printk from a 
different driver might lead to that situation on our system.

As a first fix, I would say we have at least bring bus_reset_work() 
running on its own workqueue because otherwise we cannot guarantee 
progress of bus_reset_work under memory pressure.


> There is quite a bit going on in firewire-ohci's and firewire-core's 
> parts
> of self-ID-complete event handling.  But I suspect that stuff like that 
> is
> still not in the league for which CPU_INTENSIVE was intended for
> (cryptography etc.).

Some advice from Tejun would be great here...



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: function call fw_iso_resource_mange(..) (core-iso.c) does not return
  2013-05-22  9:08   ` Stephan Gatzka
@ 2013-05-22  9:22     ` Tejun Heo
  2013-05-22 13:11     ` Stefan Richter
  1 sibling, 0 replies; 15+ messages in thread
From: Tejun Heo @ 2013-05-22  9:22 UTC (permalink / raw)
  To: Stephan Gatzka; +Cc: Stefan Richter, linux1394-devel, peter, linux-kernel

Hello,

On Wed, May 22, 2013 at 11:08:43AM +0200, Stephan Gatzka wrote:
> >There is quite a bit going on in firewire-ohci's and
> >firewire-core's parts
> >of self-ID-complete event handling.  But I suspect that stuff like
> >that is
> >still not in the league for which CPU_INTENSIVE was intended for
> >(cryptography etc.).
> 
> Some advice from Tejun would be great here...

No need to use CPU_INTENSIVE unless it's really burning considerable
amount of CPU cycles.  I don't know what the firewire work items are
doing but am pretty doubtful they would need the flag, and please
don't use it to work around deadlock problems. :)

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: function call fw_iso_resource_mange(..) (core-iso.c) does not return
  2013-05-22  8:59       ` Stephan Gatzka
@ 2013-05-22 12:58         ` Stefan Richter
  2013-05-22 13:06           ` Stephan Gatzka
  0 siblings, 1 reply; 15+ messages in thread
From: Stefan Richter @ 2013-05-22 12:58 UTC (permalink / raw)
  To: stephan.gatzka; +Cc: Tejun Heo, Peter Hurley, linux1394-devel, linux-kernel

On May 22 Stephan Gatzka wrote:
[I wrote:]
> > A third (fourth?) way to fix it --- feasible or not --- would be to break
> > the dependency between the worklets.  In this case, use a timer to cancel
> > outbound transactions if the request-transmit IRQ event was not received
> > before a timeout.  We had such a timeout in the older ieee1394 drivers and
> > we also had it in earlier versions of the firewire drivers, at a risk of a
> > race between CPU and OHCI.
> 
> Why do we need a timer? If we guarantee that bus_reset_work() always 
> make progress (if we put it into its own queue), it should always be 
> able to complete the corresponding completion object the other works are 
> waiting for.

We don't need a timer if the dual-wq solution is implemented.  It's just
that there are four different solutions possible in principle.  Any one
of these fixes is sufficient on its own:

Problem --- A WQ_MEM_RECLAIM workqueue will, for some periods of time,
offer a concurrency level of only 1; particularly during memory reclaim,
but also in some other situations due to the way how it is implemented (as
far as I understood yesterday's posts).  Therefore, interdependent works
must not be inserted into the same WQ_MEM_RECLAIM workqueue instance.

Solution 1 --- Don't set the WQ_MEM_RECLAIM flag.  This is only possible
if memory reclaim does not depend on any one of the works that are
inserted into this queue.  (This solution is therefore not available to
an SBP-2 initiator.)

Solution 2 --- Perform interdependent works in different queue instances.
(Keep the WQ_MEM_RECLAIM flag set at those workqueues that have to take
work which is necessary for progress of memory reclaim.  If this and only
this solution is employed for an SBP-2 initiator, we need two if not more
WQ_MEM_RECLAIM workqueue instances.)

Solution 3 --- Remove the dependency between worklets:

	Solution 3a --- Remove the lower-level worklet altogether.
	E.g. reimplement the lower-level worklet as a tasklet.

	Solution 3b --- Remove the higher-level worklet's dependency.
	E.g. reimplement the higher-level worklet such that it is woken by
	a timer and then aborts or reschedules ( = lets the lower-level
	worklet bubble up in the queue).
-- 
Stefan Richter
-=====-===-= -=-= =-==-
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: function call fw_iso_resource_mange(..) (core-iso.c) does not return
  2013-05-22 12:58         ` Stefan Richter
@ 2013-05-22 13:06           ` Stephan Gatzka
  2013-05-22 13:21             ` Peter Hurley
  0 siblings, 1 reply; 15+ messages in thread
From: Stephan Gatzka @ 2013-05-22 13:06 UTC (permalink / raw)
  To: Stefan Richter; +Cc: Tejun Heo, Peter Hurley, linux1394-devel, linux-kernel

> Solution 2 --- Perform interdependent works in different queue 
> instances.
> (Keep the WQ_MEM_RECLAIM flag set at those workqueues that have to take
> work which is necessary for progress of memory reclaim.  If this and 
> only
> this solution is employed for an SBP-2 initiator, we need two if not 
> more
> WQ_MEM_RECLAIM workqueue instances.)

I would go for this solution. I have no problems with lots of workqueues 
around, because there is only a relatively small structure required for 
each workqueue.

> 
> Solution 3 --- Remove the dependency between worklets:
> 
> 	Solution 3a --- Remove the lower-level worklet altogether.
> 	E.g. reimplement the lower-level worklet as a tasklet.

No, I like the workqueue context. :)


> 	Solution 3b --- Remove the higher-level worklet's dependency.
> 	E.g. reimplement the higher-level worklet such that it is woken by
> 	a timer and then aborts or reschedules ( = lets the lower-level
> 	worklet bubble up in the queue).

This looks more difficult to me and not so easy to test.

Stephan

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: function call fw_iso_resource_mange(..) (core-iso.c) does not return
  2013-05-22  9:08   ` Stephan Gatzka
  2013-05-22  9:22     ` Tejun Heo
@ 2013-05-22 13:11     ` Stefan Richter
  1 sibling, 0 replies; 15+ messages in thread
From: Stefan Richter @ 2013-05-22 13:11 UTC (permalink / raw)
  To: stephan.gatzka; +Cc: linux1394-devel, peter, linux-kernel, Tejun Heo

On May 22 Stephan Gatzka wrote:
[I wrote:]
> > Wait a minute --- I seem to be reading of Xenomai RT extensions for the
> > first time in this thread.  Has the problem been reproduced on a mainline
> > kernel too?  (Mainline plus Ralf's firewire upper layer driver maybe, but
> > without any other 3rd party stuff please.  Actually the issue should be
> > reproducible even without an upper layer driver performing
> > fw_iso_resource_manage in fw_workqueue context, shouldn't it?)
> 
> Unfortunately it's not that easy. While I agree that it should be 
> reproducible just with mainline stuff, we have to force our system into 
> the situation that it "thinks" it's under memory pressure to hand over 
> the work to the rescuer thread. It looks that a simple printk from a 
> different driver might lead to that situation on our system.
[...]

I retract my request to get it retested with mainline since a) the
analysis has clearly shown that a mainline bug is at the root of the
problem, and b) Ralf confirmed now that one of the possible fixes (using
two queues) does in fact work for him like the theory says how it should
work for mainline.
-- 
Stefan Richter
-=====-===-= -=-= =-==-
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: function call fw_iso_resource_mange(..) (core-iso.c) does not return
  2013-05-22 13:06           ` Stephan Gatzka
@ 2013-05-22 13:21             ` Peter Hurley
  0 siblings, 0 replies; 15+ messages in thread
From: Peter Hurley @ 2013-05-22 13:21 UTC (permalink / raw)
  To: stephan.gatzka; +Cc: Stefan Richter, Tejun Heo, linux1394-devel, linux-kernel

On 05/22/2013 09:06 AM, Stephan Gatzka wrote:
>> Solution 2 --- Perform interdependent works in different queue instances.
>> (Keep the WQ_MEM_RECLAIM flag set at those workqueues that have to take
>> work which is necessary for progress of memory reclaim.  If this and only
>> this solution is employed for an SBP-2 initiator, we need two if not more
>> WQ_MEM_RECLAIM workqueue instances.)
>
> I would go for this solution. I have no problems with lots of workqueues around, because there is only a relatively small structure required for each workqueue.

Now that we more fully understand the cause, this would be my choice as
well (despite my earlier advocacy for the return to tasklets).

Although we do need to carefully review the other work items (bm_work,br_work)
to ensure that there aren't unwanted dependencies with those as well.

Also, wqs that are WQ_MEM_RECLAIM aren't free as each wq will have a
dedicated rescue thread.

[The other advantage with tasklets is the reduced latency but I do
understand the advantages of the RT characteristics of worklets.]

Regards,
Peter Hurley

>
>>
>> Solution 3 --- Remove the dependency between worklets:
>>
>>     Solution 3a --- Remove the lower-level worklet altogether.
>>     E.g. reimplement the lower-level worklet as a tasklet.
>
> No, I like the workqueue context. :)
>
>
>>     Solution 3b --- Remove the higher-level worklet's dependency.
>>     E.g. reimplement the higher-level worklet such that it is woken by
>>     a timer and then aborts or reschedules ( = lets the lower-level
>>     worklet bubble up in the queue).
>
> This looks more difficult to me and not so easy to test.
>
> Stephan


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: function call fw_iso_resource_mange(..) (core-iso.c) does not return
  2013-05-21 21:13   ` Stefan Richter
  2013-05-22  8:53     ` Stephan Gatzka
@ 2013-05-22 13:38     ` Peter Hurley
  1 sibling, 0 replies; 15+ messages in thread
From: Peter Hurley @ 2013-05-22 13:38 UTC (permalink / raw)
  To: Stefan Richter; +Cc: stephan.gatzka, linux1394-devel, Tejun Heo, linux-kernel

On 05/21/2013 05:13 PM, Stefan Richter wrote:
>> FWIW, I still believe that we should revert to the original bus reset
>> as tasklet and redo the TI workaround to use TI-workaround-specific versions
>> of non-sleeping PHY accesses.
>>
>> Regards,
>> Peter Hurley
>
> I am a friend of the self-ID-complete worklet, for two reasons:
>    - Even if there was no need for the TI TSB41BA3D workaround (e.g. even
>      if we simply stopped supporting TSB41BA3D), it would still be
>      worthwhile to have at least the self-ID-complete IRQ BH performed in
>      a non-atomic context.  We should try to move as much of the
>      firewire-core self-ID-complete handler as possible out of the currently
>      spinlock protected section in order make more of this stuff
>      preemptible and replace a few GFP_ATOMIC slab allocations by GFP_NOFS
>      ones.  (Could be GFP_KERNEL in absence of firewire-sbp2.)
>      I would have liked to work on this already long ago, but such is life.

Sure. I understand reducing the card->lock critical section is desirable
(although even more care would be required when switching the work item).

>    - How do you propose to access the PHY registers without sleeping?
>      Or more to the point:  How do you propose to mix sleeping and
>      non-sleeping PHY register accesses?  (Since we can't get rid of
>      the sleeping ones.)  If the accesses are not fully serialized, you will
>      get corrupt PHY reg reads or writes.  If they are fully serialized, the
>      non-sleeping PHY reg accesses need to go a try-lock route and will be
>      forced to error out during periods when a sleeping PHY reg access goes
>      on, without even the ability to reschedule if it is done in a tasklet
>      context.

Although this point is largely irrelevant now, I wasn't suggesting mixing
sleeping and non-sleeping PHY access -- simply that the TI quirk would
require non-sleeping PHY access and every other host controller would
use sleeping PHY access.

Regards,
Peter Hurley

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2013-05-22 13:39 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <8ac7ca3200325ddf85ba57aa6d000f70@gatzka.org>
2013-05-21 20:28 ` function call fw_iso_resource_mange(..) (core-iso.c) does not return Stefan Richter
2013-05-22  9:08   ` Stephan Gatzka
2013-05-22  9:22     ` Tejun Heo
2013-05-22 13:11     ` Stefan Richter
     [not found] ` <519BA6AC.1080600@hurleysoftware.com>
2013-05-21 21:13   ` Stefan Richter
2013-05-22  8:53     ` Stephan Gatzka
2013-05-22 13:38     ` Peter Hurley
     [not found]   ` <62922216e6007f9ef83956e0ca202644@gatzka.org>
2013-05-21 21:53     ` Stefan Richter
2013-05-21 22:10       ` Peter Hurley
2013-05-22  8:52         ` Stephan Gatzka
     [not found]   ` <20130521231847.GA6985@mtj.dyndns.org>
2013-05-22  7:48     ` Stefan Richter
2013-05-22  8:59       ` Stephan Gatzka
2013-05-22 12:58         ` Stefan Richter
2013-05-22 13:06           ` Stephan Gatzka
2013-05-22 13:21             ` Peter Hurley

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.