Using a waiting MDIO does not go well with a spinlocked bridge

All of lore.kernel.org
 help / color / mirror / Atom feed

* Using a waiting MDIO does not go well with a spinlocked bridge
@ 2015-03-20 12:22 Jonas Johansson
  2015-03-20 13:16 ` Andrew Lunn
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Jonas Johansson @ 2015-03-20 12:22 UTC (permalink / raw)
  To: netdev; +Cc: stephen, f.fainelli, jiri, sfeldma, jonasj76

The bridge code will sometimes hold a spinlock and the code following 
must therefore be atomic. If using a MDIO call which uses a wait/sleep in 
this contex, the kernel will not be very happy.

I'm using a switch device and wants to flush its FDB when the linux bridge 
FDB is flushed. I've implemented some hooks for this task.
In short:
      bridge    - br_fdb_flush() & br_fdb_delete_by_port
   -> switchdev - switch_flush()
   -> dsa       - slave_flush()
   -> mv88e6xxx - mv88_flush()

So, when a bridge port is flushed via e.g. sysfs, the mv88_flush() 
function will at the end be called. The mv88_flush() will use MDIO calls 
to set the proper registers and flush the device. But, due to that 
the MDIO on my platform uses wait_for_completion() and a spinlock is held 
(in this case in brport_store()) the process will not go very well.

The only possible solutions that came into my mind is:
  1) Let mv88_flush() schedule a work queue to take care of the flush
     later on.
  2) Change the MDIO implementation to use polling.
  3) Dont use spinlock in bridge code.

1) Using this approach the the atomic part is missed, i.e. the switch 
device isn't guaranteed to be flushed after the command has been issued. 
And, if a FDB entry is added (atomic) to the switch device immediately 
after the flush command, there will not be defined if the entry will be 
added before or after the flush occurs. To solve this, all (FDB) 
operations must be added to a work queue to assure that they are executed 
in the right order.

2) This will result in unsued CPU cycles.

3) Havent looked into this, but probably a lot of work.

Any ideas?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Using a waiting MDIO does not go well with a spinlocked bridge
  2015-03-20 12:22 Using a waiting MDIO does not go well with a spinlocked bridge Jonas Johansson
@ 2015-03-20 13:16 ` Andrew Lunn
  2015-03-21  6:37   ` Guenter Roeck
  2015-03-20 18:46 ` Scott Feldman
  2015-03-21  6:32 ` Guenter Roeck
  2 siblings, 1 reply; 8+ messages in thread
From: Andrew Lunn @ 2015-03-20 13:16 UTC (permalink / raw)
  To: Jonas Johansson; +Cc: netdev, stephen, f.fainelli, jiri, sfeldma

On Fri, Mar 20, 2015 at 01:22:46PM +0100, Jonas Johansson wrote:
> The bridge code will sometimes hold a spinlock and the code
> following must therefore be atomic. If using a MDIO call which uses
> a wait/sleep in this contex, the kernel will not be very happy.
> 
> I'm using a switch device and wants to flush its FDB when the linux
> bridge FDB is flushed. I've implemented some hooks for this task.
> In short:
>      bridge    - br_fdb_flush() & br_fdb_delete_by_port
>   -> switchdev - switch_flush()
>   -> dsa       - slave_flush()
>   -> mv88e6xxx - mv88_flush()

Hi Jonas

Have you seen the patches from Guenter Roeck implementing hardware
bridging? There should be a new version coming out soon.

> So, when a bridge port is flushed via e.g. sysfs, the mv88_flush()
> function will at the end be called. The mv88_flush() will use MDIO
> calls to set the proper registers and flush the device. But, due to
> that the MDIO on my platform uses wait_for_completion() and a
> spinlock is held (in this case in brport_store()) the process will
> not go very well.

Ah, not good. We have a number of mutex in the mv88x6xxx code, one of
which is used with fdb operations..
 
> The only possible solutions that came into my mind is:
>  1) Let mv88_flush() schedule a work queue to take care of the flush
>     later on.
>  2) Change the MDIO implementation to use polling.

I don't think these is feasible. The MDIO bus could be a gpio
bit-banging interface. It is hard to guarantee that the GPIO code will
not sleep.

>  3) Dont use spinlock in bridge code.

This would be my preference, but i've no idea how much work it is.  We
should audit the bridge code and document in what context operations
on the switch are called.

   Andrew

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Using a waiting MDIO does not go well with a spinlocked bridge
  2015-03-20 12:22 Using a waiting MDIO does not go well with a spinlocked bridge Jonas Johansson
  2015-03-20 13:16 ` Andrew Lunn
@ 2015-03-20 18:46 ` Scott Feldman
  2015-03-23  6:45   ` Jiri Pirko
  2015-03-23 15:42   ` Jonas Johansson
  2015-03-21  6:32 ` Guenter Roeck
  2 siblings, 2 replies; 8+ messages in thread
From: Scott Feldman @ 2015-03-20 18:46 UTC (permalink / raw)
  To: Jonas Johansson
  Cc: Netdev, stephen, Florian Fainelli, Jiří Pírko

On Fri, Mar 20, 2015 at 5:22 AM, Jonas Johansson <jonasj76@gmail.com> wrote:
> The bridge code will sometimes hold a spinlock and the code following must
> therefore be atomic. If using a MDIO call which uses a wait/sleep in this
> contex, the kernel will not be very happy.
>
> I'm using a switch device and wants to flush its FDB when the linux bridge
> FDB is flushed. I've implemented some hooks for this task.
> In short:
>      bridge    - br_fdb_flush() & br_fdb_delete_by_port
>   -> switchdev - switch_flush()
>   -> dsa       - slave_flush()
>   -> mv88e6xxx - mv88_flush()

I think we need to hook switchdev in fdb_delete(), then it'll get
called from flush and ageing out operations, rather than adding a new
switch_flush().  But, that's an aside for your main issue that the
bridge will hold a spinlock for most (all?) FDB delete operations.  I
don't see a way around relaxing that, on the bridge side, since it's
doing things like walking lists while deleting list elements.  So that
means the call into switchdev will be spinlocked, so switchdev driver
needs to deal with that.  Scheduling to work queue is one option, as
you mention, if FDB delete can't be done under the spinlock.


> So, when a bridge port is flushed via e.g. sysfs, the mv88_flush() function
> will at the end be called. The mv88_flush() will use MDIO calls to set the
> proper registers and flush the device. But, due to that the MDIO on my
> platform uses wait_for_completion() and a spinlock is held (in this case in
> brport_store()) the process will not go very well.
>
> The only possible solutions that came into my mind is:
>  1) Let mv88_flush() schedule a work queue to take care of the flush
>     later on.
>  2) Change the MDIO implementation to use polling.
>  3) Dont use spinlock in bridge code.
>
> 1) Using this approach the the atomic part is missed, i.e. the switch device
> isn't guaranteed to be flushed after the command has been issued. And, if a
> FDB entry is added (atomic) to the switch device immediately after the flush
> command, there will not be defined if the entry will be added before or
> after the flush occurs. To solve this, all (FDB) operations must be added to
> a work queue to assure that they are executed in the right order.

We would loose the FDB add results if added to work queue.  On add,
you could check work queue delete list for entry, and if there, remove
from work queue list.

>
> 2) This will result in unsued CPU cycles.
>
> 3) Havent looked into this, but probably a lot of work.

Can of worms...wouldn't recommend that option.

> Any ideas?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Using a waiting MDIO does not go well with a spinlocked bridge
  2015-03-20 12:22 Using a waiting MDIO does not go well with a spinlocked bridge Jonas Johansson
  2015-03-20 13:16 ` Andrew Lunn
  2015-03-20 18:46 ` Scott Feldman
@ 2015-03-21  6:32 ` Guenter Roeck
  2 siblings, 0 replies; 8+ messages in thread
From: Guenter Roeck @ 2015-03-21  6:32 UTC (permalink / raw)
  To: Jonas Johansson; +Cc: netdev, stephen, f.fainelli, jiri, sfeldma

On Fri, Mar 20, 2015 at 01:22:46PM +0100, Jonas Johansson wrote:
> The bridge code will sometimes hold a spinlock and the code following must
> therefore be atomic. If using a MDIO call which uses a wait/sleep in this
> contex, the kernel will not be very happy.
> 
> I'm using a switch device and wants to flush its FDB when the linux bridge
> FDB is flushed. I've implemented some hooks for this task.
> In short:
>      bridge    - br_fdb_flush() & br_fdb_delete_by_port
>   -> switchdev - switch_flush()
>   -> dsa       - slave_flush()
>   -> mv88e6xxx - mv88_flush()
> 
> So, when a bridge port is flushed via e.g. sysfs, the mv88_flush() function
> will at the end be called. The mv88_flush() will use MDIO calls to set the
> proper registers and flush the device. But, due to that the MDIO on my
> platform uses wait_for_completion() and a spinlock is held (in this case in
> brport_store()) the process will not go very well.
> 
I happen to have similar code, though not (yet) submitted upstream.

> The only possible solutions that came into my mind is:
>  1) Let mv88_flush() schedule a work queue to take care of the flush
>     later on.

That is my implementation.

>  2) Change the MDIO implementation to use polling.
>  3) Dont use spinlock in bridge code.
> 
> 1) Using this approach the the atomic part is missed, i.e. the switch device
> isn't guaranteed to be flushed after the command has been issued. And, if a
> FDB entry is added (atomic) to the switch device immediately after the flush
> command, there will not be defined if the entry will be added before or
> after the flush occurs. To solve this, all (FDB) operations must be added to
> a work queue to assure that they are executed in the right order.
> 

In my code I did not bother about this. Which may be why I didn't submit it
upstream ;-). One possible simplification might be to reject fdb operations
while a flush operation is pending (EAGAIN or EBUSY ?), though I don't know
if that is feasible.

> 2) This will result in unsued CPU cycles.
> 
> 3) Havent looked into this, but probably a lot of work.
> 
I don't think 2) or 3) are good solutions. Both want to change other
subsystems due to problems in the dsa code. If we start doing that we
would mess up the kernel all over the place. I think the solution has
to come from within DSA.

Thanks,
Guenter

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Using a waiting MDIO does not go well with a spinlocked bridge
  2015-03-20 13:16 ` Andrew Lunn
@ 2015-03-21  6:37   ` Guenter Roeck
  0 siblings, 0 replies; 8+ messages in thread
From: Guenter Roeck @ 2015-03-21  6:37 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Jonas Johansson, netdev, stephen, f.fainelli, jiri, sfeldma

On Fri, Mar 20, 2015 at 02:16:22PM +0100, Andrew Lunn wrote:
> On Fri, Mar 20, 2015 at 01:22:46PM +0100, Jonas Johansson wrote:
> > The bridge code will sometimes hold a spinlock and the code
> > following must therefore be atomic. If using a MDIO call which uses
> > a wait/sleep in this contex, the kernel will not be very happy.
> > 
> > I'm using a switch device and wants to flush its FDB when the linux
> > bridge FDB is flushed. I've implemented some hooks for this task.
> > In short:
> >      bridge    - br_fdb_flush() & br_fdb_delete_by_port
> >   -> switchdev - switch_flush()
> >   -> dsa       - slave_flush()
> >   -> mv88e6xxx - mv88_flush()
> 
> Hi Jonas
> 
> Have you seen the patches from Guenter Roeck implementing hardware
> bridging? There should be a new version coming out soon.
> 
> > So, when a bridge port is flushed via e.g. sysfs, the mv88_flush()
> > function will at the end be called. The mv88_flush() will use MDIO
> > calls to set the proper registers and flush the device. But, due to
> > that the MDIO on my platform uses wait_for_completion() and a
> > spinlock is held (in this case in brport_store()) the process will
> > not go very well.
> 
> Ah, not good. We have a number of mutex in the mv88x6xxx code, one of
> which is used with fdb operations..
>  
The mutexes in the mv88x6xxx don't matter, really. There is also the
mdio bus mutex which would kill us anyway.

Handling port state changes is already implemented with a workqueue
in my code because of the spinlock problem. I use that same workqueue
in my (not submitted) patch to flush the fdb.

Guenter

> > The only possible solutions that came into my mind is:
> >  1) Let mv88_flush() schedule a work queue to take care of the flush
> >     later on.
> >  2) Change the MDIO implementation to use polling.
> 
> I don't think these is feasible. The MDIO bus could be a gpio
> bit-banging interface. It is hard to guarantee that the GPIO code will
> not sleep.
> 
> >  3) Dont use spinlock in bridge code.
> 
> This would be my preference, but i've no idea how much work it is.  We
> should audit the bridge code and document in what context operations
> on the switch are called.
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Using a waiting MDIO does not go well with a spinlocked bridge
  2015-03-20 18:46 ` Scott Feldman
@ 2015-03-23  6:45   ` Jiri Pirko
  2015-03-23 15:42   ` Jonas Johansson
  1 sibling, 0 replies; 8+ messages in thread
From: Jiri Pirko @ 2015-03-23  6:45 UTC (permalink / raw)
  To: Scott Feldman; +Cc: Jonas Johansson, Netdev, stephen, Florian Fainelli

Fri, Mar 20, 2015 at 07:46:47PM CET, sfeldma@gmail.com wrote:
>On Fri, Mar 20, 2015 at 5:22 AM, Jonas Johansson <jonasj76@gmail.com> wrote:
>> The bridge code will sometimes hold a spinlock and the code following must
>> therefore be atomic. If using a MDIO call which uses a wait/sleep in this
>> contex, the kernel will not be very happy.
>>
>> I'm using a switch device and wants to flush its FDB when the linux bridge
>> FDB is flushed. I've implemented some hooks for this task.
>> In short:
>>      bridge    - br_fdb_flush() & br_fdb_delete_by_port
>>   -> switchdev - switch_flush()
>>   -> dsa       - slave_flush()
>>   -> mv88e6xxx - mv88_flush()
>
>I think we need to hook switchdev in fdb_delete(), then it'll get
>called from flush and ageing out operations, rather than adding a new
>switch_flush().  But, that's an aside for your main issue that the
>bridge will hold a spinlock for most (all?) FDB delete operations.  I
>don't see a way around relaxing that, on the bridge side, since it's
>doing things like walking lists while deleting list elements.  So that
>means the call into switchdev will be spinlocked, so switchdev driver
>needs to deal with that.  Scheduling to work queue is one option, as
>you mention, if FDB delete can't be done under the spinlock.


I agree that removing/changing spinlock in bridge code is no-go. Driver
should deal with running callback in atomic context itself.

>
>
>> So, when a bridge port is flushed via e.g. sysfs, the mv88_flush() function
>> will at the end be called. The mv88_flush() will use MDIO calls to set the
>> proper registers and flush the device. But, due to that the MDIO on my
>> platform uses wait_for_completion() and a spinlock is held (in this case in
>> brport_store()) the process will not go very well.
>>
>> The only possible solutions that came into my mind is:
>>  1) Let mv88_flush() schedule a work queue to take care of the flush
>>     later on.
>>  2) Change the MDIO implementation to use polling.
>>  3) Dont use spinlock in bridge code.
>>
>> 1) Using this approach the the atomic part is missed, i.e. the switch device
>> isn't guaranteed to be flushed after the command has been issued. And, if a
>> FDB entry is added (atomic) to the switch device immediately after the flush
>> command, there will not be defined if the entry will be added before or
>> after the flush occurs. To solve this, all (FDB) operations must be added to
>> a work queue to assure that they are executed in the right order.
>
>We would loose the FDB add results if added to work queue.  On add,
>you could check work queue delete list for entry, and if there, remove
>from work queue list.
>
>>
>> 2) This will result in unsued CPU cycles.
>>
>> 3) Havent looked into this, but probably a lot of work.
>
>Can of worms...wouldn't recommend that option.
>
>> Any ideas?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Using a waiting MDIO does not go well with a spinlocked bridge
  2015-03-20 18:46 ` Scott Feldman
  2015-03-23  6:45   ` Jiri Pirko
@ 2015-03-23 15:42   ` Jonas Johansson
  2015-03-23 18:37     ` Scott Feldman
  1 sibling, 1 reply; 8+ messages in thread
From: Jonas Johansson @ 2015-03-23 15:42 UTC (permalink / raw)
  To: Scott Feldman
  Cc: Jonas Johansson, Netdev, stephen, Florian Fainelli,
	Jiří Pírko



On Fri, 20 Mar 2015, Scott Feldman wrote:

> On Fri, Mar 20, 2015 at 5:22 AM, Jonas Johansson <jonasj76@gmail.com> wrote:
>> The bridge code will sometimes hold a spinlock and the code following must
>> therefore be atomic. If using a MDIO call which uses a wait/sleep in this
>> contex, the kernel will not be very happy.
>>
>> I'm using a switch device and wants to flush its FDB when the linux bridge
>> FDB is flushed. I've implemented some hooks for this task.
>> In short:
>>      bridge    - br_fdb_flush() & br_fdb_delete_by_port
>>   -> switchdev - switch_flush()
>>   -> dsa       - slave_flush()
>>   -> mv88e6xxx - mv88_flush()
>
> I think we need to hook switchdev in fdb_delete(), then it'll get
> called from flush and ageing out operations, rather than adding a new
> switch_flush().  But, that's an aside for your main issue that the
> bridge will hold a spinlock for most (all?) FDB delete operations.  I
> don't see a way around relaxing that, on the bridge side, since it's
> doing things like walking lists while deleting list elements.  So that
> means the call into switchdev will be spinlocked, so switchdev driver
> needs to deal with that.  Scheduling to work queue is one option, as
> you mention, if FDB delete can't be done under the spinlock.
>
>
Thanks for the input.
My idea of using a switch_flush() was to take advantage of the HW to flush 
all FDB entries in one single operation. If I'm not mistaken, using 
fdb_delete() will result in a call for each FDB entry, which will result 
in a lot of overhead.

>> So, when a bridge port is flushed via e.g. sysfs, the mv88_flush() function
>> will at the end be called. The mv88_flush() will use MDIO calls to set the
>> proper registers and flush the device. But, due to that the MDIO on my
>> platform uses wait_for_completion() and a spinlock is held (in this case in
>> brport_store()) the process will not go very well.
>>
>> The only possible solutions that came into my mind is:
>>  1) Let mv88_flush() schedule a work queue to take care of the flush
>>     later on.
>>  2) Change the MDIO implementation to use polling.
>>  3) Dont use spinlock in bridge code.
>>
>> 1) Using this approach the the atomic part is missed, i.e. the switch device
>> isn't guaranteed to be flushed after the command has been issued. And, if a
>> FDB entry is added (atomic) to the switch device immediately after the flush
>> command, there will not be defined if the entry will be added before or
>> after the flush occurs. To solve this, all (FDB) operations must be added to
>> a work queue to assure that they are executed in the right order.
>
> We would loose the FDB add results if added to work queue.  On add,
> you could check work queue delete list for entry, and if there, remove
> from work queue list.
>
>>
>> 2) This will result in unsued CPU cycles.
>>
>> 3) Havent looked into this, but probably a lot of work.
>
> Can of worms...wouldn't recommend that option.
>
>> Any ideas?
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Using a waiting MDIO does not go well with a spinlocked bridge
  2015-03-23 15:42   ` Jonas Johansson
@ 2015-03-23 18:37     ` Scott Feldman
  0 siblings, 0 replies; 8+ messages in thread
From: Scott Feldman @ 2015-03-23 18:37 UTC (permalink / raw)
  To: Jonas Johansson
  Cc: Netdev, stephen, Florian Fainelli, Jiří Pírko

On Mon, Mar 23, 2015 at 8:42 AM, Jonas Johansson <jonasj76@gmail.com> wrote:
>
>
> On Fri, 20 Mar 2015, Scott Feldman wrote:
>
>> On Fri, Mar 20, 2015 at 5:22 AM, Jonas Johansson <jonasj76@gmail.com>
>> wrote:
[cut]
>> I think we need to hook switchdev in fdb_delete(), then it'll get
>> called from flush and ageing out operations, rather than adding a new
>> switch_flush().  But, that's an aside for your main issue that the
>> bridge will hold a spinlock for most (all?) FDB delete operations.  I
>> don't see a way around relaxing that, on the bridge side, since it's
>> doing things like walking lists while deleting list elements.  So that
>> means the call into switchdev will be spinlocked, so switchdev driver
>> needs to deal with that.  Scheduling to work queue is one option, as
>> you mention, if FDB delete can't be done under the spinlock.
>>
>>
> Thanks for the input.
> My idea of using a switch_flush() was to take advantage of the HW to flush
> all FDB entries in one single operation. If I'm not mistaken, using
> fdb_delete() will result in a call for each FDB entry, which will result in
> a lot of overhead.

Let's get the most general case working (delete of a single FDB entry)
and then add optimizations later, if needed.  I suspect, for drivers
doing deletes atomically, the overhead is negligible, and for drivers
doing deferred deletes, again the overhead would be negligible.

-scott

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-03-23 18:37 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-20 12:22 Using a waiting MDIO does not go well with a spinlocked bridge Jonas Johansson
2015-03-20 13:16 ` Andrew Lunn
2015-03-21  6:37   ` Guenter Roeck
2015-03-20 18:46 ` Scott Feldman
2015-03-23  6:45   ` Jiri Pirko
2015-03-23 15:42   ` Jonas Johansson
2015-03-23 18:37     ` Scott Feldman
2015-03-21  6:32 ` Guenter Roeck

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.