On Tue, May 24, 2022 at 01:30:48PM +0200, David Jander wrote:

> > But that turned out be not working properly:
> >
> > | https://lore.kernel.org/all/f86eaebb-0359-13be-f4a2-4f2b8832252e@nvidia.com/

> Ah, thanks for sharing. Added Martin to CC here.

> I have been struggling with this too. There are definitely dragons somewhere.
> I have tried to do tear-down in the same context if possible, similar to this:

There's a potential issue there with ending up spending noticable extra
time turning the controller on and off, how costly that is might be
variable.

> I have not yet discovered exactly why. In the occasions the code didn't hit
> the race, it seemed to have a notable performance impact though, so removing
> this context switch may be worth the effort.

Or come up with a mechanism for ensuring we only switch to do the
cleanup when we're not otherwise busy.

> From what I understand of this, bus_lock_mutex is used to serialize spi_sync
> calls for this bus, so there cannot be any concurrency from different threads
> doing spi_sync() calls to the same bus. This means, that if spi_sync was the
> only path in existence, bus_lock_mutex would suffice, and all the other

The bus lock is there because some devices have an unfortunate need to
do multiple SPI transfers with no other devices able to generate any
traffic on the bus in between.  It seems that even more sadly some of
the users are using it to protect against multiple calls into
themselves so we can't just do the simple thing and turn the bus locks
into noops if there's only one client on the bus.  However it *is* quite
rarely used so I'm thinking that what we could do is default to not
having it and then arrange to create it on first use, or just update
the clients to do something during initialisation to cause it to be
created.  That way only controllers with an affected client would pay
the cost.

I don't *think* it's otherwise needed but would need to go through and
verify that.

> spinlocks and mutexes could go. Big win. But the async path is what
> complicates everything. So I have been thinking... what if we could make the
> sync and the async case two separate paths, with two separate message queues?
> In fact the sync path doesn't even need a queue really, since it will just
> handle one message beginning to end, and not return until the message is done.
> It doesn't need the worker thread either AFAICS. Or am I missing something?
> In the end, both paths would converge at the io_mutex. I am tempted to try
> this route. Any thoughts?

The sync path like you say doesn't exactly need a queue itself, it's
partly looking at the queue so it can fall back to pushing work through
the thread in the case that the controller is busy (hopefully opening up
opportunities for us to minimise the latency between completing whatever
is going on already and starting the next message) and partly about
pushing the work idling the hardware into the thread so that it's
deferred a bit and we're less likely to end up spending time bouncing
the controller on and off if there's a sequence of synchronous
operations going on.  That second bit doesn't need us to actually look
at the queue though, we just need to kick the thread so it gets run at
some point and sees that the queue is empty.

Again I need to think this through properly but we can probably arrange
things so that 

>         --> __spi_sync()
>             --> bus_lock_spinlock
>                 --> queue_lock
>                     --> list_add_tail()
>             --> __spi_pump_messages() (also entered here from WQ)
>                 --> queue_lock
>                     --> list_first_entry()

the work we do under the first queue_lock here gets shuffled into
__spin_pump_messages() (possibly replace in_kthread with passing in a
message?  Would need comments.).  That'd mean we'd at least only be
taking the queue lock once.

The other potential issue with eliminating the queue entirely would be
if there's clients which are mixing async and sync operations but
expecting them to complete in order (eg, start a bunch of async stuff
then do a sync operation at the end rather than have a custom
wait/completion).