All of lore.kernel.org
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: Logan Gunthorpe <logang@deltatee.com>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	linux-kernel@vger.kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@kernel.org>, Will Deacon <will@kernel.org>,
	"Paul E . McKenney" <paulmck@kernel.org>,
	Joel Fernandes <joel@joelfernandes.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Kurt Schwemmer <kurt.schwemmer@microsemi.com>,
	Bjorn Helgaas <bhelgaas@google.com>,
	linux-pci@vger.kernel.org
Subject: Re: [PATCH 3/9] pci/switchtec: Don't abuse completion wait queue for poll
Date: Tue, 17 Mar 2020 01:17:27 +0100	[thread overview]
Message-ID: <87sgi7deco.fsf@nanos.tec.linutronix.de> (raw)
In-Reply-To: <39f2bd27-1a4a-f7ad-5d54-7fe133390cd0@deltatee.com>

Logan,

Logan Gunthorpe <logang@deltatee.com> writes:
> On 2020-03-16 1:34 p.m., Thomas Gleixner wrote:
>> Logan Gunthorpe <logang@deltatee.com> writes:
> I certainly would not agree that this qualifies as "seriously broken",

I concede that this is the wrong wording, but chasing things like this
and constantly mopping up code is not necessarily keeping my mood up all
the time.

> and I'm not even sure I'd agree that this actually violates the
> semantics of poll() seeing the man page clearly states that with
> EPOLLEXCLUSIVE set, "one or more" pollers will be woken up. So waking up
> all of them is still allowed. Ensuring fewer pollers wake up is just an
> optimization to avoid the thundering herd problem which users of this
> interface are very unlikely to ever have (I can confidently tell you
> that none have this problem now).

I can see the point, but in general we should just strive for keeping
the interfaces consistent and not subject to interpretation by
individual developers. Maybe in your case the probability that someone
wants to use it in that way is close to zero, but we've been surprised
often enough by the creativity of user space developers who came up with
use cases nobody ever expected.

> If we do want to say that all poll_wait() users *must* respect
> EPOLLEXCLUSIVE, we should at least have some documentation saying that
> combining poll_wait() with wake_up_all() (or similar) is not allowed. A
> *very* quick check finds there's at least a few drivers doing this:
>
>   drivers/char/ipmi/ipmb_dev_int.c
>   drivers/dma-buf/sync_file.c
>   drivers/gpu/vga/vgaarb.c

Right. There is surely quite some stuff in drivers which either predates
these things or slipped through review.

> Finally, since we seem to back to more reasonable discussion, I will
> make this point: it's fairly common for wait queue users to directly use
> the spinlock from within wait_queue_head_t without an interface (even
> completion.c does it).

completions and waitqueues are both core code. Core primitives build
obviously on other primitives so dependencies on the inner workings are
expected to some degree, especially for real and valuable optimization
reasons. Solving these dependencies when one primitive changes has
limited scope.

> How are developers supposed to know when an interface is required and
> when it's not? Sometimes using "implementation" details interface-free
> is standard practice, but other times it's "yuck" and will illicit ire
> from other developers? Is it valid to use completion.wait.lock?
> Where's the line?

That's a good question, but that question simply arises due to the fact
that C does not provide proper privatizing or if you want to work around
that it forces you to come up with really horrible constructs.

The waitqueue lock usage has a very long history. It goes back to 2002
when the semaphore implementation was optimized. That exposed some of
the waitqueue internals which got consequently used elsewhere as
well. But looking at the actual usage sites, that's a pretty limited
amount. Most of them are core infrastrucure. Of course there are drivers
as well which use it for questionable reasons.

In general I think that silently using implementation details just to
scratch an itch or "optimizing" code is pretty much bad practice.

Especially as this has a tendency to spread out in creative ways. And
this happens simply because developers often copy the next thing which
looks suitable and then expand on it. As this is not necessarily caught
in reviews, this can spread to the point where the core infrastructure
cannot be changed anymore.

That's not a made up nightmare scenario. This happened in reality and
caused me to mop up 50+ interrupt chip implementations in order to be
able to make an urgently needed 10 line change to the core interrupt
infrastructure. Guess what, the vast majority of instances fiddling with
the core internals were either voodoo programming or plain bugs. There
were a few legitimate issues, but they had been better solved in the
core code upfront.  Even after that cleanup a driver got merged which
had #include "../../../../kernel/irg/internal.h" inside just because the
code which was developed out of tree before this change had be made to
"work".

That's just not a workable solution with the ever expanding code base of
the kernel.

I really prefer when people come along and show the problem they want to
solve - I'm completely fine with the POC hack which uses internals for
that purpose - so that this can be discussed and eventually integrated
into core infrastructure in one way or the other or better suitable
solutions can be found.

There are other aspects to this:

 - Using existing interfaces allows a reviewer to do the quick check on
   init, run time usage and tear down instead of wrapping his head
   around the special case

 - Verification tooling of all sorts just works

 - Improvements to the core implementation including new features are
   just coming for free.

I hope that clarifies where I'm coming from.

This has nothing to do with you personally, you just triggered a few
sensible fuses while understandably defending your admittedly smart
solution.

Thanks,

        tglx

  reply	other threads:[~2020-03-17  0:18 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-13 17:46 [PATCH 0/9] Lock ordering documentation and annotation for lockdep Sebastian Andrzej Siewior
2020-03-13 17:46 ` [PATCH 1/9] Documentation: Add lock ordering and nesting documentation Sebastian Andrzej Siewior
2020-03-14 22:57   ` Randy Dunlap
2020-03-16 10:34     ` Sebastian Andrzej Siewior
2020-03-16 15:13       ` Randy Dunlap
2020-03-13 17:46 ` [PATCH 2/9] timekeeping: Split jiffies seqlock Sebastian Andrzej Siewior
2020-03-13 17:46 ` [PATCH 3/9] pci/switchtec: Don't abuse completion wait queue for poll Sebastian Andrzej Siewior
2020-03-13 18:11   ` Logan Gunthorpe
2020-03-14  0:23     ` Thomas Gleixner
2020-03-14  6:01       ` Logan Gunthorpe
2020-03-16 18:52         ` Thomas Gleixner
2020-03-16 19:24           ` Logan Gunthorpe
2020-03-16 19:34     ` Thomas Gleixner
2020-03-16 21:53       ` Logan Gunthorpe
2020-03-17  0:17         ` Thomas Gleixner [this message]
2020-03-17  1:15           ` Logan Gunthorpe
2020-03-13 19:31   ` Peter Zijlstra
2020-03-13 17:46 ` [PATCH 4/9] sched/swait: Prepare usage in completions Sebastian Andrzej Siewior
2020-03-14  0:26   ` Thomas Gleixner
2020-03-13 17:46 ` [PATCH 5/9] completion: Use simple wait queues Sebastian Andrzej Siewior
2020-03-14 15:40   ` Linus Torvalds
2020-03-16 18:55     ` Thomas Gleixner
2020-03-13 17:46 ` [PATCH 6/9] lockdep: Introduce wait-type checks Sebastian Andrzej Siewior
2020-03-31 13:25   ` Geert Uytterhoeven
2020-03-31 13:25     ` Geert Uytterhoeven
2020-03-31 13:42     ` Sebastian Andrzej Siewior
2020-03-31 13:42       ` Sebastian Andrzej Siewior
2020-03-31 14:55     ` Peter Zijlstra
2020-03-31 14:55       ` Peter Zijlstra
2020-03-31 15:28       ` Peter Zijlstra
2020-03-31 15:28         ` Peter Zijlstra
2020-03-31 15:37         ` Frederic Weisbecker
2020-03-31 15:37           ` Frederic Weisbecker
2020-03-13 17:46 ` [PATCH 7/9] lockdep: Add hrtimer context tracing bits Sebastian Andrzej Siewior
2020-03-13 17:47 ` [PATCH 8/9] lockdep: Annotate irq_work Sebastian Andrzej Siewior
2020-03-13 17:47 ` [PATCH 9/9] lockdep: Add posixtimer context tracing bits Sebastian Andrzej Siewior

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87sgi7deco.fsf@nanos.tec.linutronix.de \
    --to=tglx@linutronix.de \
    --cc=bhelgaas@google.com \
    --cc=bigeasy@linutronix.de \
    --cc=joel@joelfernandes.org \
    --cc=kurt.schwemmer@microsemi.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=logang@deltatee.com \
    --cc=mingo@kernel.org \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=torvalds@linux-foundation.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.