On Mon, Jan 17, 2022 at 02:08:19PM +0100, Geert Uytterhoeven wrote:
> Hi Uwe,
> 
> On Mon, Jan 17, 2022 at 12:49 PM Uwe Kleine-König
> <u.kleine-koenig@pengutronix.de> wrote:
> > On Mon, Jan 17, 2022 at 11:35:52AM +0100, Geert Uytterhoeven wrote:
> > > On Mon, Jan 17, 2022 at 10:24 AM Uwe Kleine-König
> > > <u.kleine-koenig@pengutronix.de> wrote:
> > > > On Mon, Jan 17, 2022 at 09:41:42AM +0100, Geert Uytterhoeven wrote:
> > > > > On Sat, Jan 15, 2022 at 9:22 PM Sergey Shtylyov <s.shtylyov@omp.ru> wrote:
> > > > > > On 1/14/22 11:22 PM, Uwe Kleine-König wrote:
> > > > > > > You have to understand that for clk (and regulator and gpiod) NULL is a
> > > > > > > valid descriptor that can actually be used, it just has no effect. So
> > > > > > > this is a convenience value for the case "If the clk/regulator/gpiod in
> > > > > > > question isn't available, there is nothing to do". This is what makes
> > > > > > > clk_get_optional() and the others really useful and justifies their
> > > > > > > existence. This doesn't apply to platform_get_irq_optional().
> > > > > >
> > > > > >    I do understand that. However, IRQs are a different beast with their
> > > > > > own justifications...
> > > > >
> > > > > > > clk_get_optional() is sane and sensible for cases where the clk might be
> > > > > > > absent and it helps you because you don't have to differentiate between
> > > > > > > "not found" and "there is an actual resource".
> > > > > > >
> > > > > > > The reason for platform_get_irq_optional()'s existence is just that
> > > > > > > platform_get_irq() emits an error message which is wrong or suboptimal
> > > > > >
> > > > > >    I think you are very wrong here. The real reason is to simplify the
> > > > > > callers.
> > > > >
> > > > > Indeed.
> > > >
> > > > The commit that introduced platform_get_irq_optional() said:
> > > >
> > > >         Introduce a new platform_get_irq_optional() that works much like
> > > >         platform_get_irq() but does not output an error on failure to
> > > >         find the interrupt.
> > > >
> > > > So the author of 8973ea47901c81a1912bd05f1577bed9b5b52506 failed to
> > > > mention the real reason? Or look at
> > > > 31a8d8fa84c51d3ab00bf059158d5de6178cf890:
> > > >
> > > >         [...] use platform_get_irq_optional() to get second/third IRQ
> > > >         which are optional to avoid below error message during probe:
> > > >         [...]
> > > >
> > > > Look through the output of
> > > >
> > > >         git log -Splatform_get_irq_optional
> > > >
> > > > to find several more of these.
> > >
> > > Commit 8973ea47901c81a1 ("driver core: platform: Introduce
> > > platform_get_irq_optional()") and the various fixups fixed the ugly
> > > printing of error messages that were not applicable.
> > > In hindsight, probably commit 7723f4c5ecdb8d83 ("driver core:
> > > platform: Add an error message to platform_get_irq*()") should have
> > > been reverted instead, until a platform_get_irq_optional() with proper
> > > semantics was introduced.
> >
> > ack.
> >
> > > But as we were all in a hurry to kill the non-applicable error
> > > message, we went for the quick and dirty fix.
> > >
> > > > Also I fail to see how a caller of (today's) platform_get_irq_optional()
> > > > is simpler than a caller of platform_get_irq() given that there is no
> > > > semantic difference between the two. Please show me a single
> > > > conversion from platform_get_irq to platform_get_irq_optional that
> > > > yielded a simplification.
> > >
> > > That's exactly why we want to change the latter to return 0 ;-)
> >
> > OK. So you agree to my statement "The reason for
> > platform_get_irq_optional()'s existence is just that platform_get_irq()
> > emits an error message [...]". Actually you don't want to oppose but
> > say: It's unfortunate that the silent variant of platform_get_irq() took
> > the obvious name of a function that could have an improved return code
> > semantic.
> >
> > So my suggestion to rename todays platform_get_irq_optional() to
> > platform_get_irq_silently() and then introducing
> > platform_get_irq_optional() with your suggested semantic seems
> > intriguing and straigt forward to me.
> 
> I don't really see the point of needing platform_get_irq_silently(),
> unless as an intermediary step, where it's going to be removed again
> once the conversion has completed.

We agree that one of the two functions is enough, just differ in which
of the two we want to have. :-)

If you think platform_get_irq_silently() is a good intermediate step for
your goal, then we agree to rename platform_get_irq_optional(). So I
suggest you ack my patch.

> Still, the rename would touch all users at once anyway.

It would be more easy to keep the conversion regression-free however. A
plain rename is simple to verify. And then converting to the new
platform_get_irq_optional() can be done individually and without the
need to do everything in a single step.

> > Another thought: platform_get_irq emits an error message for all
> > problems. Wouldn't it be consistent to let platform_get_irq_optional()
> > emit an error message for all problems but "not found"?
> > Alternatively remove the error printk from platform_get_irq().
> 
> Yes, all problems but not found are real errors.

If you want to make platform_get_irq and its optional variant more
similar to the others, dropping the error message is the way to go.

> > > > So you need some more effort to convince me of your POV.
> > > >
> > > > > Even for clocks, you cannot assume that you can always blindly use
> > > > > the returned dummy (actually a NULL pointer) to call into the clk
> > > > > API.  While this works fine for simple use cases, where you just
> > > > > want to enable/disable an optional clock (clk_prepare_enable() and
> > > > > clk_disable_unprepare()), it does not work for more complex use cases.
> > > >
> > > > Agreed. But for clks and gpiods and regulators the simple case is quite
> > > > usual. For irqs it isn't.
> > >
> > > It is for devices that can have either separate interrupts, or a single
> > > multiplexed interrupt.
> > >
> > > The logic in e.g. drivers/tty/serial/sh-sci.c and
> > > drivers/spi/spi-rspi.c could be simplified and improved (currently
> > > it doesn't handle deferred probe) if platform_get_irq_optional()
> > > would return 0 instead of -ENXIO.
> >
> > Looking at sh-sci.c the irq handling logic could be improved even
> > without a changed platform_get_irq_optional():
> >
> > diff --git a/drivers/tty/serial/sh-sci.c b/drivers/tty/serial/sh-sci.c
> > index 968967d722d4..c7dc9fb84844 100644
> > --- a/drivers/tty/serial/sh-sci.c
> > +++ b/drivers/tty/serial/sh-sci.c
> > @@ -2873,11 +2873,13 @@ static int sci_init_single(struct platform_device *dev,
> >          * interrupt ID numbers, or muxed together with another interrupt.
> >          */
> >         if (sci_port->irqs[0] < 0)
> > -               return -ENXIO;
> > +               return sci_port->irqs[0];
> >
> > -       if (sci_port->irqs[1] < 0)
> > +       if (sci_port->irqs[1] == -ENXIO)
> >                 for (i = 1; i < ARRAY_SIZE(sci_port->irqs); i++)
> >                         sci_port->irqs[i] = sci_port->irqs[0];
> > +       else if (sci_port->irqs[1] < 0)
> > +               return sci_port->irqs[1];
> >
> >         sci_port->params = sci_probe_regmap(p);
> >         if (unlikely(sci_port->params == NULL))
> >
> > And then the code flow is actively irritating. sci_init_single() copies
> > irqs[0] to all other irqs[i] and then sci_request_irq() loops over the
> > already requested irqs and checks for duplicates. A single place that
> > identifies the exact set of required irqs would already help a lot.
> 
> Yeah, it's ugly and convoluted, like the wide set of hardware the
> driver supports.
> 
> > Also for spi-rspi.c I don't see how platform_get_irq_byname_optional()
> > returning 0 instead of -ENXIO would help. Please talk in patches.
> 
> --- a/drivers/spi/spi-rspi.c
> +++ b/drivers/spi/spi-rspi.c
> @@ -1420,17 +1420,25 @@ static int rspi_probe(struct platform_device *pdev)
>         ctlr->max_native_cs = rspi->ops->num_hw_ss;
> 
>         ret = platform_get_irq_byname_optional(pdev, "rx");
> -       if (ret < 0) {
> +       if (ret < 0)
> +               goto error2;
> +
> +       if (!ret) {
>                 ret = platform_get_irq_byname_optional(pdev, "mux");
> -               if (ret < 0)
> +               if (!ret)
>                         ret = platform_get_irq(pdev, 0);
> +               if (ret < 0)
> +                       goto error2;
> +
>                 if (ret >= 0)
>                         rspi->rx_irq = rspi->tx_irq = ret;
>         } else {
>                 rspi->rx_irq = ret;
>                 ret = platform_get_irq_byname(pdev, "tx");
> -               if (ret >= 0)
> -                       rspi->tx_irq = ret;
> +               if (ret < 0)
> +                       goto error2;
> +
> +               rspi->tx_irq = ret;
>         }
> 
>         if (rspi->rx_irq == rspi->tx_irq) {

This is not a simplification, just looking at the line count and the
added gotos. That's because it also improves error handling and so the
effect isn't easily spotted.

> I like it when the "if (ret < ) ..." error handling is the first check to do.

That's a relevant difference between us.

> With -ENXIO, it becomes more convoluted. and looks less nice (IMHO).
> 
> > Preferably first simplify in-driver logic to make the conversion to the
> > new platform_get_irq_optional() actually reviewable.
> 
> So I have to choose between
> 
>     if (ret < 0 && ret != -ENXIO)
>             return ret;
> 
>     if (ret) {
>             ...
>     }
> 
> and
> 
>     if (ret == -ENXIO) {
>             ...
>     } else if (ret < 0)
>             return ret;
>     }

I would do the latter, then it's in the normal order for error handling

	handle some specific errors;
	forward unhandled errors up the stack;
	handle success;

but it seems you prefer to not call "not found" an error. Actually I
think it's an advantage that the driver has to mention -ENXIO, feels
like proper error handling to me. I guess we won't agree about that
though.

What about the following idea (in pythonic pseudo code for simplicity):

	# the rspi device either has two irqs, one for rx and one for
	# tx, or a single one for both together.

	def muxed_hander(irq):
		status = readl(STATUS)
		if status & IF_RX:
			rx_handler()
		if status & IF_TX:
			tx_handler()

	def probe_muxed_irq():
		irq = platform_get_irq_by_name("mux")
		if irq < 0:
			return irq;

		request_irq(irq, muxed_handler)

	def probe_separate_irqs():
		txirq = platform_get_irq_by_name("tx")
		if txirq < 0:
			return txirq

		rxirq = platform_get_irq_by_name("rx")
		if rxirq < 0:
			return rxirq

		request_irq(txirq, tx_handler)
		request_irq(rxirq, rx_handler)

	def probe():
		ret = probe_separate_irqs()
		if ret == -ENXIO:
			ret = probe_muxed_irq()

		if ret < 0:
			return ret

looks clean (to me that is) and allows to skip the demuxing in
tx_handler and rx_handler (which might or might not yield improved
runtime behaviour). Maybe a bit more verbose, but simpler to grasp for a
human, isn't it?

> with the final target being
> 
>     if (ret < 0)
>             return ret;
> 
>     if (ret) {
>             ...
>     }
> 
> So the first option means the final change is smaller, but it looks less
> nice than the second option (IMHO).
> But the second option means more churn.
> 
> > > So there are three reasons: because the absence of an optional IRQ
> > > is not an error, and thus that should not cause (a) an error code
> > > to be returned, and (b) an error message to be printed, and (c)
> > > because it can simplify the logic in device drivers.
> >
> > I don't agree to (a). If the value signaling not-found is -ENXIO or 0
> > (or -ENODEV) doesn't matter much. I wouldn't deviate from the return
> > code semantics of platform_get_irq() just for having to check against 0
> > instead of -ENXIO. Zero is then just another magic value.
> 
> Zero is a natural magic value (also for pointers).
> Errors are always negative.
> Positive values are cookies (or pointers) associated with success.

Yeah, the issue where we don't agree is if "not-found" is special enough
to deserve the natural magic value. For me -ENXIO is magic enough to
handle the absence of an irq line. I consider it even the better magic
value.

> > (c) still has to be proven, see above.

Best regards
Uwe

-- 
Pengutronix e.K.                           | Uwe Kleine-König            |
Industrial Linux Solutions                 | https://www.pengutronix.de/ |