Re: [RFC PATCH] of: device: Support 2nd sources of probeable but undiscoverable devices

From: Jeff LaBundy <jeff@labundy.com>
To: Doug Anderson <dianders@chromium.org>
Cc: Rob Herring <robh+dt@kernel.org>,
	Krzysztof Kozlowski <krzysztof.kozlowski+dt@linaro.org>,
	Conor Dooley <conor+dt@kernel.org>,
	devicetree@vger.kernel.org,
	Benjamin Tissoires <benjamin.tissoires@redhat.com>,
	Chen-Yu Tsai <wenst@chromium.org>,
	linux-input@vger.kernel.org, Jiri Kosina <jikos@kernel.org>,
	Hsin-Yi Wang <hsinyi@chromium.org>,
	linux-gpio@vger.kernel.org, linus.walleij@linaro.org,
	Dmitry Torokhov <dmitry.torokhov@gmail.com>,
	Johan Hovold <johan+linaro@kernel.org>,
	andriy.shevchenko@linux.intel.com, broonie@kernel.org,
	frowand.list@gmail.com, gregkh@linuxfoundation.org,
	hdegoede@redhat.com, james.clark@arm.com, james@equiv.tech,
	keescook@chromium.org, linux-kernel@vger.kernel.org,
	rafael@kernel.org, tglx@linutronix.de
Subject: Re: [RFC PATCH] of: device: Support 2nd sources of probeable but undiscoverable devices
Date: Tue, 26 Sep 2023 21:36:56 -0500	[thread overview]
Message-ID: <ZROVSAoKF9bimnSP@nixie71> (raw)
In-Reply-To: <CAD=FV=UgFzT0TW2WEV0Wmk05EXUad2EYhN2DcckAxE_Lw5gV1Q@mail.gmail.com>

Hi Doug,

On Fri, Sep 22, 2023 at 05:11:10PM -0700, Doug Anderson wrote:
> Hi,
> 
> On Fri, Sep 22, 2023 at 12:08 PM Rob Herring <robh+dt@kernel.org> wrote:
> >
> > > > This seems like overkill to me. Do we really need groups and a mutex
> > > > for each group? Worst case is what? 2-3 groups of 2-3 devices?
> > > > Instead, what about extending "status" with another value
> > > > ("fail-needs-probe"? (fail-xxx is a documented value)). Currently, the
> > > > kernel would just ignore nodes with that status. Then we can process
> > > > those nodes separately 1-by-1.
> > >
> > > My worry here is that this has the potential to impact boot speed in a
> > > non-trivial way. While trackpads and touchscreens _are_ probable,
> > > their probe routines are often quite slow. This is even mentioned in
> > > Dmitry's initial patches adding async probe to the kernel. See commit
> > > 765230b5f084 ("driver-core: add asynchronous probing support for
> > > drivers") where he specifically brings up input devices as examples.

Ideally, all but one driver in a group should bail out of probe quickly if
the device is not populated. If not, I would consider that to be a bug or at
least room for improvement in that driver.

The reason input devices can take a while to probe is because they may be
loading FW over I2C or performing some sort of calibration procedure; only
one driver in the group should get that far.

> >
> > Perhaps then this should be solved in userspace where it can learn
> > which device is actually present and save that information for
> > subsequent boots.
> 
> Yeah, the thought occurred to me as well. I think there are a few
> problems, though:
> 
> a) Userspace can't itself probe these devices effectively. While
> userspace could turn on GPIOs manually and query the i2c bus manually,
> it can't (I believe) turn on regulators nor can it turn on clocks, if
> they are needed. About the best userspace could do would be to blindly
> try binding an existing kernel driver, and in that case why did we
> need userspace involved anyway?
> 
> b) While deferring to userspace can work for solutions like ChromeOS
> or Android where it's easy to ensure the userspace bits are there,
> it's less appealing as a general solution. I think in Johan's case
> he's taking a laptop that initially ran Windows and then is trying to
> run a generic Linux distro on it. For anyone in a similar situation,
> they'd either need to pick a Linux distro that has the magic userspace
> bits that are needed or they need to know that, on their laptop, they
> need to manually install some software. While requiring special
> userspace might make sense if you've got a special peripheral, like an
> LTE modem, it makes less sense to need special userspace just to get
> the right devices bound...

I recommend against spilling the solution into user space; it's simply not
practical for many downstream use-cases where platform engineers can tightly
control their own bootloader and kernel, but have limited maintainership of
user space which is likely to be shared by many other products.

> 
> 
> > > It wouldn't be absurd to have a system that has multiple sources for
> > > both the trackpad and the touchscreen. If we have to probe each of
> > > these one at a time then it could be slow. It would be quicker to be
> > > able to probe the trackpads (one at a time) at the same time we're
> > > probing the touchscreens (one at a time). Using the "fail-needs-probe"
> > > doesn't provide information needed to know which devices conflict with
> > > each other.
> >
> > I would guess most of the time that's pretty evident. They are going
> > to be on the same bus/link. If unrelated devices are on the same bus,
> > then that's going to get serialized anyways (if bus accesses are what
> > make things slow).

Agreed with Rob here; in the case of a touchscreen, we're almost always
dealing with a module that connects to the main board by way of a flex
connector. Rarely do the bus and GPIO assignments change across alternates.

> >
> > We could add information on the class of device. touchscreen and
> > touchpad aliases or something.
> 
> Ah, I see. So something like "fail-needs-probe-<class>". The
> touchscreens could have "fail-needs-probe-touchscreen" and the
> trackpads could have "fail-needs-probe-trackpad" ? That could work. In
> theory that could fall back to the same solution of grabbing a mutex
> based on the group ID...
> 
> Also: if having the mutex in the "struct device" is seen as a bad
> idea, it would also be easy to remove. __driver_probe_device() could
> just make a call like "of_device_probe_start()" at the beginning that
> locks the mutex and then "of_device_probe_end()" that unlocks it. Both
> of those calls could easily lookup the mutex in a list, which would
> get rid of the need to store it in the "struct device".
> 
> 
> > > That would lead me to suggest this:
> > >
> > >   &i2c_bus {
> > >     trackpad-prober {
> > >       compatible = "mt8173-elm-hana-trackpad-prober";
> > >
> > >       tp1: trackpad@10 {
> > >         compatible = "hid-over-i2c";
> > >         reg = <0x10>;
> > >         ...
> > >         post-power-on-delay-ms = <200>;
> > >       };
> > >       tp2: trackpad@20 {
> > >         compatible = "hid-over-i2c";
> > >         reg = <0x20>;
> > >         ...
> > >         post-power-on-delay-ms = <200>;
> > >       };
> > >     };
> > >   };
> > >
> > > ...but I suspect that would be insta-NAKed because it's creating a
> > > completely virtual device ("mt8173-elm-hana-trackpad-prober") in the
> > > device tree. I don't know if there's something that's functionally
> > > similar that would be OK?

This solution seems a bit confusing to me, and would require more edits
to the dts each time a second source is added. It also means one would
have to write a small platform driver for each group of devices, correct?
I like the idea of a new "status" string; just my $.02.

> >
> > Why do you need the intermediate node other than a convenient way to
> > instantiate a driver? You just need a flag in each node which needs
> > this special handling. Again, "status" could work well here since it
> > keeps the normal probe from happening. But I'm not saying you can't
> > have some board specific code. Sometimes you just need code to deal
> > with this stuff. Don't try to parameterize everything to DT
> > properties.
> 
> I think I'd have an easier time understanding if I knew where you
> envisioned the board-specific code living. Do you have an example of
> board specific code running at boot time in the kernel on DT systems?
> 
> 
> > Note that the above only works with "generic" compatibles with
> > "generic" power sequencing properties (I won't repeat my dislike
> > again).
> 
> I don't think so? I was imagining that we'd have some board specific
> code that ran that knew all the possible combinations of devices,
> could probe them, and then could instantiate the correct driver.
> 
> Imagine that instead of the hated "hid-over-i2c" compatible we were
> using two other devices. Imagine that a given board could have a
> "elan,ekth6915" and a "goodix,gt7375p". Both of these devices have
> specific timing requirements on how to sequence their supplies and
> reset GPIOs. For Elan we power on the supplies, wait at least 1 ms,
> deassert reset, wait at least 300 ms, and then can talk i2c. For
> Goodix we power on the supply, wait at least 10 ms, deassert reset,
> wait at least 180 ms, and then can talk i2c. If we had a
> board-specific probing driver then it would power on the supplies,
> wait at least 10 ms (the max of the two), deassert reset, wait at
> least 300 ms (the max of the two), and then see which device talked.
> Then it would instantiate whichever of the two drivers. This could be
> done for any two devices that EEs have determined have "compatible"
> probing sequences.
> 
> Ideally in the above situation we'd be able to avoid turning the
> device off and on again between the board-specific probe code and the
> normal driver. That optimization might need special code per-driver
> but it feels doable by passing some sort of hint to the child driver
> when it's instantiated.
> 
> 
> > If only the driver knows how to handle the device, then you
> > still just have to have the driver probe. If you *only* wanted to
> > solve the above case, I'd just make "hid-over-i2c" take a 2nd (and
> > 3rd) I2C address in reg and have those as fallbacks.
> 
> Yeah, it did occur to me that having "hid-over-i2c" take more than one
> register (and I guess more than one "hid-descr-addr") would work in my
> earlier example and this might actually be a good solution for Johan.
> I'm hoping for a better generic solution, though.
> 
> 
> > You could always make the driver probe smarter where if your supply
> > was already powered on, then don't delay. Then something else could
> > ensure that the supply is enabled. I'm not sure if regulators have the
> > same issue as clocks where the clock might be on from the bootloader,
> > then a failed probe which gets then puts the clock turns it off.
> 
> I'm not sure it's that simple. Even if the supply didn't turn off by
> itself in some cases, we wouldn't know how long the supply was on.
> 
> -Doug

Thanks for charting this path; I'm really excited to see a solution to
this common problem.

Kind regards,
Jeff LaBundy