All of lore.kernel.org
 help / color / mirror / Atom feed
* gpiolib, gpio removal while a gpio in-use
@ 2015-03-17  2:59 Frkuska, Joshua
  2015-03-23  7:00 ` Alexandre Courbot
  2015-04-22 16:29 ` Johan Hovold
  0 siblings, 2 replies; 5+ messages in thread
From: Frkuska, Joshua @ 2015-03-17  2:59 UTC (permalink / raw)
  To: linux-gpio

Hello,

In 3.18, the upstream commit e1db1706c8 made it acceptable to remove gpio chips that were in use in exchange for a critical message. The reasoning from what I gathered from the mailing list was to avoid handling errors in the device remove handler. The justification was that the error seemed unused by most drivers and there was a compiler warning when ignoring the return value.

At a higher level, this is a gpio handling policy shift. Previously the behavior was to result in an error and disallow the gpio device to be removed. Then starting at 3.18, it becomes ok to free a gpio device regardless of whatever the (possibly critical) gpio may be doing as long as a critical message is displayed.

Prior to this change in 3.18, I ran across a problem in 3.8 and in 3.14, where I have a gpio expander chip on an i2c bus controlling some external power regulators. The regulator reserves the gpio and puts them in use. Then if for any reason, the i2c bus adapter/driver is destroyed/unloaded, the i2c client gets destroyed causing a dangling pointer to be left in the gpio descriptor gpio list. Since the regulator knows the gpio id any subsequent access from the regulator core of the gpiolib causes a dereference of this pointer as follows:

1. gpio_set_value_cansleep() <in my case it was due to a regulator that used the gpio being put/switched>
2. __gpio_get_value()
3. gpiod_get_raw_value()
4. _gpiod_get_raw_value()
5. chip->get()
6. pca953x_gpio_get_value() <any bus based gpio expander in this case>
7. pca953x_read_single()
8. i2c_smbus_read_byte_data()
9. dereferencing an element of the i2c client pointer causes invalid memory access.

This can be prevented a number of ways but all seem unclean (e.g. i2c bus driver from being unloaded/hot unplugged if there is a non-zero reference count). I would like to know if there is a preferred approach. From my understanding this use case is not supported as the comment in gpiolib suggests.

/* gpio_lock prevents conflicts during gpio_desc[] table updates.
* While any GPIO is requested, its gpio_chip is not removable;
* each GPIO's "requested" flag serves as a lock and refcount.
*/

Should a hotplug gpio chip be considered? If the answer is yes, then I would like to ask if there has been any discussion regarding this topic and if so, what the outcome was. 

Fast forward to 3.18+, this issue goes away because a gpio in use can now be removed. Doing so cleans up the gpio data structures and eliminates the problem above. The consequence of this is that the gpio may be put into an unintended state when freed while in-use in the system. An example of this could be the gpio cleaned up/turned off whilst controlling a regulator that is being used by some critical hardware function and it wouldn't be so obvious to a root user that this occurred just because he unloaded the bus driver. In 3.18+, where should the logic for handling the removal of a critical gpio in-use go? Does it go in the actual gpio device remove/free function or somewhere else? In this case perhaps only the element using the gpio knows what to do or at least should be informed in or
 der to keep things safe.

I realize that putting a critical gpio on a removable bus is a risk itself and can be mitigated at design time but for the sake of the argument, it may be unavoidable due to hardware constraints. 

Thank you for your time 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: gpiolib, gpio removal while a gpio in-use
  2015-03-17  2:59 gpiolib, gpio removal while a gpio in-use Frkuska, Joshua
@ 2015-03-23  7:00 ` Alexandre Courbot
  2015-03-31  4:23   ` Frkuska, Joshua
  2015-04-22 16:29 ` Johan Hovold
  1 sibling, 1 reply; 5+ messages in thread
From: Alexandre Courbot @ 2015-03-23  7:00 UTC (permalink / raw)
  To: Frkuska, Joshua; +Cc: linux-gpio

On Tue, Mar 17, 2015 at 11:59 AM, Frkuska, Joshua
<Joshua_Frkuska@mentor.com> wrote:
> Hello,
>
> In 3.18, the upstream commit e1db1706c8 made it acceptable to remove gpio chips that were in use in exchange for a critical message. The reasoning from what I gathered from the mailing list was to avoid handling errors in the device remove handler. The justification was that the error seemed unused by most drivers and there was a compiler warning when ignoring the return value.
>
> At a higher level, this is a gpio handling policy shift. Previously the behavior was to result in an error and disallow the gpio device to be removed. Then starting at 3.18, it becomes ok to free a gpio device regardless of whatever the (possibly critical) gpio may be doing as long as a critical message is displayed.
>
> Prior to this change in 3.18, I ran across a problem in 3.8 and in 3.14, where I have a gpio expander chip on an i2c bus controlling some external power regulators. The regulator reserves the gpio and puts them in use. Then if for any reason, the i2c bus adapter/driver is destroyed/unloaded, the i2c client gets destroyed causing a dangling pointer to be left in the gpio descriptor gpio list. Since the regulator knows the gpio id any subsequent access from the regulator core of the gpiolib causes a dereference of this pointer as follows:
>
> 1. gpio_set_value_cansleep() <in my case it was due to a regulator that used the gpio being put/switched>
> 2. __gpio_get_value()
> 3. gpiod_get_raw_value()
> 4. _gpiod_get_raw_value()
> 5. chip->get()
> 6. pca953x_gpio_get_value() <any bus based gpio expander in this case>
> 7. pca953x_read_single()
> 8. i2c_smbus_read_byte_data()
> 9. dereferencing an element of the i2c client pointer causes invalid memory access.
>
> This can be prevented a number of ways but all seem unclean (e.g. i2c bus driver from being unloaded/hot unplugged if there is a non-zero reference count). I would like to know if there is a preferred approach. From my understanding this use case is not supported as the comment in gpiolib suggests.
>
> /* gpio_lock prevents conflicts during gpio_desc[] table updates.
> * While any GPIO is requested, its gpio_chip is not removable;
> * each GPIO's "requested" flag serves as a lock and refcount.
> */
>
> Should a hotplug gpio chip be considered? If the answer is yes, then I would like to ask if there has been any discussion regarding this topic and if so, what the outcome was.
>
> Fast forward to 3.18+, this issue goes away because a gpio in use can now be removed. Doing so cleans up the gpio data structures and eliminates the problem above. The consequence of this is that the gpio may be put into an unintended state when freed while in-use in the system. An example of this could be the gpio cleaned up/turned off whilst controlling a regulator that is being used by some critical hardware function and it wouldn't be so obvious to a root user that this occurred just because he unloaded the bus driver. In 3.18+, where should the logic for handling the removal of a critical gpio in-use go? Does it go in the actual gpio device remove/free function or somewhere else? In this case perhaps only the element using the gpio knows what to do or at least should be informed in 
 order to keep things safe.
>
> I realize that putting a critical gpio on a removable bus is a risk itself and can be mitigated at design time but for the sake of the argument, it may be unavoidable due to hardware constraints.

As I see it, the correct behavior for a GPIO whose chip is removed at
runtime should be to return an error the next time that GPIO is used.
Problem is, gpiod_set_value() returns void, so it is not clear to me
where such error signaling should take place. At least, an error
should be printed in the kernel log, both when a GPIO chips whose
GPIOs are in use it removed, and when said GPIOs are used
post-removal. Is it what happens with 3.18+?

Having critical GPIOs on a i2c bus is a rather common thing (think
PMIC), so yeah, we should certainly define the state of the hardware
whenever GPIO chips are removed. I suppose the most common practice is
to leave the GPIOs in their last state, but maybe we should codify
this somewhere so GPIO drivers can follow that rule.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: gpiolib, gpio removal while a gpio in-use
  2015-03-23  7:00 ` Alexandre Courbot
@ 2015-03-31  4:23   ` Frkuska, Joshua
  0 siblings, 0 replies; 5+ messages in thread
From: Frkuska, Joshua @ 2015-03-31  4:23 UTC (permalink / raw)
  To: Alexandre Courbot; +Cc: linux-gpio

Thank you for your feedback

> On Tue, Mar 17, 2015 at 11:59 AM, Frkuska, Joshua
> <Joshua_Frkuska@mentor.com> wrote:
> > Hello,
> >
> > In 3.18, the upstream commit e1db1706c8 made it acceptable to remove
> gpio chips that were in use in exchange for a critical message. The reasoning
> from what I gathered from the mailing list was to avoid handling errors in
> the device remove handler. The justification was that the error seemed
> unused by most drivers and there was a compiler warning when ignoring the
> return value.
> >
> > At a higher level, this is a gpio handling policy shift. Previously the
> behavior was to result in an error and disallow the gpio device to be
> removed. Then starting at 3.18, it becomes ok to free a gpio device
> regardless of whatever the (possibly critical) gpio may be doing as long as a
> critical message is displayed.
> >
> > Prior to this change in 3.18, I ran across a problem in 3.8 and in 3.14, where
> I have a gpio expander chip on an i2c bus controlling some external power
> regulators. The regulator reserves the gpio and puts them in use. Then if for
> any reason, the i2c bus adapter/driver is destroyed/unloaded, the i2c client
> gets destroyed causing a dangling pointer to be left in the gpio descriptor
> gpio list. Since the regulator knows the gpio id any subsequent access from
> the regulator core of the gpiolib causes a dereference of this pointer as
> follows:
> >
> > 1. gpio_set_value_cansleep() <in my case it was due to a regulator
> > that used the gpio being put/switched> 2. __gpio_get_value() 3.
> > gpiod_get_raw_value() 4. _gpiod_get_raw_value() 5. chip->get() 6.
> > pca953x_gpio_get_value() <any bus based gpio expander in this case> 7.
> > pca953x_read_single() 8. i2c_smbus_read_byte_data() 9. dereferencing
> > an element of the i2c client pointer causes invalid memory access.
> >
> > This can be prevented a number of ways but all seem unclean (e.g. i2c bus
> driver from being unloaded/hot unplugged if there is a non-zero reference
> count). I would like to know if there is a preferred approach. From my
> understanding this use case is not supported as the comment in gpiolib
> suggests.
> >
> > /* gpio_lock prevents conflicts during gpio_desc[] table updates.
> > * While any GPIO is requested, its gpio_chip is not removable;
> > * each GPIO's "requested" flag serves as a lock and refcount.
> > */
> >
> > Should a hotplug gpio chip be considered? If the answer is yes, then I
> would like to ask if there has been any discussion regarding this topic and if
> so, what the outcome was.
> >
> > Fast forward to 3.18+, this issue goes away because a gpio in use can now
> be removed. Doing so cleans up the gpio data structures and eliminates the
> problem above. The consequence of this is that the gpio may be put into an
> unintended state when freed while in-use in the system. An example of this
> could be the gpio cleaned up/turned off whilst controlling a regulator that is
> being used by some critical hardware function and it wouldn't be so obvious
> to a root user that this occurred just because he unloaded the bus driver. In
> 3.18+, where should the logic for handling the removal of a critical gpio in-
> use go? Does it go in the actual gpio device remove/free function or
> somewhere else? In this case perhaps only the element using the gpio knows
> what to do or at least should be informed in order to keep things safe.
> >
> > I realize that putting a critical gpio on a removable bus is a risk itself and
> can be mitigated at design time but for the sake of the argument, it may be
> unavoidable due to hardware constraints.
> 
> As I see it, the correct behavior for a GPIO whose chip is removed at runtime
> should be to return an error the next time that GPIO is used.
> Problem is, gpiod_set_value() returns void, so it is not clear to me where
> such error signaling should take place. At least, an error should be printed in
> the kernel log, both when a GPIO chips whose GPIOs are in use it removed,
> and when said GPIOs are used post-removal. Is it what happens with 3.18+?
> 
Going forward gpiod_set_value() could be modified to return an error as well as other possible API calls but if the descriptor in the gpio table is cleared as is in 3.18+, any evidence that the gpio was in use prior to this will be lost. In 3.18+, AFAIK the only evidence that such a failure occurred is a dev_crit message.

> Having critical GPIOs on a i2c bus is a rather common thing (think PMIC), so
> yeah, we should certainly define the state of the hardware whenever GPIO
> chips are removed. I suppose the most common practice is to leave the
> GPIOs in their last state, but maybe we should codify this somewhere so
> GPIO drivers can follow that rule.

Before 3.18, the state of the gpio descriptor was maintained and the gpio device was disallowed for removal. This had the side effect of creating the dangling i2c client pointer in the gpio descriptor which I still think can be handled properly, however something such as:

1. the patch introduced in 3.18 would need to be reverted/modified to allow the descriptor to stay.
2. devise a clean way to let an i2c bus/client hotplug from the the gpio core. (In other words, if the i2c bus is not available anymore, the gpio exists but any attempts to change its status would result in immediate failure until the bus becomes available again or all related parties using the gpio release it from service. Requires #3 below)
3. possibly modify locking in the gpio core to prevent race conditions from occurring during a hotplug situation. E.g. Currently the regulator core calls into the gpio core using the gpio implicit locking scheme and always assumes availability. If a flag is checked, it would need to be checked under a locking mechanism to prevent racing.

Perhaps there is a simpler approach. Thoughts?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: gpiolib, gpio removal while a gpio in-use
  2015-03-17  2:59 gpiolib, gpio removal while a gpio in-use Frkuska, Joshua
  2015-03-23  7:00 ` Alexandre Courbot
@ 2015-04-22 16:29 ` Johan Hovold
  2015-04-23  7:15   ` Johan Hovold
  1 sibling, 1 reply; 5+ messages in thread
From: Johan Hovold @ 2015-04-22 16:29 UTC (permalink / raw)
  To: Frkuska, Joshua; +Cc: linux-gpio

On Tue, Mar 17, 2015 at 02:59:08AM +0000, Frkuska, Joshua wrote:
> Hello,
> 
> In 3.18, the upstream commit e1db1706c8 made it acceptable to remove
> gpio chips that were in use in exchange for a critical message. The
> reasoning from what I gathered from the mailing list was to avoid
> handling errors in the device remove handler. The justification was
> that the error seemed unused by most drivers and there was a compiler
> warning when ignoring the return value.
>
> At a higher level, this is a gpio handling policy shift. Previously
> the behavior was to result in an error and disallow the gpio device to
> be removed. Then starting at 3.18, it becomes ok to free a gpio device
> regardless of whatever the (possibly critical) gpio may be doing as
> long as a critical message is displayed.

It still isn't acceptable to remove a gpio chip with requested gpios as
it will lead to memory leaks (even though the most pre-3.18 crashes have
been fixed).

The main reason for not allowing gpiochip_remove to fail is truly
hot-pluggable buses such as USB, for which deregistration must not fail.
The device is gone -- gpiolib must deal with it.

Unfortunately, the change was made without implementing the "deal with
it" part.

> Prior to this change in 3.18, I ran across a problem in 3.8 and in
> 3.14, where I have a gpio expander chip on an i2c bus controlling some
> external power regulators. The regulator reserves the gpio and puts
> them in use. Then if for any reason, the i2c bus adapter/driver is
> destroyed/unloaded, the i2c client gets destroyed causing a dangling
> pointer to be left in the gpio descriptor gpio list. Since the
> regulator knows the gpio id any subsequent access from the regulator
> core of the gpiolib causes a dereference of this pointer as follows:

Here the answer appears straight-forward: don't unload drivers for buses
that are in use.

Again, the problem is real hot-pluggable buses such as USB or Greybus.

> 1. gpio_set_value_cansleep() <in my case it was due to a regulator
> that used the gpio being put/switched> 2. __gpio_get_value() 3.
> gpiod_get_raw_value() 4. _gpiod_get_raw_value() 5. chip->get() 6.
> pca953x_gpio_get_value() <any bus based gpio expander in this case> 7.
> pca953x_read_single() 8. i2c_smbus_read_byte_data() 9. dereferencing
> an element of the i2c client pointer causes invalid memory access.
> 
> This can be prevented a number of ways but all seem unclean (e.g. i2c
> bus driver from being unloaded/hot unplugged if there is a non-zero
> reference count). I would like to know if there is a preferred
> approach. From my understanding this use case is not supported as the
> comment in gpiolib suggests.

This sounds like it should be fixed in i2c with controller reference
counting.

> /* gpio_lock prevents conflicts during gpio_desc[] table updates.  *
> While any GPIO is requested, its gpio_chip is not removable; * each
> GPIO's "requested" flag serves as a lock and refcount.  */
> 
> Should a hotplug gpio chip be considered? If the answer is yes, then I
> would like to ask if there has been any discussion regarding this
> topic and if so, what the outcome was. 

Yes, I started looking into this a while back, but got side-tracked with
other work.

The most critical use case, the sysfs interface, where gpios could stay
requested indefinitely is now handled (patches posted for review).

I also have some working test code for dealing with some use cases with
generic hot-pluggable buses, but it is not at all trivial to support.

> Fast forward to 3.18+, this issue goes away because a gpio in use can
> now be removed. Doing so cleans up the gpio data structures and
> eliminates the problem above.

Not really, we are still leaking memory.

> The consequence of this is that the gpio may be put into an unintended
> state when freed while in-use in the system. An example of this could
> be the gpio cleaned up/turned off whilst controlling a regulator that
> is being used by some critical hardware function and it wouldn't be so
> obvious to a root user that this occurred just because he unloaded the
> bus driver.

We need not worry about what a careless root user does.

But for hot-pluggable buses, this is one of the issues -- we need to
inform the consumer (driver) that the gpio operation failed (and that
the chip is gone). With the current interface this is not even possible
as gpio_get does not return errors and gpio_set is declared void. [ It's
not an option to return a "default" value for gpio_get. ]

We also have no generic hotplug support in driver core and most
subsystems are not prepared for dealing with it either.

> In 3.18+, where should the logic for handling the removal of a
> critical gpio in-use go? Does it go in the actual gpio device
> remove/free function or somewhere else? In this case perhaps only the
> element using the gpio knows what to do or at least should be informed
> in order to keep things safe.

Precisely.

> I realize that putting a critical gpio on a removable bus is a risk
> itself and can be mitigated at design time but for the sake of the
> argument, it may be unavoidable due to hardware constraints.

Yes, that is part of the solution. Don't put critical gpios on a
removable bus (i2c controllers are generally not removable). And don't
go around unloading drivers in such a system. ;)

Johan

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: gpiolib, gpio removal while a gpio in-use
  2015-04-22 16:29 ` Johan Hovold
@ 2015-04-23  7:15   ` Johan Hovold
  0 siblings, 0 replies; 5+ messages in thread
From: Johan Hovold @ 2015-04-23  7:15 UTC (permalink / raw)
  To: Frkuska, Joshua; +Cc: linux-gpio

On Wed, Apr 22, 2015 at 06:29:33PM +0200, Johan Hovold wrote:
> On Tue, Mar 17, 2015 at 02:59:08AM +0000, Frkuska, Joshua wrote:
> > Hello,
> > 
> > In 3.18, the upstream commit e1db1706c8 made it acceptable to remove
> > gpio chips that were in use in exchange for a critical message. The
> > reasoning from what I gathered from the mailing list was to avoid
> > handling errors in the device remove handler. The justification was
> > that the error seemed unused by most drivers and there was a compiler
> > warning when ignoring the return value.
> >
> > At a higher level, this is a gpio handling policy shift. Previously
> > the behavior was to result in an error and disallow the gpio device to
> > be removed. Then starting at 3.18, it becomes ok to free a gpio device
> > regardless of whatever the (possibly critical) gpio may be doing as
> > long as a critical message is displayed.
> 
> It still isn't acceptable to remove a gpio chip with requested gpios as
> it will lead to memory leaks (even though the most pre-3.18 crashes have
> been fixed).

To clarify: the memory leaks are primarily related to sysfs.

The major issue is of course the crashes when attempting to call into
chip drivers whose device state is likely deallocated with descriptors
and chip structures that also have or could have been freed (e.g. as in
your i2c example). This only affects in-kernel consumers using the
gpiod-interface.

So to summarise: Using a requested gpio post chip-removal is not
supported.

Johan

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-04-23  7:15 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-17  2:59 gpiolib, gpio removal while a gpio in-use Frkuska, Joshua
2015-03-23  7:00 ` Alexandre Courbot
2015-03-31  4:23   ` Frkuska, Joshua
2015-04-22 16:29 ` Johan Hovold
2015-04-23  7:15   ` Johan Hovold

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.