All of lore.kernel.org
 help / color / mirror / Atom feed
From: Hans de Goede <hdegoede@redhat.com>
To: Daniel Vetter <daniel@ffwll.ch>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Maarten Lankhorst <maarten.lankhorst@linux.intel.com>,
	Maxime Ripard <mripard@kernel.org>,
	Thomas Zimmermann <tzimmermann@suse.de>,
	David Airlie <airlied@linux.ie>,
	Jani Nikula <jani.nikula@linux.intel.com>,
	Joonas Lahtinen <joonas.lahtinen@linux.intel.com>,
	Rodrigo Vivi <rodrigo.vivi@intel.com>,
	Guenter Roeck <linux@roeck-us.net>,
	Heikki Krogerus <heikki.krogerus@linux.intel.com>,
	intel-gfx <intel-gfx@lists.freedesktop.org>,
	dri-devel <dri-devel@lists.freedesktop.org>,
	Platform Driver <platform-driver-x86@vger.kernel.org>,
	USB list <linux-usb@vger.kernel.org>
Subject: Re: [PATCH 1/9] drm/connector: Make the drm_sysfs connector->kdev device hold a reference to the connector
Date: Fri, 30 Apr 2021 15:32:39 +0200	[thread overview]
Message-ID: <5a6fc5d6-a218-8566-6b19-b4ae7d763210@redhat.com> (raw)
In-Reply-To: <CAKMK7uFf8n6QfRdSXeB6J+L7NPGbeEyJKhx1Vu7x8env=_7tkA@mail.gmail.com>

Hi,

On 4/30/21 1:38 PM, Daniel Vetter wrote:
> On Fri, Apr 30, 2021 at 1:28 PM Hans de Goede <hdegoede@redhat.com> wrote:
>>
>> Hi,
>>
>> On 4/29/21 9:09 PM, Daniel Vetter wrote:
>>> On Thu, Apr 29, 2021 at 02:33:17PM +0200, Hans de Goede wrote:
>>>> Hi,
>>>>
>>>> On 4/29/21 2:04 PM, Daniel Vetter wrote:
>>>>> On Thu, Apr 29, 2021 at 01:54:46PM +0200, Greg Kroah-Hartman wrote:
>>>>>> On Thu, Apr 29, 2021 at 01:40:28PM +0200, Daniel Vetter wrote:
>>>>>>> On Wed, Apr 28, 2021 at 11:52:49PM +0200, Hans de Goede wrote:
>>>>>>>> Userspace could hold open a reference to the connector->kdev device,
>>>>>>>> through e.g. holding a sysfs-atrtribute open after
>>>>>>>> drm_sysfs_connector_remove() has been called. In this case the connector
>>>>>>>> could be free-ed while the connector->kdev device's drvdata is still
>>>>>>>> pointing to it.
>>>>>>>>
>>>>>>>> Give drm_connector devices there own device type, which allows
>>>>>>>> us to specify our own release function and make drm_sysfs_connector_add()
>>>>>>>> take a reference on the connector object, and have the new release
>>>>>>>> function put the reference when the device is released.
>>>>>>>>
>>>>>>>> Giving drm_connector devices there own device type, will also allow
>>>>>>>> checking if a device is a drm_connector device with a
>>>>>>>> "if (device->type == &drm_sysfs_device_connector)" check.
>>>>>>>>
>>>>>>>> Note that the setting of the name member of the device_type struct will
>>>>>>>> cause udev events for drm_connector-s to now contain DEVTYPE=drm_connector
>>>>>>>> as extra info. So this extends the uevent part of the userspace API.
>>>>>>>>
>>>>>>>> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
>>>>>>>
>>>>>>> Are you sure? I thought sysfs is supposed to flush out any pending
>>>>>>> operations (they complete fast) and handle open fd internally?
>>>>>>
>>>>>> Yes, it "should" :)
>>>>>
>>>>> Thanks for confirming my vague memories :-)
>>>>>
>>>>> Hans, pls drop this one.
>>>>
>>>> Please see my earlier reply to your review of this patch, it is
>>>> still needed but for a different reason:
>>>>
>>>> """
>>>> We still need this change though to make sure that the
>>>> "drm/connector: Add drm_connector_find_by_fwnode() function"
>>>> does not end up following a dangling drvdat pointer from one
>>>> if the drm_connector kdev-s.
>>>>
>>>> The class_dev_iter_init() in drm_connector_find_by_fwnode() gets
>>>> a reference on all devices and between getting that reference
>>>> and it calling drm_connector_get() - drm_connector_unregister()
>>>> may run and drop the possibly last reference to the
>>>> drm_connector object, freeing it and leaving the kdev's
>>>> drvdata as a dangling pointer.
>>>> """
>>>>
>>>> This is actually why I added it initially, and while adding it
>>>> I came up with this wrong theory of why it was necessary independently
>>>> of the drm_connector_find_by_fwnode() addition, sorry about that.
>>>
>>> Generally that's handled by a kref_get_unless_zero under the protection of
>>> the lock which protects the weak reference. Which I think is the right
>>> model here (at a glance at least) since this is a lookup function.
>>
>> I'm afraid that things are a bit more complicated here. The idea here
>> is that we have a subsystem outside of the DRM subsystem which received
>> a hotplug event for a drm-connector.  The only info which this subsystem
>> has is a reference on the fwnode level (either through device-tree or
>> to platform-code instantiating software-fwnode-s + links for this).
>>
>> So in order to deliver the hotplug event to the connector we need
>> to lookup the connector by fwnode.
>>
>> I've chosen to implement this by iterating over all drm_class
>> devices with a dev_type of drm_connector using class_dev_iter_init()
>> and friends. This makes sure that we either get a reference to
>> the device, or that we skip the device if it is being deleted.
>>
>> But this just gives us a reference to the connector->kdev, not
>> to the connector itself. A pointer to the connector itself is stored
>> as drvdata inside the device, but without taking a reference as
>> this patch does, there is no guarantee that that pointer does not
>> point to possibly free-ed mem.
>>
>> We could set drvdata to 0 from drm_sysfs_connector_remove()
>> Before calling device_unregister(connector->kdev) and then do
>> something like this inside drm_connector_find_by_fwnode():
>>
>> /*
>>  * Lock the device to ensure we either see the drvdata == NULL
>>  * set by drm_sysfs_connector_remove(); or we block the removal
>>  * from continuing until we are done with the device.
>>  */
>> device_lock(dev);
>> connector = dev_get_drvdata(dev);
>> if (connector && connector->fwnode == fwnode) {
>>         drm_connector_get(connector);
>>         found = connector;
>> }
>> device_unlock(dev);
> 
> Yes this is what I mean. Except not a drm_connector_get, but a
> kref_get_unless_zero. The connector might already be on it's way out,
> but the drvdata not yet cleared.

The function we race with is drm_sysfs_connector_remove() and either:

1. The lookup wins the race in which case drm_sysfs_connector_remove()
   can only complete after the drm_connector_get(); and the connector
   kref won't drop to 0 before drm_sysfs_connector_remove() completes; or
2. drm_sysfs_connector_remove() wins the race in which case drvdata will
   be 0.

So using kref_get_unless_zero here will not make a difference and
requires poking inside the drm_connector internals.

Note I will probably go with your suggestion below, so whether or
not to use kref_get_unless_zero here is likely no longer relevant.

>> With the device_lock() synchronizing against the device_lock()
>> in device_unregister(connector->kdev). So that we either see
>> drvdata == NULL if we race with unregistering; or we get
>> a reference on the drm_connector obj before its ref-count can
>> drop to 0.
> 
> The trouble is that most connectors aren't full drivers on their kdev.
> So this isn't the right lock. We need another lock which protects the
> drvdata pointer appropriately for drm connectors.
> 
>> There might be places though where we call code take the device_lock
>> while holding a lock necessary for the drm_connector_get() , so
>> this approach might lead to an AB BA deadlock. As such I think
>> my original approach is better (also see below).
>>
>>> Lookup tables holding full references tends to lead to all kinds of bad
>>> side effects.
>>
>> The proposed reference is not part of a lookup list, it is a
>> reference from the kdev on the drm_connector object which gets
>> dropped as soon as the kdev's refcount hits 0, which normally
>> happens directly after drm_connector_unregister() has run.
> 
> Yeah but the way you use it is for lookup purposes. What we're
> implementing is the "get me the drm_connector for this fwnode"
> functionality, and that _is_ a lookup.

Ack.

> How its implemented is an
> internal detail really, and somehow using full references for lookup
> functionality isn't great.

Ok, note that the caller of this only needs the reference for a
short while, what the caller does is:

        connector = drm_connector_find_by_fwnode(dp->connector_fwnode);
        if (connector) {
                drm_connector_oob_hotplug_event(connector, &data);
                drm_connector_put(connector);
        }

As a result of out discussion I have been thinking about enforcing this
short-lifetime of the reference by changing:

void drm_connector_oob_hotplug_event(struct drm_connector *connector,
                                     struct drm_connector_oob_hotplug_event_data *data);

to:

void drm_connector_oob_hotplug_event(struct fwnode_handle connector_fwnode,
                                     struct drm_connector_oob_hotplug_event_data *data);

And making that do the lookup (+ almost immediate put) internally, making
the connector-lookup a purely drm-subsys internal thing and enforcing code
outside of the drm-subsys not holding a long-time reference to the connector
this way.

Please let me know if you prefer the variant where the connector lookup
details are hidden from the callers ?

Then I can change this for for v2 of this patch/series.

> I'm also not sure why we have to use the kdev stuff here. For other
> random objects we need to look up we're building that functionality on
> that object. It means you need to keep another list_head around for
> that lookup, but that's really not a big cost. E.g. drm_bridge/panel
> work like that.

Using class_for_each_dev seemed like a good way to iterate over all
the connectors. But given the discussion this has caused, just adding
a new static list + mutex for this to drivers/gpu/drm/drm_connector.c
sounds like it might be a better approach indeed.

So shall I change thing over to this approach for v2 of this patch/series?

Regards,

Hans


WARNING: multiple messages have this Message-ID (diff)
From: Hans de Goede <hdegoede@redhat.com>
To: Daniel Vetter <daniel@ffwll.ch>
Cc: Heikki Krogerus <heikki.krogerus@linux.intel.com>,
	dri-devel <dri-devel@lists.freedesktop.org>,
	David Airlie <airlied@linux.ie>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	intel-gfx <intel-gfx@lists.freedesktop.org>,
	Platform Driver <platform-driver-x86@vger.kernel.org>,
	USB list <linux-usb@vger.kernel.org>,
	Thomas Zimmermann <tzimmermann@suse.de>,
	Rodrigo Vivi <rodrigo.vivi@intel.com>,
	Guenter Roeck <linux@roeck-us.net>
Subject: Re: [PATCH 1/9] drm/connector: Make the drm_sysfs connector->kdev device hold a reference to the connector
Date: Fri, 30 Apr 2021 15:32:39 +0200	[thread overview]
Message-ID: <5a6fc5d6-a218-8566-6b19-b4ae7d763210@redhat.com> (raw)
In-Reply-To: <CAKMK7uFf8n6QfRdSXeB6J+L7NPGbeEyJKhx1Vu7x8env=_7tkA@mail.gmail.com>

Hi,

On 4/30/21 1:38 PM, Daniel Vetter wrote:
> On Fri, Apr 30, 2021 at 1:28 PM Hans de Goede <hdegoede@redhat.com> wrote:
>>
>> Hi,
>>
>> On 4/29/21 9:09 PM, Daniel Vetter wrote:
>>> On Thu, Apr 29, 2021 at 02:33:17PM +0200, Hans de Goede wrote:
>>>> Hi,
>>>>
>>>> On 4/29/21 2:04 PM, Daniel Vetter wrote:
>>>>> On Thu, Apr 29, 2021 at 01:54:46PM +0200, Greg Kroah-Hartman wrote:
>>>>>> On Thu, Apr 29, 2021 at 01:40:28PM +0200, Daniel Vetter wrote:
>>>>>>> On Wed, Apr 28, 2021 at 11:52:49PM +0200, Hans de Goede wrote:
>>>>>>>> Userspace could hold open a reference to the connector->kdev device,
>>>>>>>> through e.g. holding a sysfs-atrtribute open after
>>>>>>>> drm_sysfs_connector_remove() has been called. In this case the connector
>>>>>>>> could be free-ed while the connector->kdev device's drvdata is still
>>>>>>>> pointing to it.
>>>>>>>>
>>>>>>>> Give drm_connector devices there own device type, which allows
>>>>>>>> us to specify our own release function and make drm_sysfs_connector_add()
>>>>>>>> take a reference on the connector object, and have the new release
>>>>>>>> function put the reference when the device is released.
>>>>>>>>
>>>>>>>> Giving drm_connector devices there own device type, will also allow
>>>>>>>> checking if a device is a drm_connector device with a
>>>>>>>> "if (device->type == &drm_sysfs_device_connector)" check.
>>>>>>>>
>>>>>>>> Note that the setting of the name member of the device_type struct will
>>>>>>>> cause udev events for drm_connector-s to now contain DEVTYPE=drm_connector
>>>>>>>> as extra info. So this extends the uevent part of the userspace API.
>>>>>>>>
>>>>>>>> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
>>>>>>>
>>>>>>> Are you sure? I thought sysfs is supposed to flush out any pending
>>>>>>> operations (they complete fast) and handle open fd internally?
>>>>>>
>>>>>> Yes, it "should" :)
>>>>>
>>>>> Thanks for confirming my vague memories :-)
>>>>>
>>>>> Hans, pls drop this one.
>>>>
>>>> Please see my earlier reply to your review of this patch, it is
>>>> still needed but for a different reason:
>>>>
>>>> """
>>>> We still need this change though to make sure that the
>>>> "drm/connector: Add drm_connector_find_by_fwnode() function"
>>>> does not end up following a dangling drvdat pointer from one
>>>> if the drm_connector kdev-s.
>>>>
>>>> The class_dev_iter_init() in drm_connector_find_by_fwnode() gets
>>>> a reference on all devices and between getting that reference
>>>> and it calling drm_connector_get() - drm_connector_unregister()
>>>> may run and drop the possibly last reference to the
>>>> drm_connector object, freeing it and leaving the kdev's
>>>> drvdata as a dangling pointer.
>>>> """
>>>>
>>>> This is actually why I added it initially, and while adding it
>>>> I came up with this wrong theory of why it was necessary independently
>>>> of the drm_connector_find_by_fwnode() addition, sorry about that.
>>>
>>> Generally that's handled by a kref_get_unless_zero under the protection of
>>> the lock which protects the weak reference. Which I think is the right
>>> model here (at a glance at least) since this is a lookup function.
>>
>> I'm afraid that things are a bit more complicated here. The idea here
>> is that we have a subsystem outside of the DRM subsystem which received
>> a hotplug event for a drm-connector.  The only info which this subsystem
>> has is a reference on the fwnode level (either through device-tree or
>> to platform-code instantiating software-fwnode-s + links for this).
>>
>> So in order to deliver the hotplug event to the connector we need
>> to lookup the connector by fwnode.
>>
>> I've chosen to implement this by iterating over all drm_class
>> devices with a dev_type of drm_connector using class_dev_iter_init()
>> and friends. This makes sure that we either get a reference to
>> the device, or that we skip the device if it is being deleted.
>>
>> But this just gives us a reference to the connector->kdev, not
>> to the connector itself. A pointer to the connector itself is stored
>> as drvdata inside the device, but without taking a reference as
>> this patch does, there is no guarantee that that pointer does not
>> point to possibly free-ed mem.
>>
>> We could set drvdata to 0 from drm_sysfs_connector_remove()
>> Before calling device_unregister(connector->kdev) and then do
>> something like this inside drm_connector_find_by_fwnode():
>>
>> /*
>>  * Lock the device to ensure we either see the drvdata == NULL
>>  * set by drm_sysfs_connector_remove(); or we block the removal
>>  * from continuing until we are done with the device.
>>  */
>> device_lock(dev);
>> connector = dev_get_drvdata(dev);
>> if (connector && connector->fwnode == fwnode) {
>>         drm_connector_get(connector);
>>         found = connector;
>> }
>> device_unlock(dev);
> 
> Yes this is what I mean. Except not a drm_connector_get, but a
> kref_get_unless_zero. The connector might already be on it's way out,
> but the drvdata not yet cleared.

The function we race with is drm_sysfs_connector_remove() and either:

1. The lookup wins the race in which case drm_sysfs_connector_remove()
   can only complete after the drm_connector_get(); and the connector
   kref won't drop to 0 before drm_sysfs_connector_remove() completes; or
2. drm_sysfs_connector_remove() wins the race in which case drvdata will
   be 0.

So using kref_get_unless_zero here will not make a difference and
requires poking inside the drm_connector internals.

Note I will probably go with your suggestion below, so whether or
not to use kref_get_unless_zero here is likely no longer relevant.

>> With the device_lock() synchronizing against the device_lock()
>> in device_unregister(connector->kdev). So that we either see
>> drvdata == NULL if we race with unregistering; or we get
>> a reference on the drm_connector obj before its ref-count can
>> drop to 0.
> 
> The trouble is that most connectors aren't full drivers on their kdev.
> So this isn't the right lock. We need another lock which protects the
> drvdata pointer appropriately for drm connectors.
> 
>> There might be places though where we call code take the device_lock
>> while holding a lock necessary for the drm_connector_get() , so
>> this approach might lead to an AB BA deadlock. As such I think
>> my original approach is better (also see below).
>>
>>> Lookup tables holding full references tends to lead to all kinds of bad
>>> side effects.
>>
>> The proposed reference is not part of a lookup list, it is a
>> reference from the kdev on the drm_connector object which gets
>> dropped as soon as the kdev's refcount hits 0, which normally
>> happens directly after drm_connector_unregister() has run.
> 
> Yeah but the way you use it is for lookup purposes. What we're
> implementing is the "get me the drm_connector for this fwnode"
> functionality, and that _is_ a lookup.

Ack.

> How its implemented is an
> internal detail really, and somehow using full references for lookup
> functionality isn't great.

Ok, note that the caller of this only needs the reference for a
short while, what the caller does is:

        connector = drm_connector_find_by_fwnode(dp->connector_fwnode);
        if (connector) {
                drm_connector_oob_hotplug_event(connector, &data);
                drm_connector_put(connector);
        }

As a result of out discussion I have been thinking about enforcing this
short-lifetime of the reference by changing:

void drm_connector_oob_hotplug_event(struct drm_connector *connector,
                                     struct drm_connector_oob_hotplug_event_data *data);

to:

void drm_connector_oob_hotplug_event(struct fwnode_handle connector_fwnode,
                                     struct drm_connector_oob_hotplug_event_data *data);

And making that do the lookup (+ almost immediate put) internally, making
the connector-lookup a purely drm-subsys internal thing and enforcing code
outside of the drm-subsys not holding a long-time reference to the connector
this way.

Please let me know if you prefer the variant where the connector lookup
details are hidden from the callers ?

Then I can change this for for v2 of this patch/series.

> I'm also not sure why we have to use the kdev stuff here. For other
> random objects we need to look up we're building that functionality on
> that object. It means you need to keep another list_head around for
> that lookup, but that's really not a big cost. E.g. drm_bridge/panel
> work like that.

Using class_for_each_dev seemed like a good way to iterate over all
the connectors. But given the discussion this has caused, just adding
a new static list + mutex for this to drivers/gpu/drm/drm_connector.c
sounds like it might be a better approach indeed.

So shall I change thing over to this approach for v2 of this patch/series?

Regards,

Hans

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

WARNING: multiple messages have this Message-ID (diff)
From: Hans de Goede <hdegoede@redhat.com>
To: Daniel Vetter <daniel@ffwll.ch>
Cc: Heikki Krogerus <heikki.krogerus@linux.intel.com>,
	dri-devel <dri-devel@lists.freedesktop.org>,
	David Airlie <airlied@linux.ie>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	intel-gfx <intel-gfx@lists.freedesktop.org>,
	Platform Driver <platform-driver-x86@vger.kernel.org>,
	USB list <linux-usb@vger.kernel.org>,
	Maxime Ripard <mripard@kernel.org>,
	Thomas Zimmermann <tzimmermann@suse.de>,
	Guenter Roeck <linux@roeck-us.net>
Subject: Re: [Intel-gfx] [PATCH 1/9] drm/connector: Make the drm_sysfs connector->kdev device hold a reference to the connector
Date: Fri, 30 Apr 2021 15:32:39 +0200	[thread overview]
Message-ID: <5a6fc5d6-a218-8566-6b19-b4ae7d763210@redhat.com> (raw)
In-Reply-To: <CAKMK7uFf8n6QfRdSXeB6J+L7NPGbeEyJKhx1Vu7x8env=_7tkA@mail.gmail.com>

Hi,

On 4/30/21 1:38 PM, Daniel Vetter wrote:
> On Fri, Apr 30, 2021 at 1:28 PM Hans de Goede <hdegoede@redhat.com> wrote:
>>
>> Hi,
>>
>> On 4/29/21 9:09 PM, Daniel Vetter wrote:
>>> On Thu, Apr 29, 2021 at 02:33:17PM +0200, Hans de Goede wrote:
>>>> Hi,
>>>>
>>>> On 4/29/21 2:04 PM, Daniel Vetter wrote:
>>>>> On Thu, Apr 29, 2021 at 01:54:46PM +0200, Greg Kroah-Hartman wrote:
>>>>>> On Thu, Apr 29, 2021 at 01:40:28PM +0200, Daniel Vetter wrote:
>>>>>>> On Wed, Apr 28, 2021 at 11:52:49PM +0200, Hans de Goede wrote:
>>>>>>>> Userspace could hold open a reference to the connector->kdev device,
>>>>>>>> through e.g. holding a sysfs-atrtribute open after
>>>>>>>> drm_sysfs_connector_remove() has been called. In this case the connector
>>>>>>>> could be free-ed while the connector->kdev device's drvdata is still
>>>>>>>> pointing to it.
>>>>>>>>
>>>>>>>> Give drm_connector devices there own device type, which allows
>>>>>>>> us to specify our own release function and make drm_sysfs_connector_add()
>>>>>>>> take a reference on the connector object, and have the new release
>>>>>>>> function put the reference when the device is released.
>>>>>>>>
>>>>>>>> Giving drm_connector devices there own device type, will also allow
>>>>>>>> checking if a device is a drm_connector device with a
>>>>>>>> "if (device->type == &drm_sysfs_device_connector)" check.
>>>>>>>>
>>>>>>>> Note that the setting of the name member of the device_type struct will
>>>>>>>> cause udev events for drm_connector-s to now contain DEVTYPE=drm_connector
>>>>>>>> as extra info. So this extends the uevent part of the userspace API.
>>>>>>>>
>>>>>>>> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
>>>>>>>
>>>>>>> Are you sure? I thought sysfs is supposed to flush out any pending
>>>>>>> operations (they complete fast) and handle open fd internally?
>>>>>>
>>>>>> Yes, it "should" :)
>>>>>
>>>>> Thanks for confirming my vague memories :-)
>>>>>
>>>>> Hans, pls drop this one.
>>>>
>>>> Please see my earlier reply to your review of this patch, it is
>>>> still needed but for a different reason:
>>>>
>>>> """
>>>> We still need this change though to make sure that the
>>>> "drm/connector: Add drm_connector_find_by_fwnode() function"
>>>> does not end up following a dangling drvdat pointer from one
>>>> if the drm_connector kdev-s.
>>>>
>>>> The class_dev_iter_init() in drm_connector_find_by_fwnode() gets
>>>> a reference on all devices and between getting that reference
>>>> and it calling drm_connector_get() - drm_connector_unregister()
>>>> may run and drop the possibly last reference to the
>>>> drm_connector object, freeing it and leaving the kdev's
>>>> drvdata as a dangling pointer.
>>>> """
>>>>
>>>> This is actually why I added it initially, and while adding it
>>>> I came up with this wrong theory of why it was necessary independently
>>>> of the drm_connector_find_by_fwnode() addition, sorry about that.
>>>
>>> Generally that's handled by a kref_get_unless_zero under the protection of
>>> the lock which protects the weak reference. Which I think is the right
>>> model here (at a glance at least) since this is a lookup function.
>>
>> I'm afraid that things are a bit more complicated here. The idea here
>> is that we have a subsystem outside of the DRM subsystem which received
>> a hotplug event for a drm-connector.  The only info which this subsystem
>> has is a reference on the fwnode level (either through device-tree or
>> to platform-code instantiating software-fwnode-s + links for this).
>>
>> So in order to deliver the hotplug event to the connector we need
>> to lookup the connector by fwnode.
>>
>> I've chosen to implement this by iterating over all drm_class
>> devices with a dev_type of drm_connector using class_dev_iter_init()
>> and friends. This makes sure that we either get a reference to
>> the device, or that we skip the device if it is being deleted.
>>
>> But this just gives us a reference to the connector->kdev, not
>> to the connector itself. A pointer to the connector itself is stored
>> as drvdata inside the device, but without taking a reference as
>> this patch does, there is no guarantee that that pointer does not
>> point to possibly free-ed mem.
>>
>> We could set drvdata to 0 from drm_sysfs_connector_remove()
>> Before calling device_unregister(connector->kdev) and then do
>> something like this inside drm_connector_find_by_fwnode():
>>
>> /*
>>  * Lock the device to ensure we either see the drvdata == NULL
>>  * set by drm_sysfs_connector_remove(); or we block the removal
>>  * from continuing until we are done with the device.
>>  */
>> device_lock(dev);
>> connector = dev_get_drvdata(dev);
>> if (connector && connector->fwnode == fwnode) {
>>         drm_connector_get(connector);
>>         found = connector;
>> }
>> device_unlock(dev);
> 
> Yes this is what I mean. Except not a drm_connector_get, but a
> kref_get_unless_zero. The connector might already be on it's way out,
> but the drvdata not yet cleared.

The function we race with is drm_sysfs_connector_remove() and either:

1. The lookup wins the race in which case drm_sysfs_connector_remove()
   can only complete after the drm_connector_get(); and the connector
   kref won't drop to 0 before drm_sysfs_connector_remove() completes; or
2. drm_sysfs_connector_remove() wins the race in which case drvdata will
   be 0.

So using kref_get_unless_zero here will not make a difference and
requires poking inside the drm_connector internals.

Note I will probably go with your suggestion below, so whether or
not to use kref_get_unless_zero here is likely no longer relevant.

>> With the device_lock() synchronizing against the device_lock()
>> in device_unregister(connector->kdev). So that we either see
>> drvdata == NULL if we race with unregistering; or we get
>> a reference on the drm_connector obj before its ref-count can
>> drop to 0.
> 
> The trouble is that most connectors aren't full drivers on their kdev.
> So this isn't the right lock. We need another lock which protects the
> drvdata pointer appropriately for drm connectors.
> 
>> There might be places though where we call code take the device_lock
>> while holding a lock necessary for the drm_connector_get() , so
>> this approach might lead to an AB BA deadlock. As such I think
>> my original approach is better (also see below).
>>
>>> Lookup tables holding full references tends to lead to all kinds of bad
>>> side effects.
>>
>> The proposed reference is not part of a lookup list, it is a
>> reference from the kdev on the drm_connector object which gets
>> dropped as soon as the kdev's refcount hits 0, which normally
>> happens directly after drm_connector_unregister() has run.
> 
> Yeah but the way you use it is for lookup purposes. What we're
> implementing is the "get me the drm_connector for this fwnode"
> functionality, and that _is_ a lookup.

Ack.

> How its implemented is an
> internal detail really, and somehow using full references for lookup
> functionality isn't great.

Ok, note that the caller of this only needs the reference for a
short while, what the caller does is:

        connector = drm_connector_find_by_fwnode(dp->connector_fwnode);
        if (connector) {
                drm_connector_oob_hotplug_event(connector, &data);
                drm_connector_put(connector);
        }

As a result of out discussion I have been thinking about enforcing this
short-lifetime of the reference by changing:

void drm_connector_oob_hotplug_event(struct drm_connector *connector,
                                     struct drm_connector_oob_hotplug_event_data *data);

to:

void drm_connector_oob_hotplug_event(struct fwnode_handle connector_fwnode,
                                     struct drm_connector_oob_hotplug_event_data *data);

And making that do the lookup (+ almost immediate put) internally, making
the connector-lookup a purely drm-subsys internal thing and enforcing code
outside of the drm-subsys not holding a long-time reference to the connector
this way.

Please let me know if you prefer the variant where the connector lookup
details are hidden from the callers ?

Then I can change this for for v2 of this patch/series.

> I'm also not sure why we have to use the kdev stuff here. For other
> random objects we need to look up we're building that functionality on
> that object. It means you need to keep another list_head around for
> that lookup, but that's really not a big cost. E.g. drm_bridge/panel
> work like that.

Using class_for_each_dev seemed like a good way to iterate over all
the connectors. But given the discussion this has caused, just adding
a new static list + mutex for this to drivers/gpu/drm/drm_connector.c
sounds like it might be a better approach indeed.

So shall I change thing over to this approach for v2 of this patch/series?

Regards,

Hans

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2021-04-30 13:32 UTC|newest]

Thread overview: 88+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-28 21:52 [PATCH 0/9] drm + usb-type-c: Add support for out-of-band hotplug notification Hans de Goede
2021-04-28 21:52 ` [Intel-gfx] " Hans de Goede
2021-04-28 21:52 ` Hans de Goede
2021-04-28 21:52 ` [PATCH 1/9] drm/connector: Make the drm_sysfs connector->kdev device hold a reference to the connector Hans de Goede
2021-04-28 21:52   ` [Intel-gfx] " Hans de Goede
2021-04-28 21:52   ` Hans de Goede
2021-04-29 11:40   ` Daniel Vetter
2021-04-29 11:40     ` [Intel-gfx] " Daniel Vetter
2021-04-29 11:40     ` Daniel Vetter
2021-04-29 11:54     ` Greg Kroah-Hartman
2021-04-29 11:54       ` [Intel-gfx] " Greg Kroah-Hartman
2021-04-29 11:54       ` Greg Kroah-Hartman
2021-04-29 12:04       ` Daniel Vetter
2021-04-29 12:04         ` [Intel-gfx] " Daniel Vetter
2021-04-29 12:04         ` Daniel Vetter
2021-04-29 12:33         ` Hans de Goede
2021-04-29 12:33           ` [Intel-gfx] " Hans de Goede
2021-04-29 12:33           ` Hans de Goede
2021-04-29 19:09           ` Daniel Vetter
2021-04-29 19:09             ` [Intel-gfx] " Daniel Vetter
2021-04-29 19:09             ` Daniel Vetter
2021-04-30 11:28             ` Hans de Goede
2021-04-30 11:28               ` [Intel-gfx] " Hans de Goede
2021-04-30 11:28               ` Hans de Goede
2021-04-30 11:38               ` Daniel Vetter
2021-04-30 11:38                 ` [Intel-gfx] " Daniel Vetter
2021-04-30 11:38                 ` Daniel Vetter
2021-04-30 13:32                 ` Hans de Goede [this message]
2021-04-30 13:32                   ` [Intel-gfx] " Hans de Goede
2021-04-30 13:32                   ` Hans de Goede
2021-04-30 14:30                   ` Daniel Vetter
2021-04-30 14:30                     ` [Intel-gfx] " Daniel Vetter
2021-04-30 14:30                     ` Daniel Vetter
2021-04-30 13:45                 ` Hans de Goede
2021-04-30 13:45                   ` [Intel-gfx] " Hans de Goede
2021-04-30 13:45                   ` Hans de Goede
2021-04-29 12:30     ` Hans de Goede
2021-04-29 12:30       ` [Intel-gfx] " Hans de Goede
2021-04-29 12:30       ` Hans de Goede
2021-04-28 21:52 ` [PATCH 2/9] drm/connector: Add a fwnode pointer to drm_connector and register with ACPI Hans de Goede
2021-04-28 21:52   ` [Intel-gfx] " Hans de Goede
2021-04-28 21:52   ` Hans de Goede
2021-04-28 21:52 ` [PATCH 3/9] drm/connector: Add drm_connector_find_by_fwnode() function Hans de Goede
2021-04-28 21:52   ` [Intel-gfx] " Hans de Goede
2021-04-28 21:52   ` Hans de Goede
2021-04-28 21:52 ` [PATCH 4/9] drm/connector: Add support for out-of-band hotplug notification Hans de Goede
2021-04-28 21:52   ` [Intel-gfx] " Hans de Goede
2021-04-28 21:52   ` Hans de Goede
2021-05-03  8:00   ` Heikki Krogerus
2021-05-03  8:00     ` [Intel-gfx] " Heikki Krogerus
2021-05-03  8:00     ` Heikki Krogerus
2021-05-03 14:35     ` Hans de Goede
2021-05-03 14:35       ` [Intel-gfx] " Hans de Goede
2021-05-03 14:35       ` Hans de Goede
2021-05-04 14:52       ` Heikki Krogerus
2021-05-04 14:52         ` [Intel-gfx] " Heikki Krogerus
2021-05-04 14:52         ` Heikki Krogerus
2021-05-04 15:34         ` Hans de Goede
2021-05-04 15:34           ` [Intel-gfx] " Hans de Goede
2021-05-04 15:34           ` Hans de Goede
2021-05-04 12:53     ` Imre Deak
2021-05-04 12:53       ` [Intel-gfx] " Imre Deak
2021-05-04 12:53       ` Imre Deak
2021-04-28 21:52 ` [PATCH 5/9] drm/i915: Associate ACPI connector nodes with connector entries Hans de Goede
2021-04-28 21:52   ` [Intel-gfx] " Hans de Goede
2021-04-28 21:52   ` Hans de Goede
2021-04-28 21:52 ` [PATCH 6/9] drm/i915/dp: Add support for out-of-bound hotplug events Hans de Goede
2021-04-28 21:52   ` [Intel-gfx] " Hans de Goede
2021-04-28 21:52   ` Hans de Goede
2021-04-28 21:52 ` [PATCH 7/9] usb: typec: altmodes/displayport: Make dp_altmode_notify() more generic Hans de Goede
2021-04-28 21:52   ` [Intel-gfx] " Hans de Goede
2021-04-28 21:52   ` Hans de Goede
2021-04-28 21:52 ` [PATCH 8/9] usb: typec: altmodes/displayport: Notify drm subsys of hotplug events Hans de Goede
2021-04-28 21:52   ` [Intel-gfx] " Hans de Goede
2021-04-28 21:52   ` Hans de Goede
2021-04-29  2:08   ` kernel test robot
2021-04-29 14:16     ` Hans de Goede
2021-04-29  4:40   ` kernel test robot
2021-04-28 21:52 ` [PATCH 9/9] platform/x86/intel_cht_int33fe: Correct "displayport" fwnode reference Hans de Goede
2021-04-28 21:52   ` [Intel-gfx] " Hans de Goede
2021-04-28 21:52   ` Hans de Goede
2021-04-28 21:57 ` [PATCH 0/9] drm + usb-type-c: Add support for out-of-band hotplug notification Hans de Goede
2021-04-28 21:57   ` [Intel-gfx] " Hans de Goede
2021-04-28 21:57   ` Hans de Goede
2021-04-28 22:18 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for " Patchwork
2021-04-28 22:21 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2021-04-28 22:49 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2021-04-29  0:26 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5a6fc5d6-a218-8566-6b19-b4ae7d763210@redhat.com \
    --to=hdegoede@redhat.com \
    --cc=airlied@linux.ie \
    --cc=daniel@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=heikki.krogerus@linux.intel.com \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=jani.nikula@linux.intel.com \
    --cc=joonas.lahtinen@linux.intel.com \
    --cc=linux-usb@vger.kernel.org \
    --cc=linux@roeck-us.net \
    --cc=maarten.lankhorst@linux.intel.com \
    --cc=mripard@kernel.org \
    --cc=platform-driver-x86@vger.kernel.org \
    --cc=rodrigo.vivi@intel.com \
    --cc=tzimmermann@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.