* [PATCH 0/1] [RFC] DRM locking issues during early open @ 2012-04-19 16:22 Andy Whitcroft 2012-04-19 16:22 ` [PATCH 1/1] drm -- stop early access to drm devices Andy Whitcroft ` (2 more replies) 0 siblings, 3 replies; 16+ messages in thread From: Andy Whitcroft @ 2012-04-19 16:22 UTC (permalink / raw) To: Andy Whitcroft, David Airlie, dri-devel Cc: Jesse Barnes, Bryce Harrington, linux-kernel We have been carrying a (rather poor) patch for an issue we identified in the DRM driver. This issue is triggered when a DRM device is initialising and userspace attempts to open it, typically in response to the sysfs device added event. Basically we allocate the minor numbers making the device available, and then call the drm load callback. Until this completes the device is really not ready and these early opens typically lead to oopses. We have been using the following patch to avoid this by marking the minors as in error until the load method has completed. This avoids the early open by simply erroring out the opens with EAGAIN. Obviously we should be delaying the open until the load method complete. I include the existing patch for completness (it is not really ready for merging) to illustrate the issue. I think it is logical that the wait should simply be delayed until the load has completed. I am proposing to include a wait queue associated with the idr cache for the drm minors which we can use to allow open callers to wait_event_interruptible() on. I'll be putting together a prototype shortly and will follow up with it. Thoughts? -apw Andy Whitcroft (1): drm -- stop early access to drm devices drivers/gpu/drm/drm_fops.c | 8 ++++++-- drivers/gpu/drm/drm_pci.c | 4 ++++ drivers/gpu/drm/drm_platform.c | 4 ++++ drivers/gpu/drm/drm_stub.c | 2 +- 4 files changed, 15 insertions(+), 3 deletions(-) -- 1.7.9.5 ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 1/1] drm -- stop early access to drm devices 2012-04-19 16:22 [PATCH 0/1] [RFC] DRM locking issues during early open Andy Whitcroft @ 2012-04-19 16:22 ` Andy Whitcroft 2012-04-19 16:30 ` [PATCH 0/1] [RFC] DRM locking issues during early open Dave Airlie 2012-04-20 9:40 ` Dave Airlie 2 siblings, 0 replies; 16+ messages in thread From: Andy Whitcroft @ 2012-04-19 16:22 UTC (permalink / raw) To: Andy Whitcroft, David Airlie, dri-devel Cc: Jesse Barnes, Bryce Harrington, linux-kernel When a drm driver is initialised we first allocate and initialise the drm minor numbers including creating the sysfs files, then we trigger the driver load method. The act of creating the sysfs files triggers the uevent. This means udev may start programs which open /dev/dri/card0 and other interfaces, this can occur before the load method has even started and thus before the driver has fully initialised its data structures. In the case of plymouthd this leads to it opening and closing (in disgust) the interface, which in turn leads to a kernel panic as the mutexes are yet to be initialised. This patch delays the linking up of the drm devices minor numbers until the driver is fully initialised. As it is possible for consumers of these interfaces to reach them before they are fully initialised we arrange for opens of these devices to return EAGAIN until the device is fully initialised. Signed-off-by: Andy Whitcroft <apw@canonical.com> --- drivers/gpu/drm/drm_fops.c | 8 ++++++-- drivers/gpu/drm/drm_pci.c | 4 ++++ drivers/gpu/drm/drm_platform.c | 4 ++++ drivers/gpu/drm/drm_stub.c | 2 +- 4 files changed, 15 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/drm_fops.c b/drivers/gpu/drm/drm_fops.c index cdfbf27..b415ef0 100644 --- a/drivers/gpu/drm/drm_fops.c +++ b/drivers/gpu/drm/drm_fops.c @@ -129,7 +129,8 @@ int drm_open(struct inode *inode, struct file *filp) minor = idr_find(&drm_minors_idr, minor_id); if (!minor) return -ENODEV; - + if (IS_ERR(minor)) + return PTR_ERR(minor); if (!(dev = minor->dev)) return -ENODEV; @@ -180,7 +181,10 @@ int drm_stub_open(struct inode *inode, struct file *filp) minor = idr_find(&drm_minors_idr, minor_id); if (!minor) goto out; - + if (IS_ERR(minor)) { + err = PTR_ERR(minor); + goto out; + } if (!(dev = minor->dev)) goto out; diff --git a/drivers/gpu/drm/drm_pci.c b/drivers/gpu/drm/drm_pci.c index 13f3d93..b321672 100644 --- a/drivers/gpu/drm/drm_pci.c +++ b/drivers/gpu/drm/drm_pci.c @@ -367,6 +367,10 @@ int drm_get_pci_dev(struct pci_dev *pdev, const struct pci_device_id *ent, list_add_tail(&dev->driver_item, &driver->device_list); + if (drm_core_check_feature(dev, DRIVER_MODESET)) + idr_replace(&drm_minors_idr, dev->control, dev->control->index); + idr_replace(&drm_minors_idr, dev->primary, dev->primary->index); + DRM_INFO("Initialized %s %d.%d.%d %s for %s on minor %d\n", driver->name, driver->major, driver->minor, driver->patchlevel, driver->date, pci_name(pdev), dev->primary->index); diff --git a/drivers/gpu/drm/drm_platform.c b/drivers/gpu/drm/drm_platform.c index 82431dc..d749389 100644 --- a/drivers/gpu/drm/drm_platform.c +++ b/drivers/gpu/drm/drm_platform.c @@ -90,6 +90,10 @@ int drm_get_platform_dev(struct platform_device *platdev, list_add_tail(&dev->driver_item, &driver->device_list); + if (drm_core_check_feature(dev, DRIVER_MODESET)) + idr_replace(&drm_minors_idr, dev->control, dev->control->index); + idr_replace(&drm_minors_idr, dev->primary, dev->primary->index); + mutex_unlock(&drm_global_mutex); DRM_INFO("Initialized %s %d.%d.%d %s on minor %d\n", diff --git a/drivers/gpu/drm/drm_stub.c b/drivers/gpu/drm/drm_stub.c index aa454f8..6c32781 100644 --- a/drivers/gpu/drm/drm_stub.c +++ b/drivers/gpu/drm/drm_stub.c @@ -357,7 +357,7 @@ int drm_get_minor(struct drm_device *dev, struct drm_minor **minor, int type) new_minor->index = minor_id; INIT_LIST_HEAD(&new_minor->master_list); - idr_replace(&drm_minors_idr, new_minor, minor_id); + idr_replace(&drm_minors_idr, ERR_PTR(-EAGAIN), minor_id); if (type == DRM_MINOR_LEGACY) { ret = drm_proc_init(new_minor, minor_id, drm_proc_root); -- 1.7.9.5 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 0/1] [RFC] DRM locking issues during early open 2012-04-19 16:22 [PATCH 0/1] [RFC] DRM locking issues during early open Andy Whitcroft 2012-04-19 16:22 ` [PATCH 1/1] drm -- stop early access to drm devices Andy Whitcroft @ 2012-04-19 16:30 ` Dave Airlie 2012-04-19 16:41 ` Andy Whitcroft 2012-04-19 16:41 ` Daniel Vetter 2012-04-20 9:40 ` Dave Airlie 2 siblings, 2 replies; 16+ messages in thread From: Dave Airlie @ 2012-04-19 16:30 UTC (permalink / raw) To: Andy Whitcroft Cc: David Airlie, dri-devel, Jesse Barnes, Bryce Harrington, linux-kernel On Thu, Apr 19, 2012 at 5:22 PM, Andy Whitcroft <apw@canonical.com> wrote: > We have been carrying a (rather poor) patch for an issue we identified in > the DRM driver. This issue is triggered when a DRM device is initialising > and userspace attempts to open it, typically in response to the sysfs > device added event. Basically we allocate the minor numbers making > the device available, and then call the drm load callback. Until this > completes the device is really not ready and these early opens typically > lead to oopses. > > We have been using the following patch to avoid this by marking the minors > as in error until the load method has completed. This avoids the early > open by simply erroring out the opens with EAGAIN. Obviously we should > be delaying the open until the load method complete. > > I include the existing patch for completness (it is not really ready for > merging) to illustrate the issue. I think it is logical that the wait > should simply be delayed until the load has completed. I am proposing > to include a wait queue associated with the idr cache for the drm minors > which we can use to allow open callers to wait_event_interruptible() on. > I'll be putting together a prototype shortly and will follow up with it. > > Thoughts? Couldn't we just delay registering things until the driver is ready to accept an open? Granted the midlayer of drm doesn't make that easy, thanks for sending this out, it keeps falling off my radar, I don't think I've ever seen this reported on RHEL/Fedora, which makes me wonder what we are doing that makes us lucky. Dave. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/1] [RFC] DRM locking issues during early open 2012-04-19 16:30 ` [PATCH 0/1] [RFC] DRM locking issues during early open Dave Airlie @ 2012-04-19 16:41 ` Andy Whitcroft 2012-04-19 16:47 ` Dave Airlie 2012-04-19 16:41 ` Daniel Vetter 1 sibling, 1 reply; 16+ messages in thread From: Andy Whitcroft @ 2012-04-19 16:41 UTC (permalink / raw) To: Dave Airlie Cc: David Airlie, dri-devel, Jesse Barnes, Bryce Harrington, linux-kernel On Thu, Apr 19, 2012 at 05:30:03PM +0100, Dave Airlie wrote: > On Thu, Apr 19, 2012 at 5:22 PM, Andy Whitcroft <apw@canonical.com> wrote: > > We have been carrying a (rather poor) patch for an issue we identified in > > the DRM driver. This issue is triggered when a DRM device is initialising > > and userspace attempts to open it, typically in response to the sysfs > > device added event. Basically we allocate the minor numbers making > > the device available, and then call the drm load callback. Until this > > completes the device is really not ready and these early opens typically > > lead to oopses. > > > > We have been using the following patch to avoid this by marking the minors > > as in error until the load method has completed. This avoids the early > > open by simply erroring out the opens with EAGAIN. Obviously we should > > be delaying the open until the load method complete. > > > > I include the existing patch for completness (it is not really ready for > > merging) to illustrate the issue. I think it is logical that the wait > > should simply be delayed until the load has completed. I am proposing > > to include a wait queue associated with the idr cache for the drm minors > > which we can use to allow open callers to wait_event_interruptible() on. > > I'll be putting together a prototype shortly and will follow up with it. > > > > Thoughts? > > Couldn't we just delay registering things until the driver is ready to > accept an open? > > Granted the midlayer of drm doesn't make that easy, It seems that we need the dri minor allocated before we hit the load function as things are done right now. > thanks for sending this out, it keeps falling off my radar, I don't > think I've ever seen this reported on RHEL/Fedora, which makes me > wonder what we are doing that makes us lucky. We never hit it until we started doing things earlier and quicker. I first found it in the prettification of boot so we were keen to get plymouth running as soon as possible. That lead to random panics and me finding this bug. The window is tiny as far as I know and it tends to be specific machines and specific package combinations which trigger it reliably. I suspect that a proper fix would allow delaying the registration as you suggest but in the interim a wait would at least avoid the issues we are seeing. I will see how awful it looks. -apw ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/1] [RFC] DRM locking issues during early open 2012-04-19 16:41 ` Andy Whitcroft @ 2012-04-19 16:47 ` Dave Airlie 0 siblings, 0 replies; 16+ messages in thread From: Dave Airlie @ 2012-04-19 16:47 UTC (permalink / raw) To: Andy Whitcroft Cc: David Airlie, dri-devel, Jesse Barnes, Bryce Harrington, linux-kernel On Thu, Apr 19, 2012 at 5:41 PM, Andy Whitcroft <apw@canonical.com> wrote: > On Thu, Apr 19, 2012 at 05:30:03PM +0100, Dave Airlie wrote: >> On Thu, Apr 19, 2012 at 5:22 PM, Andy Whitcroft <apw@canonical.com> wrote: >> > We have been carrying a (rather poor) patch for an issue we identified in >> > the DRM driver. This issue is triggered when a DRM device is initialising >> > and userspace attempts to open it, typically in response to the sysfs >> > device added event. Basically we allocate the minor numbers making >> > the device available, and then call the drm load callback. Until this >> > completes the device is really not ready and these early opens typically >> > lead to oopses. >> > >> > We have been using the following patch to avoid this by marking the minors >> > as in error until the load method has completed. This avoids the early >> > open by simply erroring out the opens with EAGAIN. Obviously we should >> > be delaying the open until the load method complete. >> > >> > I include the existing patch for completness (it is not really ready for >> > merging) to illustrate the issue. I think it is logical that the wait >> > should simply be delayed until the load has completed. I am proposing >> > to include a wait queue associated with the idr cache for the drm minors >> > which we can use to allow open callers to wait_event_interruptible() on. >> > I'll be putting together a prototype shortly and will follow up with it. >> > >> > Thoughts? >> >> Couldn't we just delay registering things until the driver is ready to >> accept an open? >> >> Granted the midlayer of drm doesn't make that easy, > > It seems that we need the dri minor allocated before we hit the load > function as things are done right now. > >> thanks for sending this out, it keeps falling off my radar, I don't >> think I've ever seen this reported on RHEL/Fedora, which makes me >> wonder what we are doing that makes us lucky. > > We never hit it until we started doing things earlier and quicker. I first > found it in the prettification of boot so we were keen to get plymouth > running as soon as possible. That lead to random panics and me finding > this bug. The window is tiny as far as I know and it tends to be specific > machines and specific package combinations which trigger it reliably. > > I suspect that a proper fix would allow delaying the registration as you > suggest but in the interim a wait would at least avoid the issues we are > seeing. I will see how awful it looks. Just to confirm its the drm_sysfs_device_add that causes the race we care about. it needs to happen after the driver is happy. Since it calls device_register and that is what triggers udev magic to load the userspace. If you have a userspace app banging on a static device node that might need another set of fun fixes. Dave. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/1] [RFC] DRM locking issues during early open @ 2012-04-19 16:47 ` Dave Airlie 0 siblings, 0 replies; 16+ messages in thread From: Dave Airlie @ 2012-04-19 16:47 UTC (permalink / raw) To: Andy Whitcroft; +Cc: dri-devel, linux-kernel On Thu, Apr 19, 2012 at 5:41 PM, Andy Whitcroft <apw@canonical.com> wrote: > On Thu, Apr 19, 2012 at 05:30:03PM +0100, Dave Airlie wrote: >> On Thu, Apr 19, 2012 at 5:22 PM, Andy Whitcroft <apw@canonical.com> wrote: >> > We have been carrying a (rather poor) patch for an issue we identified in >> > the DRM driver. This issue is triggered when a DRM device is initialising >> > and userspace attempts to open it, typically in response to the sysfs >> > device added event. Basically we allocate the minor numbers making >> > the device available, and then call the drm load callback. Until this >> > completes the device is really not ready and these early opens typically >> > lead to oopses. >> > >> > We have been using the following patch to avoid this by marking the minors >> > as in error until the load method has completed. This avoids the early >> > open by simply erroring out the opens with EAGAIN. Obviously we should >> > be delaying the open until the load method complete. >> > >> > I include the existing patch for completness (it is not really ready for >> > merging) to illustrate the issue. I think it is logical that the wait >> > should simply be delayed until the load has completed. I am proposing >> > to include a wait queue associated with the idr cache for the drm minors >> > which we can use to allow open callers to wait_event_interruptible() on. >> > I'll be putting together a prototype shortly and will follow up with it. >> > >> > Thoughts? >> >> Couldn't we just delay registering things until the driver is ready to >> accept an open? >> >> Granted the midlayer of drm doesn't make that easy, > > It seems that we need the dri minor allocated before we hit the load > function as things are done right now. > >> thanks for sending this out, it keeps falling off my radar, I don't >> think I've ever seen this reported on RHEL/Fedora, which makes me >> wonder what we are doing that makes us lucky. > > We never hit it until we started doing things earlier and quicker. I first > found it in the prettification of boot so we were keen to get plymouth > running as soon as possible. That lead to random panics and me finding > this bug. The window is tiny as far as I know and it tends to be specific > machines and specific package combinations which trigger it reliably. > > I suspect that a proper fix would allow delaying the registration as you > suggest but in the interim a wait would at least avoid the issues we are > seeing. I will see how awful it looks. Just to confirm its the drm_sysfs_device_add that causes the race we care about. it needs to happen after the driver is happy. Since it calls device_register and that is what triggers udev magic to load the userspace. If you have a userspace app banging on a static device node that might need another set of fun fixes. Dave. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/1] [RFC] DRM locking issues during early open 2012-04-19 16:47 ` Dave Airlie (?) @ 2012-04-19 16:52 ` Dave Airlie 2012-04-19 16:55 ` Jesse Barnes -1 siblings, 1 reply; 16+ messages in thread From: Dave Airlie @ 2012-04-19 16:52 UTC (permalink / raw) To: Andy Whitcroft Cc: David Airlie, dri-devel, Jesse Barnes, Bryce Harrington, linux-kernel On Thu, Apr 19, 2012 at 5:47 PM, Dave Airlie <airlied@gmail.com> wrote: > On Thu, Apr 19, 2012 at 5:41 PM, Andy Whitcroft <apw@canonical.com> wrote: >> On Thu, Apr 19, 2012 at 05:30:03PM +0100, Dave Airlie wrote: >>> On Thu, Apr 19, 2012 at 5:22 PM, Andy Whitcroft <apw@canonical.com> wrote: >>> > We have been carrying a (rather poor) patch for an issue we identified in >>> > the DRM driver. This issue is triggered when a DRM device is initialising >>> > and userspace attempts to open it, typically in response to the sysfs >>> > device added event. Basically we allocate the minor numbers making >>> > the device available, and then call the drm load callback. Until this >>> > completes the device is really not ready and these early opens typically >>> > lead to oopses. >>> > >>> > We have been using the following patch to avoid this by marking the minors >>> > as in error until the load method has completed. This avoids the early >>> > open by simply erroring out the opens with EAGAIN. Obviously we should >>> > be delaying the open until the load method complete. >>> > >>> > I include the existing patch for completness (it is not really ready for >>> > merging) to illustrate the issue. I think it is logical that the wait >>> > should simply be delayed until the load has completed. I am proposing >>> > to include a wait queue associated with the idr cache for the drm minors >>> > which we can use to allow open callers to wait_event_interruptible() on. >>> > I'll be putting together a prototype shortly and will follow up with it. >>> > >>> > Thoughts? >>> >>> Couldn't we just delay registering things until the driver is ready to >>> accept an open? >>> >>> Granted the midlayer of drm doesn't make that easy, >> >> It seems that we need the dri minor allocated before we hit the load >> function as things are done right now. >> >>> thanks for sending this out, it keeps falling off my radar, I don't >>> think I've ever seen this reported on RHEL/Fedora, which makes me >>> wonder what we are doing that makes us lucky. >> >> We never hit it until we started doing things earlier and quicker. I first >> found it in the prettification of boot so we were keen to get plymouth >> running as soon as possible. That lead to random panics and me finding >> this bug. The window is tiny as far as I know and it tends to be specific >> machines and specific package combinations which trigger it reliably. >> >> I suspect that a proper fix would allow delaying the registration as you >> suggest but in the interim a wait would at least avoid the issues we are >> seeing. I will see how awful it looks. > > Just to confirm its the drm_sysfs_device_add that causes the race we care about. > > it needs to happen after the driver is happy. Since it calls > device_register and that is what triggers udev magic to load the > userspace. > > If you have a userspace app banging on a static device node that might > need another set of fun fixes. Okay the sysfs add and the idr_replace are the things we need to delay. Dave. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/1] [RFC] DRM locking issues during early open 2012-04-19 16:52 ` Dave Airlie @ 2012-04-19 16:55 ` Jesse Barnes 2012-04-19 16:56 ` Dave Airlie 0 siblings, 1 reply; 16+ messages in thread From: Jesse Barnes @ 2012-04-19 16:55 UTC (permalink / raw) To: Dave Airlie Cc: Andy Whitcroft, David Airlie, dri-devel, Bryce Harrington, linux-kernel On Thu, 19 Apr 2012 17:52:39 +0100 Dave Airlie <airlied@gmail.com> wrote: > On Thu, Apr 19, 2012 at 5:47 PM, Dave Airlie <airlied@gmail.com> wrote: > > On Thu, Apr 19, 2012 at 5:41 PM, Andy Whitcroft <apw@canonical.com> wrote: > >> On Thu, Apr 19, 2012 at 05:30:03PM +0100, Dave Airlie wrote: > >>> On Thu, Apr 19, 2012 at 5:22 PM, Andy Whitcroft <apw@canonical.com> wrote: > >>> > We have been carrying a (rather poor) patch for an issue we identified in > >>> > the DRM driver. This issue is triggered when a DRM device is initialising > >>> > and userspace attempts to open it, typically in response to the sysfs > >>> > device added event. Basically we allocate the minor numbers making > >>> > the device available, and then call the drm load callback. Until this > >>> > completes the device is really not ready and these early opens typically > >>> > lead to oopses. > >>> > > >>> > We have been using the following patch to avoid this by marking the minors > >>> > as in error until the load method has completed. This avoids the early > >>> > open by simply erroring out the opens with EAGAIN. Obviously we should > >>> > be delaying the open until the load method complete. > >>> > > >>> > I include the existing patch for completness (it is not really ready for > >>> > merging) to illustrate the issue. I think it is logical that the wait > >>> > should simply be delayed until the load has completed. I am proposing > >>> > to include a wait queue associated with the idr cache for the drm minors > >>> > which we can use to allow open callers to wait_event_interruptible() on. > >>> > I'll be putting together a prototype shortly and will follow up with it. > >>> > > >>> > Thoughts? > >>> > >>> Couldn't we just delay registering things until the driver is ready to > >>> accept an open? > >>> > >>> Granted the midlayer of drm doesn't make that easy, > >> > >> It seems that we need the dri minor allocated before we hit the load > >> function as things are done right now. > >> > >>> thanks for sending this out, it keeps falling off my radar, I don't > >>> think I've ever seen this reported on RHEL/Fedora, which makes me > >>> wonder what we are doing that makes us lucky. > >> > >> We never hit it until we started doing things earlier and quicker. I first > >> found it in the prettification of boot so we were keen to get plymouth > >> running as soon as possible. That lead to random panics and me finding > >> this bug. The window is tiny as far as I know and it tends to be specific > >> machines and specific package combinations which trigger it reliably. > >> > >> I suspect that a proper fix would allow delaying the registration as you > >> suggest but in the interim a wait would at least avoid the issues we are > >> seeing. I will see how awful it looks. > > > > Just to confirm its the drm_sysfs_device_add that causes the race we care about. > > > > it needs to happen after the driver is happy. Since it calls > > device_register and that is what triggers udev magic to load the > > userspace. > > > > If you have a userspace app banging on a static device node that might > > need another set of fun fixes. > > Okay the sysfs add and the idr_replace are the things we need to delay. Since you can still get at things with a static node, it seems like locking is the real issue here? Is there no mutex we can take across init to block any openers until we're done? -- Jesse Barnes, Intel Open Source Technology Center ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/1] [RFC] DRM locking issues during early open 2012-04-19 16:55 ` Jesse Barnes @ 2012-04-19 16:56 ` Dave Airlie 2012-04-19 17:00 ` Dave Airlie 0 siblings, 1 reply; 16+ messages in thread From: Dave Airlie @ 2012-04-19 16:56 UTC (permalink / raw) To: Jesse Barnes Cc: Andy Whitcroft, David Airlie, dri-devel, Bryce Harrington, linux-kernel On Thu, Apr 19, 2012 at 5:55 PM, Jesse Barnes <jbarnes@virtuousgeek.org> wrote: > On Thu, 19 Apr 2012 17:52:39 +0100 > Dave Airlie <airlied@gmail.com> wrote: > >> On Thu, Apr 19, 2012 at 5:47 PM, Dave Airlie <airlied@gmail.com> wrote: >> > On Thu, Apr 19, 2012 at 5:41 PM, Andy Whitcroft <apw@canonical.com> wrote: >> >> On Thu, Apr 19, 2012 at 05:30:03PM +0100, Dave Airlie wrote: >> >>> On Thu, Apr 19, 2012 at 5:22 PM, Andy Whitcroft <apw@canonical.com> wrote: >> >>> > We have been carrying a (rather poor) patch for an issue we identified in >> >>> > the DRM driver. This issue is triggered when a DRM device is initialising >> >>> > and userspace attempts to open it, typically in response to the sysfs >> >>> > device added event. Basically we allocate the minor numbers making >> >>> > the device available, and then call the drm load callback. Until this >> >>> > completes the device is really not ready and these early opens typically >> >>> > lead to oopses. >> >>> > >> >>> > We have been using the following patch to avoid this by marking the minors >> >>> > as in error until the load method has completed. This avoids the early >> >>> > open by simply erroring out the opens with EAGAIN. Obviously we should >> >>> > be delaying the open until the load method complete. >> >>> > >> >>> > I include the existing patch for completness (it is not really ready for >> >>> > merging) to illustrate the issue. I think it is logical that the wait >> >>> > should simply be delayed until the load has completed. I am proposing >> >>> > to include a wait queue associated with the idr cache for the drm minors >> >>> > which we can use to allow open callers to wait_event_interruptible() on. >> >>> > I'll be putting together a prototype shortly and will follow up with it. >> >>> > >> >>> > Thoughts? >> >>> >> >>> Couldn't we just delay registering things until the driver is ready to >> >>> accept an open? >> >>> >> >>> Granted the midlayer of drm doesn't make that easy, >> >> >> >> It seems that we need the dri minor allocated before we hit the load >> >> function as things are done right now. >> >> >> >>> thanks for sending this out, it keeps falling off my radar, I don't >> >>> think I've ever seen this reported on RHEL/Fedora, which makes me >> >>> wonder what we are doing that makes us lucky. >> >> >> >> We never hit it until we started doing things earlier and quicker. I first >> >> found it in the prettification of boot so we were keen to get plymouth >> >> running as soon as possible. That lead to random panics and me finding >> >> this bug. The window is tiny as far as I know and it tends to be specific >> >> machines and specific package combinations which trigger it reliably. >> >> >> >> I suspect that a proper fix would allow delaying the registration as you >> >> suggest but in the interim a wait would at least avoid the issues we are >> >> seeing. I will see how awful it looks. >> > >> > Just to confirm its the drm_sysfs_device_add that causes the race we care about. >> > >> > it needs to happen after the driver is happy. Since it calls >> > device_register and that is what triggers udev magic to load the >> > userspace. >> > >> > If you have a userspace app banging on a static device node that might >> > need another set of fun fixes. >> >> Okay the sysfs add and the idr_replace are the things we need to delay. > > Since you can still get at things with a static node, it seems like > locking is the real issue here? Is there no mutex we can take across > init to block any openers until we're done? well the idr replace should be the thing that matters, since before that openers get -ENODEV, after it they end up success. we may need a lock around that once we fix the logic. Dave. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/1] [RFC] DRM locking issues during early open 2012-04-19 16:56 ` Dave Airlie @ 2012-04-19 17:00 ` Dave Airlie 0 siblings, 0 replies; 16+ messages in thread From: Dave Airlie @ 2012-04-19 17:00 UTC (permalink / raw) To: Jesse Barnes Cc: Andy Whitcroft, David Airlie, dri-devel, Bryce Harrington, linux-kernel [-- Attachment #1: Type: text/plain, Size: 3830 bytes --] >> >>> On Thu, Apr 19, 2012 at 5:47 PM, Dave Airlie <airlied@gmail.com> wrote: >>> > On Thu, Apr 19, 2012 at 5:41 PM, Andy Whitcroft <apw@canonical.com> wrote: >>> >> On Thu, Apr 19, 2012 at 05:30:03PM +0100, Dave Airlie wrote: >>> >>> On Thu, Apr 19, 2012 at 5:22 PM, Andy Whitcroft <apw@canonical.com> wrote: >>> >>> > We have been carrying a (rather poor) patch for an issue we identified in >>> >>> > the DRM driver. This issue is triggered when a DRM device is initialising >>> >>> > and userspace attempts to open it, typically in response to the sysfs >>> >>> > device added event. Basically we allocate the minor numbers making >>> >>> > the device available, and then call the drm load callback. Until this >>> >>> > completes the device is really not ready and these early opens typically >>> >>> > lead to oopses. >>> >>> > >>> >>> > We have been using the following patch to avoid this by marking the minors >>> >>> > as in error until the load method has completed. This avoids the early >>> >>> > open by simply erroring out the opens with EAGAIN. Obviously we should >>> >>> > be delaying the open until the load method complete. >>> >>> > >>> >>> > I include the existing patch for completness (it is not really ready for >>> >>> > merging) to illustrate the issue. I think it is logical that the wait >>> >>> > should simply be delayed until the load has completed. I am proposing >>> >>> > to include a wait queue associated with the idr cache for the drm minors >>> >>> > which we can use to allow open callers to wait_event_interruptible() on. >>> >>> > I'll be putting together a prototype shortly and will follow up with it. >>> >>> > >>> >>> > Thoughts? >>> >>> >>> >>> Couldn't we just delay registering things until the driver is ready to >>> >>> accept an open? >>> >>> >>> >>> Granted the midlayer of drm doesn't make that easy, >>> >> >>> >> It seems that we need the dri minor allocated before we hit the load >>> >> function as things are done right now. >>> >> >>> >>> thanks for sending this out, it keeps falling off my radar, I don't >>> >>> think I've ever seen this reported on RHEL/Fedora, which makes me >>> >>> wonder what we are doing that makes us lucky. >>> >> >>> >> We never hit it until we started doing things earlier and quicker. I first >>> >> found it in the prettification of boot so we were keen to get plymouth >>> >> running as soon as possible. That lead to random panics and me finding >>> >> this bug. The window is tiny as far as I know and it tends to be specific >>> >> machines and specific package combinations which trigger it reliably. >>> >> >>> >> I suspect that a proper fix would allow delaying the registration as you >>> >> suggest but in the interim a wait would at least avoid the issues we are >>> >> seeing. I will see how awful it looks. >>> > >>> > Just to confirm its the drm_sysfs_device_add that causes the race we care about. >>> > >>> > it needs to happen after the driver is happy. Since it calls >>> > device_register and that is what triggers udev magic to load the >>> > userspace. >>> > >>> > If you have a userspace app banging on a static device node that might >>> > need another set of fun fixes. >>> >>> Okay the sysfs add and the idr_replace are the things we need to delay. >> >> Since you can still get at things with a static node, it seems like >> locking is the real issue here? Is there no mutex we can take across >> init to block any openers until we're done? > > well the idr replace should be the thing that matters, since before > that openers get -ENODEV, after it they end up success. > we may need a lock around that once we fix the logic.\ Here's my predinner hack, contains random rtl change as well, plz ignore. now for dinner. Dave. [-- Attachment #2: myhack --] [-- Type: application/octet-stream, Size: 2821 bytes --] diff --git a/drivers/gpu/drm/drm_pci.c b/drivers/gpu/drm/drm_pci.c index 13f3d93..23b472b 100644 --- a/drivers/gpu/drm/drm_pci.c +++ b/drivers/gpu/drm/drm_pci.c @@ -367,6 +367,11 @@ int drm_get_pci_dev(struct pci_dev *pdev, const struct pci_device_id *ent, list_add_tail(&dev->driver_item, &driver->device_list); + if (drm_core_check_feature(dev, DRIVER_MODESET)) { + drm_activate_minor(&dev->control); + } + drm_activate_minor(&dev->primary); + DRM_INFO("Initialized %s %d.%d.%d %s for %s on minor %d\n", driver->name, driver->major, driver->minor, driver->patchlevel, driver->date, pci_name(pdev), dev->primary->index); diff --git a/drivers/gpu/drm/drm_stub.c b/drivers/gpu/drm/drm_stub.c index aa454f8..703c05a 100644 --- a/drivers/gpu/drm/drm_stub.c +++ b/drivers/gpu/drm/drm_stub.c @@ -357,7 +357,7 @@ int drm_get_minor(struct drm_device *dev, struct drm_minor **minor, int type) new_minor->index = minor_id; INIT_LIST_HEAD(&new_minor->master_list); - idr_replace(&drm_minors_idr, new_minor, minor_id); + //idr_replace(&drm_minors_idr, new_minor, minor_id); if (type == DRM_MINOR_LEGACY) { ret = drm_proc_init(new_minor, minor_id, drm_proc_root); @@ -375,13 +375,14 @@ int drm_get_minor(struct drm_device *dev, struct drm_minor **minor, int type) goto err_g2; } #endif - +#if 0 ret = drm_sysfs_device_add(new_minor); if (ret) { printk(KERN_ERR "DRM: Error sysfs_device_add.\n"); goto err_g2; } +#endif *minor = new_minor; DRM_DEBUG("new minor assigned %d\n", minor_id); @@ -400,6 +401,18 @@ err_idr: } EXPORT_SYMBOL(drm_get_minor); +int drm_activate_minor(struct drm_minor *minor) +{ + int ret; + idr_replace(&drm_minors_idr, minor, minor->index); + ret = drm_sysfs_device_add(minor); + if (ret) { + printk(KERN_ERR "DRM: Error sysfs_device_add.\n"); + } + return ret; +} +EXPORT_SYMBOL(drm_activate_minor); + /** * Put a secondary minor number. * diff --git a/drivers/net/wireless/rtlwifi/pci.c b/drivers/net/wireless/rtlwifi/pci.c index 288b035..cc15fdb 100644 --- a/drivers/net/wireless/rtlwifi/pci.c +++ b/drivers/net/wireless/rtlwifi/pci.c @@ -1941,6 +1941,7 @@ void rtl_pci_disconnect(struct pci_dev *pdev) rtl_deinit_deferred_work(hw); rtlpriv->intf_ops->adapter_stop(hw); } + rtlpriv->cfg->ops->disable_interrupt(hw); /*deinit rfkill */ rtl_deinit_rfkill(hw); diff --git a/include/drm/drmP.h b/include/drm/drmP.h index dd73104..3a5606f 100644 --- a/include/drm/drmP.h +++ b/include/drm/drmP.h @@ -1506,6 +1506,7 @@ extern void drm_master_put(struct drm_master **master); extern void drm_put_dev(struct drm_device *dev); extern int drm_put_minor(struct drm_minor **minor); +extern int drm_activate_minor(struct drm_minor *minor); extern void drm_unplug_dev(struct drm_device *dev); extern unsigned int drm_debug; ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 0/1] [RFC] DRM locking issues during early open 2012-04-19 16:30 ` [PATCH 0/1] [RFC] DRM locking issues during early open Dave Airlie 2012-04-19 16:41 ` Andy Whitcroft @ 2012-04-19 16:41 ` Daniel Vetter 1 sibling, 0 replies; 16+ messages in thread From: Daniel Vetter @ 2012-04-19 16:41 UTC (permalink / raw) To: Dave Airlie; +Cc: Andy Whitcroft, dri-devel, linux-kernel On Thu, Apr 19, 2012 at 05:30:03PM +0100, Dave Airlie wrote: > On Thu, Apr 19, 2012 at 5:22 PM, Andy Whitcroft <apw@canonical.com> wrote: > > We have been carrying a (rather poor) patch for an issue we identified in > > the DRM driver. This issue is triggered when a DRM device is initialising > > and userspace attempts to open it, typically in response to the sysfs > > device added event. Basically we allocate the minor numbers making > > the device available, and then call the drm load callback. Until this > > completes the device is really not ready and these early opens typically > > lead to oopses. > > > > We have been using the following patch to avoid this by marking the minors > > as in error until the load method has completed. This avoids the early > > open by simply erroring out the opens with EAGAIN. Obviously we should > > be delaying the open until the load method complete. > > > > I include the existing patch for completness (it is not really ready for > > merging) to illustrate the issue. I think it is logical that the wait > > should simply be delayed until the load has completed. I am proposing > > to include a wait queue associated with the idr cache for the drm minors > > which we can use to allow open callers to wait_event_interruptible() on. > > I'll be putting together a prototype shortly and will follow up with it. > > > > Thoughts? > > Couldn't we just delay registering things until the driver is ready to > accept an open? It's somewhere on my eternal&epic todo list. > Granted the midlayer of drm doesn't make that easy, ... after fixing this one ;-) > thanks for sending this out, it keeps falling off my radar, I don't > think I've ever seen this reported on RHEL/Fedora, which makes me > wonder what we are doing that makes us lucky. I think it's just a matter of races, if you load the drm module early enough (like fedora does already in the initrd) and ensure that nothing pokes drm devices for a few seconds, you'll be fine. Iirc ubuntus powerd stuff is really got at brining everything down. Also, not loading the module with udev, but loading it with X resulted in nice fireworks last time I've tried that (radeon ums was trying to set up the card while the kms code was doing the same, hilarity ensued). -Daniel -- Daniel Vetter Mail: daniel@ffwll.ch Mobile: +41 (0)79 365 57 48 ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/1] [RFC] DRM locking issues during early open 2012-04-19 16:22 [PATCH 0/1] [RFC] DRM locking issues during early open Andy Whitcroft 2012-04-19 16:22 ` [PATCH 1/1] drm -- stop early access to drm devices Andy Whitcroft 2012-04-19 16:30 ` [PATCH 0/1] [RFC] DRM locking issues during early open Dave Airlie @ 2012-04-20 9:40 ` Dave Airlie 2012-04-20 10:31 ` Andy Whitcroft 2 siblings, 1 reply; 16+ messages in thread From: Dave Airlie @ 2012-04-20 9:40 UTC (permalink / raw) To: Andy Whitcroft Cc: David Airlie, dri-devel, Jesse Barnes, Bryce Harrington, linux-kernel On Thu, Apr 19, 2012 at 5:22 PM, Andy Whitcroft <apw@canonical.com> wrote: > We have been carrying a (rather poor) patch for an issue we identified in > the DRM driver. This issue is triggered when a DRM device is initialising > and userspace attempts to open it, typically in response to the sysfs > device added event. Basically we allocate the minor numbers making > the device available, and then call the drm load callback. Until this > completes the device is really not ready and these early opens typically > lead to oopses. > > We have been using the following patch to avoid this by marking the minors > as in error until the load method has completed. This avoids the early > open by simply erroring out the opens with EAGAIN. Obviously we should > be delaying the open until the load method complete. > > I include the existing patch for completness (it is not really ready for > merging) to illustrate the issue. I think it is logical that the wait > should simply be delayed until the load has completed. I am proposing > to include a wait queue associated with the idr cache for the drm minors > which we can use to allow open callers to wait_event_interruptible() on. > I'll be putting together a prototype shortly and will follow up with it. > > Thoughts? I've just revisited this, maybe I'm going insane but why does drm_global_mutex not stop this? drm_get_pci_dev takes drm_global_mutex before calling drm_fill_in_dev and drm_get_minor. Now the fops should be pointing at stub_open at this point, as we won't have switched to the per device fops yet, and one of the first things drm_stub_open does is take the drm_global_mutex before doing the idr lookup. So is the problem opening some sysfs or proc file early? Dave. > > -apw > > Andy Whitcroft (1): > drm -- stop early access to drm devices > > drivers/gpu/drm/drm_fops.c | 8 ++++++-- > drivers/gpu/drm/drm_pci.c | 4 ++++ > drivers/gpu/drm/drm_platform.c | 4 ++++ > drivers/gpu/drm/drm_stub.c | 2 +- > 4 files changed, 15 insertions(+), 3 deletions(-) > > -- > 1.7.9.5 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/1] [RFC] DRM locking issues during early open 2012-04-20 9:40 ` Dave Airlie @ 2012-04-20 10:31 ` Andy Whitcroft 2012-04-20 10:34 ` Dave Airlie 0 siblings, 1 reply; 16+ messages in thread From: Andy Whitcroft @ 2012-04-20 10:31 UTC (permalink / raw) To: Dave Airlie Cc: David Airlie, dri-devel, Jesse Barnes, Bryce Harrington, linux-kernel On Fri, Apr 20, 2012 at 10:40:35AM +0100, Dave Airlie wrote: > I've just revisited this, maybe I'm going insane but why does > drm_global_mutex not stop this? > > drm_get_pci_dev takes drm_global_mutex before calling drm_fill_in_dev > and drm_get_minor. > > Now the fops should be pointing at stub_open at this point, as we > won't have switched to the per device fops yet, > and one of the first things drm_stub_open does is take the > drm_global_mutex before doing the idr lookup. > > So is the problem opening some sysfs or proc file early? I may be reading things wrong but the initialisation does indeed hold drm_global_mutex, but and back when this first occured we would have been using kernel_lock() which was at least partially reentrant right? Anyhow, I will go back to the reporter and try and get a proper reproduce by, there is no point in fixing something which is something else. -apw ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/1] [RFC] DRM locking issues during early open 2012-04-20 10:31 ` Andy Whitcroft @ 2012-04-20 10:34 ` Dave Airlie 2012-04-20 17:25 ` Andy Whitcroft 0 siblings, 1 reply; 16+ messages in thread From: Dave Airlie @ 2012-04-20 10:34 UTC (permalink / raw) To: Andy Whitcroft Cc: David Airlie, dri-devel, Jesse Barnes, Bryce Harrington, linux-kernel > > I may be reading things wrong but the initialisation does indeed hold > drm_global_mutex, but and back when this first occured we would have > been using kernel_lock() which was at least partially reentrant right? Yup if we slept with the BKL held we'd have allowed others to get past it, but also I introduced the global mutex in pci a while back commit b64c115eb22516ecd187c74ad6de3f1693f1dc7b Author: Dave Airlie <airlied@redhat.com> Date: Tue Sep 14 20:14:38 2010 +1000 drm: fix race between driver loading and userspace open. Not 100% sure this is due to BKL removal, its most likely a combination of that + userspace timing changes in udev/plymouth. The drm adds the sysfs device before the driver has completed internal loading, this causes udev to make the node and plymouth to open it before we've completed loading. The proper solution is to delay the sysfs manipulation until later in loading however this causes knock on issues with sysfs connector nodes, so we can use the global mutex to serialise loading and userspace opens. Reported-by: Toni Spets (hifi on #radeon) Signed-off-by: Dave Airlie <airlied@redhat.com> by a while I mean nearly 1.5 yrs ago, with the intent of fixing it this way. Dave. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/1] [RFC] DRM locking issues during early open 2012-04-20 10:34 ` Dave Airlie @ 2012-04-20 17:25 ` Andy Whitcroft 0 siblings, 0 replies; 16+ messages in thread From: Andy Whitcroft @ 2012-04-20 17:25 UTC (permalink / raw) To: Dave Airlie Cc: David Airlie, dri-devel, Jesse Barnes, Bryce Harrington, linux-kernel On Fri, Apr 20, 2012 at 11:34:43AM +0100, Dave Airlie wrote: > > > > I may be reading things wrong but the initialisation does indeed hold > > drm_global_mutex, but and back when this first occured we would have > > been using kernel_lock() which was at least partially reentrant right? > > Yup if we slept with the BKL held we'd have allowed others to get past it, > but also I introduced the global mutex in pci a while back Yeah I have managed to get access to more details on the bug, and actually we are opening the drm device successfully, we then attempt a DRM_SETVERSION ioctl on it and it is that that appears to fail both for 1.4 and 1.1. It is somewhat perplexing to understand how that is possible, though I will note that the stub f_ops do not contain an ioctl op but I cannot see any mechanism by which we might return a validly open file without putting the driver specific ops in it. -apw ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/1] [RFC] DRM locking issues during early open @ 2012-04-20 17:25 ` Andy Whitcroft 0 siblings, 0 replies; 16+ messages in thread From: Andy Whitcroft @ 2012-04-20 17:25 UTC (permalink / raw) To: Dave Airlie; +Cc: dri-devel, linux-kernel On Fri, Apr 20, 2012 at 11:34:43AM +0100, Dave Airlie wrote: > > > > I may be reading things wrong but the initialisation does indeed hold > > drm_global_mutex, but and back when this first occured we would have > > been using kernel_lock() which was at least partially reentrant right? > > Yup if we slept with the BKL held we'd have allowed others to get past it, > but also I introduced the global mutex in pci a while back Yeah I have managed to get access to more details on the bug, and actually we are opening the drm device successfully, we then attempt a DRM_SETVERSION ioctl on it and it is that that appears to fail both for 1.4 and 1.1. It is somewhat perplexing to understand how that is possible, though I will note that the stub f_ops do not contain an ioctl op but I cannot see any mechanism by which we might return a validly open file without putting the driver specific ops in it. -apw ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2012-04-20 17:25 UTC | newest] Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2012-04-19 16:22 [PATCH 0/1] [RFC] DRM locking issues during early open Andy Whitcroft 2012-04-19 16:22 ` [PATCH 1/1] drm -- stop early access to drm devices Andy Whitcroft 2012-04-19 16:30 ` [PATCH 0/1] [RFC] DRM locking issues during early open Dave Airlie 2012-04-19 16:41 ` Andy Whitcroft 2012-04-19 16:47 ` Dave Airlie 2012-04-19 16:47 ` Dave Airlie 2012-04-19 16:52 ` Dave Airlie 2012-04-19 16:55 ` Jesse Barnes 2012-04-19 16:56 ` Dave Airlie 2012-04-19 17:00 ` Dave Airlie 2012-04-19 16:41 ` Daniel Vetter 2012-04-20 9:40 ` Dave Airlie 2012-04-20 10:31 ` Andy Whitcroft 2012-04-20 10:34 ` Dave Airlie 2012-04-20 17:25 ` Andy Whitcroft 2012-04-20 17:25 ` Andy Whitcroft
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.