All of lore.kernel.org
 help / color / mirror / Atom feed
From: linux@arm.linux.org.uk (Russell King - ARM Linux)
To: linux-arm-kernel@lists.infradead.org
Subject: [RFC PATCH 0/8] component helper improvements
Date: Wed, 2 Jul 2014 12:12:06 +0100	[thread overview]
Message-ID: <20140702111206.GA21252@n2100.arm.linux.org.uk> (raw)
In-Reply-To: <20140514184212.GA14232@ulmo>

On Wed, May 14, 2014 at 08:42:17PM +0200, Thierry Reding wrote:
> I've been looking at converting the Tegra DRM driver to the component
> helpers for a while now and had to make some changes to make it work for
> that particular use-case. While updating the imx-drm and msm DRM drivers
> for those changes I noticed an oddity. Both of the existing drivers use
> the following pattern:
> 
> 	static int driver_component_bind(struct device *dev,
> 					 struct device *master,
> 					 void *data)
> 	{
> 		allocate memory
> 		request resources
> 		...
> 		hook up to subsystem
> 		...
> 		enable hardware
> 	}
> 
> 	static const struct component_ops driver_component_ops = {
> 		.bind = driver_component_bind,
> 	};
> 
> 	static int driver_probe(struct platform_device *pdev)
> 	{
> 		return component_add(&pdev->dev, &driver_component_ops);
> 	}
> 
> While converting Tegra DRM, what I intuitively did (I didn't actually
> look at the other drivers for inspiration) was something more along the
> lines of the following:
> 
> 	static int driver_component_bind(struct device *dev,
> 					 struct device *master,
> 					 void *data)
> 	{
> 		hook up to subsystem
> 		...
> 		enable hardware
> 	}
> 
> 	static const struct component_ops driver_component_ops = {
> 		.bind = driver_component_bind,
> 	};
> 
> 	static int driver_probe(struct platform_device *pdev)
> 	{
> 		allocate memory
> 		request resources
> 		...
> 		return component_add(&pdev->dev, &driver_component_ops);
> 	}
> 
> Since usually deferred probing is caused by resource allocations failing
> this has the side-effect of handling deferred probing before the master
> device is even bound (the component_add() happens as the very last step)
> and therefore there is less risk for component_bind_all() to fail. I've
> actually never seen it fail at all. Failure at that point is almost
> certainly irrecoverable anyway.

It isn't irrecoverable - that case is handled.

I really don't like two-stage driver initialisation - it increases the
chances of bugs creeping in.  Take for example this code:

probe()
{
	priv = devm_kzalloc(dev, whatever);
	priv->mem = devm_ioremap_resource(dev, res);
	dev_set_drvdata(dev, priv);
	return component_add(dev, &ops);
}

So far so good, not much can go wrong at that point - we know exactly
what state the 'priv' structure is at the point where the component_add
call is made.

Now, when the ops' bind method is called, we retrieve the private data.
At this point, we can no longer rely on the initialisation state of
many of the members.  We can't assume that they were zero when we're
called, because we can have this sequence of events:

- driver is probed
- component is bound
- component is unbound
- component is bound

At this point, the private data will be dirty.  This actually makes the
use of devm_kzalloc() a joke in the probe function - although it does
initialise all members to zero, we can't rely on that at all when the
component is bound.

While the driver itself may be coded for this to be safe, can we say the
same for any structures which are embedded into the private data, which
may be private to other subsystems?

By way of illustration, ASoC can also have this two stage approach.  I'll
draw your attention to SGTL5000, and the recent patch I submitted (which
I don't think will be taken.)  This driver suffers badly if the ASoC
"card" is bound, then unbound, and an attempt to rebind it again.  That's
because the driver gets some managed resources in both the first stage and
the second stage, and expects them to be automatically released in when
the second stage is torn down.  This bug has existed for a very long
time, and has gone unnoticed (it will be unnoticed until you try to debug
by removing modules and trying to load replacements, which is how I found
it.)

That exact bug can't happen with the component helpers, because I
explicitly thought about the handling of managed resources, and added
the necessary support to deal with these correctly.  However, it serves
as an example that, despite comments from people saying that my fear
is unlikely to happen, we already have code which suffers from issues
with two-stage initialisation.

The unfortunate thing is that validation testing for Linux tends not to
venture much past "does it boot", "are my devices present" and "can I run
some programs".  It doesn't cover system shutdown/reboot very often
(we've had bugs which have been present for ages there - my test farm
explicitly does a power off after boot testing now) and it hardly ever
covers drivers being unbound or module removal.

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

WARNING: multiple messages have this Message-ID (diff)
From: Russell King - ARM Linux <linux@arm.linux.org.uk>
To: Thierry Reding <thierry.reding@gmail.com>
Cc: devel@driverdev.osuosl.org,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	dri-devel@lists.freedesktop.org,
	Laurent Pinchart <laurent.pinchart@ideasonboard.com>,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [RFC PATCH 0/8] component helper improvements
Date: Wed, 2 Jul 2014 12:12:06 +0100	[thread overview]
Message-ID: <20140702111206.GA21252@n2100.arm.linux.org.uk> (raw)
In-Reply-To: <20140514184212.GA14232@ulmo>

On Wed, May 14, 2014 at 08:42:17PM +0200, Thierry Reding wrote:
> I've been looking at converting the Tegra DRM driver to the component
> helpers for a while now and had to make some changes to make it work for
> that particular use-case. While updating the imx-drm and msm DRM drivers
> for those changes I noticed an oddity. Both of the existing drivers use
> the following pattern:
> 
> 	static int driver_component_bind(struct device *dev,
> 					 struct device *master,
> 					 void *data)
> 	{
> 		allocate memory
> 		request resources
> 		...
> 		hook up to subsystem
> 		...
> 		enable hardware
> 	}
> 
> 	static const struct component_ops driver_component_ops = {
> 		.bind = driver_component_bind,
> 	};
> 
> 	static int driver_probe(struct platform_device *pdev)
> 	{
> 		return component_add(&pdev->dev, &driver_component_ops);
> 	}
> 
> While converting Tegra DRM, what I intuitively did (I didn't actually
> look at the other drivers for inspiration) was something more along the
> lines of the following:
> 
> 	static int driver_component_bind(struct device *dev,
> 					 struct device *master,
> 					 void *data)
> 	{
> 		hook up to subsystem
> 		...
> 		enable hardware
> 	}
> 
> 	static const struct component_ops driver_component_ops = {
> 		.bind = driver_component_bind,
> 	};
> 
> 	static int driver_probe(struct platform_device *pdev)
> 	{
> 		allocate memory
> 		request resources
> 		...
> 		return component_add(&pdev->dev, &driver_component_ops);
> 	}
> 
> Since usually deferred probing is caused by resource allocations failing
> this has the side-effect of handling deferred probing before the master
> device is even bound (the component_add() happens as the very last step)
> and therefore there is less risk for component_bind_all() to fail. I've
> actually never seen it fail at all. Failure at that point is almost
> certainly irrecoverable anyway.

It isn't irrecoverable - that case is handled.

I really don't like two-stage driver initialisation - it increases the
chances of bugs creeping in.  Take for example this code:

probe()
{
	priv = devm_kzalloc(dev, whatever);
	priv->mem = devm_ioremap_resource(dev, res);
	dev_set_drvdata(dev, priv);
	return component_add(dev, &ops);
}

So far so good, not much can go wrong at that point - we know exactly
what state the 'priv' structure is at the point where the component_add
call is made.

Now, when the ops' bind method is called, we retrieve the private data.
At this point, we can no longer rely on the initialisation state of
many of the members.  We can't assume that they were zero when we're
called, because we can have this sequence of events:

- driver is probed
- component is bound
- component is unbound
- component is bound

At this point, the private data will be dirty.  This actually makes the
use of devm_kzalloc() a joke in the probe function - although it does
initialise all members to zero, we can't rely on that at all when the
component is bound.

While the driver itself may be coded for this to be safe, can we say the
same for any structures which are embedded into the private data, which
may be private to other subsystems?

By way of illustration, ASoC can also have this two stage approach.  I'll
draw your attention to SGTL5000, and the recent patch I submitted (which
I don't think will be taken.)  This driver suffers badly if the ASoC
"card" is bound, then unbound, and an attempt to rebind it again.  That's
because the driver gets some managed resources in both the first stage and
the second stage, and expects them to be automatically released in when
the second stage is torn down.  This bug has existed for a very long
time, and has gone unnoticed (it will be unnoticed until you try to debug
by removing modules and trying to load replacements, which is how I found
it.)

That exact bug can't happen with the component helpers, because I
explicitly thought about the handling of managed resources, and added
the necessary support to deal with these correctly.  However, it serves
as an example that, despite comments from people saying that my fear
is unlikely to happen, we already have code which suffers from issues
with two-stage initialisation.

The unfortunate thing is that validation testing for Linux tends not to
venture much past "does it boot", "are my devices present" and "can I run
some programs".  It doesn't cover system shutdown/reboot very often
(we've had bugs which have been present for ages there - my test farm
explicitly does a power off after boot testing now) and it hardly ever
covers drivers being unbound or module removal.

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

  reply	other threads:[~2014-07-02 11:12 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-26 23:00 [RFC PATCH 0/8] component helper improvements Russell King - ARM Linux
2014-04-26 23:00 ` Russell King - ARM Linux
2014-04-26 23:01 ` [PATCH RFC 1/8] component: fix missed cleanup in case of devres failure Russell King
2014-04-26 23:01 ` [PATCH RFC 2/8] component: ignore multiple additions of the same component Russell King
2014-04-26 23:01 ` [PATCH RFC 3/8] component: add support for component match array Russell King
2014-04-28  9:21   ` Thierry Reding
2014-06-24 19:08     ` Russell King - ARM Linux
2014-04-26 23:02 ` [PATCH RFC 4/8] drm: msm: update to use component match support Russell King
2014-04-26 23:02   ` Russell King
2014-04-27 15:49   ` Rob Clark
2014-04-27 15:49     ` Rob Clark
2014-04-26 23:02 ` [PATCH RFC 5/8] imx-drm: " Russell King
2014-04-26 23:02 ` [PATCH RFC 6/8] component: remove old add_components method Russell King
2014-04-28  7:07   ` Thierry Reding
2014-04-28 10:28     ` Russell King - ARM Linux
2014-04-28 10:52       ` Thierry Reding
2014-04-26 23:02 ` [PATCH RFC 7/8] component: move check for unbound master into try_to_bring_up_masters() Russell King
2014-04-28  7:10   ` Thierry Reding
2014-04-26 23:02 ` [PATCH RFC 8/8] component: track components via array rather than list Russell King
2014-04-27  9:43 ` [PATCH RFC 1/8] component: fix missed cleanup in case of devres failure Russell King
2014-04-27 12:51 ` [RFC PATCH 0/8] component helper improvements Daniel Vetter
2014-04-27 12:51   ` Daniel Vetter
2014-04-27 13:32   ` Russell King - ARM Linux
2014-04-27 13:32     ` Russell King - ARM Linux
2014-05-14 18:42 ` Thierry Reding
2014-05-14 18:42   ` Thierry Reding
2014-07-02 11:12   ` Russell King - ARM Linux [this message]
2014-07-02 11:12     ` Russell King - ARM Linux

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140702111206.GA21252@n2100.arm.linux.org.uk \
    --to=linux@arm.linux.org.uk \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.