linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH 0/4] Change return values from queue_work et al.
@ 2006-08-28 20:37 Alan Stern
  0 siblings, 0 replies; 5+ messages in thread
From: Alan Stern @ 2006-08-28 20:37 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Kernel development list

On 28 August 2006, Alan Stern wrote:

> Andrew and Ingo:
> 
> As discussed earlier, the following patches change the representation used 
> for the return values from queue_work(), schedule_work(), and related 
> routines.

Sorry, I messed up the Subject line.  This series contains only 3 patches, 
not 4.

Alan Stern


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 0/4] Change return values from queue_work et al.
  2006-08-29 14:40   ` Alan Stern
@ 2006-08-29 16:06     ` Stefan Richter
  0 siblings, 0 replies; 5+ messages in thread
From: Stefan Richter @ 2006-08-29 16:06 UTC (permalink / raw)
  To: Alan Stern
  Cc: Andrew Morton, Ingo Molnar, Kernel development list,
	SCSI development list, Jens Axboe

Alan Stern wrote:
...
> Note that the change falls within the bounds of the documented
> behavior, in the sense that any code which was originally written
> correctly (i.e., in accordance with the documentation) will continue to
> work correctly without generating any warnings.
...

You are right that there is no comment (or better yet, kerneldoc
comment) about what happens if an instance of work_struct is enqueued
twice. However, /a/ there is the source and /b/ Corbet, Rubini,
Kroah-Hartman: LDD3 describes in detail in an easily understood section
how workqueues are to be used. (Workqueues in Linux 2.6.10, that is.)

...
> If the
> usage is correct then there is no harm in leaving the WARN_ON call where
> it is.  If the usage is wrong then the call needs to be fixed, and the
> maintainer for the subsystem containing the call will soon find out about
> it, thanks to the WARN_ON.
...

Acceptable on second thought, particularly in light of your new
replacement functions with improved semantics of their return value.
Although there are cases where the WARN_ON might not go off during a
long time, or where an update won't happen in many months despite
hundreds of reports at dozens of mailing lists and bugzillas.
-- 
Stefan Richter
-=====-=-==- =--- ===-=
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 0/4] Change return values from queue_work et al.
  2006-08-28 22:31 ` Stefan Richter
@ 2006-08-29 14:40   ` Alan Stern
  2006-08-29 16:06     ` Stefan Richter
  0 siblings, 1 reply; 5+ messages in thread
From: Alan Stern @ 2006-08-29 14:40 UTC (permalink / raw)
  To: Stefan Richter
  Cc: Andrew Morton, Ingo Molnar, Kernel development list,
	SCSI development list, Jens Axboe

On Tue, 29 Aug 2006, Stefan Richter wrote:

> Alan Stern wrote:
> [...]
> > It turned out these functions were used in ~800 places, and in ~90% of
> > them the return value was ignored!  This is perhaps understandble, because
> > the only way these functions can fail is if their work_struct argument is
> > uninitialized or already in use.  (Whether it's robust for callers to
> > depend on this behavior remaining unchanged into the indefinite future is
> > more questionable.)
> 
> You are changing this behavior right now...

Indeed yes.  As stated in portions of the patch description that you have
omitted.  Note that the change falls within the bounds of the documented
behavior, in the sense that any code which was originally written
correctly (i.e., in accordance with the documentation) will continue to
work correctly without generating any warnings.

> > So I took a short cut which allowed most of the usages to remain as they
> > are.  queue_work(), schedule_work(), and their friends still exist and do
> > what they did before, but now they return void.  In addition, they call
> > WARN_ON if the submission fails; this seems safer than letting the failure
> > go silently unnoticed.  
> [...]
> 
> ...by adding a WARN_ON even though "work not enqueued because it is 
> already in queue" may not be a "failure" at all.

Again yes, as mentioned in the patch description.  The underlying problem 
is that the API does not provide any way to distinguish between failure 
because the request is already in a queue and other types of failure, 
such as forgetting to initialize the data structure.

In fact, to be _really_ robust about it would require walking through the
list of structures attached to the work queue, to verify that the
submitted request actually was attached to that queue already and not to
some other one (or to none).  Of course, any individual caller is probably
able to guarantee this, so it's not worthwhile to add such a check.

> It _is_ robust for callers to depend on the old behavior. This is 
> because /we/ who use these functions will remind /you/ who alters these 
> functions to first research what the actual usage is. So please check 
> every caller which ignores the return code for the actual intent of the 
> caller.

Thank you, but I do not intend to spend the next five years of my life
working on this issue.  The fact that the callers ignore the return code
clearly indicates that they do not care whether the functions fail or why
they fail.  This may be due to inside knowledge ("the only reason it would
fail is if the request is already queued" -- which is certainly not robust
because it is not documented and in the future there may be other reasons
for failure) or because of sheer laziness.  In the absence of comments
it's impossible or extremely difficult to tell.

However, if you can point to specific examples of places where callers 
rely on undocumented behavior, I will be happy to fix them as I did for 
the call in drivers/char/vt.c.  (I will admit, that particular change was 
not ideal.  It really should check for errors other than -EBUSY -- but at 
least now the functions are _documented_ as returning no errors other than 
-EBUSY.)

> Do not add WARN_ON to queue_work() etc.. Instead add WARN_ON or BUG_ON 
> or an actual failure handling to callers which _incorrectly_ expect they 
> could add the same instance of work_struct to queues more than once 
> before the work was executed.

No.  I cannot go through hundreds of usages, learning them and their
contexts in sufficient detail to understand the reasoning behind them.  
(If _you_ are capable of doing this... then my hat's off to you.)  If the
usage is correct then there is no harm in leaving the WARN_ON call where
it is.  If the usage is wrong then the call needs to be fixed, and the
maintainer for the subsystem containing the call will soon find out about
it, thanks to the WARN_ON.

By the way, you left out a couple of possible _incorrect_ usages.  
Callers may _incorrectly_ expect they can add an instance of work_struct
to a queue without having completely initialized it.  Or they may
_potentially_ _incorrectly_ expect that even though the structure is
properly initialized and not in use, the submission is certain to succeed.

> Furthermore, if you already change the type of widely-used exported 
> functions (i.e. you change the workqueue API), why don't you delete 
> these functions right away?

This question was answered in a section you quoted above ("So I took a 
short cut...").  Leaving the existing calls as they are is an opportunity 
for finding about errors that otherwise would be silently ignored.

Alan Stern


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 0/4] Change return values from queue_work et al.
  2006-08-28 20:26 Alan Stern
@ 2006-08-28 22:31 ` Stefan Richter
  2006-08-29 14:40   ` Alan Stern
  0 siblings, 1 reply; 5+ messages in thread
From: Stefan Richter @ 2006-08-28 22:31 UTC (permalink / raw)
  To: Alan Stern
  Cc: Andrew Morton, Ingo Molnar, Kernel development list,
	SCSI development list, Jens Axboe

Alan Stern wrote:
[...]
> It turned out these functions were used in ~800 places, and in ~90% of
> them the return value was ignored!  This is perhaps understandble, because
> the only way these functions can fail is if their work_struct argument is
> uninitialized or already in use.  (Whether it's robust for callers to
> depend on this behavior remaining unchanged into the indefinite future is
> more questionable.)

You are changing this behavior right now...

> So I took a short cut which allowed most of the usages to remain as they
> are.  queue_work(), schedule_work(), and their friends still exist and do
> what they did before, but now they return void.  In addition, they call
> WARN_ON if the submission fails; this seems safer than letting the failure
> go silently unnoticed.  
[...]

...by adding a WARN_ON even though "work not enqueued because it is 
already in queue" may not be a "failure" at all.

It _is_ robust for callers to depend on the old behavior. This is 
because /we/ who use these functions will remind /you/ who alters these 
functions to first research what the actual usage is. So please check 
every caller which ignores the return code for the actual intent of the 
caller.

Do not add WARN_ON to queue_work() etc.. Instead add WARN_ON or BUG_ON 
or an actual failure handling to callers which _incorrectly_ expect they 
could add the same instance of work_struct to queues more than once 
before the work was executed.

Furthermore, if you already change the type of widely-used exported 
functions (i.e. you change the workqueue API), why don't you delete 
these functions right away?
-- 
Stefan Richter
-=====-=-==- =--- ===-=
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 0/4] Change return values from queue_work et al.
@ 2006-08-28 20:26 Alan Stern
  2006-08-28 22:31 ` Stefan Richter
  0 siblings, 1 reply; 5+ messages in thread
From: Alan Stern @ 2006-08-28 20:26 UTC (permalink / raw)
  To: Andrew Morton, Ingo Molnar
  Cc: Kernel development list, SCSI development list, Jens Axboe

Andrew and Ingo:

As discussed earlier, the following patches change the representation used 
for the return values from queue_work(), schedule_work(), and related 
routines.

It turned out these functions were used in ~800 places, and in ~90% of
them the return value was ignored!  This is perhaps understandble, because
the only way these functions can fail is if their work_struct argument is
uninitialized or already in use.  (Whether it's robust for callers to
depend on this behavior remaining unchanged into the indefinite future is
more questionable.)

So I took a short cut which allowed most of the usages to remain as they
are.  queue_work(), schedule_work(), and their friends still exist and do
what they did before, but now they return void.  In addition, they call
WARN_ON if the submission fails; this seems safer than letting the failure
go silently unnoticed.  

However there is a risk to using WARN_ON: In at least one place
(drivers/char/vt.c) a call to schedule_work() is _expected_ to fail!  The
driver doesn't care whether or not the work_struct is already queued, so
long as the callback occurs eventually.  This amounts to relying on
undocumented behavior; I added an explanatory comment but it's not clear
what the right approach is.  The same thing may happen in other places.  
Perhaps the WARN_ON lines shouldn't be added.

Keeping the old routine names may complicate the changeover slightly, but 
it isn't dangerous.  The compiler will flag as an error any attempt to 
call the old routine names expecting a non-void return value.

In place of the original routines are add_work_to_q(), add_work(), and so
on.  They return int values, but now the values are 0 for success and
-EBUSY for failure instead of 1 for success and 0 for failure.  The
relatively small number of places that did check the return values have
been updated to use the new routines.  (Incidentally, rpc_make_runnable()  
in net/sunrpc/sched.c checked the return value in the wrong way -- which
is now the correct way.  So the patch _does_ qualify as a bug fix.)

There were four functions acting as wrappers around queue_work or 
queue_delayed_work:

	fc_queue_work() and fc_queue_devloss_work() in 
	drivers/scsi/scsi_transport_fc.c,

	scsi_queue_work() in drivers/scsi/hosts.c,

	and kblockd_schedule_work() in block/ll_rw_blk.c.

These functions passed the return value from queue_work back to their
callers, which might have made things awkward -- except that none of the
callers check the return value!  So I changed all these routines to
return void instead of int.

Finally, included is a patch which adds a new chapter to 
Documentation/CodingStyle.  It explains how to decide between returning a 
1/0 code vs. a 0/-Exxx code for success/failure, and it speaks for itself.

Alan Stern


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2006-08-29 16:10 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-08-28 20:37 [PATCH 0/4] Change return values from queue_work et al Alan Stern
  -- strict thread matches above, loose matches on Subject: below --
2006-08-28 20:26 Alan Stern
2006-08-28 22:31 ` Stefan Richter
2006-08-29 14:40   ` Alan Stern
2006-08-29 16:06     ` Stefan Richter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).