All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] firmware: dmi_scan: Drop dmi_initialized
@ 2017-09-18  8:05 Jean Delvare
  2017-09-23 10:50 ` Ingo Molnar
  0 siblings, 1 reply; 8+ messages in thread
From: Jean Delvare @ 2017-09-18  8:05 UTC (permalink / raw)
  To: LKML; +Cc: Ingo Molnar

I don't think it makes sense to check for a possible bad
initialization order at run time on every system when it is all
decided at build time.

A more efficient way to make sure developers do not introduce new
calls to dmi_check_system() too early in the initialization sequence
is to simply document the expected call order. That way, developers
have a chance to get it right immediately, without having to
test-boot their kernel, wonder why it does not work, and parse the
kernel logs for a warning message. And we get rid of the run-time
performance penalty as a nice side effect.

Signed-off-by: Jean Delvare <jdelvare@suse.de>
Cc: Ingo Molnar <mingo@kernel.org>
---
 drivers/firmware/dmi_scan.c |   21 ++++++++-------------
 1 file changed, 8 insertions(+), 13 deletions(-)

--- linux-4.14-rc0.orig/drivers/firmware/dmi_scan.c	2017-09-15 11:58:10.005977920 +0200
+++ linux-4.14-rc0/drivers/firmware/dmi_scan.c	2017-09-18 09:00:01.506399879 +0200
@@ -26,11 +26,6 @@ static u16 dmi_num;
 static u8 smbios_entry_point[32];
 static int smbios_entry_point_size;
 
-/*
- * Catch too early calls to dmi_check_system():
- */
-static int dmi_initialized;
-
 /* DMI system identification string used during boot */
 static char dmi_ids_string[128] __initdata;
 
@@ -633,7 +628,7 @@ void __init dmi_scan_machine(void)
 
 			if (!dmi_smbios3_present(buf)) {
 				dmi_available = 1;
-				goto out;
+				return;
 			}
 		}
 		if (efi.smbios == EFI_INVALID_TABLE_ADDR)
@@ -651,7 +646,7 @@ void __init dmi_scan_machine(void)
 
 		if (!dmi_present(buf)) {
 			dmi_available = 1;
-			goto out;
+			return;
 		}
 	} else if (IS_ENABLED(CONFIG_DMI_SCAN_MACHINE_NON_EFI_FALLBACK)) {
 		p = dmi_early_remap(0xF0000, 0x10000);
@@ -668,7 +663,7 @@ void __init dmi_scan_machine(void)
 			if (!dmi_smbios3_present(buf)) {
 				dmi_available = 1;
 				dmi_early_unmap(p, 0x10000);
-				goto out;
+				return;
 			}
 			memcpy(buf, buf + 16, 16);
 		}
@@ -686,7 +681,7 @@ void __init dmi_scan_machine(void)
 			if (!dmi_present(buf)) {
 				dmi_available = 1;
 				dmi_early_unmap(p, 0x10000);
-				goto out;
+				return;
 			}
 			memcpy(buf, buf + 16, 16);
 		}
@@ -694,8 +689,6 @@ void __init dmi_scan_machine(void)
 	}
  error:
 	pr_info("DMI not present or invalid.\n");
- out:
-	dmi_initialized = 1;
 }
 
 static ssize_t raw_table_read(struct file *file, struct kobject *kobj,
@@ -827,14 +820,14 @@ static bool dmi_is_end_of_table(const st
  *	Walk the blacklist table running matching functions until someone
  *	returns non zero or we hit the end. Callback function is called for
  *	each successful match. Returns the number of matches.
+ *
+ *	dmi_scan_machine must be called before this function is called.
  */
 int dmi_check_system(const struct dmi_system_id *list)
 {
 	int count = 0;
 	const struct dmi_system_id *d;
 
-	WARN(!dmi_initialized, KERN_ERR "dmi check: not initialized yet.\n");
-
 	for (d = list; !dmi_is_end_of_table(d); d++)
 		if (dmi_matches(d)) {
 			count++;
@@ -857,6 +850,8 @@ EXPORT_SYMBOL(dmi_check_system);
  *
  *	Walk the blacklist table until the first match is found.  Return the
  *	pointer to the matching entry or NULL if there's no match.
+ *
+ *	dmi_scan_machine must be called before this function is called.
  */
 const struct dmi_system_id *dmi_first_match(const struct dmi_system_id *list)
 {


-- 
Jean Delvare
SUSE L3 Support

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] firmware: dmi_scan: Drop dmi_initialized
  2017-09-18  8:05 [PATCH] firmware: dmi_scan: Drop dmi_initialized Jean Delvare
@ 2017-09-23 10:50 ` Ingo Molnar
  2017-09-23 15:29   ` Jean Delvare
  0 siblings, 1 reply; 8+ messages in thread
From: Ingo Molnar @ 2017-09-23 10:50 UTC (permalink / raw)
  To: Jean Delvare; +Cc: LKML


* Jean Delvare <jdelvare@suse.de> wrote:

> I don't think it makes sense to check for a possible bad
> initialization order at run time on every system when it is all
> decided at build time.
> 
> A more efficient way to make sure developers do not introduce new
> calls to dmi_check_system() too early in the initialization sequence
> is to simply document the expected call order. That way, developers
> have a chance to get it right immediately, without having to
> test-boot their kernel, wonder why it does not work, and parse the
> kernel logs for a warning message. And we get rid of the run-time
> performance penalty as a nice side effect.

Huh? Initialization ordering requirements are very opaque, and by removing the 
debug check any such bugs are actively hidden. How is documentation supposed to 
uncover such bugs once they happen?

So NAK.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] firmware: dmi_scan: Drop dmi_initialized
  2017-09-23 10:50 ` Ingo Molnar
@ 2017-09-23 15:29   ` Jean Delvare
  2017-09-24  9:16     ` Ingo Molnar
  0 siblings, 1 reply; 8+ messages in thread
From: Jean Delvare @ 2017-09-23 15:29 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: LKML

Hi Ingo,

On Sat, 23 Sep 2017 12:50:31 +0200, Ingo Molnar wrote:
> * Jean Delvare <jdelvare@suse.de> wrote:
> 
> > I don't think it makes sense to check for a possible bad
> > initialization order at run time on every system when it is all
> > decided at build time.
> > 
> > A more efficient way to make sure developers do not introduce new
> > calls to dmi_check_system() too early in the initialization sequence
> > is to simply document the expected call order. That way, developers
> > have a chance to get it right immediately, without having to
> > test-boot their kernel, wonder why it does not work, and parse the
> > kernel logs for a warning message. And we get rid of the run-time
> > performance penalty as a nice side effect.  
> 
> Huh? Initialization ordering requirements are very opaque,

They were. Now they are very documented.

> and by removing the debug check any such bugs are actively hidden. How
> is documentation supposed to uncover such bugs once they happen?

You are looking at it the wrong way around. Documentation is how they
do not happen in the first place.

You hit this problem once, 9 years ago. You thought it would have been
easier to debug if there was a warning, and you added it. It was one
way to solve the problem but I claim it was not the best.

What I expect from developers calling a function they aren't familiar
with is to read its documentation first. That's the very reason why we
spend time writing the documentation. They should not just call the
function, boot and see if it works or not. Software engineering vs.
trial and error.

> So NAK.

This was FYI. I maintain this subsystem, and you did not convince me. I
also can't see a general trend of implementing what you suggest in the
rest of the kernel. Thankfully.

-- 
Jean Delvare
SUSE L3 Support

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] firmware: dmi_scan: Drop dmi_initialized
  2017-09-23 15:29   ` Jean Delvare
@ 2017-09-24  9:16     ` Ingo Molnar
  2017-09-25  9:00       ` Jean Delvare
  0 siblings, 1 reply; 8+ messages in thread
From: Ingo Molnar @ 2017-09-24  9:16 UTC (permalink / raw)
  To: Jean Delvare
  Cc: LKML, Thomas Gleixner, H. Peter Anvin, Peter Zijlstra,
	Linus Torvalds, Andrew Morton


* Jean Delvare <jdelvare@suse.de> wrote:

> Hi Ingo,
> 
> On Sat, 23 Sep 2017 12:50:31 +0200, Ingo Molnar wrote:
> > * Jean Delvare <jdelvare@suse.de> wrote:
> > 
> > > I don't think it makes sense to check for a possible bad
> > > initialization order at run time on every system when it is all
> > > decided at build time.
> > > 
> > > A more efficient way to make sure developers do not introduce new
> > > calls to dmi_check_system() too early in the initialization sequence
> > > is to simply document the expected call order. That way, developers
> > > have a chance to get it right immediately, without having to
> > > test-boot their kernel, wonder why it does not work, and parse the
> > > kernel logs for a warning message. And we get rid of the run-time
> > > performance penalty as a nice side effect.  
> > 
> > Huh? Initialization ordering requirements are very opaque,
> 
> They were. Now they are very documented.
> 
> > and by removing the debug check any such bugs are actively hidden. How
> > is documentation supposed to uncover such bugs once they happen?
> 
> You are looking at it the wrong way around. Documentation is how they
> do not happen in the first place.

That expectation, as a general statement, is very naive and contrary to 
experience: documentation is fine for one layer of defense to prevent bugs, but 
_when_ they happen and a bug slips through, documentation does not help anymore, 
because the dependencies in the _code_ are opaque and non-obvious ...

For example during the early SMP efforts of Linux we used to document lock 
dependencies as well, but once the kernel had more than a dozen spinlocks we 
periodically ran into deadlocks and the whole design became unmaintainable 
quickly. So we have lockdep in addition to documentation.

> You hit this problem once, 9 years ago. You thought it would have been easier to 
> debug if there was a warning, and you added it.

I did not just 'think' it would have been easier to debug, I wasted time on that 
bug and a warning would have helped so I added it. That was and remains 
objectively true.

While I expect most such warnings to never see any public email lists (because 
once a developer triggers it it gets fixed without the bug ever getting triggered 
by others), yet searching for "dmi check: not initialized yet" still finds a 
couple of incidents where real or potential bugs were found by this init 
dependency check, such as:

   http://lists.infradead.org/pipermail/linux-arm-kernel/2014-September/289347.html

or this:

   https://www.spinics.net/lists/linux-acpi/msg28698.html

... so this warning actually helped a number of kernel developers to not
waste time on the opaque dependency. This is a warning that was added due
to an _actual category of bugs_, which has been triggered subsequently as
well, so it's not a frivolous warning by any meaning.

> [...] It was one way to solve the problem but I claim it was not the best.
> 
> What I expect from developers calling a function they aren't familiar
> with is to read its documentation first. That's the very reason why we
> spend time writing the documentation. They should not just call the
> function, boot and see if it works or not. Software engineering vs.
> trial and error.

This statement is breathtaking in its ignorance :-(

> > So NAK.
> 
> This was FYI. I maintain this subsystem, and you did not convince me. I also 
> can't see a general trend of implementing what you suggest in the rest of the 
> kernel. Thankfully.

I find the arrogance displayed here breathtaking as well - the last thing we need 
is for firmware interfacing kernel code to become _more_ fragile.

This was and continues to be a useful warning - but what worries me even more is 
not just the removal of the warning, but the false and technically invalid 
justifications under which it is removed...

For those reasons I maintain my NAK:

  Nacked-by: Ingo Molnar <mingo@kernel.org>

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] firmware: dmi_scan: Drop dmi_initialized
  2017-09-24  9:16     ` Ingo Molnar
@ 2017-09-25  9:00       ` Jean Delvare
  2017-09-25  9:24         ` Peter Zijlstra
  2017-09-25  9:26         ` Peter Zijlstra
  0 siblings, 2 replies; 8+ messages in thread
From: Jean Delvare @ 2017-09-25  9:00 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Thomas Gleixner, H. Peter Anvin, Peter Zijlstra,
	Linus Torvalds, Andrew Morton

On Sun, 24 Sep 2017 11:16:25 +0200, Ingo Molnar wrote:
> * Jean Delvare <jdelvare@suse.de> wrote:
> > On Sat, 23 Sep 2017 12:50:31 +0200, Ingo Molnar wrote:  
> > > * Jean Delvare <jdelvare@suse.de> wrote:
> > > > I don't think it makes sense to check for a possible bad
> > > > initialization order at run time on every system when it is all
> > > > decided at build time.
> > > > 
> > > > A more efficient way to make sure developers do not introduce new
> > > > calls to dmi_check_system() too early in the initialization sequence
> > > > is to simply document the expected call order. That way, developers
> > > > have a chance to get it right immediately, without having to
> > > > test-boot their kernel, wonder why it does not work, and parse the
> > > > kernel logs for a warning message. And we get rid of the run-time
> > > > performance penalty as a nice side effect.    
> > > 
> > > Huh? Initialization ordering requirements are very opaque,  
> > 
> > They were. Now they are very documented.
> >   
> > > and by removing the debug check any such bugs are actively hidden. How
> > > is documentation supposed to uncover such bugs once they happen?  
> > 
> > You are looking at it the wrong way around. Documentation is how they
> > do not happen in the first place.  
> 
> That expectation, as a general statement, is very naive and contrary to 
> experience: documentation is fine for one layer of defense to prevent bugs, but 
> _when_ they happen and a bug slips through, documentation does not help anymore, 
> because the dependencies in the _code_ are opaque and non-obvious ...

Seriously... dmi_scan_machine must be called before dmi_check_system,
how is that "opaque"? Non-obvious, maybe, hence the need to document it.

I can find 169 occurrences of "(must|has to|should) be called
(before|after)" in the kernel source tree, plus 19 occurrences of "call
this function (before|after)" so apparently I'm not the only fool who
thinks documenting such ordering requirements is worthwhile.

> For example during the early SMP efforts of Linux we used to document lock 
> dependencies as well, but once the kernel had more than a dozen spinlocks we 
> periodically ran into deadlocks and the whole design became unmaintainable 
> quickly. So we have lockdep in addition to documentation.

And lockdep is good, because it helps solve a very complex problem, and
is able to catch potential issues before they happen. I'm very happy
that bright people designed it, and I am well aware that it caught a
huge number of bugs.

Also note that lockdep can be disabled, so you don't suffer the
overhead at run-time if you don't want to. There is no way to disable
the code you added back then. I wish I could make your check depend on
some general kernel debugging option, unfortunately CONFIG_DEBUG_KERNEL
covers so much that it is almost impossible to do without it and it
ends up being enabled even on production kernels. Maybe we'd need a
separate option for developer mode (something like
CFG80211_DEVELOPER_WARNINGS but generalized to the whole kernel.)

And yes, I am well aware that the performance penalty of your check was
nowhere close to that of lockdep. But add a bit here and a bit there...
More and more drivers are calling dmi_check_system.

> > You hit this problem once, 9 years ago. You thought it would have been easier to 
> > debug if there was a warning, and you added it.  
> > [...] It was one way to solve the problem but I claim it was not the best.
> 
> I did not just 'think' it would have been easier to debug, I wasted time on that 
> bug and a warning would have helped so I added it. That was and remains 
> objectively true.
> 
> While I expect most such warnings to never see any public email lists (because 
> once a developer triggers it it gets fixed without the bug ever getting triggered 
> by others), yet searching for "dmi check: not initialized yet" still finds a 
> couple of incidents where real or potential bugs were found by this init 
> dependency check, such as:
> 
>    http://lists.infradead.org/pipermail/linux-arm-kernel/2014-September/289347.html

Here it indeed helped spot a bug quickly. But I have little doubt the
issue would have been found fast even without it, as disabling DMI
support completely on half of the arm64 systems would hardly go
unnoticed for long.

> or this:
> 
>    https://www.spinics.net/lists/linux-acpi/msg28698.html
> 
> ... so this warning actually helped a number of kernel developers to not
> waste time on the opaque dependency.

But here it was a case where it did not matter, and developers ended up
silencing the warning instead of actually fixing the ordering. Without
your warning, they would actually have saved time.

A hack which will break as soon as my patch named "firmware: dmi:
Optimize dmi_matches" [1] hits mainline, BTW. Thanks for the pointer,
I'll get in touch with the involved developers to find a proper fix.

[1] https://marc.info/?l=linux-kernel&m=150113709717948&w=2

> This is a warning that was added due
> to an _actual category of bugs_, which has been triggered subsequently as
> well, so it's not a frivolous warning by any meaning.

I don't see any category here (unlike lockdep.) That's a single bug,
with 2 known occurrences in 10 years.

> > (...)
> > This was FYI. I maintain this subsystem, and you did not convince me. I also 
> > can't see a general trend of implementing what you suggest in the rest of the 
> > kernel. Thankfully.  
> 
> I find the arrogance displayed here breathtaking as well - the last thing we need 
> is for firmware interfacing kernel code to become _more_ fragile.

We don't need it to become slower, more bloated and less documented
either. And there's a trade-off in everything.

> This was and continues to be a useful warning - but what worries me even more is 
> not just the removal of the warning, but the false and technically invalid 
> justifications under which it is removed...

Then we have that in common. While reading the code and its history, I
was worried that the justification to add this warning in the first
place was technically weak. Not every coding error must automatically
translate to a patch to make the code robust against said error.
Sometimes you just have to admit that you did not pay attention as you
should have, fix your mistake, possibly document it for others, and
move on. Otherwise we end up with slow bloated code.

-- 
Jean Delvare
SUSE L3 Support

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] firmware: dmi_scan: Drop dmi_initialized
  2017-09-25  9:00       ` Jean Delvare
@ 2017-09-25  9:24         ` Peter Zijlstra
  2017-09-25  9:26         ` Peter Zijlstra
  1 sibling, 0 replies; 8+ messages in thread
From: Peter Zijlstra @ 2017-09-25  9:24 UTC (permalink / raw)
  To: Jean Delvare
  Cc: Ingo Molnar, LKML, Thomas Gleixner, H. Peter Anvin,
	Linus Torvalds, Andrew Morton

On Mon, Sep 25, 2017 at 11:00:11AM +0200, Jean Delvare wrote:
> I can find 169 occurrences of "(must|has to|should) be called
> (before|after)" in the kernel source tree, plus 19 occurrences of "call
> this function (before|after)" so apparently I'm not the only fool who
> thinks documenting such ordering requirements is worthwhile.

Documenting things is good, assuming people read it is another. In fact
its provably incorrect, people simply don't read.

I've been slowly converting all those occurrences of "should be called
with 'foo' lock held" into:

	lockdep_assert_held(&foo);

simply because people do _not_ read. And even if they did read, its
still easy to forget and make a mistake.

Heck, I sometimes get it wrong on code I wrote.

Documentation good, runtime checks also good.

Because when you do get it wrong p(someone getting it wrong at some
point) = 1, a WARN_ON_ONCE() triggering that has a comment that says:

 /*
  * if you trigger this; you violated rule #123
  */
 WARN_ON_ONCE(!invariant_cond_123);

saves ever so much more time than debugging weird and wonderful splats
much later in the code that rely on our rule #123.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] firmware: dmi_scan: Drop dmi_initialized
  2017-09-25  9:00       ` Jean Delvare
  2017-09-25  9:24         ` Peter Zijlstra
@ 2017-09-25  9:26         ` Peter Zijlstra
  2017-09-27  8:56           ` Jean Delvare
  1 sibling, 1 reply; 8+ messages in thread
From: Peter Zijlstra @ 2017-09-25  9:26 UTC (permalink / raw)
  To: Jean Delvare
  Cc: Ingo Molnar, LKML, Thomas Gleixner, H. Peter Anvin,
	Linus Torvalds, Andrew Morton

On Mon, Sep 25, 2017 at 11:00:11AM +0200, Jean Delvare wrote:
> Then we have that in common. While reading the code and its history, I
> was worried that the justification to add this warning in the first
> place was technically weak. Not every coding error must automatically
> translate to a patch to make the code robust against said error.
> Sometimes you just have to admit that you did not pay attention as you
> should have, fix your mistake, possibly document it for others, and
> move on. Otherwise we end up with slow bloated code.

That WARN_ON() is a form of documentation.

And if you care about performance for your code path, hide it under some
CONFIG_*_DEBUG, but in general WARN_ON() isn't terribly expensive
(depending entirely on the complexity of the condition of course).

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] firmware: dmi_scan: Drop dmi_initialized
  2017-09-25  9:26         ` Peter Zijlstra
@ 2017-09-27  8:56           ` Jean Delvare
  0 siblings, 0 replies; 8+ messages in thread
From: Jean Delvare @ 2017-09-27  8:56 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, LKML, Thomas Gleixner, H. Peter Anvin,
	Linus Torvalds, Andrew Morton

Hi Peter,

On Mon, 25 Sep 2017 11:26:44 +0200, Peter Zijlstra wrote:
> On Mon, Sep 25, 2017 at 11:00:11AM +0200, Jean Delvare wrote:
> > Then we have that in common. While reading the code and its history, I
> > was worried that the justification to add this warning in the first
> > place was technically weak. Not every coding error must automatically
> > translate to a patch to make the code robust against said error.
> > Sometimes you just have to admit that you did not pay attention as you
> > should have, fix your mistake, possibly document it for others, and
> > move on. Otherwise we end up with slow bloated code.  
> 
> That WARN_ON() is a form of documentation.

Sort of, but not the best form in my opinion. I mean, if a WARN
triggers, you'll have to spot it in the kernel log, and then figure out
why the condition which triggered it was evaluated to false. In this
specific case, this means grepping the code for "dmi_initialized" to
figure out where it should have been set. The error string ("dmi check:
not initialized yet") helps a bit, but that's still not as clear and
immediate as an explicit "dmi_scan_machine must be called before this
function is called" in the function description.

> And if you care about performance for your code path, hide it under some
> CONFIG_*_DEBUG,

I'd love to, but I couldn't find a generic one which would actually be
disabled on production kernels, and introducing a configuration option
for just this seems overkill.

> (...) but in general WARN_ON() isn't terribly expensive
> (depending entirely on the complexity of the condition of course).

The condition is pretty trivial here, so it should indeed be fairly
cheap. However you have to multiply it by the number of times it is
called. When this WARN was introduced, there were 61 calls to
dmi_check_system() in the whole kernel tree. Today we have 164 and the
count keeps increasing. And some of them in common paths which get
called repeatedly. I just instrumented the function to see how many
times it was called on my x86_64 workstation and the answer is 110.
That's not a one-time thing.

-- 
Jean Delvare
SUSE L3 Support

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-09-27  8:56 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-18  8:05 [PATCH] firmware: dmi_scan: Drop dmi_initialized Jean Delvare
2017-09-23 10:50 ` Ingo Molnar
2017-09-23 15:29   ` Jean Delvare
2017-09-24  9:16     ` Ingo Molnar
2017-09-25  9:00       ` Jean Delvare
2017-09-25  9:24         ` Peter Zijlstra
2017-09-25  9:26         ` Peter Zijlstra
2017-09-27  8:56           ` Jean Delvare

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.