linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Naming devices
@ 2003-05-18 21:33 Anton Blanchard
  2003-05-18 22:36 ` Russell King
  2003-05-19  1:22 ` Daniel Stekloff
  0 siblings, 2 replies; 5+ messages in thread
From: Anton Blanchard @ 2003-05-18 21:33 UTC (permalink / raw)
  To: linux-kernel


Hi,

I just spent 2 hours trying to make a machine boot. It had one bad disk
and one bad network card. Normally not a problem, but this thing had 40
cards in it so identifying the problem ones was not straight forward.

I was wondering why we dont have a consistent way of printing a device
location? If all drivers used the same thing, eg:

struct pci_dev *foo;
...
printf("%s: could not enable card\n", PCI_LOCATION(foo));

Which by default would print pci bus/devfn and an arch could override eg
on ppc64 it would also print a location code:

U1.6-P1-I2/E1 (90:0c.0)

This sounds like the domain of the event logging guys but I havent seen
anything from them in a while. The nice thing about this is that when we
get pci domains nothing needs to be changed in the driver, we just
update the PCI_LOCATION macro.

Also the tendency of network drivers to print "eth0: foo" during
initialisation is even more of a problem. If you get a bad card then you
could end up reusing the eth0 name for a subsequent device, making
pinpointing the problem card difficult. On top of that some drivers use
dev->name between calling alloc_netdev() and register_netdev() so that
you end up with error messages like "eth%d: failed".

Anton

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Naming devices
  2003-05-18 21:33 Naming devices Anton Blanchard
@ 2003-05-18 22:36 ` Russell King
  2003-05-19  3:40   ` Anton Blanchard
  2003-05-19  1:22 ` Daniel Stekloff
  1 sibling, 1 reply; 5+ messages in thread
From: Russell King @ 2003-05-18 22:36 UTC (permalink / raw)
  To: Anton Blanchard; +Cc: linux-kernel

On Mon, May 19, 2003 at 07:33:59AM +1000, Anton Blanchard wrote:
> I was wondering why we dont have a consistent way of printing a device
> location? If all drivers used the same thing, eg:

Isn't this what dev->bus_id in the device structure is supposed to be?
(which is supposed to be a unique bus ID on a particular bus type, in
the pci case, a PCI device.)

> Also the tendency of network drivers to print "eth0: foo" during
> initialisation is even more of a problem. If you get a bad card then you
> could end up reusing the eth0 name for a subsequent device, making
> pinpointing the problem card difficult. On top of that some drivers use
> dev->name between calling alloc_netdev() and register_netdev() so that
> you end up with error messages like "eth%d: failed".

Now that the point has been raised, it seems pretty obvious that
initialisation failures should report the BUS ID of the failing card,
not the logical name assigned by the system to that device which could
change.  Once the card is up and running, using the logical name becomes
meaningful - it's the identifier which user space uses to reference the
device.

-- 
Russell King (rmk@arm.linux.org.uk)                The developer of ARM Linux
             http://www.arm.linux.org.uk/personal/aboutme.html


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Naming devices
  2003-05-18 21:33 Naming devices Anton Blanchard
  2003-05-18 22:36 ` Russell King
@ 2003-05-19  1:22 ` Daniel Stekloff
  2003-05-19  1:48   ` David S. Miller
  1 sibling, 1 reply; 5+ messages in thread
From: Daniel Stekloff @ 2003-05-19  1:22 UTC (permalink / raw)
  To: Anton Blanchard, linux-kernel

On Sunday 18 May 2003 02:33 pm, Anton Blanchard wrote:
> Hi,
>
> I just spent 2 hours trying to make a machine boot. It had one bad disk
> and one bad network card. Normally not a problem, but this thing had 40
> cards in it so identifying the problem ones was not straight forward.
>
> I was wondering why we dont have a consistent way of printing a device
> location? If all drivers used the same thing, eg:
>
> struct pci_dev *foo;
> ...
> printf("%s: could not enable card\n", PCI_LOCATION(foo));
>
> Which by default would print pci bus/devfn and an arch could override eg
> on ppc64 it would also print a location code:
>
> U1.6-P1-I2/E1 (90:0c.0)
>
> This sounds like the domain of the event logging guys but I havent seen
> anything from them in a while. The nice thing about this is that when we
> get pci domains nothing needs to be changed in the driver, we just
> update the PCI_LOCATION macro.
>
> Also the tendency of network drivers to print "eth0: foo" during
> initialisation is even more of a problem. If you get a bad card then you
> could end up reusing the eth0 name for a subsequent device, making
> pinpointing the problem card difficult. On top of that some drivers use
> dev->name between calling alloc_netdev() and register_netdev() so that
> you end up with error messages like "eth%d: failed".


Hi Anton,

We have been working on device macros that add standard prefixes to printk 
messages. The purpose of the prefix is to identify the device in the message 
with a specific device or sysfs directory. Generic device macros already are 
in the 2.5 kernel in include/linux/device.h - dev_err, dev_info, etc. They 
prefix printk messages with dev->bus_id and driver name. 

Just last week or so, Jim Keniston asked for comments on network device 
specific macros - netdev_printk. I thought these were handy when I was 
working on a system with 4 ethernet cards. With the e1000 patch, I could 
identify the device without having to use ethtool because netdev_printk 
appends the PCI device id in the prefix of the message. I could tell which 
device eth0 referred to from the message.

One of the reasons why we decided on the wrapper macros is the ability to 
change the prefix in the future without impacting device drivers that have 
implemented those macros. We could add more infromation from the device 
structure to the message without requiring device drivers to change anything. 
We could also use those macros as a hook to provide more functionality, like 
building templates based on calling function and format string to idenify the 
message uniquely, without impacting the driver.

Yet the macros we've been supplying are a bit rigid. Perhaps we should have 
something like you've suggested that could be used by driver writers to tag a 
message with a specific device location while not requiring the use of a 
whole wrapper macro. Plus, you could override the result based on arch. You 
wouldn't get the benefits of the current device macros, but you would be able 
to identify the message with a specific device.


Thanks,

Dan





> Anton
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Naming devices
  2003-05-19  1:22 ` Daniel Stekloff
@ 2003-05-19  1:48   ` David S. Miller
  0 siblings, 0 replies; 5+ messages in thread
From: David S. Miller @ 2003-05-19  1:48 UTC (permalink / raw)
  To: Daniel Stekloff; +Cc: Anton Blanchard, linux-kernel

On Sun, 2003-05-18 at 18:22, Daniel Stekloff wrote:
> Just last week or so, Jim Keniston asked for comments on network device 
> specific macros - netdev_printk. I thought these were handy when I was 
> working on a system with 4 ethernet cards.

I don't understand how this is useful for this application.
If I put 1,000 e1000 cards into the machine, all the messages
scroll out of the dmesg buffer.

The only reliable source for this kind of information is ethtool.
The kernel message buffer is like IP datagram delivery in that it is
unreliable, whereas ethtool provides a stable source for this
information.

All I hear is that "hey we're making printk provide the same
information as ethtool", and when duplicating functionality you
ought to have a real good reason for it :-)

-- 
David S. Miller <davem@redhat.com>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Naming devices
  2003-05-18 22:36 ` Russell King
@ 2003-05-19  3:40   ` Anton Blanchard
  0 siblings, 0 replies; 5+ messages in thread
From: Anton Blanchard @ 2003-05-19  3:40 UTC (permalink / raw)
  To: linux-kernel

 
> Isn't this what dev->bus_id in the device structure is supposed to be?
> (which is supposed to be a unique bus ID on a particular bus type, in
> the pci case, a PCI device.)

We could use that, although for ppc64 Id like to increase its size and
stash the physical location in there as well.

> Now that the point has been raised, it seems pretty obvious that
> initialisation failures should report the BUS ID of the failing card,
> not the logical name assigned by the system to that device which could
> change.  Once the card is up and running, using the logical name becomes
> meaningful - it's the identifier which user space uses to reference the
> device.

Sounds good to me.

Anton

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2003-05-19  3:30 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-05-18 21:33 Naming devices Anton Blanchard
2003-05-18 22:36 ` Russell King
2003-05-19  3:40   ` Anton Blanchard
2003-05-19  1:22 ` Daniel Stekloff
2003-05-19  1:48   ` David S. Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).