All of lore.kernel.org
 help / color / mirror / Atom feed
* What to do about the 2TB limit on HDIO_GETGEO ?
       [not found] <47E875AD.1000901@rtr.ca>
@ 2008-03-25  4:02 ` Mark Lord
  2008-03-25  4:19   ` Andrew Morton
                     ` (2 more replies)
       [not found] ` <alpine.LFD.1.00.0803242254020.2775@woody.linux-foundation.org>
  1 sibling, 3 replies; 59+ messages in thread
From: Mark Lord @ 2008-03-25  4:02 UTC (permalink / raw)
  To: Jens Axboe, Jeff Garzik, Tejun Heo, Greg KH
  Cc: Linus Torvalds, Andrew Morton, Linux Kernel,
	IDE/ATA development list, linux-scsi

(resending .. forgot to copy the lists originally)

We have a problem coming down the pipeline.

Practically all utilities that care about it,
use ioctl(fd, HDIO_GETGEO) to determine the starting
sector offset of a hard disk partition.

SCSI, libata, IDE, USB, Firewire.. you name it.

The return value uses "unsigned long",
which on a 32-bit system limits drive offsets to 2TB.

There will be single drives exceeding this limit within
the next 12 months or less, and we already have RAID arrays
that exceed 2TB.

So.. what's the replacement for HDIO_GETGEO on 32-bits ?

One candidate might seem to be the existing /sys/block/dev/partition/start
which I expect is already 64-bit friendly.

But this requires about 150 lines of somewhat complex C code to access,
using only the dev_t (from stat(2) on a file) as a starting point,
or less if one relies upon the udev device name matching the sysfs device name.

Is it time now for HDIO_GETGEO64 to make an appearance?
Similar to how the existing BLKGETSIZE64 is supplanting BLKGETSIZE ?

??

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: What to do about the 2TB limit on HDIO_GETGEO ?
  2008-03-25  4:02 ` What to do about the 2TB limit on HDIO_GETGEO ? Mark Lord
@ 2008-03-25  4:19   ` Andrew Morton
  2008-03-25  5:13   ` H. Peter Anvin
  2008-03-25 15:17   ` James Bottomley
  2 siblings, 0 replies; 59+ messages in thread
From: Andrew Morton @ 2008-03-25  4:19 UTC (permalink / raw)
  To: Mark Lord
  Cc: Jens Axboe, Jeff Garzik, Tejun Heo, Greg KH, Linus Torvalds,
	Linux Kernel, IDE/ATA development list, linux-scsi

On Tue, 25 Mar 2008 00:02:10 -0400 Mark Lord <lkml@rtr.ca> wrote:

> Is it time now for HDIO_GETGEO64 to make an appearance?
> Similar to how the existing BLKGETSIZE64 is supplanting BLKGETSIZE ?

That sounds useful.

But you're the one who has investigated this - please make a recommendation?

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: What to do about the 2TB limit on HDIO_GETGEO ?
  2008-03-25  4:02 ` What to do about the 2TB limit on HDIO_GETGEO ? Mark Lord
  2008-03-25  4:19   ` Andrew Morton
@ 2008-03-25  5:13   ` H. Peter Anvin
  2008-03-25 13:37     ` Mark Lord
  2008-03-25 15:17   ` James Bottomley
  2 siblings, 1 reply; 59+ messages in thread
From: H. Peter Anvin @ 2008-03-25  5:13 UTC (permalink / raw)
  To: Mark Lord
  Cc: Jens Axboe, Jeff Garzik, Tejun Heo, Greg KH, Linus Torvalds,
	Andrew Morton, Linux Kernel, IDE/ATA development list,
	linux-scsi

Mark Lord wrote:
> 
> One candidate might seem to be the existing /sys/block/dev/partition/start
> which I expect is already 64-bit friendly.
> 
> But this requires about 150 lines of somewhat complex C code to access,
> using only the dev_t (from stat(2) on a file) as a starting point,
> or less if one relies upon the udev device name matching the sysfs 
> device name.
> 
> Is it time now for HDIO_GETGEO64 to make an appearance?
> Similar to how the existing BLKGETSIZE64 is supplanting BLKGETSIZE ?
> 

Probably a better thing to have would be a way to look up block devices 
in sysfs by device number.

	-hpa

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: What to do about the 2TB limit on HDIO_GETGEO ?
       [not found] ` <alpine.LFD.1.00.0803242254020.2775@woody.linux-foundation.org>
@ 2008-03-25 13:34   ` Mark Lord
  2008-03-25 13:51     ` Greg Freemyer
  2008-03-25 14:31     ` Ric Wheeler
  0 siblings, 2 replies; 59+ messages in thread
From: Mark Lord @ 2008-03-25 13:34 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jens Axboe, Jeff Garzik, Tejun Heo, Greg KH l, Andrew Morton,
	Linux Kernel, IDE/ATA development list, linux-scsi

Linus Torvalds wrote:
> 
> On Mon, 24 Mar 2008, Mark Lord wrote:
>> The return value uses "unsigned long",
>> which on a 32-bit system limits drive offsets to 2TB.
> 
> One relevant question is: does anybody seriously care about the 
> combination of "32 bit" and "huge modern drives" any more?
> 
> Sure, we can add a 64-bit version that ends up being used only on 32-bit 
> systems, but quite frankly, I think the solution here is to just ignore 
> the issue and see if anybody really even cares.
> 
> Because quite frankly, the kind of people who buy modern 2TB drives 
> generally don't then couple them to CPU's that are five+ years old.
..

Yeah.  Except Dell will undoubtedly have them in desktops
within 2 years, and tons of people (myself included) still use
32-bit (K)Ubuntu on our systems, simply for the better binary 
compatibility that it is perceived to give with things like
browser plugins and stuff.

Using sysfs interfaces might be a good alternative,
if they were easier to use, but drives are not directly
accessible there using the dev_t value from stat(2).

Instead, software has to search everything inside /sys/block/
looking for a "dev" file whose contents match,
rather than just trying to access something like this:

   /sys/block/8:1/start
or
   /sys/block/majors/8/minors/1/start

Or any one of a number of similar ways to arrange it.

Cheers

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: What to do about the 2TB limit on HDIO_GETGEO ?
  2008-03-25  5:13   ` H. Peter Anvin
@ 2008-03-25 13:37     ` Mark Lord
  2008-03-25 13:55       ` H. Peter Anvin
  0 siblings, 1 reply; 59+ messages in thread
From: Mark Lord @ 2008-03-25 13:37 UTC (permalink / raw)
  To: H. Peter Anvin, Greg KH
  Cc: Jens Axboe, Jeff Garzik, Tejun Heo, Linus Torvalds,
	Andrew Morton, Linux Kernel, IDE/ATA development list,
	linux-scsi

H. Peter Anvin wrote:
> Mark Lord wrote:
>>
>> One candidate might seem to be the existing 
>> /sys/block/dev/partition/start
>> which I expect is already 64-bit friendly.
>>
>> But this requires about 150 lines of somewhat complex C code to access,
>> using only the dev_t (from stat(2) on a file) as a starting point,
>> or less if one relies upon the udev device name matching the sysfs 
>> device name.
>>
>> Is it time now for HDIO_GETGEO64 to make an appearance?
>> Similar to how the existing BLKGETSIZE64 is supplanting BLKGETSIZE ?
>>
> 
> Probably a better thing to have would be a way to look up block devices 
> in sysfs by device number.
..

Yeah, that would be just as good, really.  Maybe even better.

Mark Lord wrote (later on):
> Instead, software has to search everything inside /sys/block/
> looking for a "dev" file whose contents match,
> rather than just trying to access something like this:
> 
>   /sys/block/8:1/start
> or
>   /sys/block/majors/8/minors/1/start
> 
> Or any one of a number of similar ways to arrange it. 
..

Greg ?

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: What to do about the 2TB limit on HDIO_GETGEO ?
  2008-03-25 13:34   ` Mark Lord
@ 2008-03-25 13:51     ` Greg Freemyer
  2008-03-25 14:31     ` Ric Wheeler
  1 sibling, 0 replies; 59+ messages in thread
From: Greg Freemyer @ 2008-03-25 13:51 UTC (permalink / raw)
  To: Mark Lord
  Cc: Linus Torvalds, Jens Axboe, Jeff Garzik, Tejun Heo, Greg KH l,
	Andrew Morton, Linux Kernel, IDE/ATA development list,
	linux-scsi

On Tue, Mar 25, 2008 at 9:34 AM, Mark Lord <lkml@rtr.ca> wrote:
> Linus Torvalds wrote:
>
> >
>  > On Mon, 24 Mar 2008, Mark Lord wrote:
>  >> The return value uses "unsigned long",
>  >> which on a 32-bit system limits drive offsets to 2TB.
>  >
>  > One relevant question is: does anybody seriously care about the
>  > combination of "32 bit" and "huge modern drives" any more?
>  >
>  > Sure, we can add a 64-bit version that ends up being used only on 32-bit
>  > systems, but quite frankly, I think the solution here is to just ignore
>  > the issue and see if anybody really even cares.
>  >
>  > Because quite frankly, the kind of people who buy modern 2TB drives
>  > generally don't then couple them to CPU's that are five+ years old.

We provide data services to our clients.  We are already seeing USB
enclosures routinely provided to us by our clients with 1TB.  1.5TB on
occasion.  2TB usb enclosures can't be far behind.

For usb a bigger factor than anything is when will MS offer
compatibility/supportfor 2TB+ drives.  As soon as they become readily
supported in MS, our clients will start buying them and filling them
up and we will need to be able to access them from all of our systems.
(old and new).

Greg
-- 
Greg Freemyer
Litigation Triage Solutions Specialist
http://www.linkedin.com/in/gregfreemyer
First 99 Days Litigation White Paper -
http://www.norcrossgroup.com/forms/whitepapers/99%20Days%20whitepaper.pdf

The Norcross Group
The Intersection of Evidence & Technology
http://www.norcrossgroup.com

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: What to do about the 2TB limit on HDIO_GETGEO ?
  2008-03-25 13:37     ` Mark Lord
@ 2008-03-25 13:55       ` H. Peter Anvin
  2008-03-25 17:37         ` Mark Lord
  0 siblings, 1 reply; 59+ messages in thread
From: H. Peter Anvin @ 2008-03-25 13:55 UTC (permalink / raw)
  To: Mark Lord
  Cc: Greg KH, Jens Axboe, Jeff Garzik, Tejun Heo, Linus Torvalds,
	Andrew Morton, Linux Kernel, IDE/ATA development list,
	linux-scsi

Mark Lord wrote:
> 
> Yeah, that would be just as good, really.  Maybe even better.
> 
> Mark Lord wrote (later on):
>> Instead, software has to search everything inside /sys/block/
>> looking for a "dev" file whose contents match,
>> rather than just trying to access something like this:
>>
>>   /sys/block/8:1/start
>> or
>>   /sys/block/majors/8/minors/1/start
>>
>> Or any one of a number of similar ways to arrange it. 
> ..
> 

It shouldn't be under /sys/block... there are enough many things that 
scan /sys/block and assume any directory underneath it has the current 
format.

	-hpa

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: What to do about the 2TB limit on HDIO_GETGEO ?
  2008-03-25 13:34   ` Mark Lord
  2008-03-25 13:51     ` Greg Freemyer
@ 2008-03-25 14:31     ` Ric Wheeler
  2008-03-25 15:25       ` Andrew Paprocki
  2008-03-25 15:34       ` Matthew Wilcox
  1 sibling, 2 replies; 59+ messages in thread
From: Ric Wheeler @ 2008-03-25 14:31 UTC (permalink / raw)
  To: Mark Lord
  Cc: Linus Torvalds, Jens Axboe, Jeff Garzik, Tejun Heo, Greg KH l,
	Andrew Morton, Linux Kernel, IDE/ATA development list,
	linux-scsi



Mark Lord wrote:
> Linus Torvalds wrote:
>>
>> On Mon, 24 Mar 2008, Mark Lord wrote:
>>> The return value uses "unsigned long",
>>> which on a 32-bit system limits drive offsets to 2TB.
>>
>> One relevant question is: does anybody seriously care about the 
>> combination of "32 bit" and "huge modern drives" any more?
>>
>> Sure, we can add a 64-bit version that ends up being used only on 
>> 32-bit systems, but quite frankly, I think the solution here is to 
>> just ignore the issue and see if anybody really even cares.
>>
>> Because quite frankly, the kind of people who buy modern 2TB drives 
>> generally don't then couple them to CPU's that are five+ years old.
> ..
> 
> Yeah.  Except Dell will undoubtedly have them in desktops
> within 2 years, and tons of people (myself included) still use
> 32-bit (K)Ubuntu on our systems, simply for the better binary 
> compatibility that it is perceived to give with things like
> browser plugins and stuff.

I think that there are many embedded applications (lots of them linux based)
which have large amounts of storage behind low power, low cost 32 bit CPU's.

Think of the home/small office NAS boxes that you can get from bestbuy or other 
big box stores. Those devices today have 4 S-ATA drives (each of which can be 
1TB in size).

Also, if you have a very low end box, it can still access really large storage
over iSCSI or a SAN which will present as a local, large device.

Over time, even these low end CPU's will migrate towards 64 bits, but we are not
there yet...

ric




^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: What to do about the 2TB limit on HDIO_GETGEO ?
  2008-03-25  4:02 ` What to do about the 2TB limit on HDIO_GETGEO ? Mark Lord
  2008-03-25  4:19   ` Andrew Morton
  2008-03-25  5:13   ` H. Peter Anvin
@ 2008-03-25 15:17   ` James Bottomley
  2008-03-25 17:31     ` Mark Lord
  2008-03-25 17:45     ` Greg Freemyer
  2 siblings, 2 replies; 59+ messages in thread
From: James Bottomley @ 2008-03-25 15:17 UTC (permalink / raw)
  To: Mark Lord
  Cc: Jens Axboe, Jeff Garzik, Tejun Heo, Greg KH, Linus Torvalds,
	Andrew Morton, Linux Kernel, IDE/ATA development list,
	linux-scsi

On Tue, 2008-03-25 at 00:02 -0400, Mark Lord wrote:
> (resending .. forgot to copy the lists originally)
> 
> We have a problem coming down the pipeline.
> 
> Practically all utilities that care about it,
> use ioctl(fd, HDIO_GETGEO) to determine the starting
> sector offset of a hard disk partition.
> 
> SCSI, libata, IDE, USB, Firewire.. you name it.
> 
> The return value uses "unsigned long",
> which on a 32-bit system limits drive offsets to 2TB.
> 
> There will be single drives exceeding this limit within
> the next 12 months or less, and we already have RAID arrays
> that exceed 2TB.
> 
> So.. what's the replacement for HDIO_GETGEO on 32-bits ?
> 
> One candidate might seem to be the existing /sys/block/dev/partition/start
> which I expect is already 64-bit friendly.
> 
> But this requires about 150 lines of somewhat complex C code to access,
> using only the dev_t (from stat(2) on a file) as a starting point,
> or less if one relies upon the udev device name matching the sysfs device name.
> 
> Is it time now for HDIO_GETGEO64 to make an appearance?
> Similar to how the existing BLKGETSIZE64 is supplanting BLKGETSIZE ?

Perhaps I've missed something, but surely geometry doesn't make sense on
a >2TB drive does it?  The only reason we use it on modern disks (which
usually make it up specially for us) is that the DOS partition scheme
requires it.  Once we're over 2TB, isn't it impossible to use DOS
partitions (well, OK, unless you increase the sector size, but that's
only delaying the inevitable), so we can just go with a proper disk
labelling scheme and use BLKGETSIZE64 all the time.

James



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: What to do about the 2TB limit on HDIO_GETGEO ?
  2008-03-25 14:31     ` Ric Wheeler
@ 2008-03-25 15:25       ` Andrew Paprocki
  2008-03-25 15:34       ` Matthew Wilcox
  1 sibling, 0 replies; 59+ messages in thread
From: Andrew Paprocki @ 2008-03-25 15:25 UTC (permalink / raw)
  To: ric
  Cc: Mark Lord, Linus Torvalds, Jens Axboe, Jeff Garzik, Tejun Heo,
	Greg KH l, Andrew Morton, Linux Kernel, IDE/ATA development list,
	linux-scsi

On Tue, Mar 25, 2008 at 10:31 AM, Ric Wheeler <ric@emc.com> wrote:
>  > Yeah.  Except Dell will undoubtedly have them in desktops
>  > within 2 years, and tons of people (myself included) still use
>  > 32-bit (K)Ubuntu on our systems, simply for the better binary
>  > compatibility that it is perceived to give with things like
>  > browser plugins and stuff.
>
>  I think that there are many embedded applications (lots of them linux based)
>  which have large amounts of storage behind low power, low cost 32 bit CPU's.
>
>  Think of the home/small office NAS boxes that you can get from bestbuy or other
>  big box stores. Those devices today have 4 S-ATA drives (each of which can be
>  1TB in size).

I can attest to this. I hear from a reliable source (manufacturer)
that 2TB 3.5" disks will be out no later than first half of 2009
(possibly even sooner, or at least 1.5TB). I currently use 1TB disks
with a Geode LX based motherboard for SATA RAID and I plan on
upgrading/consolidating to larger sizes once they become available in
the market. 64-bit is not an option for me on this hardware.

-Andrew

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: What to do about the 2TB limit on HDIO_GETGEO ?
  2008-03-25 14:31     ` Ric Wheeler
  2008-03-25 15:25       ` Andrew Paprocki
@ 2008-03-25 15:34       ` Matthew Wilcox
  2008-03-25 15:48         ` Ric Wheeler
  1 sibling, 1 reply; 59+ messages in thread
From: Matthew Wilcox @ 2008-03-25 15:34 UTC (permalink / raw)
  To: Ric Wheeler
  Cc: Mark Lord, Linus Torvalds, Jens Axboe, Jeff Garzik, Tejun Heo,
	Greg KH l, Andrew Morton, Linux Kernel, IDE/ATA development list,
	linux-scsi

On Tue, Mar 25, 2008 at 10:31:54AM -0400, Ric Wheeler wrote:
> I think that there are many embedded applications (lots of them linux based)
> which have large amounts of storage behind low power, low cost 32 bit CPU's.
> 
> Think of the home/small office NAS boxes that you can get from bestbuy or 
> other big box stores. Those devices today have 4 S-ATA drives (each of 
> which can be 1TB in size).
> 
> Also, if you have a very low end box, it can still access really large 
> storage
> over iSCSI or a SAN which will present as a local, large device.

Don't those devices run into trouble with fsck?  The amount of memory
you need to fsck a device is obviously going to depend on the filesystem,
but it has to grow with device size, and I'm not sure that 4GB is enough
virtual address space to fsck 2TB.

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: What to do about the 2TB limit on HDIO_GETGEO ?
  2008-03-25 15:34       ` Matthew Wilcox
@ 2008-03-25 15:48         ` Ric Wheeler
  2008-03-25 16:47           ` Theodore Tso
  0 siblings, 1 reply; 59+ messages in thread
From: Ric Wheeler @ 2008-03-25 15:48 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Mark Lord, Linus Torvalds, Jens Axboe, Jeff Garzik, Tejun Heo,
	Greg KH l, Andrew Morton, Linux Kernel, IDE/ATA development list,
	linux-scsi


Matthew Wilcox wrote:
> On Tue, Mar 25, 2008 at 10:31:54AM -0400, Ric Wheeler wrote:
>> I think that there are many embedded applications (lots of them linux based)
>> which have large amounts of storage behind low power, low cost 32 bit CPU's.
>>
>> Think of the home/small office NAS boxes that you can get from bestbuy or 
>> other big box stores. Those devices today have 4 S-ATA drives (each of 
>> which can be 1TB in size).
>>
>> Also, if you have a very low end box, it can still access really large 
>> storage
>> over iSCSI or a SAN which will present as a local, large device.
> 
> Don't those devices run into trouble with fsck?  The amount of memory
> you need to fsck a device is obviously going to depend on the filesystem,
> but it has to grow with device size, and I'm not sure that 4GB is enough
> virtual address space to fsck 2TB.

Absolutely - they more or less hit a stonewall once the disk has any trouble and 
you need to fsck.  On the other hand, this might be merciful since on 64 bit 
boxes, we will let you run the fsck and watch it run for a week or so before you 
despair ;-)

On a serious note, fsck time tends to track more the number of active inodes, so 
you can fsck a large file system if you use it to store large files (especially 
if you use a file system with dynamic inode creation or something like the 
uninitialized ext4 inodes).

ric


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: What to do about the 2TB limit on HDIO_GETGEO ?
  2008-03-25 15:48         ` Ric Wheeler
@ 2008-03-25 16:47           ` Theodore Tso
  2008-03-25 20:51               ` Theodore Tso
  2008-03-25 20:51             ` Theodore Tso
  0 siblings, 2 replies; 59+ messages in thread
From: Theodore Tso @ 2008-03-25 16:47 UTC (permalink / raw)
  To: Ric Wheeler
  Cc: Matthew Wilcox, Mark Lord, Linus Torvalds, Jens Axboe,
	Jeff Garzik, Tejun Heo, Greg KH l, Andrew Morton, Linux Kernel,
	IDE/ATA development list, linux-scsi

On Tue, Mar 25, 2008 at 11:48:50AM -0400, Ric Wheeler wrote:
>> Don't those devices run into trouble with fsck?  The amount of memory
>> you need to fsck a device is obviously going to depend on the filesystem,
>> but it has to grow with device size, and I'm not sure that 4GB is enough
>> virtual address space to fsck 2TB.

Well 2TB, assuming a 4k blocksize, means a block bitmap is 512 megs.
So at least for ext3, 4GB should be just enough, unless you hit
certainly really nasty complicated corruptions (i.e. large number of
blocks claimed by more than one inode, which can happen if an inode
table is written to the wrong location on disk --- on top of some
other portion of the inode table), or if the filesystem has a large
number of files with hard links (such as the case with certain backup
programs).

The plan is to implement some kind of run-length encoding to compress
the in-memory requirements for storing the bitmaps, but that hasn't
been coded yet.  If someone is a staff programmer for one of these
bookshelf NAS manufacturers is interested in implementing such a
beast, they should talk to me; I've thought quite a bit about the
design, and I just need a minion to implement it.  :-)

> Absolutely - they more or less hit a stonewall once the disk has any 
> trouble and you need to fsck.  On the other hand, this might be merciful 
> since on 64 bit boxes, we will let you run the fsck and watch it run for a 
> week or so before you despair ;-)
>
> On a serious note, fsck time tends to track more the number of active 
> inodes, so you can fsck a large file system if you use it to store large 
> files (especially if you use a file system with dynamic inode creation or 
> something like the uninitialized ext4 inodes).

And ext4 extents will help because it reduces the number of indirect
blocks you have to read, which will significantly reduce the fsck
time.  So there will be improvements on the horizon.

       	  	     		     	- Ted


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: What to do about the 2TB limit on HDIO_GETGEO ?
  2008-03-25 15:17   ` James Bottomley
@ 2008-03-25 17:31     ` Mark Lord
  2008-03-25 19:32       ` James Bottomley
  2008-03-25 17:45     ` Greg Freemyer
  1 sibling, 1 reply; 59+ messages in thread
From: Mark Lord @ 2008-03-25 17:31 UTC (permalink / raw)
  To: James Bottomley
  Cc: Jens Axboe, Jeff Garzik, Tejun Heo, Greg KH, Linus Torvalds,
	Andrew Morton, Linux Kernel, IDE/ATA development list,
	linux-scsi

James Bottomley wrote:
> On Tue, 2008-03-25 at 00:02 -0400, Mark Lord wrote:
>..
>> Practically all utilities that care about it,
>> use ioctl(fd, HDIO_GETGEO) to determine the starting
>> sector offset of a hard disk partition.
..
> Perhaps I've missed something, but surely geometry doesn't make sense on
> a >2TB drive does it?  The only reason we use it on modern disks (which
> usually make it up specially for us) is that the DOS partition scheme
> requires it.  Once we're over 2TB, isn't it impossible to use DOS
> partitions (well, OK, unless you increase the sector size, but that's
> only delaying the inevitable), so we can just go with a proper disk
> labelling scheme and use BLKGETSIZE64 all the time.
..

I haven't thought much about problems with the virtual geometry,
because, as you say, we really don't care about it for the most part.
We use LBA values from the partition tables rather than CHS.
I suppose those also likely to be 32-bit limited.

The "partition offset", or "starting sector" is the important
bit of information for most things.  And that's currently available
from HDIO_GETGEO, and from /sys/block/XXX/XXXn/start, if sysfs is mounted.

We just need an easy way to get it, given a dev_t from stat(2).
Currently there isn't an easy way, and HDIO_GETGEO returns
only 32-bits on a 32-bit system.

Cheers


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: What to do about the 2TB limit on HDIO_GETGEO ?
  2008-03-25 13:55       ` H. Peter Anvin
@ 2008-03-25 17:37         ` Mark Lord
  2008-03-25 19:25           ` Greg KH
  0 siblings, 1 reply; 59+ messages in thread
From: Mark Lord @ 2008-03-25 17:37 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Greg KH, Jens Axboe, Jeff Garzik, Tejun Heo, Linus Torvalds,
	Andrew Morton, Linux Kernel, IDE/ATA development list,
	linux-scsi

H. Peter Anvin wrote:
> Mark Lord wrote:
>>
>> Yeah, that would be just as good, really.  Maybe even better.
>>
>> Mark Lord wrote (later on):
>>> Instead, software has to search everything inside /sys/block/
>>> looking for a "dev" file whose contents match,
>>> rather than just trying to access something like this:
>>>
>>>   /sys/block/8:1/start
>>> or
>>>   /sys/block/majors/8/minors/1/start
>>>
>>> Or any one of a number of similar ways to arrange it. 
>> ..
> 
> It shouldn't be under /sys/block... there are enough many things that 
> scan /sys/block and assume any directory underneath it has the current 
> format.
..

So long as we only add things, and not remove them, then any software
that scans /sys/block/ shouldn't care, really.

But yes, it could go elsewhere, too.
Perhaps a /sys/dev/ directory, populated with symbolic links
(or hard links?) back to the /sys/block/ entries, something like this:

   /sys/dev/block/8:0 -> ../../../block/sda
   /sys/dev/block/8:1 -> ../../../block/sda/sda1
   /sys/dev/block/8:2 -> ../../../block/sda/sda2
   ...

That's just a suggestion, really.
And what about character devices?

Perhaps Greg will chime in.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: What to do about the 2TB limit on HDIO_GETGEO ?
  2008-03-25 15:17   ` James Bottomley
  2008-03-25 17:31     ` Mark Lord
@ 2008-03-25 17:45     ` Greg Freemyer
  2008-03-25 17:52       ` Randy Dunlap
  2008-03-30  4:28       ` Matt Domsch
  1 sibling, 2 replies; 59+ messages in thread
From: Greg Freemyer @ 2008-03-25 17:45 UTC (permalink / raw)
  To: James Bottomley
  Cc: Mark Lord, Jens Axboe, Jeff Garzik, Tejun Heo, Greg KH,
	Linus Torvalds, Andrew Morton, Linux Kernel,
	IDE/ATA development list, linux-scsi

On Tue, Mar 25, 2008 at 11:17 AM, James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
> On Tue, 2008-03-25 at 00:02 -0400, Mark Lord wrote:
>  > (resending .. forgot to copy the lists originally)
>  >
>  > We have a problem coming down the pipeline.
>  >
>  > Practically all utilities that care about it,
>  > use ioctl(fd, HDIO_GETGEO) to determine the starting
>  > sector offset of a hard disk partition.
>  >
>  > SCSI, libata, IDE, USB, Firewire.. you name it.
>  >
>  > The return value uses "unsigned long",
>  > which on a 32-bit system limits drive offsets to 2TB.
>  >
>  > There will be single drives exceeding this limit within
>  > the next 12 months or less, and we already have RAID arrays
>  > that exceed 2TB.
>  >
>  > So.. what's the replacement for HDIO_GETGEO on 32-bits ?
>  >
>  > One candidate might seem to be the existing /sys/block/dev/partition/start
>  > which I expect is already 64-bit friendly.
>  >
>  > But this requires about 150 lines of somewhat complex C code to access,
>  > using only the dev_t (from stat(2) on a file) as a starting point,
>  > or less if one relies upon the udev device name matching the sysfs device name.
>  >
>  > Is it time now for HDIO_GETGEO64 to make an appearance?
>  > Similar to how the existing BLKGETSIZE64 is supplanting BLKGETSIZE ?
>
>  Perhaps I've missed something, but surely geometry doesn't make sense on
>  a >2TB drive does it?  The only reason we use it on modern disks (which
>  usually make it up specially for us) is that the DOS partition scheme
>  requires it.  Once we're over 2TB, isn't it impossible to use DOS
>  partitions (well, OK, unless you increase the sector size, but that's
>  only delaying the inevitable), so we can just go with a proper disk
>  labelling scheme and use BLKGETSIZE64 all the time.
>

I believe GUID Partition Tables (GPTs) are the answer.

I believe one of the features of GPT is the elimination of the 32-bit
sector restrictions.

http://en.wikipedia.org/wiki/GUID_Partition_Table

Windows VISTA 64-bit supports GPTs on data disks and new Mac OS based
systems have been using it on internal drives for a couple years at
least.

GPTs are part of the Extensible Firmware Interface (EFI), so they
should be usable for PC bootable disks at some point.  (Maybe now in
some cases?)

I'm not sure what the Linux Kernel support is for GPTs.

Greg
-- 
Greg Freemyer
Litigation Triage Solutions Specialist
http://www.linkedin.com/in/gregfreemyer
First 99 Days Litigation White Paper -
http://www.norcrossgroup.com/forms/whitepapers/99%20Days%20whitepaper.pdf

The Norcross Group
The Intersection of Evidence & Technology
http://www.norcrossgroup.com

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: What to do about the 2TB limit on HDIO_GETGEO ?
  2008-03-25 17:45     ` Greg Freemyer
@ 2008-03-25 17:52       ` Randy Dunlap
  2008-03-25 18:09         ` Matthew Wilcox
  2008-03-30  4:28       ` Matt Domsch
  1 sibling, 1 reply; 59+ messages in thread
From: Randy Dunlap @ 2008-03-25 17:52 UTC (permalink / raw)
  To: Greg Freemyer
  Cc: James Bottomley, Mark Lord, Jens Axboe, Jeff Garzik, Tejun Heo,
	Greg KH, Linus Torvalds, Andrew Morton, Linux Kernel,
	IDE/ATA development list, linux-scsi

On Tue, 25 Mar 2008 13:45:35 -0400 Greg Freemyer wrote:

> On Tue, Mar 25, 2008 at 11:17 AM, James Bottomley
> <James.Bottomley@hansenpartnership.com> wrote:
> > On Tue, 2008-03-25 at 00:02 -0400, Mark Lord wrote:
> >  > (resending .. forgot to copy the lists originally)
> >  >
> >  > We have a problem coming down the pipeline.
> >  >
> >  > Practically all utilities that care about it,
> >  > use ioctl(fd, HDIO_GETGEO) to determine the starting
> >  > sector offset of a hard disk partition.
> >  >
> >  > SCSI, libata, IDE, USB, Firewire.. you name it.
> >  >
> >  > The return value uses "unsigned long",
> >  > which on a 32-bit system limits drive offsets to 2TB.
> >  >
> >  > There will be single drives exceeding this limit within
> >  > the next 12 months or less, and we already have RAID arrays
> >  > that exceed 2TB.
> >  >
> >  > So.. what's the replacement for HDIO_GETGEO on 32-bits ?
> >  >
> >  > One candidate might seem to be the existing /sys/block/dev/partition/start
> >  > which I expect is already 64-bit friendly.
> >  >
> >  > But this requires about 150 lines of somewhat complex C code to access,
> >  > using only the dev_t (from stat(2) on a file) as a starting point,
> >  > or less if one relies upon the udev device name matching the sysfs device name.
> >  >
> >  > Is it time now for HDIO_GETGEO64 to make an appearance?
> >  > Similar to how the existing BLKGETSIZE64 is supplanting BLKGETSIZE ?
> >
> >  Perhaps I've missed something, but surely geometry doesn't make sense on
> >  a >2TB drive does it?  The only reason we use it on modern disks (which
> >  usually make it up specially for us) is that the DOS partition scheme
> >  requires it.  Once we're over 2TB, isn't it impossible to use DOS
> >  partitions (well, OK, unless you increase the sector size, but that's
> >  only delaying the inevitable), so we can just go with a proper disk
> >  labelling scheme and use BLKGETSIZE64 all the time.
> >
> 
> I believe GUID Partition Tables (GPTs) are the answer.
> 
> I believe one of the features of GPT is the elimination of the 32-bit
> sector restrictions.
> 
> http://en.wikipedia.org/wiki/GUID_Partition_Table
> 
> Windows VISTA 64-bit supports GPTs on data disks and new Mac OS based
> systems have been using it on internal drives for a couple years at
> least.
> 
> GPTs are part of the Extensible Firmware Interface (EFI), so they
> should be usable for PC bootable disks at some point.  (Maybe now in
> some cases?)
> 
> I'm not sure what the Linux Kernel support is for GPTs.

It's implemented.  Not sure about how well used/tested it is.

config EFI_PARTITION
	bool "EFI GUID Partition support"
	depends on PARTITION_ADVANCED
	select CRC32
	help
	  Say Y here if you would like to use hard disks under Linux which
	  were partitioned using EFI GPT.


---
~Randy

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: What to do about the 2TB limit on HDIO_GETGEO ?
  2008-03-25 17:52       ` Randy Dunlap
@ 2008-03-25 18:09         ` Matthew Wilcox
  2008-03-26  9:58           ` Boaz Harrosh
  0 siblings, 1 reply; 59+ messages in thread
From: Matthew Wilcox @ 2008-03-25 18:09 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Greg Freemyer, James Bottomley, Mark Lord, Jens Axboe,
	Jeff Garzik, Tejun Heo, Greg KH, Linus Torvalds, Andrew Morton,
	Linux Kernel, IDE/ATA development list, linux-scsi

On Tue, Mar 25, 2008 at 10:52:28AM -0700, Randy Dunlap wrote:
> > I'm not sure what the Linux Kernel support is for GPTs.
> 
> It's implemented.  Not sure about how well used/tested it is.

ia64 uses it exclusively ... at least on discs that you want to use from
EFI.

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: What to do about the 2TB limit on HDIO_GETGEO ?
  2008-03-25 17:37         ` Mark Lord
@ 2008-03-25 19:25           ` Greg KH
  2008-03-25 19:34             ` Randy Dunlap
  2008-03-26  0:34             ` Mark Lord
  0 siblings, 2 replies; 59+ messages in thread
From: Greg KH @ 2008-03-25 19:25 UTC (permalink / raw)
  To: Mark Lord
  Cc: H. Peter Anvin, Jens Axboe, Jeff Garzik, Tejun Heo,
	Linus Torvalds, Andrew Morton, Linux Kernel,
	IDE/ATA development list, linux-scsi

On Tue, Mar 25, 2008 at 01:37:03PM -0400, Mark Lord wrote:
> H. Peter Anvin wrote:
>> Mark Lord wrote:
>>>
>>> Yeah, that would be just as good, really.  Maybe even better.
>>>
>>> Mark Lord wrote (later on):
>>>> Instead, software has to search everything inside /sys/block/
>>>> looking for a "dev" file whose contents match,
>>>> rather than just trying to access something like this:
>>>>
>>>>   /sys/block/8:1/start
>>>> or
>>>>   /sys/block/majors/8/minors/1/start
>>>>
>>>> Or any one of a number of similar ways to arrange it. 
>>> ..
>> It shouldn't be under /sys/block... there are enough many things that scan 
>> /sys/block and assume any directory underneath it has the current format.
> ..
>
> So long as we only add things, and not remove them, then any software
> that scans /sys/block/ shouldn't care, really.
>
> But yes, it could go elsewhere, too.
> Perhaps a /sys/dev/ directory, populated with symbolic links
> (or hard links?) back to the /sys/block/ entries, something like this:
>
>   /sys/dev/block/8:0 -> ../../../block/sda
>   /sys/dev/block/8:1 -> ../../../block/sda/sda1
>   /sys/dev/block/8:2 -> ../../../block/sda/sda2
>   ...
>
> That's just a suggestion, really.
> And what about character devices?
>
> Perhaps Greg will chime in.

I've been waiting to see if sanity will take hold of anyone here.

Come on people, adding symlinks for device major:minor numbers in sysfs
to save a few 10s of lines of userspace code?  Can things get sillier?

You can add a single udev rule to probably build these in a tree in /dev
if you really need such a thing...

And what's wrong with your new ioctl recomendation?

greg k-h

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: What to do about the 2TB limit on HDIO_GETGEO ?
  2008-03-25 17:31     ` Mark Lord
@ 2008-03-25 19:32       ` James Bottomley
  0 siblings, 0 replies; 59+ messages in thread
From: James Bottomley @ 2008-03-25 19:32 UTC (permalink / raw)
  To: Mark Lord
  Cc: Jens Axboe, Jeff Garzik, Tejun Heo, Greg KH, Linus Torvalds,
	Andrew Morton, Linux Kernel, IDE/ATA development list,
	linux-scsi

On Tue, 2008-03-25 at 13:31 -0400, Mark Lord wrote:
> James Bottomley wrote:
> > On Tue, 2008-03-25 at 00:02 -0400, Mark Lord wrote:
> >..
> >> Practically all utilities that care about it,
> >> use ioctl(fd, HDIO_GETGEO) to determine the starting
> >> sector offset of a hard disk partition.
> ..
> > Perhaps I've missed something, but surely geometry doesn't make sense on
> > a >2TB drive does it?  The only reason we use it on modern disks (which
> > usually make it up specially for us) is that the DOS partition scheme
> > requires it.  Once we're over 2TB, isn't it impossible to use DOS
> > partitions (well, OK, unless you increase the sector size, but that's
> > only delaying the inevitable), so we can just go with a proper disk
> > labelling scheme and use BLKGETSIZE64 all the time.
> ..
> 
> I haven't thought much about problems with the virtual geometry,
> because, as you say, we really don't care about it for the most part.
> We use LBA values from the partition tables rather than CHS.
> I suppose those also likely to be 32-bit limited.
> 
> The "partition offset", or "starting sector" is the important
> bit of information for most things.  And that's currently available
> from HDIO_GETGEO, and from /sys/block/XXX/XXXn/start, if sysfs is mounted.
> 
> We just need an easy way to get it, given a dev_t from stat(2).
> Currently there isn't an easy way, and HDIO_GETGEO returns
> only 32-bits on a 32-bit system.

But I think where this is leading is that you've been using the geometry
call, but all you really want to know is the actual partition start in
sector units, so a new BLKGETPARTSTART (or something) ioctl that was
designed to return a u64 would work for you?  That sounds reasonable to
me; so not a HDIO_GETGEO64 which gets us into trouble with geometries,
but a simple ioctl that gives you exactly what you're looking for.

James



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: What to do about the 2TB limit on HDIO_GETGEO ?
  2008-03-25 19:25           ` Greg KH
@ 2008-03-25 19:34             ` Randy Dunlap
  2008-03-25 20:36               ` H. Peter Anvin
  2008-03-26  0:34             ` Mark Lord
  1 sibling, 1 reply; 59+ messages in thread
From: Randy Dunlap @ 2008-03-25 19:34 UTC (permalink / raw)
  To: Greg KH
  Cc: Mark Lord, H. Peter Anvin, Jens Axboe, Jeff Garzik, Tejun Heo,
	Linus Torvalds, Andrew Morton, Linux Kernel,
	IDE/ATA development list, linux-scsi

On Tue, 25 Mar 2008 12:25:15 -0700 Greg KH wrote:

> On Tue, Mar 25, 2008 at 01:37:03PM -0400, Mark Lord wrote:
> > H. Peter Anvin wrote:
> >> Mark Lord wrote:
> >>>
> >>> Yeah, that would be just as good, really.  Maybe even better.
> >>>
> >>> Mark Lord wrote (later on):
> >>>> Instead, software has to search everything inside /sys/block/
> >>>> looking for a "dev" file whose contents match,
> >>>> rather than just trying to access something like this:
> >>>>
> >>>>   /sys/block/8:1/start
> >>>> or
> >>>>   /sys/block/majors/8/minors/1/start
> >>>>
> >>>> Or any one of a number of similar ways to arrange it. 
> >>> ..
> >> It shouldn't be under /sys/block... there are enough many things that scan 
> >> /sys/block and assume any directory underneath it has the current format.
> > ..
> >
> > So long as we only add things, and not remove them, then any software
> > that scans /sys/block/ shouldn't care, really.
> >
> > But yes, it could go elsewhere, too.
> > Perhaps a /sys/dev/ directory, populated with symbolic links
> > (or hard links?) back to the /sys/block/ entries, something like this:
> >
> >   /sys/dev/block/8:0 -> ../../../block/sda
> >   /sys/dev/block/8:1 -> ../../../block/sda/sda1
> >   /sys/dev/block/8:2 -> ../../../block/sda/sda2
> >   ...
> >
> > That's just a suggestion, really.
> > And what about character devices?
> >
> > Perhaps Greg will chime in.
> 
> I've been waiting to see if sanity will take hold of anyone here.
> 
> Come on people, adding symlinks for device major:minor numbers in sysfs
> to save a few 10s of lines of userspace code?  Can things get sillier?
> 
> You can add a single udev rule to probably build these in a tree in /dev
> if you really need such a thing...
> 
> And what's wrong with your new ioctl recomendation?

Ah, there's some sanity.  :)

---
~Randy

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: What to do about the 2TB limit on HDIO_GETGEO ?
  2008-03-25 19:34             ` Randy Dunlap
@ 2008-03-25 20:36               ` H. Peter Anvin
  2008-03-25 21:20                 ` Greg KH
  0 siblings, 1 reply; 59+ messages in thread
From: H. Peter Anvin @ 2008-03-25 20:36 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Greg KH, Mark Lord, Jens Axboe, Jeff Garzik, Tejun Heo,
	Linus Torvalds, Andrew Morton, Linux Kernel,
	IDE/ATA development list, linux-scsi

Randy Dunlap wrote:
>>
>> Come on people, adding symlinks for device major:minor numbers in sysfs
>> to save a few 10s of lines of userspace code?  Can things get sillier?
>>
>> You can add a single udev rule to probably build these in a tree in /dev
>> if you really need such a thing...
>>
>> And what's wrong with your new ioctl recomendation?
> 
> Ah, there's some sanity.  :)
> 

It's not so much an issue of a few tens of lines of user space code, but 
rather the fact that something that should be O(1) is currently O(n).

	-hpa

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: What to do about the 2TB limit on HDIO_GETGEO ?
  2008-03-25 16:47           ` Theodore Tso
@ 2008-03-25 20:51               ` Theodore Tso
  2008-03-25 20:51             ` Theodore Tso
  1 sibling, 0 replies; 59+ messages in thread
From: Theodore Tso @ 2008-03-25 20:51 UTC (permalink / raw)
  To: Ric Wheeler, Matthew Wilcox, Mark Lord, Linus Torvalds,
	Jens Axboe, Jeff Garzik

On Tue, Mar 25, 2008 at 12:47:50PM -0400, Theodore Tso wrote:
> 
> Well 2TB, assuming a 4k blocksize, means a block bitmap is 512 megs.
> So at least for ext3, 4GB should be just enough, unless you hit
> certainly really nasty complicated corruptions (i.e. large number of
> blocks claimed by more than one inode, which can happen if an inode
> table is written to the wrong location on disk --- on top of some
> other portion of the inode table), or if the filesystem has a large
> number of files with hard links (such as the case with certain backup
> programs).

Whoops, screwed up my math.  The block bitmap for a 2TB filesystem is
64 megs, not 512 megs.  2*41 / 2**12 / 2**3 == 2**26, or 64mb.  E2fsck
in the worst case will allocate 5 inode bitmaps and 3 block bitmaps,
plus various arrays for directory blocks and keeping track of
refcounts (which are optimized for counnts of 0 and 1, so lots of hard
links will blow up your memory usage, although we do have a tdb option
which helps in that particular case).  So I'd say that most of the
time 3GB of address space should really be enough for a 2TB raid
array, unless you get really pathalogical corruption cases.

							- Ted

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: What to do about the 2TB limit on HDIO_GETGEO ?
@ 2008-03-25 20:51               ` Theodore Tso
  0 siblings, 0 replies; 59+ messages in thread
From: Theodore Tso @ 2008-03-25 20:51 UTC (permalink / raw)
  To: Ric Wheeler, Matthew Wilcox, Mark Lord, Linus Torvalds,
	Jens Axboe, Jeff Garzik, Tejun Heo, Greg KH l, Andrew Morton,
	Linux Kernel, IDE/ATA development list, linux-scsi

On Tue, Mar 25, 2008 at 12:47:50PM -0400, Theodore Tso wrote:
> 
> Well 2TB, assuming a 4k blocksize, means a block bitmap is 512 megs.
> So at least for ext3, 4GB should be just enough, unless you hit
> certainly really nasty complicated corruptions (i.e. large number of
> blocks claimed by more than one inode, which can happen if an inode
> table is written to the wrong location on disk --- on top of some
> other portion of the inode table), or if the filesystem has a large
> number of files with hard links (such as the case with certain backup
> programs).

Whoops, screwed up my math.  The block bitmap for a 2TB filesystem is
64 megs, not 512 megs.  2*41 / 2**12 / 2**3 == 2**26, or 64mb.  E2fsck
in the worst case will allocate 5 inode bitmaps and 3 block bitmaps,
plus various arrays for directory blocks and keeping track of
refcounts (which are optimized for counnts of 0 and 1, so lots of hard
links will blow up your memory usage, although we do have a tdb option
which helps in that particular case).  So I'd say that most of the
time 3GB of address space should really be enough for a 2TB raid
array, unless you get really pathalogical corruption cases.

							- Ted

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: What to do about the 2TB limit on HDIO_GETGEO ?
  2008-03-25 16:47           ` Theodore Tso
  2008-03-25 20:51               ` Theodore Tso
@ 2008-03-25 20:51             ` Theodore Tso
  1 sibling, 0 replies; 59+ messages in thread
From: Theodore Tso @ 2008-03-25 20:51 UTC (permalink / raw)
  To: Ric Wheeler, Matthew Wilcox, Mark Lord, Linus Torvalds,
	Jens Axboe, Jeff Garzik

On Tue, Mar 25, 2008 at 12:47:50PM -0400, Theodore Tso wrote:
> 
> Well 2TB, assuming a 4k blocksize, means a block bitmap is 512 megs.
> So at least for ext3, 4GB should be just enough, unless you hit
> certainly really nasty complicated corruptions (i.e. large number of
> blocks claimed by more than one inode, which can happen if an inode
> table is written to the wrong location on disk --- on top of some
> other portion of the inode table), or if the filesystem has a large
> number of files with hard links (such as the case with certain backup
> programs).

Whoops, screwed up my math.  The block bitmap for a 2TB filesystem is
64 megs, not 512 megs.  2*41 / 2**12 / 2**3 == 2**26, or 64mb.  E2fsck
in the worst case will allocate 5 inode bitmaps and 3 block bitmaps,
plus various arrays for directory blocks and keeping track of
refcounts (which are optimized for counnts of 0 and 1, so lots of hard
links will blow up your memory usage, although we do have a tdb option
which helps in that particular case).  So I'd say that most of the
time 3GB of address space should really be enough for a 2TB raid
array, unless you get really pathalogical corruption cases.

							- Ted

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: What to do about the 2TB limit on HDIO_GETGEO ?
  2008-03-25 20:36               ` H. Peter Anvin
@ 2008-03-25 21:20                 ` Greg KH
  2008-03-25 21:26                   ` H. Peter Anvin
  0 siblings, 1 reply; 59+ messages in thread
From: Greg KH @ 2008-03-25 21:20 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Randy Dunlap, Mark Lord, Jens Axboe, Jeff Garzik, Tejun Heo,
	Linus Torvalds, Andrew Morton, Linux Kernel,
	IDE/ATA development list, linux-scsi

On Tue, Mar 25, 2008 at 01:36:51PM -0700, H. Peter Anvin wrote:
> Randy Dunlap wrote:
>>>
>>> Come on people, adding symlinks for device major:minor numbers in sysfs
>>> to save a few 10s of lines of userspace code?  Can things get sillier?
>>>
>>> You can add a single udev rule to probably build these in a tree in /dev
>>> if you really need such a thing...
>>>
>>> And what's wrong with your new ioctl recomendation?
>> Ah, there's some sanity.  :)
>
> It's not so much an issue of a few tens of lines of user space code, but 
> rather the fact that something that should be O(1) is currently O(n).

"should"?  why?  Is this some new requirement that everyone needs?  I've
_never_ seen anyone ask for the ability to find sysfs devices by
major:minor number in O(1) time.  Is this somehow a place where such
optimization is warranted?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: What to do about the 2TB limit on HDIO_GETGEO ?
  2008-03-25 21:20                 ` Greg KH
@ 2008-03-25 21:26                   ` H. Peter Anvin
  2008-03-25 23:00                     ` Greg KH
  2008-03-27 19:05                     ` Matthew Wilcox
  0 siblings, 2 replies; 59+ messages in thread
From: H. Peter Anvin @ 2008-03-25 21:26 UTC (permalink / raw)
  To: Greg KH
  Cc: Randy Dunlap, Mark Lord, Jens Axboe, Jeff Garzik, Tejun Heo,
	Linus Torvalds, Andrew Morton, Linux Kernel,
	IDE/ATA development list, linux-scsi

Greg KH wrote:
> On Tue, Mar 25, 2008 at 01:36:51PM -0700, H. Peter Anvin wrote:
>> Randy Dunlap wrote:
>>>> Come on people, adding symlinks for device major:minor numbers in sysfs
>>>> to save a few 10s of lines of userspace code?  Can things get sillier?
>>>>
>>>> You can add a single udev rule to probably build these in a tree in /dev
>>>> if you really need such a thing...
>>>>
>>>> And what's wrong with your new ioctl recomendation?
>>> Ah, there's some sanity.  :)
>> It's not so much an issue of a few tens of lines of user space code, but 
>> rather the fact that something that should be O(1) is currently O(n).
> 
> "should"?  why?  Is this some new requirement that everyone needs?  I've
> _never_ seen anyone ask for the ability to find sysfs devices by
> major:minor number in O(1) time.  Is this somehow a place where such
> optimization is warranted?

Well, when dealing with shell scripts a O(n) very easily becomes O(n^2). 
  For the stuff that I, personally, do, it's not a big deal, but people 
with large number of disks have serious gripes with our boot times.

	-hpa

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: What to do about the 2TB limit on HDIO_GETGEO ?
  2008-03-25 21:26                   ` H. Peter Anvin
@ 2008-03-25 23:00                     ` Greg KH
  2008-03-25 23:05                       ` H. Peter Anvin
  2008-03-27 19:05                     ` Matthew Wilcox
  1 sibling, 1 reply; 59+ messages in thread
From: Greg KH @ 2008-03-25 23:00 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Randy Dunlap, Mark Lord, Jens Axboe, Jeff Garzik, Tejun Heo,
	Linus Torvalds, Andrew Morton, Linux Kernel,
	IDE/ATA development list, linux-scsi

On Tue, Mar 25, 2008 at 02:26:45PM -0700, H. Peter Anvin wrote:
> Greg KH wrote:
>> On Tue, Mar 25, 2008 at 01:36:51PM -0700, H. Peter Anvin wrote:
>>> Randy Dunlap wrote:
>>>>> Come on people, adding symlinks for device major:minor numbers in sysfs
>>>>> to save a few 10s of lines of userspace code?  Can things get sillier?
>>>>>
>>>>> You can add a single udev rule to probably build these in a tree in 
>>>>> /dev
>>>>> if you really need such a thing...
>>>>>
>>>>> And what's wrong with your new ioctl recomendation?
>>>> Ah, there's some sanity.  :)
>>> It's not so much an issue of a few tens of lines of user space code, but 
>>> rather the fact that something that should be O(1) is currently O(n).
>> "should"?  why?  Is this some new requirement that everyone needs?  I've
>> _never_ seen anyone ask for the ability to find sysfs devices by
>> major:minor number in O(1) time.  Is this somehow a place where such
>> optimization is warranted?
>
> Well, when dealing with shell scripts a O(n) very easily becomes O(n^2).  
> For the stuff that I, personally, do, it's not a big deal, but people with 
> large number of disks have serious gripes with our boot times.

How does this have anything to do with boot times?  Do you really have a
foolish shell script that iteratorates over every single disk in the
sysfs tree for every disk?  What does it do that for?

I thought we were talking about 2TB disks here, with a proposed new
ioctl, not foolishness of boot scripts...

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: What to do about the 2TB limit on HDIO_GETGEO ?
  2008-03-25 23:00                     ` Greg KH
@ 2008-03-25 23:05                       ` H. Peter Anvin
  2008-03-25 23:22                         ` Greg KH
  0 siblings, 1 reply; 59+ messages in thread
From: H. Peter Anvin @ 2008-03-25 23:05 UTC (permalink / raw)
  To: Greg KH
  Cc: Randy Dunlap, Mark Lord, Jens Axboe, Jeff Garzik, Tejun Heo,
	Linus Torvalds, Andrew Morton, Linux Kernel,
	IDE/ATA development list, linux-scsi

> How does this have anything to do with boot times?  Do you really have a
> foolish shell script that iteratorates over every single disk in the
> sysfs tree for every disk?  What does it do that for?

Any time you want to get the sysfs information for a filesystem which is 
already mounted, that's what you're forced to do.

> I thought we were talking about 2TB disks here, with a proposed new
> ioctl, not foolishness of boot scripts...

I pointed out that having a way to map device numbers to sysfs 
directories would have the same effect, *and* would be usable for other 
purposes.  I'd rather see that than a new ioctl, and another, and another...

ioctl()s are also nasty since they're generally root-only (or rather, 
device-owner only).  Since the information is already in sysfs, there is 
no benefit to this hiding.  Otherwise one could consider a ioctl() "give 
me the sysfs name of this device."

	-hpa

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: What to do about the 2TB limit on HDIO_GETGEO ?
  2008-03-25 23:05                       ` H. Peter Anvin
@ 2008-03-25 23:22                         ` Greg KH
  0 siblings, 0 replies; 59+ messages in thread
From: Greg KH @ 2008-03-25 23:22 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Randy Dunlap, Mark Lord, Jens Axboe, Jeff Garzik, Tejun Heo,
	Linus Torvalds, Andrew Morton, Linux Kernel,
	IDE/ATA development list, linux-scsi

On Tue, Mar 25, 2008 at 04:05:32PM -0700, H. Peter Anvin wrote:
>> How does this have anything to do with boot times?  Do you really have a
>> foolish shell script that iteratorates over every single disk in the
>> sysfs tree for every disk?  What does it do that for?
>
> Any time you want to get the sysfs information for a filesystem which is 
> already mounted, that's what you're forced to do.
>
>> I thought we were talking about 2TB disks here, with a proposed new
>> ioctl, not foolishness of boot scripts...
>
> I pointed out that having a way to map device numbers to sysfs directories 
> would have the same effect, *and* would be usable for other purposes.  I'd 
> rather see that than a new ioctl, and another, and another...

Again, a simple udev rule will give you that today if you really want
it...

And I think 'udevinfo' can be used to retrieve this information as well.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: What to do about the 2TB limit on HDIO_GETGEO ?
  2008-03-25 19:25           ` Greg KH
  2008-03-25 19:34             ` Randy Dunlap
@ 2008-03-26  0:34             ` Mark Lord
  2008-03-26  0:54               ` Tejun Heo
  2008-03-27 18:51               ` What to do about the 2TB limit on HDIO_GETGEO ? Kay Sievers
  1 sibling, 2 replies; 59+ messages in thread
From: Mark Lord @ 2008-03-26  0:34 UTC (permalink / raw)
  To: Greg KH
  Cc: H. Peter Anvin, Jens Axboe, Jeff Garzik, Tejun Heo,
	Linus Torvalds, Andrew Morton, Linux Kernel,
	IDE/ATA development list, linux-scsi

Greg KH wrote:
> On Tue, Mar 25, 2008 at 01:37:03PM -0400, Mark Lord wrote:
>> Perhaps Greg will chime in.
> 
> I've been waiting to see if sanity will take hold of anyone here.
..

So have we.  sysfs is a total nightmare to extract information from
under program / script control.  The idea presented in this thread,
is to have it cross-index the contents with a method that actually
makes it easy to access in many common scenarios, without requiring
huge gobs of code in user space.  Or in kernel space.

And it's not just a few 10s of lines of code currently,
but rather about 80-100 lines just to find the correct device subdir,
and *then* a few more 10s of lines of code to retrieve the value.

In a bulletproof fashion, that is.  Sure it can be slightly smaller
if niceties such as error checking/handling are omitted.

There's no guarantee that udev is present, and even if it were present,
there's no guarantee that the names in /dev/ will match /sysfs/ pathnames,
since udev is very configurable to do otherwise.

So lookups are by dev_t, which sysfs has no simple or even easy way
of accomplishing.  O(n) at a minimum.

If we make it easier to access, then more programs will use it
rather than us having to expand our tricky binary ioctl interfaces.

Isn't that part of the idea of sysfs -- to limit the need for new ioctls ?

Cheers

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: What to do about the 2TB limit on HDIO_GETGEO ?
  2008-03-26  0:34             ` Mark Lord
@ 2008-03-26  0:54               ` Tejun Heo
  2008-03-26  3:38                 ` Greg KH
  2008-03-27 19:29                 ` Kay Sievers
  2008-03-27 18:51               ` What to do about the 2TB limit on HDIO_GETGEO ? Kay Sievers
  1 sibling, 2 replies; 59+ messages in thread
From: Tejun Heo @ 2008-03-26  0:54 UTC (permalink / raw)
  To: Mark Lord
  Cc: Greg KH, H. Peter Anvin, Jens Axboe, Jeff Garzik, Linus Torvalds,
	Andrew Morton, Linux Kernel, IDE/ATA development list,
	linux-scsi

Hello,

Mark Lord wrote:
> So have we.  sysfs is a total nightmare to extract information from
> under program / script control.  The idea presented in this thread,
> is to have it cross-index the contents with a method that actually
> makes it easy to access in many common scenarios, without requiring
> huge gobs of code in user space.  Or in kernel space.
> 
> And it's not just a few 10s of lines of code currently,
> but rather about 80-100 lines just to find the correct device subdir,
> and *then* a few more 10s of lines of code to retrieve the value.
> 
> In a bulletproof fashion, that is.  Sure it can be slightly smaller
> if niceties such as error checking/handling are omitted.
> 
> There's no guarantee that udev is present, and even if it were present,
> there's no guarantee that the names in /dev/ will match /sysfs/ pathnames,
> since udev is very configurable to do otherwise.
> 
> So lookups are by dev_t, which sysfs has no simple or even easy way
> of accomplishing.  O(n) at a minimum.
> 
> If we make it easier to access, then more programs will use it
> rather than us having to expand our tricky binary ioctl interfaces.
> 
> Isn't that part of the idea of sysfs -- to limit the need for new ioctls ?

The questions are...

1. Are we gonna push sysfs as the primary interface and not provide an
alternative interface (ioctl here) which can provide equivalent
information?  There are people running their systems w/o sysfs but I
think we're getting closer to this everyday.

2. Is udev an essential part of all systems?  I'm not sure about this
one.  Lots of small machines run w/o udev and I think udev is a bit too
high level to depend on for every system.

If both #1 and #2 are true, I agree with Mark that we need an easy to
map from device number to matching sysfs nodes.  Tools which are used
early during boot and emergency sessions need this mapping and many of
them are minimal C program w/o much dependency for a good reason.
Requiring each of them to implement their own way to map device node to
sysfs node is too awkward.

Probably something like /sys/class/block/MAJ:MIN or
/sys/class/devnums/bMAJ:MIN?

-- 
tejun

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: What to do about the 2TB limit on HDIO_GETGEO ?
  2008-03-26  0:54               ` Tejun Heo
@ 2008-03-26  3:38                 ` Greg KH
  2008-03-26  4:24                   ` Tejun Heo
  2008-03-27 19:29                 ` Kay Sievers
  1 sibling, 1 reply; 59+ messages in thread
From: Greg KH @ 2008-03-26  3:38 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Mark Lord, H. Peter Anvin, Jens Axboe, Jeff Garzik,
	Linus Torvalds, Andrew Morton, Linux Kernel,
	IDE/ATA development list, linux-scsi

On Wed, Mar 26, 2008 at 09:54:22AM +0900, Tejun Heo wrote:
> Hello,
> 
> Mark Lord wrote:
> > So have we.  sysfs is a total nightmare to extract information from
> > under program / script control.  The idea presented in this thread,
> > is to have it cross-index the contents with a method that actually
> > makes it easy to access in many common scenarios, without requiring
> > huge gobs of code in user space.  Or in kernel space.
> > 
> > And it's not just a few 10s of lines of code currently,
> > but rather about 80-100 lines just to find the correct device subdir,
> > and *then* a few more 10s of lines of code to retrieve the value.

I think you are using either the wrong programming language, or your
sysfs walking logic is quite convulted.  Look at the udev and HAL code
if you want to steal some compact, working sysfs code :)

> > In a bulletproof fashion, that is.  Sure it can be slightly smaller
> > if niceties such as error checking/handling are omitted.
> > 
> > There's no guarantee that udev is present, and even if it were present,
> > there's no guarantee that the names in /dev/ will match /sysfs/ pathnames,
> > since udev is very configurable to do otherwise.
> > 
> > So lookups are by dev_t, which sysfs has no simple or even easy way
> > of accomplishing.  O(n) at a minimum.

And again, is this a performance requiring operation?

> > If we make it easier to access, then more programs will use it
> > rather than us having to expand our tricky binary ioctl interfaces.
> > 
> > Isn't that part of the idea of sysfs -- to limit the need for new ioctls ?
> 
> The questions are...
> 
> 1. Are we gonna push sysfs as the primary interface and not provide an
> alternative interface (ioctl here) which can provide equivalent
> information?  There are people running their systems w/o sysfs but I
> think we're getting closer to this everyday.

Exactly, originally you suggested a new ioctl, which would be trivial to
add, and trivial to switch any program that was currently using an ioctl
to get the disk size, to use it instead.

Since when is the major:minor view of devices the "standard" one that
userspace uses?  Last I looked, userspace uses symlinks and lots of
other ways of directly accessing block devices in /dev/, and does not
rely on major:minor.

And finally, I haven't seen a patch that implements this "shadow" tree,
it would be interesting to see if it could even be done.

> 2. Is udev an essential part of all systems?  I'm not sure about this
> one.  Lots of small machines run w/o udev and I think udev is a bit too
> high level to depend on for every system.

My tiny little phone runs udev, I don't see why anyone wouldn't run it
these days, except in very limited embedded applications with no dynamic
devices.  But if you are in that situation, you aren't querying the size
of any random block device either :)

And heck, this phone is a very limited embedded application, with razor
thin margins, if it can use udev, I'd be interested in hearing the
justifications for anyone who says it is too large for their systems to
use it.

> If both #1 and #2 are true, I agree with Mark that we need an easy to
> map from device number to matching sysfs nodes.  Tools which are used
> early during boot and emergency sessions need this mapping and many of
> them are minimal C program w/o much dependency for a good reason.
> Requiring each of them to implement their own way to map device node to
> sysfs node is too awkward.
> 
> Probably something like /sys/class/block/MAJ:MIN or
> /sys/class/devnums/bMAJ:MIN?

Why the preopcupation with major:minor?  Just because you are able to
grab it from an open file handle?  Heck, why not just an ioctl to get
the path within sysfs for the device currently open?  :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: What to do about the 2TB limit on HDIO_GETGEO ?
  2008-03-26  3:38                 ` Greg KH
@ 2008-03-26  4:24                   ` Tejun Heo
  2008-03-26  6:04                     ` H. Peter Anvin
  0 siblings, 1 reply; 59+ messages in thread
From: Tejun Heo @ 2008-03-26  4:24 UTC (permalink / raw)
  To: Greg KH
  Cc: Mark Lord, H. Peter Anvin, Jens Axboe, Jeff Garzik,
	Linus Torvalds, Andrew Morton, Linux Kernel,
	IDE/ATA development list, linux-scsi

Hello, Greg.

Greg KH wrote:
>> 1. Are we gonna push sysfs as the primary interface and not provide an
>> alternative interface (ioctl here) which can provide equivalent
>> information?  There are people running their systems w/o sysfs but I
>> think we're getting closer to this everyday.
> 
> Exactly, originally you suggested a new ioctl,

Well, I like Mark but am not really him.  :-)

> which would be trivial to
> add, and trivial to switch any program that was currently using an ioctl
> to get the disk size, to use it instead.

That should be the simplest solution for the problem at hand.

> Since when is the major:minor view of devices the "standard" one that
> userspace uses?  Last I looked, userspace uses symlinks and lots of
> other ways of directly accessing block devices in /dev/, and does not
> rely on major:minor.

The fact that major:minor is the unique identifier of a device makes it
a bit special compared to other names on filesystem.

> And finally, I haven't seen a patch that implements this "shadow" tree,
> it would be interesting to see if it could even be done.

It's possible, all that's needed are symlinks.  We do similar things all
the time.

>> 2. Is udev an essential part of all systems?  I'm not sure about this
>> one.  Lots of small machines run w/o udev and I think udev is a bit too
>> high level to depend on for every system.
> 
> My tiny little phone runs udev, I don't see why anyone wouldn't run it
> these days, except in very limited embedded applications with no dynamic
> devices.  But if you are in that situation, you aren't querying the size
> of any random block device either :)
> 
> And heck, this phone is a very limited embedded application, with razor
> thin margins, if it can use udev, I'd be interested in hearing the
> justifications for anyone who says it is too large for their systems to
> use it.

I agree udev is affordable for most cases but it's still a major step to
require it for every system.  I would hate to hear that hdparm or fdisk
doesn't work unless udev is online.  These are tools which are used to
recover systems.

>> If both #1 and #2 are true, I agree with Mark that we need an easy to
>> map from device number to matching sysfs nodes.  Tools which are used
>> early during boot and emergency sessions need this mapping and many of
>> them are minimal C program w/o much dependency for a good reason.
>> Requiring each of them to implement their own way to map device node to
>> sysfs node is too awkward.
>>
>> Probably something like /sys/class/block/MAJ:MIN or
>> /sys/class/devnums/bMAJ:MIN?
> 
> Why the preopcupation with major:minor?  Just because you are able to
> grab it from an open file handle?  Heck, why not just an ioctl to get
> the path within sysfs for the device currently open?  :)

Because major:minor is the key attribute to devices?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: What to do about the 2TB limit on HDIO_GETGEO ?
  2008-03-26  4:24                   ` Tejun Heo
@ 2008-03-26  6:04                     ` H. Peter Anvin
  0 siblings, 0 replies; 59+ messages in thread
From: H. Peter Anvin @ 2008-03-26  6:04 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Greg KH, Mark Lord, Jens Axboe, Jeff Garzik, Linus Torvalds,
	Andrew Morton, Linux Kernel, IDE/ATA development list,
	linux-scsi

Tejun Heo wrote:
> 
>> Since when is the major:minor view of devices the "standard" one that
>> userspace uses?  Last I looked, userspace uses symlinks and lots of
>> other ways of directly accessing block devices in /dev/, and does not
>> rely on major:minor.
> 
> The fact that major:minor is the unique identifier of a device makes it
> a bit special compared to other names on filesystem.
> 

In particular, stat() and friends returns the device number, not a 
device name.

	-hpa

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: What to do about the 2TB limit on HDIO_GETGEO ?
  2008-03-25 18:09         ` Matthew Wilcox
@ 2008-03-26  9:58           ` Boaz Harrosh
  0 siblings, 0 replies; 59+ messages in thread
From: Boaz Harrosh @ 2008-03-26  9:58 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Randy Dunlap, Greg Freemyer, James Bottomley, Mark Lord,
	Jens Axboe, Jeff Garzik, Tejun Heo, Greg KH, Linus Torvalds,
	Andrew Morton, Linux Kernel, IDE/ATA development list,
	linux-scsi

On Tue, Mar 25 2008 at 20:09 +0200, Matthew Wilcox <matthew@wil.cx> wrote:
> On Tue, Mar 25, 2008 at 10:52:28AM -0700, Randy Dunlap wrote:
>>> I'm not sure what the Linux Kernel support is for GPTs.
>> It's implemented.  Not sure about how well used/tested it is.
> 
> ia64 uses it exclusively ... at least on discs that you want to use from
> EFI.
> 
I thinks intel-Macs do too.

Boaz


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: What to do about the 2TB limit on HDIO_GETGEO ?
  2008-03-26  0:34             ` Mark Lord
  2008-03-26  0:54               ` Tejun Heo
@ 2008-03-27 18:51               ` Kay Sievers
  2008-03-27 18:55                 ` H. Peter Anvin
  1 sibling, 1 reply; 59+ messages in thread
From: Kay Sievers @ 2008-03-27 18:51 UTC (permalink / raw)
  To: Mark Lord
  Cc: Greg KH, H. Peter Anvin, Jens Axboe, Jeff Garzik, Tejun Heo,
	Linus Torvalds, Andrew Morton, Linux Kernel,
	IDE/ATA development list, linux-scsi

On Wed, Mar 26, 2008 at 1:34 AM, Mark Lord <lkml@rtr.ca> wrote:
> Greg KH wrote:
>  > On Tue, Mar 25, 2008 at 01:37:03PM -0400, Mark Lord wrote:
>
> >> Perhaps Greg will chime in.
>  >
>  > I've been waiting to see if sanity will take hold of anyone here.
>  ..
>
>  So have we.  sysfs is a total nightmare to extract information from
>  under program / script control.  The idea presented in this thread,
>  is to have it cross-index the contents with a method that actually
>  makes it easy to access in many common scenarios, without requiring
>  huge gobs of code in user space.  Or in kernel space.
>
>  And it's not just a few 10s of lines of code currently,
>  but rather about 80-100 lines just to find the correct device subdir,
>  and *then* a few more 10s of lines of code to retrieve the value.

Hmm, 100 lines? What else do you need?

  $ grep -l 8:3 /sys/class/block/*/dev
  /sys/class/block/sdc/dev

Kay

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: What to do about the 2TB limit on HDIO_GETGEO ?
  2008-03-27 18:51               ` What to do about the 2TB limit on HDIO_GETGEO ? Kay Sievers
@ 2008-03-27 18:55                 ` H. Peter Anvin
  2008-03-27 19:03                   ` Kay Sievers
  0 siblings, 1 reply; 59+ messages in thread
From: H. Peter Anvin @ 2008-03-27 18:55 UTC (permalink / raw)
  To: Kay Sievers
  Cc: Mark Lord, Greg KH, Jens Axboe, Jeff Garzik, Tejun Heo,
	Linus Torvalds, Andrew Morton, Linux Kernel,
	IDE/ATA development list, linux-scsi

Kay Sievers wrote:
>>
>>  And it's not just a few 10s of lines of code currently,
>>  but rather about 80-100 lines just to find the correct device subdir,
>>  and *then* a few more 10s of lines of code to retrieve the value.
> 
> Hmm, 100 lines? What else do you need?
> 
>   $ grep -l 8:3 /sys/class/block/*/dev
>   /sys/class/block/sdc/dev
> 

That's particularly funny, because your very own example gives the wrong 
result -- sdc is 8:32 not 8:3 (which is sdc3, which is also excluded by 
your search.)

Not to mention the fact that it is still O(n).

	-hpa

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: What to do about the 2TB limit on HDIO_GETGEO ?
  2008-03-27 18:55                 ` H. Peter Anvin
@ 2008-03-27 19:03                   ` Kay Sievers
  0 siblings, 0 replies; 59+ messages in thread
From: Kay Sievers @ 2008-03-27 19:03 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Mark Lord, Greg KH, Jens Axboe, Jeff Garzik, Tejun Heo,
	Linus Torvalds, Andrew Morton, Linux Kernel,
	IDE/ATA development list, linux-scsi

On Thu, 2008-03-27 at 11:55 -0700, H. Peter Anvin wrote:
> Kay Sievers wrote:
> >>
> >>  And it's not just a few 10s of lines of code currently,
> >>  but rather about 80-100 lines just to find the correct device subdir,
> >>  and *then* a few more 10s of lines of code to retrieve the value.
> > 
> > Hmm, 100 lines? What else do you need?
> > 
> >   $ grep -l 8:3 /sys/class/block/*/dev
> >   /sys/class/block/sdc/dev
> > 
> 
> That's particularly funny, because your very own example gives the wrong 
> result -- sdc is 8:32 not 8:3 (which is sdc3, which is also excluded by 
> your search.)

Very true, but I guess you get the idea, and know how to add the proper
string match to grep. :)

> Not to mention the fact that it is still O(n).

Any real numbers from a large setup, which show that we want to have a
reverse devnum map in sysfs?

Thanks,
Kay


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: What to do about the 2TB limit on HDIO_GETGEO ?
  2008-03-25 21:26                   ` H. Peter Anvin
  2008-03-25 23:00                     ` Greg KH
@ 2008-03-27 19:05                     ` Matthew Wilcox
  1 sibling, 0 replies; 59+ messages in thread
From: Matthew Wilcox @ 2008-03-27 19:05 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Greg KH, Randy Dunlap, Mark Lord, Jens Axboe, Jeff Garzik,
	Tejun Heo, Linus Torvalds, Andrew Morton, Linux Kernel,
	IDE/ATA development list, linux-scsi

On Tue, Mar 25, 2008 at 02:26:45PM -0700, H. Peter Anvin wrote:
> Well, when dealing with shell scripts a O(n) very easily becomes O(n^2). 
>  For the stuff that I, personally, do, it's not a big deal, but people 
> with large number of disks have serious gripes with our boot times.

This should be a solved problem with scsi_mod.scan=async (or equivalent
compile option).  Are people still complaining about it, and if so, have
they tried this option?

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: What to do about the 2TB limit on HDIO_GETGEO ?
  2008-03-26  0:54               ` Tejun Heo
  2008-03-26  3:38                 ` Greg KH
@ 2008-03-27 19:29                 ` Kay Sievers
  2008-03-27 19:38                   ` H. Peter Anvin
  1 sibling, 1 reply; 59+ messages in thread
From: Kay Sievers @ 2008-03-27 19:29 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Mark Lord, Greg KH, H. Peter Anvin, Jens Axboe, Jeff Garzik,
	Linus Torvalds, Andrew Morton, Linux Kernel,
	IDE/ATA development list, linux-scsi

On Wed, Mar 26, 2008 at 1:54 AM, Tejun Heo <htejun@gmail.com> wrote:
>  Mark Lord wrote:
>  > So have we.  sysfs is a total nightmare to extract information from
>  > under program / script control.  The idea presented in this thread,
>  > is to have it cross-index the contents with a method that actually
>  > makes it easy to access in many common scenarios, without requiring
>  > huge gobs of code in user space.  Or in kernel space.
>  >
>  > And it's not just a few 10s of lines of code currently,
>  > but rather about 80-100 lines just to find the correct device subdir,
>  > and *then* a few more 10s of lines of code to retrieve the value.
>  >
>  > In a bulletproof fashion, that is.  Sure it can be slightly smaller
>  > if niceties such as error checking/handling are omitted.
>  >
>  > There's no guarantee that udev is present, and even if it were present,
>  > there's no guarantee that the names in /dev/ will match /sysfs/ pathnames,
>  > since udev is very configurable to do otherwise.
>  >
>  > So lookups are by dev_t, which sysfs has no simple or even easy way
>  > of accomplishing.  O(n) at a minimum.
>  >
>  > If we make it easier to access, then more programs will use it
>  > rather than us having to expand our tricky binary ioctl interfaces.
>  >
>  > Isn't that part of the idea of sysfs -- to limit the need for new ioctls ?
>
>  The questions are...
>
>  1. Are we gonna push sysfs as the primary interface and not provide an
>  alternative interface (ioctl here) which can provide equivalent
>  information?  There are people running their systems w/o sysfs but I
>  think we're getting closer to this everyday.
>
>  2. Is udev an essential part of all systems?  I'm not sure about this
>  one.  Lots of small machines run w/o udev and I think udev is a bit too
>  high level to depend on for every system.
>
>  If both #1 and #2 are true, I agree with Mark that we need an easy to
>  map from device number to matching sysfs nodes.  Tools which are used
>  early during boot and emergency sessions need this mapping and many of
>  them are minimal C program w/o much dependency for a good reason.
>  Requiring each of them to implement their own way to map device node to
>  sysfs node is too awkward.
>
>  Probably something like /sys/class/block/MAJ:MIN

"Devices directories" are not supposed to contain duplicate entries.
It would slow-down, or may even break things.

> or /sys/class/devnums/bMAJ:MIN?

These are no devices belonging to the class "devnums", so it may
confuse things which crawl these directories to get "all devices".
Current coldplug-like setups will likely add duplicate devices with
the wrong subsystem. There are also bus-devices with have a dev_t, and
that will make them show up in /sys/class, which might confuse some
tools too.

I guess we will need to find some other solution as a /sys/class/ for
that. And we must prefix the links with 'c' and 'b' because dev_t is
not unique across char and block devices.

Thanks,
Kay

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: What to do about the 2TB limit on HDIO_GETGEO ?
  2008-03-27 19:29                 ` Kay Sievers
@ 2008-03-27 19:38                   ` H. Peter Anvin
  2008-04-11 23:25                     ` Dan Williams
  0 siblings, 1 reply; 59+ messages in thread
From: H. Peter Anvin @ 2008-03-27 19:38 UTC (permalink / raw)
  To: Kay Sievers
  Cc: Tejun Heo, Mark Lord, Greg KH, Jens Axboe, Jeff Garzik,
	Linus Torvalds, Andrew Morton, Linux Kernel,
	IDE/ATA development list, linux-scsi

Kay Sievers wrote:
>>
>>  Probably something like /sys/class/block/MAJ:MIN
> 
> "Devices directories" are not supposed to contain duplicate entries.
> It would slow-down, or may even break things.
> 
>> or /sys/class/devnums/bMAJ:MIN?
> 
> These are no devices belonging to the class "devnums", so it may
> confuse things which crawl these directories to get "all devices".
> Current coldplug-like setups will likely add duplicate devices with
> the wrong subsystem. There are also bus-devices with have a dev_t, and
> that will make them show up in /sys/class, which might confuse some
> tools too.
> 
> I guess we will need to find some other solution as a /sys/class/ for
> that. And we must prefix the links with 'c' and 'b' because dev_t is
> not unique across char and block devices.
> 

It doesn't really seem to be to belong under class at all.  I would 
suggest /sys/dev/char/ and /sys/dev/block/, for char and block respectively.

	-hpa


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: What to do about the 2TB limit on HDIO_GETGEO ?
  2008-03-25 17:45     ` Greg Freemyer
  2008-03-25 17:52       ` Randy Dunlap
@ 2008-03-30  4:28       ` Matt Domsch
  1 sibling, 0 replies; 59+ messages in thread
From: Matt Domsch @ 2008-03-30  4:28 UTC (permalink / raw)
  To: Greg Freemyer
  Cc: James Bottomley, Mark Lord, Jens Axboe, Jeff Garzik, Tejun Heo,
	Greg KH, Linus Torvalds, Andrew Morton, Linux Kernel,
	IDE/ATA development list, linux-scsi

On Tue, Mar 25, 2008 at 01:45:35PM -0400, Greg Freemyer wrote:
> I believe GUID Partition Tables (GPTs) are the answer.
> 
> I believe one of the features of GPT is the elimination of the 32-bit
> sector restrictions.
> 
> http://en.wikipedia.org/wiki/GUID_Partition_Table
> 
> Windows VISTA 64-bit supports GPTs on data disks and new Mac OS based
> systems have been using it on internal drives for a couple years at
> least.
> 
> GPTs are part of the Extensible Firmware Interface (EFI), so they
> should be usable for PC bootable disks at some point.  (Maybe now in
> some cases?)
> 
> I'm not sure what the Linux Kernel support is for GPTs.

It has been supported since the first Itanium systems shipped.  It's
the first code I wrote 7+ years before it was really needed. :-)  Most
distributions have it enabled, as do userspace tools like GNU Parted.

-- 
Matt Domsch
Linux Technology Strategist, Dell Office of the CTO
linux.dell.com & www.dell.com/linux

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: What to do about the 2TB limit on HDIO_GETGEO ?
  2008-03-27 19:38                   ` H. Peter Anvin
@ 2008-04-11 23:25                     ` Dan Williams
  2008-04-15  7:18                       ` Andrew Morton
  0 siblings, 1 reply; 59+ messages in thread
From: Dan Williams @ 2008-04-11 23:25 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Kay Sievers, Tejun Heo, Mark Lord, Greg KH, Jens Axboe,
	Jeff Garzik, Linus Torvalds, Andrew Morton, Linux Kernel,
	IDE/ATA development list, linux-scsi

[-- Attachment #1: Type: text/plain, Size: 5371 bytes --]

On Thu, Mar 27, 2008 at 12:38 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> Kay Sievers wrote:
>
> >
> > >
> > >  Probably something like /sys/class/block/MAJ:MIN
> > >
> >
> > "Devices directories" are not supposed to contain duplicate entries.
> > It would slow-down, or may even break things.
> >
> >
> > > or /sys/class/devnums/bMAJ:MIN?
> > >
> >
> > These are no devices belonging to the class "devnums", so it may
> > confuse things which crawl these directories to get "all devices".
> > Current coldplug-like setups will likely add duplicate devices with
> > the wrong subsystem. There are also bus-devices with have a dev_t, and
> > that will make them show up in /sys/class, which might confuse some
> > tools too.
> >
> > I guess we will need to find some other solution as a /sys/class/ for
> > that. And we must prefix the links with 'c' and 'b' because dev_t is
> > not unique across char and block devices.
> >
> >
>
>  It doesn't really seem to be to belong under class at all.  I would suggest
> /sys/dev/char/ and /sys/dev/block/, for char and block respectively.
>

This thread fizzled out without a patch... here goes:

[ note: I'm replying via gmail, so if it has whitespace mangled the
patch please see the attachment ]

-----snip---->
sysfs: add /sys/dev/{char,block} to lookup sysfs path by major:minor

From: Dan Williams <dan.j.williams@intel.com>

Why?:
There are occasions where userspace would like to access sysfs
attributes for a device but it may not know how sysfs has named the
device or the path.  For example what is the sysfs path for
/dev/disk/by-id/ata-ST3160827AS_5MT004CK?  With this change a call to
stat(2) returns the major:minor then userspace can see that
/sys/dev/block/8:32 links to /sys/block/sdc.

What are the alternatives?:
1/ Add an ioctl to return the path: Doable, but sysfs is meant to reduce
   the need to proliferate ioctl interfaces into the kernel, so this
   seems counter productive.

2/ Use udev to create these symlinks: Also doable, but it adds a
   udev dependency to utilities that might be running in a limited
   environment like an initramfs.

Cc: NeilBrown <neilb@suse.de>
Cc: Tejun Heo <htejun@gmail.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Greg KH <gregkh@suse.de>
Cc: Mark Lord <lkml@rtr.ca>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---

 drivers/base/core.c |   37 ++++++++++++++++++++++++++++++++++++-
 1 files changed, 36 insertions(+), 1 deletions(-)


diff --git a/drivers/base/core.c b/drivers/base/core.c
index 24198ad..de925f8 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -27,6 +27,9 @@

 int (*platform_notify)(struct device *dev) = NULL;
 int (*platform_notify_remove)(struct device *dev) = NULL;
+static struct kobject *dev_kobj;
+static struct kobject *char_kobj;
+static struct kobject *block_kobj;

 #ifdef CONFIG_BLOCK
 static inline int device_is_not_partition(struct device *dev)
@@ -759,6 +762,11 @@ static void device_remove_class_symlinks(struct
device *dev)
 	sysfs_remove_link(&dev->kobj, "subsystem");
 }

+static struct kobject *device_to_dev_kobj(struct device *dev)
+{
+	return dev->class == &block_class ? block_kobj : char_kobj;
+}
+
 /**
  * device_add - add device to device hierarchy.
  * @dev: device.
@@ -775,6 +783,7 @@ int device_add(struct device *dev)
 	struct device *parent = NULL;
 	struct class_interface *class_intf;
 	int error;
+	char devt_str[25];

 	dev = get_device(dev);
 	if (!dev || !strlen(dev->bus_id)) {
@@ -806,9 +815,16 @@ int device_add(struct device *dev)
 		goto attrError;

 	if (MAJOR(dev->devt)) {
+		struct kobject *kobj = device_to_dev_kobj(dev);
+
 		error = device_create_file(dev, &devt_attr);
 		if (error)
 			goto ueventattrError;
+
+		format_dev_t(devt_str, dev->devt);
+		error = sysfs_create_link(kobj, &dev->kobj, devt_str);
+		if (error)
+			goto devtattrError;
 	}

 	error = device_add_class_symlinks(dev);
@@ -854,6 +870,9 @@ int device_add(struct device *dev)
 	device_remove_class_symlinks(dev);
  SymlinkError:
 	if (MAJOR(dev->devt))
+		sysfs_remove_link(device_to_dev_kobj(dev), devt_str);
+ devtattrError:
+	if (MAJOR(dev->devt))
 		device_remove_file(dev, &devt_attr);
  ueventattrError:
 	device_remove_file(dev, &uevent_attr);
@@ -925,12 +944,16 @@ void device_del(struct device *dev)
 {
 	struct device *parent = dev->parent;
 	struct class_interface *class_intf;
+	char devt_str[25];

 	device_pm_remove(dev);
 	if (parent)
 		klist_del(&dev->knode_parent);
-	if (MAJOR(dev->devt))
+	if (MAJOR(dev->devt)) {
+		format_dev_t(devt_str, dev->devt);
+		sysfs_remove_link(device_to_dev_kobj(dev), devt_str);
 		device_remove_file(dev, &devt_attr);
+	}
 	if (dev->class) {
 		device_remove_class_symlinks(dev);

@@ -1055,6 +1078,15 @@ int __init devices_init(void)
 	devices_kset = kset_create_and_add("devices", &device_uevent_ops, NULL);
 	if (!devices_kset)
 		return -ENOMEM;
+	dev_kobj = kobject_create_and_add("dev", NULL);
+	if (!dev_kobj)
+		return -ENOMEM;
+	block_kobj = kobject_create_and_add("block", dev_kobj);
+	if (!block_kobj)
+		return -ENOMEM;
+	char_kobj = kobject_create_and_add("char", dev_kobj);
+	if (!char_kobj)
+		return -ENOMEM;
 	return 0;
 }

@@ -1380,4 +1412,7 @@ void device_shutdown(void)
 			dev->driver->shutdown(dev);
 		}
 	}
+	kobject_put(char_kobj);
+	kobject_put(block_kobj);
+	kobject_put(dev_kobj);
 }

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: sysfs-sys-dev-char-block.patch --]
[-- Type: text/x-patch; name=sysfs-sys-dev-char-block.patch, Size: 4123 bytes --]

sysfs: add /sys/dev/{char,block} to lookup sysfs path by major:minor

From: Dan Williams <dan.j.williams@intel.com>

Why?:
There are occasions where userspace would like to access sysfs
attributes for a device but it may not know how sysfs has named the
device or the path.  For example what is the sysfs path for
/dev/disk/by-id/ata-ST3160827AS_5MT004CK?  With this change a call to
stat(2) returns the major:minor then userspace can see that
/sys/dev/block/8:32 links to /sys/block/sdc.

What are the alternatives?:
1/ Add an ioctl to return the path: Doable, but sysfs is meant to reduce
   the need to proliferate ioctl interfaces into the kernel, so this
   seems counter productive.

2/ Use udev to create these symlinks: Also doable, but it adds a
   udev dependency to utilities that might be running in a limited
   environment like an initramfs.

Cc: NeilBrown <neilb@suse.de>
Cc: Tejun Heo <htejun@gmail.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Greg KH <gregkh@suse.de>
Cc: Mark Lord <lkml@rtr.ca>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---

 drivers/base/core.c |   37 ++++++++++++++++++++++++++++++++++++-
 1 files changed, 36 insertions(+), 1 deletions(-)


diff --git a/drivers/base/core.c b/drivers/base/core.c
index 24198ad..de925f8 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -27,6 +27,9 @@
 
 int (*platform_notify)(struct device *dev) = NULL;
 int (*platform_notify_remove)(struct device *dev) = NULL;
+static struct kobject *dev_kobj;
+static struct kobject *char_kobj;
+static struct kobject *block_kobj;
 
 #ifdef CONFIG_BLOCK
 static inline int device_is_not_partition(struct device *dev)
@@ -759,6 +762,11 @@ static void device_remove_class_symlinks(struct device *dev)
 	sysfs_remove_link(&dev->kobj, "subsystem");
 }
 
+static struct kobject *device_to_dev_kobj(struct device *dev)
+{
+	return dev->class == &block_class ? block_kobj : char_kobj;
+}
+
 /**
  * device_add - add device to device hierarchy.
  * @dev: device.
@@ -775,6 +783,7 @@ int device_add(struct device *dev)
 	struct device *parent = NULL;
 	struct class_interface *class_intf;
 	int error;
+	char devt_str[25];
 
 	dev = get_device(dev);
 	if (!dev || !strlen(dev->bus_id)) {
@@ -806,9 +815,16 @@ int device_add(struct device *dev)
 		goto attrError;
 
 	if (MAJOR(dev->devt)) {
+		struct kobject *kobj = device_to_dev_kobj(dev);
+
 		error = device_create_file(dev, &devt_attr);
 		if (error)
 			goto ueventattrError;
+
+		format_dev_t(devt_str, dev->devt);
+		error = sysfs_create_link(kobj, &dev->kobj, devt_str);
+		if (error)
+			goto devtattrError;
 	}
 
 	error = device_add_class_symlinks(dev);
@@ -854,6 +870,9 @@ int device_add(struct device *dev)
 	device_remove_class_symlinks(dev);
  SymlinkError:
 	if (MAJOR(dev->devt))
+		sysfs_remove_link(device_to_dev_kobj(dev), devt_str);
+ devtattrError:
+	if (MAJOR(dev->devt))
 		device_remove_file(dev, &devt_attr);
  ueventattrError:
 	device_remove_file(dev, &uevent_attr);
@@ -925,12 +944,16 @@ void device_del(struct device *dev)
 {
 	struct device *parent = dev->parent;
 	struct class_interface *class_intf;
+	char devt_str[25];
 
 	device_pm_remove(dev);
 	if (parent)
 		klist_del(&dev->knode_parent);
-	if (MAJOR(dev->devt))
+	if (MAJOR(dev->devt)) {
+		format_dev_t(devt_str, dev->devt);
+		sysfs_remove_link(device_to_dev_kobj(dev), devt_str);
 		device_remove_file(dev, &devt_attr);
+	}
 	if (dev->class) {
 		device_remove_class_symlinks(dev);
 
@@ -1055,6 +1078,15 @@ int __init devices_init(void)
 	devices_kset = kset_create_and_add("devices", &device_uevent_ops, NULL);
 	if (!devices_kset)
 		return -ENOMEM;
+	dev_kobj = kobject_create_and_add("dev", NULL);
+	if (!dev_kobj)
+		return -ENOMEM;
+	block_kobj = kobject_create_and_add("block", dev_kobj);
+	if (!block_kobj)
+		return -ENOMEM;
+	char_kobj = kobject_create_and_add("char", dev_kobj);
+	if (!char_kobj)
+		return -ENOMEM;
 	return 0;
 }
 
@@ -1380,4 +1412,7 @@ void device_shutdown(void)
 			dev->driver->shutdown(dev);
 		}
 	}
+	kobject_put(char_kobj);
+	kobject_put(block_kobj);
+	kobject_put(dev_kobj);
 }

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: What to do about the 2TB limit on HDIO_GETGEO ?
  2008-04-11 23:25                     ` Dan Williams
@ 2008-04-15  7:18                       ` Andrew Morton
  2008-04-15 13:47                         ` Mark Lord
  2008-04-15 14:20                         ` James Bottomley
  0 siblings, 2 replies; 59+ messages in thread
From: Andrew Morton @ 2008-04-15  7:18 UTC (permalink / raw)
  To: Dan Williams
  Cc: H. Peter Anvin, Kay Sievers, Tejun Heo, Mark Lord, Greg KH,
	Jens Axboe, Jeff Garzik, Linus Torvalds, Linux Kernel,
	IDE/ATA development list, linux-scsi

On Fri, 11 Apr 2008 16:25:32 -0700 "Dan Williams" <dan.j.williams@intel.com> wrote:

> >  It doesn't really seem to be to belong under class at all.  I would suggest
> > /sys/dev/char/ and /sys/dev/block/, for char and block respectively.
> >
> 
> This thread fizzled out without a patch... here goes:
> 
> ...
>
> sysfs: add /sys/dev/{char,block} to lookup sysfs path by major:minor

Crickets are chirping and I can't remember what the conclusion to all this
was.  In fact the thread was more than ten-deep so I probably fell asleep.

I queued it up so that others cannot do the same ;)

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: What to do about the 2TB limit on HDIO_GETGEO ?
  2008-04-15  7:18                       ` Andrew Morton
@ 2008-04-15 13:47                         ` Mark Lord
  2008-04-15 14:20                         ` James Bottomley
  1 sibling, 0 replies; 59+ messages in thread
From: Mark Lord @ 2008-04-15 13:47 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Dan Williams, H. Peter Anvin, Kay Sievers, Tejun Heo, Mark Lord,
	Greg KH, Jens Axboe, Jeff Garzik, Linus Torvalds, Linux Kernel,
	IDE/ATA development list, linux-scsi

Andrew Morton wrote:
> On Fri, 11 Apr 2008 16:25:32 -0700 "Dan Williams" <dan.j.williams@intel.com> wrote:
> 
>>>  It doesn't really seem to be to belong under class at all.  I would suggest
>>> /sys/dev/char/ and /sys/dev/block/, for char and block respectively.
>>>
>> This thread fizzled out without a patch... here goes:
>>
>> ...
>>
>> sysfs: add /sys/dev/{char,block} to lookup sysfs path by major:minor
> 
> Crickets are chirping and I can't remember what the conclusion to all this
> was.  In fact the thread was more than ten-deep so I probably fell asleep.
..

Last I recall, Greg was vehemently opposed to having direct path access
by device number in sysfs, but many other people saw benefit.

Myself (the originator), I simply decided that my sysfs access code has
to work with older kernels too, so for now I'm just doing a brute force
tree search to find things in sysfs.  I did get the code size down smaller
for it, but it's still a pain.

When the direct access feature goes in, I'll just change my code to try it first,
and then still fall back to the tree search method on failure.

> I queued it up so that others cannot do the same ;)
..

Good!

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: What to do about the 2TB limit on HDIO_GETGEO ?
  2008-04-15  7:18                       ` Andrew Morton
  2008-04-15 13:47                         ` Mark Lord
@ 2008-04-15 14:20                         ` James Bottomley
  2008-04-15 18:16                           ` H. Peter Anvin
  1 sibling, 1 reply; 59+ messages in thread
From: James Bottomley @ 2008-04-15 14:20 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Dan Williams, H. Peter Anvin, Kay Sievers, Tejun Heo, Mark Lord,
	Greg KH, Jens Axboe, Jeff Garzik, Linus Torvalds, Linux Kernel,
	IDE/ATA development list, linux-scsi

On Tue, 2008-04-15 at 00:18 -0700, Andrew Morton wrote:
> On Fri, 11 Apr 2008 16:25:32 -0700 "Dan Williams" <dan.j.williams@intel.com> wrote:
> 
> > >  It doesn't really seem to be to belong under class at all.  I would suggest
> > > /sys/dev/char/ and /sys/dev/block/, for char and block respectively.
> > >
> > 
> > This thread fizzled out without a patch... here goes:
> > 
> > ...
> >
> > sysfs: add /sys/dev/{char,block} to lookup sysfs path by major:minor
> 
> Crickets are chirping and I can't remember what the conclusion to all this
> was.  In fact the thread was more than ten-deep so I probably fell asleep.
> 
> I queued it up so that others cannot do the same ;)

The expressed preference was simply to expand the ioctl (or add a new
one that got the required information without having to go through the
old HDIOGETGEO path to extract the value from a fictitious geometry).

Greg was a bit sceptical of the value of the above proposal ...

James



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: What to do about the 2TB limit on HDIO_GETGEO ?
  2008-04-15 14:20                         ` James Bottomley
@ 2008-04-15 18:16                           ` H. Peter Anvin
  2008-04-15 23:43                             ` Dan Williams
  0 siblings, 1 reply; 59+ messages in thread
From: H. Peter Anvin @ 2008-04-15 18:16 UTC (permalink / raw)
  To: James Bottomley
  Cc: Andrew Morton, Dan Williams, Kay Sievers, Tejun Heo, Mark Lord,
	Greg KH, Jens Axboe, Jeff Garzik, Linus Torvalds, Linux Kernel,
	IDE/ATA development list, linux-scsi

James Bottomley wrote:
> 
> The expressed preference was simply to expand the ioctl (or add a new
> one that got the required information without having to go through the
> old HDIOGETGEO path to extract the value from a fictitious geometry).
> 
> Greg was a bit sceptical of the value of the above proposal ...
> 

However, you have to admit that kind of defeats the whole point of 
having this information in sysfs.  IMNSHO, even scanning sysfs is better 
than keep adding binary ioctls.

	-hpa


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: What to do about the 2TB limit on HDIO_GETGEO ?
  2008-04-15 18:16                           ` H. Peter Anvin
@ 2008-04-15 23:43                             ` Dan Williams
  2008-04-16 20:55                               ` patch sysfs-add-sys-dev-char-block-to-lookup-sysfs-path-by-major-minor.patch added to gregkh-2.6 tree gregkh
  2008-04-16 20:55                                 ` gregkh
  0 siblings, 2 replies; 59+ messages in thread
From: Dan Williams @ 2008-04-15 23:43 UTC (permalink / raw)
  To: Andrew Morton
  Cc: James Bottomley, Kay Sievers, Tejun Heo, Mark Lord, Greg KH,
	Jens Axboe, Jeff Garzik, Linus Torvalds, Linux Kernel,
	IDE/ATA development list, linux-scsi, H. Peter Anvin

Subject: sysfs: add /sys/dev/{char,block} to lookup sysfs path by major:minor

From: Dan Williams <dan.j.williams@intel.com>

Why?:
There are occasions where userspace would like to access sysfs
attributes for a device but it may not know how sysfs has named the
device or the path.  For example what is the sysfs path for
/dev/disk/by-id/ata-ST3160827AS_5MT004CK?  With this change a call to
stat(2) returns the major:minor then userspace can see that
/sys/dev/block/8:32 links to /sys/block/sdc.

What are the alternatives?:
1/ Add an ioctl to return the path: Doable, but sysfs is meant to reduce
   the need to proliferate ioctl interfaces into the kernel, so this
   seems counter productive.

2/ Use udev to create these symlinks: Also doable, but it adds a
   udev dependency to utilities that might be running in a limited
   environment like an initramfs.

3/ Do a full-tree search of sysfs.

Cc: NeilBrown <neilb@suse.de>
Cc: Tejun Heo <htejun@gmail.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Greg KH <gregkh@suse.de>
Acked-by: Mark Lord <lkml@rtr.ca>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Reviewed-by: SL Baur <steve@xemacs.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---

Andrew here is an updated patch with some presumptive acked-by's from Mark and hpa.

* fixed up ENOMEM handling in devices_init()
* added a short blurb in Documentation/filesystems/sysfs.txt
* dropped the size of the buffer passed to format_dev_t a bit

 Documentation/filesystems/sysfs.txt |    6 +++++
 drivers/base/core.c                 |   46 ++++++++++++++++++++++++++++++++++-
 2 files changed, 51 insertions(+), 1 deletions(-)


diff --git a/Documentation/filesystems/sysfs.txt b/Documentation/filesystems/sysfs.txt
index 7f27b8f..9e9c348 100644
--- a/Documentation/filesystems/sysfs.txt
+++ b/Documentation/filesystems/sysfs.txt
@@ -248,6 +248,7 @@ The top level sysfs directory looks like:
 block/
 bus/
 class/
+dev/
 devices/
 firmware/
 net/
@@ -274,6 +275,11 @@ fs/ contains a directory for some filesystems.  Currently each
 filesystem wanting to export attributes must create its own hierarchy
 below fs/ (see ./fuse.txt for an example).
 
+dev/ contains two directories char/ and block/. Inside these two
+directories there are symlinks named <major>:<minor>.  These symlinks
+point to the sysfs directory for the given device.  /sys/dev provides a
+quick way to lookup the sysfs interface for a device from the result of
+a stat(2) operation.
 
 More information can driver-model specific features can be found in
 Documentation/driver-model/. 
diff --git a/drivers/base/core.c b/drivers/base/core.c
index 24198ad..ba21118 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -27,6 +27,9 @@
 
 int (*platform_notify)(struct device *dev) = NULL;
 int (*platform_notify_remove)(struct device *dev) = NULL;
+static struct kobject *dev_kobj;
+static struct kobject *char_kobj;
+static struct kobject *block_kobj;
 
 #ifdef CONFIG_BLOCK
 static inline int device_is_not_partition(struct device *dev)
@@ -759,6 +762,11 @@ static void device_remove_class_symlinks(struct device *dev)
 	sysfs_remove_link(&dev->kobj, "subsystem");
 }
 
+static struct kobject *device_to_dev_kobj(struct device *dev)
+{
+	return dev->class == &block_class ? block_kobj : char_kobj;
+}
+
 /**
  * device_add - add device to device hierarchy.
  * @dev: device.
@@ -775,6 +783,7 @@ int device_add(struct device *dev)
 	struct device *parent = NULL;
 	struct class_interface *class_intf;
 	int error;
+	char devt_str[15];
 
 	dev = get_device(dev);
 	if (!dev || !strlen(dev->bus_id)) {
@@ -806,9 +815,16 @@ int device_add(struct device *dev)
 		goto attrError;
 
 	if (MAJOR(dev->devt)) {
+		struct kobject *kobj = device_to_dev_kobj(dev);
+
 		error = device_create_file(dev, &devt_attr);
 		if (error)
 			goto ueventattrError;
+
+		format_dev_t(devt_str, dev->devt);
+		error = sysfs_create_link(kobj, &dev->kobj, devt_str);
+		if (error)
+			goto devtattrError;
 	}
 
 	error = device_add_class_symlinks(dev);
@@ -854,6 +870,9 @@ int device_add(struct device *dev)
 	device_remove_class_symlinks(dev);
  SymlinkError:
 	if (MAJOR(dev->devt))
+		sysfs_remove_link(device_to_dev_kobj(dev), devt_str);
+ devtattrError:
+	if (MAJOR(dev->devt))
 		device_remove_file(dev, &devt_attr);
  ueventattrError:
 	device_remove_file(dev, &uevent_attr);
@@ -925,12 +944,16 @@ void device_del(struct device *dev)
 {
 	struct device *parent = dev->parent;
 	struct class_interface *class_intf;
+	char devt_str[15];
 
 	device_pm_remove(dev);
 	if (parent)
 		klist_del(&dev->knode_parent);
-	if (MAJOR(dev->devt))
+	if (MAJOR(dev->devt)) {
+		format_dev_t(devt_str, dev->devt);
+		sysfs_remove_link(device_to_dev_kobj(dev), devt_str);
 		device_remove_file(dev, &devt_attr);
+	}
 	if (dev->class) {
 		device_remove_class_symlinks(dev);
 
@@ -1055,7 +1078,25 @@ int __init devices_init(void)
 	devices_kset = kset_create_and_add("devices", &device_uevent_ops, NULL);
 	if (!devices_kset)
 		return -ENOMEM;
+	dev_kobj = kobject_create_and_add("dev", NULL);
+	if (!dev_kobj)
+		goto dev_kobj_err;
+	block_kobj = kobject_create_and_add("block", dev_kobj);
+	if (!block_kobj)
+		goto block_kobj_err;
+	char_kobj = kobject_create_and_add("char", dev_kobj);
+	if (!char_kobj)
+		goto char_kobj_err;
+
 	return 0;
+
+ char_kobj_err:
+	kobject_put(block_kobj);
+ block_kobj_err:
+	kobject_put(dev_kobj);
+ dev_kobj_err:
+	kset_unregister(devices_kset);
+	return -ENOMEM;
 }
 
 EXPORT_SYMBOL_GPL(device_for_each_child);
@@ -1380,4 +1421,7 @@ void device_shutdown(void)
 			dev->driver->shutdown(dev);
 		}
 	}
+	kobject_put(char_kobj);
+	kobject_put(block_kobj);
+	kobject_put(dev_kobj);
 }

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* patch sysfs-add-sys-dev-char-block-to-lookup-sysfs-path-by-major-minor.patch added to gregkh-2.6 tree
  2008-04-15 23:43                             ` Dan Williams
@ 2008-04-16 20:55                               ` gregkh
  2008-04-16 20:55                                 ` gregkh
  1 sibling, 0 replies; 59+ messages in thread
From: gregkh @ 2008-04-16 20:55 UTC (permalink / raw)
  To: dan.j.williams, James.Bottomley, akpm, axboe, gregkh, hpa,
	htejun, jgarzik, kay.si


This is a note to let you know that I've just added the patch titled

     Subject: sysfs: add /sys/dev/{char,block} to lookup sysfs path by major:minor

to my gregkh-2.6 tree.  Its filename is

     sysfs-add-sys-dev-char-block-to-lookup-sysfs-path-by-major-minor.patch

This tree can be found at 
    http://www.kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/patches/


>From dan.j.williams@intel.com  Wed Apr 16 13:49:38 2008
From: Dan Williams <dan.j.williams@intel.com>
Date: Tue, 15 Apr 2008 16:43:15 -0700
Subject: sysfs: add /sys/dev/{char,block} to lookup sysfs path by major:minor
To: Andrew Morton <akpm@linux-foundation.org>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>, Kay Sievers <kay.sievers@vrfy.org>, Tejun Heo <htejun@gmail.com>, Mark Lord <lkml@rtr.ca>, Greg KH <gregkh@suse.de>, Jens Axboe <axboe@kernel.dk>, Jeff Garzik <jgarzik@pobox.com>, Linus Torvalds <torvalds@linux-foundation.org>, Linux Kernel <linux-kernel@vger.kernel.org>, IDE/ATA development list <linux-ide@vger.kernel.org>, linux-scsi <linux-scsi@vger.kernel.org>, "H. Peter Anvin" <hpa@zytor.com>
Message-ID: <1208302995.21877.12.camel@dwillia2-linux.ch.intel.com>


From: Dan Williams <dan.j.williams@intel.com>

Why?:
There are occasions where userspace would like to access sysfs
attributes for a device but it may not know how sysfs has named the
device or the path.  For example what is the sysfs path for
/dev/disk/by-id/ata-ST3160827AS_5MT004CK?  With this change a call to
stat(2) returns the major:minor then userspace can see that
/sys/dev/block/8:32 links to /sys/block/sdc.

What are the alternatives?:
1/ Add an ioctl to return the path: Doable, but sysfs is meant to reduce
   the need to proliferate ioctl interfaces into the kernel, so this
   seems counter productive.

2/ Use udev to create these symlinks: Also doable, but it adds a
   udev dependency to utilities that might be running in a limited
   environment like an initramfs.

3/ Do a full-tree search of sysfs.

Cc: Neil Brown <neilb@suse.de>
Cc: Tejun Heo <htejun@gmail.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Greg KH <gregkh@suse.de>
Acked-by: Mark Lord <lkml@rtr.ca>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Reviewed-by: SL Baur <steve@xemacs.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 Documentation/filesystems/sysfs.txt |    6 ++++
 drivers/base/core.c                 |   46 +++++++++++++++++++++++++++++++++++-
 2 files changed, 51 insertions(+), 1 deletion(-)

--- a/Documentation/filesystems/sysfs.txt
+++ b/Documentation/filesystems/sysfs.txt
@@ -248,6 +248,7 @@ The top level sysfs directory looks like
 block/
 bus/
 class/
+dev/
 devices/
 firmware/
 net/
@@ -274,6 +275,11 @@ fs/ contains a directory for some filesy
 filesystem wanting to export attributes must create its own hierarchy
 below fs/ (see ./fuse.txt for an example).
 
+dev/ contains two directories char/ and block/. Inside these two
+directories there are symlinks named <major>:<minor>.  These symlinks
+point to the sysfs directory for the given device.  /sys/dev provides a
+quick way to lookup the sysfs interface for a device from the result of
+a stat(2) operation.
 
 More information can driver-model specific features can be found in
 Documentation/driver-model/. 
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -27,6 +27,9 @@
 
 int (*platform_notify)(struct device *dev) = NULL;
 int (*platform_notify_remove)(struct device *dev) = NULL;
+static struct kobject *dev_kobj;
+static struct kobject *char_kobj;
+static struct kobject *block_kobj;
 
 #ifdef CONFIG_BLOCK
 static inline int device_is_not_partition(struct device *dev)
@@ -760,6 +763,11 @@ static void device_remove_class_symlinks
 	sysfs_remove_link(&dev->kobj, "subsystem");
 }
 
+static struct kobject *device_to_dev_kobj(struct device *dev)
+{
+	return dev->class == &block_class ? block_kobj : char_kobj;
+}
+
 /**
  * device_add - add device to device hierarchy.
  * @dev: device.
@@ -776,6 +784,7 @@ int device_add(struct device *dev)
 	struct device *parent = NULL;
 	struct class_interface *class_intf;
 	int error;
+	char devt_str[15];
 
 	dev = get_device(dev);
 	if (!dev || !strlen(dev->bus_id)) {
@@ -807,9 +816,16 @@ int device_add(struct device *dev)
 		goto attrError;
 
 	if (MAJOR(dev->devt)) {
+		struct kobject *kobj = device_to_dev_kobj(dev);
+
 		error = device_create_file(dev, &devt_attr);
 		if (error)
 			goto ueventattrError;
+
+		format_dev_t(devt_str, dev->devt);
+		error = sysfs_create_link(kobj, &dev->kobj, devt_str);
+		if (error)
+			goto devtattrError;
 	}
 
 	error = device_add_class_symlinks(dev);
@@ -854,6 +870,9 @@ int device_add(struct device *dev)
 	device_remove_class_symlinks(dev);
  SymlinkError:
 	if (MAJOR(dev->devt))
+		sysfs_remove_link(device_to_dev_kobj(dev), devt_str);
+ devtattrError:
+	if (MAJOR(dev->devt))
 		device_remove_file(dev, &devt_attr);
  ueventattrError:
 	device_remove_file(dev, &uevent_attr);
@@ -925,12 +944,16 @@ void device_del(struct device *dev)
 {
 	struct device *parent = dev->parent;
 	struct class_interface *class_intf;
+	char devt_str[15];
 
 	device_pm_remove(dev);
 	if (parent)
 		klist_del(&dev->knode_parent);
-	if (MAJOR(dev->devt))
+	if (MAJOR(dev->devt)) {
+		format_dev_t(devt_str, dev->devt);
+		sysfs_remove_link(device_to_dev_kobj(dev), devt_str);
 		device_remove_file(dev, &devt_attr);
+	}
 	if (dev->class) {
 		device_remove_class_symlinks(dev);
 
@@ -1055,7 +1078,25 @@ int __init devices_init(void)
 	devices_kset = kset_create_and_add("devices", &device_uevent_ops, NULL);
 	if (!devices_kset)
 		return -ENOMEM;
+	dev_kobj = kobject_create_and_add("dev", NULL);
+	if (!dev_kobj)
+		goto dev_kobj_err;
+	block_kobj = kobject_create_and_add("block", dev_kobj);
+	if (!block_kobj)
+		goto block_kobj_err;
+	char_kobj = kobject_create_and_add("char", dev_kobj);
+	if (!char_kobj)
+		goto char_kobj_err;
+
 	return 0;
+
+ char_kobj_err:
+	kobject_put(block_kobj);
+ block_kobj_err:
+	kobject_put(dev_kobj);
+ dev_kobj_err:
+	kset_unregister(devices_kset);
+	return -ENOMEM;
 }
 
 EXPORT_SYMBOL_GPL(device_for_each_child);
@@ -1351,4 +1392,7 @@ void device_shutdown(void)
 			dev->driver->shutdown(dev);
 		}
 	}
+	kobject_put(char_kobj);
+	kobject_put(block_kobj);
+	kobject_put(dev_kobj);
 }


Patches currently in gregkh-2.6 which might be from dan.j.williams@intel.com are

driver-core/sysfs-refill-attribute-buffer-when-reading-from-offset-0.patch
driver-core/sysfs-add-sys-dev-char-block-to-lookup-sysfs-path-by-major-minor.patch

^ permalink raw reply	[flat|nested] 59+ messages in thread

* patch sysfs-add-sys-dev-char-block-to-lookup-sysfs-path-by-major-minor.patch added to gregkh-2.6 tree
  2008-04-15 23:43                             ` Dan Williams
@ 2008-04-16 20:55                                 ` gregkh
  2008-04-16 20:55                                 ` gregkh
  1 sibling, 0 replies; 59+ messages in thread
From: gregkh @ 2008-04-16 20:55 UTC (permalink / raw)
  To: dan.j.williams, James.Bottomley, akpm, axboe, gregkh, hpa,
	htejun, jgarzik, kay.sievers, linux-ide, linux-kernel,
	linux-scsi, lkml, neilb, steve, torvalds


This is a note to let you know that I've just added the patch titled

     Subject: sysfs: add /sys/dev/{char,block} to lookup sysfs path by major:minor

to my gregkh-2.6 tree.  Its filename is

     sysfs-add-sys-dev-char-block-to-lookup-sysfs-path-by-major-minor.patch

This tree can be found at 
    http://www.kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/patches/


>From dan.j.williams@intel.com  Wed Apr 16 13:49:38 2008
From: Dan Williams <dan.j.williams@intel.com>
Date: Tue, 15 Apr 2008 16:43:15 -0700
Subject: sysfs: add /sys/dev/{char,block} to lookup sysfs path by major:minor
To: Andrew Morton <akpm@linux-foundation.org>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>, Kay Sievers <kay.sievers@vrfy.org>, Tejun Heo <htejun@gmail.com>, Mark Lord <lkml@rtr.ca>, Greg KH <gregkh@suse.de>, Jens Axboe <axboe@kernel.dk>, Jeff Garzik <jgarzik@pobox.com>, Linus Torvalds <torvalds@linux-foundation.org>, Linux Kernel <linux-kernel@vger.kernel.org>, IDE/ATA development list <linux-ide@vger.kernel.org>, linux-scsi <linux-scsi@vger.kernel.org>, "H. Peter Anvin" <hpa@zytor.com>
Message-ID: <1208302995.21877.12.camel@dwillia2-linux.ch.intel.com>


From: Dan Williams <dan.j.williams@intel.com>

Why?:
There are occasions where userspace would like to access sysfs
attributes for a device but it may not know how sysfs has named the
device or the path.  For example what is the sysfs path for
/dev/disk/by-id/ata-ST3160827AS_5MT004CK?  With this change a call to
stat(2) returns the major:minor then userspace can see that
/sys/dev/block/8:32 links to /sys/block/sdc.

What are the alternatives?:
1/ Add an ioctl to return the path: Doable, but sysfs is meant to reduce
   the need to proliferate ioctl interfaces into the kernel, so this
   seems counter productive.

2/ Use udev to create these symlinks: Also doable, but it adds a
   udev dependency to utilities that might be running in a limited
   environment like an initramfs.

3/ Do a full-tree search of sysfs.

Cc: Neil Brown <neilb@suse.de>
Cc: Tejun Heo <htejun@gmail.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Greg KH <gregkh@suse.de>
Acked-by: Mark Lord <lkml@rtr.ca>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Reviewed-by: SL Baur <steve@xemacs.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 Documentation/filesystems/sysfs.txt |    6 ++++
 drivers/base/core.c                 |   46 +++++++++++++++++++++++++++++++++++-
 2 files changed, 51 insertions(+), 1 deletion(-)

--- a/Documentation/filesystems/sysfs.txt
+++ b/Documentation/filesystems/sysfs.txt
@@ -248,6 +248,7 @@ The top level sysfs directory looks like
 block/
 bus/
 class/
+dev/
 devices/
 firmware/
 net/
@@ -274,6 +275,11 @@ fs/ contains a directory for some filesy
 filesystem wanting to export attributes must create its own hierarchy
 below fs/ (see ./fuse.txt for an example).
 
+dev/ contains two directories char/ and block/. Inside these two
+directories there are symlinks named <major>:<minor>.  These symlinks
+point to the sysfs directory for the given device.  /sys/dev provides a
+quick way to lookup the sysfs interface for a device from the result of
+a stat(2) operation.
 
 More information can driver-model specific features can be found in
 Documentation/driver-model/. 
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -27,6 +27,9 @@
 
 int (*platform_notify)(struct device *dev) = NULL;
 int (*platform_notify_remove)(struct device *dev) = NULL;
+static struct kobject *dev_kobj;
+static struct kobject *char_kobj;
+static struct kobject *block_kobj;
 
 #ifdef CONFIG_BLOCK
 static inline int device_is_not_partition(struct device *dev)
@@ -760,6 +763,11 @@ static void device_remove_class_symlinks
 	sysfs_remove_link(&dev->kobj, "subsystem");
 }
 
+static struct kobject *device_to_dev_kobj(struct device *dev)
+{
+	return dev->class == &block_class ? block_kobj : char_kobj;
+}
+
 /**
  * device_add - add device to device hierarchy.
  * @dev: device.
@@ -776,6 +784,7 @@ int device_add(struct device *dev)
 	struct device *parent = NULL;
 	struct class_interface *class_intf;
 	int error;
+	char devt_str[15];
 
 	dev = get_device(dev);
 	if (!dev || !strlen(dev->bus_id)) {
@@ -807,9 +816,16 @@ int device_add(struct device *dev)
 		goto attrError;
 
 	if (MAJOR(dev->devt)) {
+		struct kobject *kobj = device_to_dev_kobj(dev);
+
 		error = device_create_file(dev, &devt_attr);
 		if (error)
 			goto ueventattrError;
+
+		format_dev_t(devt_str, dev->devt);
+		error = sysfs_create_link(kobj, &dev->kobj, devt_str);
+		if (error)
+			goto devtattrError;
 	}
 
 	error = device_add_class_symlinks(dev);
@@ -854,6 +870,9 @@ int device_add(struct device *dev)
 	device_remove_class_symlinks(dev);
  SymlinkError:
 	if (MAJOR(dev->devt))
+		sysfs_remove_link(device_to_dev_kobj(dev), devt_str);
+ devtattrError:
+	if (MAJOR(dev->devt))
 		device_remove_file(dev, &devt_attr);
  ueventattrError:
 	device_remove_file(dev, &uevent_attr);
@@ -925,12 +944,16 @@ void device_del(struct device *dev)
 {
 	struct device *parent = dev->parent;
 	struct class_interface *class_intf;
+	char devt_str[15];
 
 	device_pm_remove(dev);
 	if (parent)
 		klist_del(&dev->knode_parent);
-	if (MAJOR(dev->devt))
+	if (MAJOR(dev->devt)) {
+		format_dev_t(devt_str, dev->devt);
+		sysfs_remove_link(device_to_dev_kobj(dev), devt_str);
 		device_remove_file(dev, &devt_attr);
+	}
 	if (dev->class) {
 		device_remove_class_symlinks(dev);
 
@@ -1055,7 +1078,25 @@ int __init devices_init(void)
 	devices_kset = kset_create_and_add("devices", &device_uevent_ops, NULL);
 	if (!devices_kset)
 		return -ENOMEM;
+	dev_kobj = kobject_create_and_add("dev", NULL);
+	if (!dev_kobj)
+		goto dev_kobj_err;
+	block_kobj = kobject_create_and_add("block", dev_kobj);
+	if (!block_kobj)
+		goto block_kobj_err;
+	char_kobj = kobject_create_and_add("char", dev_kobj);
+	if (!char_kobj)
+		goto char_kobj_err;
+
 	return 0;
+
+ char_kobj_err:
+	kobject_put(block_kobj);
+ block_kobj_err:
+	kobject_put(dev_kobj);
+ dev_kobj_err:
+	kset_unregister(devices_kset);
+	return -ENOMEM;
 }
 
 EXPORT_SYMBOL_GPL(device_for_each_child);
@@ -1351,4 +1392,7 @@ void device_shutdown(void)
 			dev->driver->shutdown(dev);
 		}
 	}
+	kobject_put(char_kobj);
+	kobject_put(block_kobj);
+	kobject_put(dev_kobj);
 }


Patches currently in gregkh-2.6 which might be from dan.j.williams@intel.com are

driver-core/sysfs-refill-attribute-buffer-when-reading-from-offset-0.patch
driver-core/sysfs-add-sys-dev-char-block-to-lookup-sysfs-path-by-major-minor.patch

^ permalink raw reply	[flat|nested] 59+ messages in thread

* patch sysfs-add-sys-dev-char-block-to-lookup-sysfs-path-by-major-minor.patch added to gregkh-2.6 tree
@ 2008-04-16 20:55                                 ` gregkh
  0 siblings, 0 replies; 59+ messages in thread
From: gregkh @ 2008-04-16 20:55 UTC (permalink / raw)
  To: dan.j.williams, James.Bottomley, akpm, axboe, gregkh, hpa,
	htejun, jgarzik, kay.si


This is a note to let you know that I've just added the patch titled

     Subject: sysfs: add /sys/dev/{char,block} to lookup sysfs path by major:minor

to my gregkh-2.6 tree.  Its filename is

     sysfs-add-sys-dev-char-block-to-lookup-sysfs-path-by-major-minor.patch

This tree can be found at 
    http://www.kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/patches/


>From dan.j.williams@intel.com  Wed Apr 16 13:49:38 2008
From: Dan Williams <dan.j.williams@intel.com>
Date: Tue, 15 Apr 2008 16:43:15 -0700
Subject: sysfs: add /sys/dev/{char,block} to lookup sysfs path by major:minor
To: Andrew Morton <akpm@linux-foundation.org>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>, Kay Sievers <kay.sievers@vrfy.org>, Tejun Heo <htejun@gmail.com>, Mark Lord <lkml@rtr.ca>, Greg KH <gregkh@suse.de>, Jens Axboe <axboe@kernel.dk>, Jeff Garzik <jgarzik@pobox.com>, Linus Torvalds <torvalds@linux-foundation.org>, Linux Kernel <linux-kernel@vger.kernel.org>, IDE/ATA development list <linux-ide@vger.kernel.org>, linux-scsi <linux-scsi@vger.kernel.org>, "H. Peter Anvin" <hpa@zytor.com>
Message-ID: <1208302995.21877.12.camel@dwillia2-linux.ch.intel.com>


From: Dan Williams <dan.j.williams@intel.com>

Why?:
There are occasions where userspace would like to access sysfs
attributes for a device but it may not know how sysfs has named the
device or the path.  For example what is the sysfs path for
/dev/disk/by-id/ata-ST3160827AS_5MT004CK?  With this change a call to
stat(2) returns the major:minor then userspace can see that
/sys/dev/block/8:32 links to /sys/block/sdc.

What are the alternatives?:
1/ Add an ioctl to return the path: Doable, but sysfs is meant to reduce
   the need to proliferate ioctl interfaces into the kernel, so this
   seems counter productive.

2/ Use udev to create these symlinks: Also doable, but it adds a
   udev dependency to utilities that might be running in a limited
   environment like an initramfs.

3/ Do a full-tree search of sysfs.

Cc: Neil Brown <neilb@suse.de>
Cc: Tejun Heo <htejun@gmail.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Greg KH <gregkh@suse.de>
Acked-by: Mark Lord <lkml@rtr.ca>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Reviewed-by: SL Baur <steve@xemacs.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 Documentation/filesystems/sysfs.txt |    6 ++++
 drivers/base/core.c                 |   46 +++++++++++++++++++++++++++++++++++-
 2 files changed, 51 insertions(+), 1 deletion(-)

--- a/Documentation/filesystems/sysfs.txt
+++ b/Documentation/filesystems/sysfs.txt
@@ -248,6 +248,7 @@ The top level sysfs directory looks like
 block/
 bus/
 class/
+dev/
 devices/
 firmware/
 net/
@@ -274,6 +275,11 @@ fs/ contains a directory for some filesy
 filesystem wanting to export attributes must create its own hierarchy
 below fs/ (see ./fuse.txt for an example).
 
+dev/ contains two directories char/ and block/. Inside these two
+directories there are symlinks named <major>:<minor>.  These symlinks
+point to the sysfs directory for the given device.  /sys/dev provides a
+quick way to lookup the sysfs interface for a device from the result of
+a stat(2) operation.
 
 More information can driver-model specific features can be found in
 Documentation/driver-model/. 
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -27,6 +27,9 @@
 
 int (*platform_notify)(struct device *dev) = NULL;
 int (*platform_notify_remove)(struct device *dev) = NULL;
+static struct kobject *dev_kobj;
+static struct kobject *char_kobj;
+static struct kobject *block_kobj;
 
 #ifdef CONFIG_BLOCK
 static inline int device_is_not_partition(struct device *dev)
@@ -760,6 +763,11 @@ static void device_remove_class_symlinks
 	sysfs_remove_link(&dev->kobj, "subsystem");
 }
 
+static struct kobject *device_to_dev_kobj(struct device *dev)
+{
+	return dev->class == &block_class ? block_kobj : char_kobj;
+}
+
 /**
  * device_add - add device to device hierarchy.
  * @dev: device.
@@ -776,6 +784,7 @@ int device_add(struct device *dev)
 	struct device *parent = NULL;
 	struct class_interface *class_intf;
 	int error;
+	char devt_str[15];
 
 	dev = get_device(dev);
 	if (!dev || !strlen(dev->bus_id)) {
@@ -807,9 +816,16 @@ int device_add(struct device *dev)
 		goto attrError;
 
 	if (MAJOR(dev->devt)) {
+		struct kobject *kobj = device_to_dev_kobj(dev);
+
 		error = device_create_file(dev, &devt_attr);
 		if (error)
 			goto ueventattrError;
+
+		format_dev_t(devt_str, dev->devt);
+		error = sysfs_create_link(kobj, &dev->kobj, devt_str);
+		if (error)
+			goto devtattrError;
 	}
 
 	error = device_add_class_symlinks(dev);
@@ -854,6 +870,9 @@ int device_add(struct device *dev)
 	device_remove_class_symlinks(dev);
  SymlinkError:
 	if (MAJOR(dev->devt))
+		sysfs_remove_link(device_to_dev_kobj(dev), devt_str);
+ devtattrError:
+	if (MAJOR(dev->devt))
 		device_remove_file(dev, &devt_attr);
  ueventattrError:
 	device_remove_file(dev, &uevent_attr);
@@ -925,12 +944,16 @@ void device_del(struct device *dev)
 {
 	struct device *parent = dev->parent;
 	struct class_interface *class_intf;
+	char devt_str[15];
 
 	device_pm_remove(dev);
 	if (parent)
 		klist_del(&dev->knode_parent);
-	if (MAJOR(dev->devt))
+	if (MAJOR(dev->devt)) {
+		format_dev_t(devt_str, dev->devt);
+		sysfs_remove_link(device_to_dev_kobj(dev), devt_str);
 		device_remove_file(dev, &devt_attr);
+	}
 	if (dev->class) {
 		device_remove_class_symlinks(dev);
 
@@ -1055,7 +1078,25 @@ int __init devices_init(void)
 	devices_kset = kset_create_and_add("devices", &device_uevent_ops, NULL);
 	if (!devices_kset)
 		return -ENOMEM;
+	dev_kobj = kobject_create_and_add("dev", NULL);
+	if (!dev_kobj)
+		goto dev_kobj_err;
+	block_kobj = kobject_create_and_add("block", dev_kobj);
+	if (!block_kobj)
+		goto block_kobj_err;
+	char_kobj = kobject_create_and_add("char", dev_kobj);
+	if (!char_kobj)
+		goto char_kobj_err;
+
 	return 0;
+
+ char_kobj_err:
+	kobject_put(block_kobj);
+ block_kobj_err:
+	kobject_put(dev_kobj);
+ dev_kobj_err:
+	kset_unregister(devices_kset);
+	return -ENOMEM;
 }
 
 EXPORT_SYMBOL_GPL(device_for_each_child);
@@ -1351,4 +1392,7 @@ void device_shutdown(void)
 			dev->driver->shutdown(dev);
 		}
 	}
+	kobject_put(char_kobj);
+	kobject_put(block_kobj);
+	kobject_put(dev_kobj);
 }


Patches currently in gregkh-2.6 which might be from dan.j.williams@intel.com are

driver-core/sysfs-refill-attribute-buffer-when-reading-from-offset-0.patch
driver-core/sysfs-add-sys-dev-char-block-to-lookup-sysfs-path-by-major-minor.patch

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: What to do about the 2TB limit on HDIO_GETGEO ?
  2008-03-27 14:45                         ` Mark Lord
@ 2008-03-27 15:15                           ` Greg KH
  0 siblings, 0 replies; 59+ messages in thread
From: Greg KH @ 2008-03-27 15:15 UTC (permalink / raw)
  To: Mark Lord
  Cc: Bodo Eggert, H. Peter Anvin, Randy Dunlap, Jens Axboe,
	Jeff Garzik, Tejun Heo, Linus Torvalds, Andrew Morton,
	Linux Kernel, IDE/ATA development list, linux-scsi

On Thu, Mar 27, 2008 at 10:45:54AM -0400, Mark Lord wrote:
> Greg KH wrote:
>>
>> If sysfs is stupid, then use an ioctl, have I objected to that?
> ..
>
> Well, at this point it certainly seems a lot simpler
> than trying to get the sysfs "maintainers" to improve it.

I'm sorry, have I missed a patch that was submitted that adds this new
functionality?

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: What to do about the 2TB limit on HDIO_GETGEO ?
  2008-03-27  3:52                       ` Greg KH
  2008-03-27  4:57                         ` H. Peter Anvin
@ 2008-03-27 14:45                         ` Mark Lord
  2008-03-27 15:15                           ` Greg KH
  1 sibling, 1 reply; 59+ messages in thread
From: Mark Lord @ 2008-03-27 14:45 UTC (permalink / raw)
  To: Greg KH
  Cc: Bodo Eggert, H. Peter Anvin, Randy Dunlap, Jens Axboe,
	Jeff Garzik, Tejun Heo, Linus Torvalds, Andrew Morton,
	Linux Kernel, IDE/ATA development list, linux-scsi

Greg KH wrote:
>
> If sysfs is stupid, then use an ioctl, have I objected to that?
..

Well, at this point it certainly seems a lot simpler
than trying to get the sysfs "maintainers" to improve it.

Cheers

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: What to do about the 2TB limit on HDIO_GETGEO ?
  2008-03-27  3:52                       ` Greg KH
@ 2008-03-27  4:57                         ` H. Peter Anvin
  2008-03-27 14:45                         ` Mark Lord
  1 sibling, 0 replies; 59+ messages in thread
From: H. Peter Anvin @ 2008-03-27  4:57 UTC (permalink / raw)
  To: Greg KH
  Cc: Bodo Eggert, Randy Dunlap, Mark Lord, Jens Axboe, Jeff Garzik,
	Tejun Heo, Linus Torvalds, Andrew Morton, Linux Kernel,
	IDE/ATA development list, linux-scsi

Greg KH wrote:
>> So e.g. lilo should depend on sysfs and *a*special*configuration* of udev,
>> while the admin MUST NOT use mknod'ed device files nor manually create
>> symlinks pointing to them, and not use relative path names?
>> That's plain stupid.
> 
> If sysfs is stupid, then use an ioctl, have I objected to that?

I think he's objecting to the dependency on udev configuration, not to 
sysfs.

	-hpa

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: What to do about the 2TB limit on HDIO_GETGEO ?
  2008-03-26 11:30                       ` Bodo Eggert
  (?)
@ 2008-03-27  3:52                       ` Greg KH
  2008-03-27  4:57                         ` H. Peter Anvin
  2008-03-27 14:45                         ` Mark Lord
  -1 siblings, 2 replies; 59+ messages in thread
From: Greg KH @ 2008-03-27  3:52 UTC (permalink / raw)
  To: Bodo Eggert
  Cc: H. Peter Anvin, Randy Dunlap, Mark Lord, Jens Axboe, Jeff Garzik,
	Tejun Heo, Linus Torvalds, Andrew Morton, Linux Kernel,
	IDE/ATA development list, linux-scsi

On Wed, Mar 26, 2008 at 12:30:40PM +0100, Bodo Eggert wrote:
> Greg KH <gregkh@suse.de> wrote:
> > On Tue, Mar 25, 2008 at 04:05:32PM -0700, H. Peter Anvin wrote:
> 
> >>> How does this have anything to do with boot times?  Do you really have a
> >>> foolish shell script that iteratorates over every single disk in the
> >>> sysfs tree for every disk?  What does it do that for?
> >>
> >> Any time you want to get the sysfs information for a filesystem which is
> >> already mounted, that's what you're forced to do.
> >>
> >>> I thought we were talking about 2TB disks here, with a proposed new
> >>> ioctl, not foolishness of boot scripts...
> >>
> >> I pointed out that having a way to map device numbers to sysfs directories
> >> would have the same effect, *and* would be usable for other purposes.  I'd
> >> rather see that than a new ioctl, and another, and another...
> > 
> > Again, a simple udev rule will give you that today if you really want
> > it...
> 
> So e.g. lilo should depend on sysfs and *a*special*configuration* of udev,
> while the admin MUST NOT use mknod'ed device files nor manually create
> symlinks pointing to them, and not use relative path names?
> That's plain stupid.

If sysfs is stupid, then use an ioctl, have I objected to that?

> > And I think 'udevinfo' can be used to retrieve this information as well.
> 
> $ udevinfo /dev/hda
> missing option
> $ udevinfo /dev/hda --help
> Usage: udevinfo OPTIONS
>   --query=<type>    query database for the specified value:
>     name            name of device node
>     symlink         pointing to node
>     path            sysfs device path
>     env             the device related imported environment
>     all             all values
> 
>   --path=<devpath>  sysfs device path used for query or chain
>   --name=<name>     node or symlink name used for query
> 
>   --root            prepend to query result or print udev_root
>   --attribute-walk  print all SYSFS_attributes along the device chain
>   --export-db       export the content of the udev database
>   --help            print this text
> $ udevinfo --name=/dev/hda
> missing option
> $ udevinfo --name=/dev/hda --query=all
> P: /block/hda
> N: hda
> S: disk/by-id/ata-Maxtor_2F040L0_F1748ZQE
> S: disk/by-path/pci-0000:00:0f.0-ide-0:0
> E: DEVTYPE=disk
> E: ID_TYPE=disk
> E: ID_MODEL=Maxtor_2F040L0
> E: ID_SERIAL=F1748ZQE
> E: ID_REVISION=VAM51JJ0
> E: ID_BUS=ata
> E: ID_PATH=pci-0000:00:0f.0-ide-0:0
> 
> 
> As you can see, it gives no major:minor information. But it is in the DB:

That should be easy to add, no one has ever asked for this information
from udevinfo before.  If it's needed, it can be provided.

> $ cd /dev/.udev/db
> $ grep -l hda * 2>/dev/null
> \x2fblock\x2fhda
> \x2fblock\x2fhda\x2fhda1
> $ cat "\x2fblock\x2fhda"
> N:hda
> S:disk/by-id/ata-Maxtor_2F040L0_F1748ZQE
> S:disk/by-path/pci-0000:00:0f.0-ide-0:0
> M:3:0
> E:DEVTYPE=disk
> E:ID_TYPE=disk
> E:ID_MODEL=Maxtor_2F040L0
> E:ID_SERIAL=F1748ZQE
> E:ID_REVISION=VAM51JJ0
> E:ID_BUS=ata
> E:ID_PATH=pci-0000:00:0f.0-ide-0:0
> 
> What a great tool - for making linux look bad.

Your constructive criticism is greatly appreciated, please continue.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: What to do about the 2TB limit on HDIO_GETGEO ?
       [not found]                   ` <abqKM-6Ka-13@gated-at.bofh.it>
@ 2008-03-26 11:30                     ` Bodo Eggert
  2008-03-26 11:30                       ` Bodo Eggert
  1 sibling, 0 replies; 59+ messages in thread
From: Bodo Eggert @ 2008-03-26 11:30 UTC (permalink / raw)
  To: Greg KH, H. Peter Anvin, Randy Dunlap, Mark Lord, Jens Axboe,
	Jeff Garzik

Greg KH <gregkh@suse.de> wrote:
> On Tue, Mar 25, 2008 at 04:05:32PM -0700, H. Peter Anvin wrote:

>>> How does this have anything to do with boot times?  Do you really have a
>>> foolish shell script that iteratorates over every single disk in the
>>> sysfs tree for every disk?  What does it do that for?
>>
>> Any time you want to get the sysfs information for a filesystem which is
>> already mounted, that's what you're forced to do.
>>
>>> I thought we were talking about 2TB disks here, with a proposed new
>>> ioctl, not foolishness of boot scripts...
>>
>> I pointed out that having a way to map device numbers to sysfs directories
>> would have the same effect, *and* would be usable for other purposes.  I'd
>> rather see that than a new ioctl, and another, and another...
> 
> Again, a simple udev rule will give you that today if you really want
> it...

So e.g. lilo should depend on sysfs and *a*special*configuration* of udev,
while the admin MUST NOT use mknod'ed device files nor manually create
symlinks pointing to them, and not use relative path names?
That's plain stupid.

> And I think 'udevinfo' can be used to retrieve this information as well.

$ udevinfo /dev/hda
missing option
$ udevinfo /dev/hda --help
Usage: udevinfo OPTIONS
  --query=<type>    query database for the specified value:
    name            name of device node
    symlink         pointing to node
    path            sysfs device path
    env             the device related imported environment
    all             all values

  --path=<devpath>  sysfs device path used for query or chain
  --name=<name>     node or symlink name used for query

  --root            prepend to query result or print udev_root
  --attribute-walk  print all SYSFS_attributes along the device chain
  --export-db       export the content of the udev database
  --help            print this text
$ udevinfo --name=/dev/hda
missing option
$ udevinfo --name=/dev/hda --query=all
P: /block/hda
N: hda
S: disk/by-id/ata-Maxtor_2F040L0_F1748ZQE
S: disk/by-path/pci-0000:00:0f.0-ide-0:0
E: DEVTYPE=disk
E: ID_TYPE=disk
E: ID_MODEL=Maxtor_2F040L0
E: ID_SERIAL=F1748ZQE
E: ID_REVISION=VAM51JJ0
E: ID_BUS=ata
E: ID_PATH=pci-0000:00:0f.0-ide-0:0


As you can see, it gives no major:minor information. But it is in the DB:

$ cd /dev/.udev/db
$ grep -l hda * 2>/dev/null
\x2fblock\x2fhda
\x2fblock\x2fhda\x2fhda1
$ cat "\x2fblock\x2fhda"
N:hda
S:disk/by-id/ata-Maxtor_2F040L0_F1748ZQE
S:disk/by-path/pci-0000:00:0f.0-ide-0:0
M:3:0
E:DEVTYPE=disk
E:ID_TYPE=disk
E:ID_MODEL=Maxtor_2F040L0
E:ID_SERIAL=F1748ZQE
E:ID_REVISION=VAM51JJ0
E:ID_BUS=ata
E:ID_PATH=pci-0000:00:0f.0-ide-0:0

What a great tool - for making linux look bad.



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: What to do about the 2TB limit on HDIO_GETGEO ?
       [not found]                   ` <abqKM-6Ka-13@gated-at.bofh.it>
@ 2008-03-26 11:30                       ` Bodo Eggert
  2008-03-26 11:30                       ` Bodo Eggert
  1 sibling, 0 replies; 59+ messages in thread
From: Bodo Eggert @ 2008-03-26 11:30 UTC (permalink / raw)
  To: Greg KH, H. Peter Anvin, Randy Dunlap, Mark Lord, Jens Axboe,
	Jeff Garzik, Tejun Heo, Linus Torvalds, Andrew Morton,
	Linux Kernel, IDE/ATA development list, linux-scsi

Greg KH <gregkh@suse.de> wrote:
> On Tue, Mar 25, 2008 at 04:05:32PM -0700, H. Peter Anvin wrote:

>>> How does this have anything to do with boot times?  Do you really have a
>>> foolish shell script that iteratorates over every single disk in the
>>> sysfs tree for every disk?  What does it do that for?
>>
>> Any time you want to get the sysfs information for a filesystem which is
>> already mounted, that's what you're forced to do.
>>
>>> I thought we were talking about 2TB disks here, with a proposed new
>>> ioctl, not foolishness of boot scripts...
>>
>> I pointed out that having a way to map device numbers to sysfs directories
>> would have the same effect, *and* would be usable for other purposes.  I'd
>> rather see that than a new ioctl, and another, and another...
> 
> Again, a simple udev rule will give you that today if you really want
> it...

So e.g. lilo should depend on sysfs and *a*special*configuration* of udev,
while the admin MUST NOT use mknod'ed device files nor manually create
symlinks pointing to them, and not use relative path names?
That's plain stupid.

> And I think 'udevinfo' can be used to retrieve this information as well.

$ udevinfo /dev/hda
missing option
$ udevinfo /dev/hda --help
Usage: udevinfo OPTIONS
  --query=<type>    query database for the specified value:
    name            name of device node
    symlink         pointing to node
    path            sysfs device path
    env             the device related imported environment
    all             all values

  --path=<devpath>  sysfs device path used for query or chain
  --name=<name>     node or symlink name used for query

  --root            prepend to query result or print udev_root
  --attribute-walk  print all SYSFS_attributes along the device chain
  --export-db       export the content of the udev database
  --help            print this text
$ udevinfo --name=/dev/hda
missing option
$ udevinfo --name=/dev/hda --query=all
P: /block/hda
N: hda
S: disk/by-id/ata-Maxtor_2F040L0_F1748ZQE
S: disk/by-path/pci-0000:00:0f.0-ide-0:0
E: DEVTYPE=disk
E: ID_TYPE=disk
E: ID_MODEL=Maxtor_2F040L0
E: ID_SERIAL=F1748ZQE
E: ID_REVISION=VAM51JJ0
E: ID_BUS=ata
E: ID_PATH=pci-0000:00:0f.0-ide-0:0


As you can see, it gives no major:minor information. But it is in the DB:

$ cd /dev/.udev/db
$ grep -l hda * 2>/dev/null
\x2fblock\x2fhda
\x2fblock\x2fhda\x2fhda1
$ cat "\x2fblock\x2fhda"
N:hda
S:disk/by-id/ata-Maxtor_2F040L0_F1748ZQE
S:disk/by-path/pci-0000:00:0f.0-ide-0:0
M:3:0
E:DEVTYPE=disk
E:ID_TYPE=disk
E:ID_MODEL=Maxtor_2F040L0
E:ID_SERIAL=F1748ZQE
E:ID_REVISION=VAM51JJ0
E:ID_BUS=ata
E:ID_PATH=pci-0000:00:0f.0-ide-0:0

What a great tool - for making linux look bad.



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: What to do about the 2TB limit on HDIO_GETGEO ?
@ 2008-03-26 11:30                       ` Bodo Eggert
  0 siblings, 0 replies; 59+ messages in thread
From: Bodo Eggert @ 2008-03-26 11:30 UTC (permalink / raw)
  To: Greg KH, H. Peter Anvin, Randy Dunlap, Mark Lord, Jens Axboe,
	Jeff Garzik

Greg KH <gregkh@suse.de> wrote:
> On Tue, Mar 25, 2008 at 04:05:32PM -0700, H. Peter Anvin wrote:

>>> How does this have anything to do with boot times?  Do you really have a
>>> foolish shell script that iteratorates over every single disk in the
>>> sysfs tree for every disk?  What does it do that for?
>>
>> Any time you want to get the sysfs information for a filesystem which is
>> already mounted, that's what you're forced to do.
>>
>>> I thought we were talking about 2TB disks here, with a proposed new
>>> ioctl, not foolishness of boot scripts...
>>
>> I pointed out that having a way to map device numbers to sysfs directories
>> would have the same effect, *and* would be usable for other purposes.  I'd
>> rather see that than a new ioctl, and another, and another...
> 
> Again, a simple udev rule will give you that today if you really want
> it...

So e.g. lilo should depend on sysfs and *a*special*configuration* of udev,
while the admin MUST NOT use mknod'ed device files nor manually create
symlinks pointing to them, and not use relative path names?
That's plain stupid.

> And I think 'udevinfo' can be used to retrieve this information as well.

$ udevinfo /dev/hda
missing option
$ udevinfo /dev/hda --help
Usage: udevinfo OPTIONS
  --query=<type>    query database for the specified value:
    name            name of device node
    symlink         pointing to node
    path            sysfs device path
    env             the device related imported environment
    all             all values

  --path=<devpath>  sysfs device path used for query or chain
  --name=<name>     node or symlink name used for query

  --root            prepend to query result or print udev_root
  --attribute-walk  print all SYSFS_attributes along the device chain
  --export-db       export the content of the udev database
  --help            print this text
$ udevinfo --name=/dev/hda
missing option
$ udevinfo --name=/dev/hda --query=all
P: /block/hda
N: hda
S: disk/by-id/ata-Maxtor_2F040L0_F1748ZQE
S: disk/by-path/pci-0000:00:0f.0-ide-0:0
E: DEVTYPE=disk
E: ID_TYPE=disk
E: ID_MODEL=Maxtor_2F040L0
E: ID_SERIAL=F1748ZQE
E: ID_REVISION=VAM51JJ0
E: ID_BUS=ata
E: ID_PATH=pci-0000:00:0f.0-ide-0:0


As you can see, it gives no major:minor information. But it is in the DB:

$ cd /dev/.udev/db
$ grep -l hda * 2>/dev/null
\x2fblock\x2fhda
\x2fblock\x2fhda\x2fhda1
$ cat "\x2fblock\x2fhda"
N:hda
S:disk/by-id/ata-Maxtor_2F040L0_F1748ZQE
S:disk/by-path/pci-0000:00:0f.0-ide-0:0
M:3:0
E:DEVTYPE=disk
E:ID_TYPE=disk
E:ID_MODEL=Maxtor_2F040L0
E:ID_SERIAL=F1748ZQE
E:ID_REVISION=VAM51JJ0
E:ID_BUS=ata
E:ID_PATH=pci-0000:00:0f.0-ide-0:0

What a great tool - for making linux look bad.



^ permalink raw reply	[flat|nested] 59+ messages in thread

end of thread, other threads:[~2008-04-16 20:57 UTC | newest]

Thread overview: 59+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <47E875AD.1000901@rtr.ca>
2008-03-25  4:02 ` What to do about the 2TB limit on HDIO_GETGEO ? Mark Lord
2008-03-25  4:19   ` Andrew Morton
2008-03-25  5:13   ` H. Peter Anvin
2008-03-25 13:37     ` Mark Lord
2008-03-25 13:55       ` H. Peter Anvin
2008-03-25 17:37         ` Mark Lord
2008-03-25 19:25           ` Greg KH
2008-03-25 19:34             ` Randy Dunlap
2008-03-25 20:36               ` H. Peter Anvin
2008-03-25 21:20                 ` Greg KH
2008-03-25 21:26                   ` H. Peter Anvin
2008-03-25 23:00                     ` Greg KH
2008-03-25 23:05                       ` H. Peter Anvin
2008-03-25 23:22                         ` Greg KH
2008-03-27 19:05                     ` Matthew Wilcox
2008-03-26  0:34             ` Mark Lord
2008-03-26  0:54               ` Tejun Heo
2008-03-26  3:38                 ` Greg KH
2008-03-26  4:24                   ` Tejun Heo
2008-03-26  6:04                     ` H. Peter Anvin
2008-03-27 19:29                 ` Kay Sievers
2008-03-27 19:38                   ` H. Peter Anvin
2008-04-11 23:25                     ` Dan Williams
2008-04-15  7:18                       ` Andrew Morton
2008-04-15 13:47                         ` Mark Lord
2008-04-15 14:20                         ` James Bottomley
2008-04-15 18:16                           ` H. Peter Anvin
2008-04-15 23:43                             ` Dan Williams
2008-04-16 20:55                               ` patch sysfs-add-sys-dev-char-block-to-lookup-sysfs-path-by-major-minor.patch added to gregkh-2.6 tree gregkh
2008-04-16 20:55                               ` gregkh
2008-04-16 20:55                                 ` gregkh
2008-03-27 18:51               ` What to do about the 2TB limit on HDIO_GETGEO ? Kay Sievers
2008-03-27 18:55                 ` H. Peter Anvin
2008-03-27 19:03                   ` Kay Sievers
2008-03-25 15:17   ` James Bottomley
2008-03-25 17:31     ` Mark Lord
2008-03-25 19:32       ` James Bottomley
2008-03-25 17:45     ` Greg Freemyer
2008-03-25 17:52       ` Randy Dunlap
2008-03-25 18:09         ` Matthew Wilcox
2008-03-26  9:58           ` Boaz Harrosh
2008-03-30  4:28       ` Matt Domsch
     [not found] ` <alpine.LFD.1.00.0803242254020.2775@woody.linux-foundation.org>
2008-03-25 13:34   ` Mark Lord
2008-03-25 13:51     ` Greg Freemyer
2008-03-25 14:31     ` Ric Wheeler
2008-03-25 15:25       ` Andrew Paprocki
2008-03-25 15:34       ` Matthew Wilcox
2008-03-25 15:48         ` Ric Wheeler
2008-03-25 16:47           ` Theodore Tso
2008-03-25 20:51             ` Theodore Tso
2008-03-25 20:51               ` Theodore Tso
2008-03-25 20:51             ` Theodore Tso
     [not found] <abhxL-xC-7@gated-at.bofh.it>
     [not found] ` <abhRd-1bf-15@gated-at.bofh.it>
     [not found]   ` <ablib-2zv-65@gated-at.bofh.it>
     [not found]     ` <abn0B-735-35@gated-at.bofh.it>
     [not found]       ` <abna7-7jK-3@gated-at.bofh.it>
     [not found]         ` <abo6b-11J-9@gated-at.bofh.it>
     [not found]           ` <aboSP-2Wf-29@gated-at.bofh.it>
     [not found]             ` <aboSP-2Wf-27@gated-at.bofh.it>
     [not found]               ` <abqrq-6eX-29@gated-at.bofh.it>
     [not found]                 ` <abqrq-6eX-27@gated-at.bofh.it>
     [not found]                   ` <abqKM-6Ka-13@gated-at.bofh.it>
2008-03-26 11:30                     ` Bodo Eggert
2008-03-26 11:30                     ` Bodo Eggert
2008-03-26 11:30                       ` Bodo Eggert
2008-03-27  3:52                       ` Greg KH
2008-03-27  4:57                         ` H. Peter Anvin
2008-03-27 14:45                         ` Mark Lord
2008-03-27 15:15                           ` Greg KH

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.