On Apr 19, 2017, at 4:58 PM, Darrick J. Wong wrote: > > On Wed, Apr 19, 2017 at 06:17:15PM +0300, Amir Goldstein wrote: >> On Wed, Apr 19, 2017 at 6:01 PM, Miklos Szeredi wrote: >>> On Wed, Apr 19, 2017 at 4:46 PM, Amir Goldstein wrote: >>>> Well, if you are lucky you can run into a filesystem that exports >>>> a file handle of type FILEID_INO32_GEN, then you *know* you're >>>> good to go. ext* will do that and xfs that was forever mounted with >>>> -o inode32. >>>> Even with xfs -o inode64, it will not use the MSB ino bits unless >>>> you are in the exabytes fs sizes. > > I think it only takes really big AGs for it to start using the >32 bit parts. > >>> >>> Could filesystems export a max-ino property in their sb? That would >>> help with doing this properly. >>> >> >> Sounds reasonable, but as max-ino usually derived from filesystem size >> and filesystems can grow size online, you will need to query both the >> 'soft' ino limit (without growing fs) and the 'hard' ino limit. >> >> Darrick, >> >> Are there bits in GETFSMAP to provide this info? > > Nope. I suppose there could be a way to find out the theoretical > maximum inode number for a filesystem (statvfsx, etc.) but on the other > hand I can also see the other fs developers not wanting to expose that > information for fear that someone will start using the upper bits (inode > numbers should just be a 64-bit cookie we hand to users, right?) and > then they'll have to resort to all sorts of trickery to avoid breaking > things if they ever /do/ want to use those high bits that have been > claimed by someone else. I recall there was a similar issue with GlusterFS assuming only 32-bit readdir cookies on ext4, and stashing some information in the high bits, but that broke when ext4 moved to 64-bit readdir cookies to avoid hash collisions on "normal sized" directories (above ~32k entries). I'd agree that it is the filesystem's prerogative to use any/all of the 64-bit inode number when it wants, and stacking filesystems shouldn't try to usurp those bits for something else, only to suffer later on. There is already some interest to add 64-bit inode numbers for ext4, and it may allocate inode numbers sparsely, so just because the filesystem has 2^33 inodes in it doesn't imply that the highest possible inum is 2^33, but could instead be 2^48 or something else entirely. > /me wonders what you're trying to accomplish? That is mentioned upthread: >>>> On Wed, Apr 19, 2017 at 4:52 PM, Miklos Szeredi wrote: >>>>> I think we *can* do unified ino space even in >>>>> most non-samefs cases. And here's why: look at the inode numbers of >>>>> any filesystem; they will always be "small" so we can just partition >>>>> the 64 bit ino space between layers and map inode numbers into its own >>>>> partition. This does not work in the general case, and it is a hack. >>>>> But it's a very simple hack and it probably works fine. Cheers, Andreas