Re: [PATCH] ceph: fix inode number handling on arches with 32-bit ino_t

From: Jeff Layton <jlayton@kernel.org>
To: ceph-devel@vger.kernel.org
Cc: Ulrich.Weigand@de.ibm.com, Tuan.Hoang1@ibm.com, idryomov@gmail.com
Subject: Re: [PATCH] ceph: fix inode number handling on arches with 32-bit ino_t
Date: Thu, 20 Aug 2020 17:30:57 -0400	[thread overview]
Message-ID: <4429ff0fc03ced7ab7ecabeadda913b5a08d1684.camel@kernel.org> (raw)
In-Reply-To: <20200819151645.38951-1-jlayton@kernel.org>

On Wed, 2020-08-19 at 11:16 -0400, Jeff Layton wrote:
> Tuan and Ulrich mentioned that they were hitting a problem on s390x,
> which has a 32-bit ino_t value, even though it's a 64-bit arch (for
> historical reasons).
> 
> I think the current handling of inode numbers in the ceph driver is
> wrong. It tries to use 32-bit inode numbers on 32-bit arches, but that's
> actually not a problem. 32-bit arches can deal with 64-bit inode numbers
> just fine when userland code is compiled with LFS support (the common
> case these days).
> 
> What we really want to do is just use 64-bit numbers everywhere, unless
> someone has mounted with the ino32 mount option. In that case, we want
> to ensure that we hash the inode number down to something that will fit
> in 32 bits before presenting the value to userland.
> 
> Add new helper functions that do this, and only do the conversion before
> presenting these values to userland in getattr and readdir.
> 
> The inode table hashvalue is changed to just cast the inode number to
> unsigned long, as low-order bits are the most likely to vary anyway.
> 
> While it's not strictly required, we do want to put something in
> inode->i_ino. Instead of basing it on BITS_PER_LONG, however, base it on
> the size of the ino_t type.
> 
> Reported-by: Tuan Hoang1 <Tuan.Hoang1@ibm.com>
> Reported-by: Ulrich Weigand <Ulrich.Weigand@de.ibm.com>
> URL: https://tracker.ceph.com/issues/46828
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
>  fs/ceph/caps.c       | 14 ++++-----
>  fs/ceph/debugfs.c    |  4 +--
>  fs/ceph/dir.c        | 31 ++++++++-----------
>  fs/ceph/file.c       |  4 +--
>  fs/ceph/inode.c      | 19 ++++++------
>  fs/ceph/mds_client.h |  2 +-
>  fs/ceph/quota.c      |  4 +--
>  fs/ceph/super.h      | 73 +++++++++++++++++++++++---------------------
>  8 files changed, 74 insertions(+), 77 deletions(-)
> 
> v4:
> - flesh out comments in super.h
> - merge dout messages in ceph_get_inode
> - rename ceph_vino_to_ino to ceph_vino_to_ino_t
> 
> v3:
> - use ceph_ino instead of ceph_present_ino in most dout() messages
> 
> v2:
> - fix dir_emit inode number for ".."
> - fix ino_t size test
> 

FWIW, I built an i386 VM and ran a kernel with this patch through
xfstests and it seems to be ok.

To be clear though, this _will_ be a user-visible change on 32-bit
arches:

1/ inode numbers will be seen to have changed between kernel versions.
32-bit arches will see large inode numbers now instead of the hashed
ones they saw before.

2/ any really old software not built with LFS support may start failing
stat() calls with -EOVERFLOW on inode numbers >2^32. Nothing much we can
do about these, but hopefully the intersection of people running such
code on ceph will be very small.

The workaround for both problems will be to mount with "-o ino32".
-- 
Jeff Layton <jlayton@kernel.org>