linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Peter Maydell <peter.maydell@linaro.org>
To: Florian Weimer <fw@deneb.enyo.de>
Cc: linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org,
	linux-ext4@vger.kernel.org, lucho@ionkov.net,
	libc-alpha@sourceware.org, Arnd Bergmann <arnd@arndb.de>,
	ericvh@gmail.com, hpa@zytor.com,
	lkml - Kernel Mailing List <linux-kernel@vger.kernel.org>,
	QEMU Developers <qemu-devel@nongnu.org>,
	rminnich@sandia.gov, v9fs-developer@lists.sourceforge.net
Subject: Re: [Qemu-devel] d_off field in struct dirent and 32-on-64 emulation
Date: Thu, 27 Dec 2018 17:41:58 +0000	[thread overview]
Message-ID: <CAFEAcA92m4vhzjJ+B=mP_o6Wfhx1XSKo3uWxah3osh=u5UXFuw@mail.gmail.com> (raw)
In-Reply-To: <87bm56vqg4.fsf@mid.deneb.enyo.de>

On Thu, 27 Dec 2018 at 17:19, Florian Weimer <fw@deneb.enyo.de> wrote:
> We have a bit of an interesting problem with respect to the d_off
> field in struct dirent.
>
> When running a 64-bit kernel on certain file systems, notably ext4,
> this field uses the full 63 bits even for small directories (strace -v
> output, wrapped here for readability):
>
> getdents(3, [
>   {d_ino=1494304, d_off=3901177228673045825, d_reclen=40, d_name="authorized_keys", d_type=DT_REG},
>   {d_ino=1494277, d_off=7491915799041650922, d_reclen=24, d_name=".", d_type=DT_DIR},
>   {d_ino=1314655, d_off=9223372036854775807, d_reclen=24, d_name="..", d_type=DT_DIR}
> ], 32768) = 88
>
> When running in 32-bit compat mode, this value is somehow truncated to
> 31 bits, for both the getdents and the getdents64 (!) system call (at
> least on i386).

Yes -- look for hash2pos() and friends in fs/ext4/dir.c.
The ext4 code in the kernel uses a 32 bit hash if (a) the kernel
is 32 bit (b) this is a compat syscall (b) some other bit of
the kernel asked it to via the FMODE_32BITHASH flag (currently only
NFS does that I think).

As you note, this causes breakage for userspace programs which
need to implement an API/ABI with 32-bit offset but which only
have access to the kernel's 64-bit offset API/ABI.

I think the best fix for this would be for the kernel to either
(a) consistently use a 32-bit hash or (b) to provide an API
so that userspace can use the FMODE_32BITHASH flag the way
that kernel-internal users already can.

I couldn't think of or find any existing way for userspace
to get the right results here, which is why
32-bit-guest-on-64-bit-host QEMU doesn't work on these filesystems
(depending on what exactly the guest's libc etc do).

> the 32-bit getdents system call emulation in a 64-bit qemu-user
> process would just silently truncate the d_off field as part of
> the translation, not reporting an error.
> [...]
> This truncation has always been a bug; it breaks telldir/seekdir
> at least in some cases.

Yes; you can't fit a quart into a pint pot, so if the guest
only handles 32-bit offsets then truncation is about all we
can do. This works fine if offsets are offsets, assuming the
directory isn't so enormous it would have broken the guest
anyway. I'm not aware of any issues with this other than the
oddball ext4 offsets-are-hashes situation -- could you expand
on the telldir/seekdir issue? (I suppose we should probably
make QEMU's syscall emulation layer return "no more entries"
rather than entries with truncated hashes.)

thanks
-- PMM

  reply	other threads:[~2018-12-27 17:42 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-27 17:18 d_off field in struct dirent and 32-on-64 emulation Florian Weimer
2018-12-27 17:41 ` Peter Maydell [this message]
2018-12-28  0:23   ` [Qemu-devel] " Andreas Dilger
2018-12-28 11:18     ` Peter Maydell
2018-12-28 23:16       ` Andreas Dilger
2018-12-29  0:12         ` Peter Maydell
2018-12-29  1:54           ` Matthew Wilcox
2018-12-29 16:49             ` Andy Lutomirski
2018-12-30 13:59               ` Peter Maydell
2018-12-29  2:11       ` Theodore Y. Ts'o
2018-12-29  2:37         ` Dominique Martinet
2018-12-29  3:14           ` Theodore Y. Ts'o
2018-12-29  4:04             ` [V9fs-developer] " Dominique Martinet
     [not found] ` <C65D3222-723F-4C0B-AF02-38488C302E84@amacapital.net>
2018-12-27 17:56   ` Florian Weimer
2018-12-27 17:58 ` Adhemerval Zanella
2018-12-27 18:09   ` Florian Weimer
2018-12-28 11:53     ` Adhemerval Zanella
2018-12-28 11:56       ` Florian Weimer
2018-12-28 12:01         ` Florian Weimer
2018-12-28 12:21           ` Adhemerval Zanella
2018-12-31 17:03       ` Joseph Myers
2019-01-02 13:16         ` Adhemerval Zanella
2018-12-28  2:23 ` Dmitry V. Levin
2018-12-28  7:38   ` Florian Weimer
2018-12-28 15:26 ` Andy Lutomirski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAFEAcA92m4vhzjJ+B=mP_o6Wfhx1XSKo3uWxah3osh=u5UXFuw@mail.gmail.com' \
    --to=peter.maydell@linaro.org \
    --cc=arnd@arndb.de \
    --cc=ericvh@gmail.com \
    --cc=fw@deneb.enyo.de \
    --cc=hpa@zytor.com \
    --cc=libc-alpha@sourceware.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lucho@ionkov.net \
    --cc=qemu-devel@nongnu.org \
    --cc=rminnich@sandia.gov \
    --cc=v9fs-developer@lists.sourceforge.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).