linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mark Fasheh <mfasheh@suse.de>
To: Al Viro <viro@ZenIV.linux.org.uk>, linux-fsdevel@vger.kernel.org
Cc: linux-btrfs@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Amir Goldstein <amir73il@gmail.com>,
	linux-unionfs@vger.kernel.org, Jeff Mahoney <jeffm@suse.com>,
	linux-nfs@vger.kernel.org
Subject: [RFC PATCH 0/4] vfs: map unique ino/dev pairs for user space
Date: Tue, 31 Jul 2018 14:10:41 -0700	[thread overview]
Message-ID: <20180731211045.5671-1-mfasheh@suse.de> (raw)

Hi,

These patches are follow-up to this conversation on fsdevel which provides a
good break-down of the core problem:

https://www.spinics.net/lists/linux-fsdevel/msg128003.html

That e-mail was in turn a follow-up to the following patch series:

https://lwn.net/Articles/753917/

which tried to solve the dev portion of this problem with a bit of structure
re-routing but was never accepted.


We have an inconsistency in how the kernel is exporting inode number /
device pairs for user space.  I'm not talking about stat(2) or statx(2) here
but everywhere else where an ino/dev is exported into a buffer for user
space.

We typically do this by simply dumping the potentially 32 bit inode->i_ino
and single device from super->s_dev.  Sometimes these values differ from
what is returned via stat(2) or statx(2).  Some filesystems might even show
duplicate (but internally different!) pairs when the raw i_ino/s_dev is
used.

Aside from breaking software which expects these pairs to be unique, this
can put the user in a situation where they might not be able to find an
inode referenced from some kernel log (be it /proc/, printk buffer, etc). 
What's even worse - depending on how ino is exported, they might even find
an existing but /wrong/ file.

The following patches fix these inconsistencies by introducing a VFS helper
function which calls the underlying filesystem ->getattr to get our real
inode number / device pair.  The returned values can then be used at those
places in the kernel where we are incorrectly reporting our ino/dev pair. 
We then update fs/proc/ and fs/locks.c to use this helper when writing to
/proc/PID/maps and /proc/locks respectively.


For anyone who is watching the evolution of these patches, this would be one
implementation of the 'callback' method which I discussed in my last e-mail. 
This approach is very straight forward and easy to understand but has some
drawbacks:

- Not all of the call sites can tolerate errors.  Instead, we fallback to
  inode->i_ino and super->s_dev in case of getattr error.  That behavior is
  no worse than what we have today but what we have today is pretty bad so I
  don't particularly feel good about that. It's better than nothing though.

- There are places in the kernel where we could never call into ->getattr. 
  There are tons of tracepoints for example that are printing the the wrong
  ino/dev pair.

- The helper function has to fake a path struct because getting a vfsmount
  from inside these call sites is impossible.  This isn't actually a big
  deal as nobody except NFS cares and the NFS patch is very trivial.  It's
  ugly though, so if we had consensus on this approach I would happily
  rework our ->getattr function signature, perhaps pulling the vfsmount and
  dentry arguments back out.

- The dentry argument to ->getattr is potentially problematic as we have
  some places which _only_ have an inode. Even if we had a dentry I'm not
  sure what would keep it from going negative while in the middle of our
  ->getattr call.


Comments and review would be greatly appreciated. Thanks,
	--Mark

             reply	other threads:[~2018-07-31 22:53 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-31 21:10 Mark Fasheh [this message]
2018-07-31 21:10 ` [PATCH 1/4] vfs: introduce function to map unique ino/dev pairs Mark Fasheh
2018-07-31 23:21   ` Mark Fasheh
2018-08-01  5:41     ` Amir Goldstein
2018-08-01 15:59       ` Mark Fasheh
2018-07-31 21:10 ` [PATCH 2/4] nfs: check for NULL vfsmount in nfs_getattr Mark Fasheh
2018-07-31 22:16   ` Al Viro
2018-07-31 22:51     ` Mark Fasheh
2018-08-02  0:43       ` Al Viro
2018-08-03  2:04         ` Mark Fasheh
2018-07-31 21:10 ` [PATCH 3/4] proc: use vfs helper to get ino/dev pairs for maps file Mark Fasheh
2018-07-31 21:10 ` [PATCH 4/4] locks: map correct ino/dev pairs when exporting to userspace Mark Fasheh
2018-08-01  5:37   ` Amir Goldstein
2018-08-01 17:45     ` Mark Fasheh
2018-08-01 11:03   ` Jeff Layton
2018-08-01 20:38     ` Mark Fasheh
2018-08-02  0:24 ` [RFC PATCH 0/4] vfs: map unique ino/dev pairs for user space J. R. Okajima

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180731211045.5671-1-mfasheh@suse.de \
    --to=mfasheh@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=amir73il@gmail.com \
    --cc=jeffm@suse.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-unionfs@vger.kernel.org \
    --cc=viro@ZenIV.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).