All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Biggers <ebiggers@kernel.org>
To: Al Viro <viro@zeniv.linux.org.uk>
Cc: Daniel Rosenberg <drosen@google.com>,
	Theodore Ts'o <tytso@mit.edu>,
	linux-ext4@vger.kernel.org, Jaegeuk Kim <jaegeuk@kernel.org>,
	Chao Yu <chao@kernel.org>,
	linux-f2fs-devel@lists.sourceforge.net,
	linux-fscrypt@vger.kernel.org,
	Richard Weinberger <richard@nod.at>,
	linux-mtd@lists.infradead.org,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	Jonathan Corbet <corbet@lwn.net>,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org,
	Gabriel Krisman Bertazi <krisman@collabora.com>,
	kernel-team@android.com
Subject: Re: [PATCH v7 2/8] fs: Add standard casefolding support
Date: Tue, 11 Feb 2020 22:57:34 -0800	[thread overview]
Message-ID: <20200212065734.GA157327@sol.localdomain> (raw)
In-Reply-To: <20200212063440.GL870@sol.localdomain>

On Tue, Feb 11, 2020 at 10:34:40PM -0800, Eric Biggers wrote:
> On Mon, Feb 10, 2020 at 11:42:07PM +0000, Al Viro wrote:
> > On Mon, Feb 10, 2020 at 03:11:13PM -0800, Daniel Rosenberg wrote:
> > > On Fri, Feb 7, 2020 at 6:12 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> > > >
> > > > On Fri, Feb 07, 2020 at 05:35:46PM -0800, Daniel Rosenberg wrote:
> > > >
> > > >
> > > > Again, is that safe in case when the contents of the string str points to
> > > > keeps changing under you?
> > > 
> > > I'm not sure what you mean. I thought it was safe to use the str and
> > > len passed into d_compare. Even if it gets changed under RCU
> > > conditions I thought there was some code to ensure that the name/len
> > > pair passed in is consistent, and any other inconsistencies would get
> > > caught by d_seq later. Are there unsafe code paths that can follow?
> > 
> > If you ever fetch the same byte twice, you might see different values.
> > You need a fairly careful use of READ_ONCE() or equivalents to make
> > sure that you don't get screwed over by that.
> > 
> > Sure, ->d_seq mismatch will throw the result out, but you need to make
> > sure you won't oops/step on uninitialized memory/etc. in process.
> > 
> > It's not impossible to get right, but it's not trivial and you need all
> > code working with that much more careful than normal for string handling.
> 
> It looks like this is a real problem, not just a "theoretical" data race.
> For example, see:
> 
> utf8ncursor():
>         /* The first byte of s may not be an utf8 continuation. */
>         if (len > 0 && (*s & 0xC0) == 0x80)
>                 return -1;
> 
> and then utf8byte():
>                 } else if ((*u8c->s & 0xC0) == 0x80) {
>                         /* This is a continuation of the current character. */
>                         if (!u8c->p)
>                                 u8c->len--;
>                         return (unsigned char)*u8c->s++;
> 
> The first byte of the string is checked in two different functions, so it's very
> likely to be loaded twice.  In between, it could change from a non-continuation
> byte to a continuation byte.  That would cause the string length to be
> decremented from 0 to UINT_MAX.  Then utf8_strncasecmp() would run beyond the
> bounds of the string until something happened to mismatch.
> 
> That's just an example that I found right away; there are probably more.
> 
> IMO, this needs to be fixed before anyone can actually use the ext4 and f2fs
> casefolding stuff.
> 
> I don't know the best solution.  One option is to fix fs/unicode/ to handle
> concurrently modified strings.  Another could be to see what it would take to
> serialize lookups and renames for casefolded directories...
> 

Or (just throwing another idea out there) the dentry's name could be copied to a
temporary buffer in ->d_compare().  The simplest version would be:

	u8 _name[NAME_MAX];

	memcpy(_name, name, len);
	name = _name;

Though, 255 bytes is a bit large for a stack buffer (so for long names it may
need kmalloc with GFP_ATOMIC), and technically it would need a special version
of memcpy() to be guaranteed safe from compiler optimizations (though I expect
this would work in practice).

Alternatively, take_dentry_name_snapshot() kind of does this already, except
that it takes a dentry and not a (name, len) pair.

- Eric

WARNING: multiple messages have this Message-ID (diff)
From: Eric Biggers <ebiggers@kernel.org>
To: Al Viro <viro@zeniv.linux.org.uk>
Cc: kernel-team@android.com, Theodore Ts'o <tytso@mit.edu>,
	Daniel Rosenberg <drosen@google.com>,
	Jonathan Corbet <corbet@lwn.net>,
	Richard Weinberger <richard@nod.at>,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-f2fs-devel@lists.sourceforge.net,
	linux-fscrypt@vger.kernel.org, linux-mtd@lists.infradead.org,
	linux-fsdevel@vger.kernel.org, Jaegeuk Kim <jaegeuk@kernel.org>,
	linux-ext4@vger.kernel.org,
	Gabriel Krisman Bertazi <krisman@collabora.com>
Subject: Re: [f2fs-dev] [PATCH v7 2/8] fs: Add standard casefolding support
Date: Tue, 11 Feb 2020 22:57:34 -0800	[thread overview]
Message-ID: <20200212065734.GA157327@sol.localdomain> (raw)
In-Reply-To: <20200212063440.GL870@sol.localdomain>

On Tue, Feb 11, 2020 at 10:34:40PM -0800, Eric Biggers wrote:
> On Mon, Feb 10, 2020 at 11:42:07PM +0000, Al Viro wrote:
> > On Mon, Feb 10, 2020 at 03:11:13PM -0800, Daniel Rosenberg wrote:
> > > On Fri, Feb 7, 2020 at 6:12 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> > > >
> > > > On Fri, Feb 07, 2020 at 05:35:46PM -0800, Daniel Rosenberg wrote:
> > > >
> > > >
> > > > Again, is that safe in case when the contents of the string str points to
> > > > keeps changing under you?
> > > 
> > > I'm not sure what you mean. I thought it was safe to use the str and
> > > len passed into d_compare. Even if it gets changed under RCU
> > > conditions I thought there was some code to ensure that the name/len
> > > pair passed in is consistent, and any other inconsistencies would get
> > > caught by d_seq later. Are there unsafe code paths that can follow?
> > 
> > If you ever fetch the same byte twice, you might see different values.
> > You need a fairly careful use of READ_ONCE() or equivalents to make
> > sure that you don't get screwed over by that.
> > 
> > Sure, ->d_seq mismatch will throw the result out, but you need to make
> > sure you won't oops/step on uninitialized memory/etc. in process.
> > 
> > It's not impossible to get right, but it's not trivial and you need all
> > code working with that much more careful than normal for string handling.
> 
> It looks like this is a real problem, not just a "theoretical" data race.
> For example, see:
> 
> utf8ncursor():
>         /* The first byte of s may not be an utf8 continuation. */
>         if (len > 0 && (*s & 0xC0) == 0x80)
>                 return -1;
> 
> and then utf8byte():
>                 } else if ((*u8c->s & 0xC0) == 0x80) {
>                         /* This is a continuation of the current character. */
>                         if (!u8c->p)
>                                 u8c->len--;
>                         return (unsigned char)*u8c->s++;
> 
> The first byte of the string is checked in two different functions, so it's very
> likely to be loaded twice.  In between, it could change from a non-continuation
> byte to a continuation byte.  That would cause the string length to be
> decremented from 0 to UINT_MAX.  Then utf8_strncasecmp() would run beyond the
> bounds of the string until something happened to mismatch.
> 
> That's just an example that I found right away; there are probably more.
> 
> IMO, this needs to be fixed before anyone can actually use the ext4 and f2fs
> casefolding stuff.
> 
> I don't know the best solution.  One option is to fix fs/unicode/ to handle
> concurrently modified strings.  Another could be to see what it would take to
> serialize lookups and renames for casefolded directories...
> 

Or (just throwing another idea out there) the dentry's name could be copied to a
temporary buffer in ->d_compare().  The simplest version would be:

	u8 _name[NAME_MAX];

	memcpy(_name, name, len);
	name = _name;

Though, 255 bytes is a bit large for a stack buffer (so for long names it may
need kmalloc with GFP_ATOMIC), and technically it would need a special version
of memcpy() to be guaranteed safe from compiler optimizations (though I expect
this would work in practice).

Alternatively, take_dentry_name_snapshot() kind of does this already, except
that it takes a dentry and not a (name, len) pair.

- Eric


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

WARNING: multiple messages have this Message-ID (diff)
From: Eric Biggers <ebiggers@kernel.org>
To: Al Viro <viro@zeniv.linux.org.uk>
Cc: kernel-team@android.com, Theodore Ts'o <tytso@mit.edu>,
	Daniel Rosenberg <drosen@google.com>,
	Jonathan Corbet <corbet@lwn.net>,
	Richard Weinberger <richard@nod.at>,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	Chao Yu <chao@kernel.org>,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-f2fs-devel@lists.sourceforge.net,
	linux-fscrypt@vger.kernel.org, linux-mtd@lists.infradead.org,
	linux-fsdevel@vger.kernel.org, Jaegeuk Kim <jaegeuk@kernel.org>,
	linux-ext4@vger.kernel.org,
	Gabriel Krisman Bertazi <krisman@collabora.com>
Subject: Re: [PATCH v7 2/8] fs: Add standard casefolding support
Date: Tue, 11 Feb 2020 22:57:34 -0800	[thread overview]
Message-ID: <20200212065734.GA157327@sol.localdomain> (raw)
In-Reply-To: <20200212063440.GL870@sol.localdomain>

On Tue, Feb 11, 2020 at 10:34:40PM -0800, Eric Biggers wrote:
> On Mon, Feb 10, 2020 at 11:42:07PM +0000, Al Viro wrote:
> > On Mon, Feb 10, 2020 at 03:11:13PM -0800, Daniel Rosenberg wrote:
> > > On Fri, Feb 7, 2020 at 6:12 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> > > >
> > > > On Fri, Feb 07, 2020 at 05:35:46PM -0800, Daniel Rosenberg wrote:
> > > >
> > > >
> > > > Again, is that safe in case when the contents of the string str points to
> > > > keeps changing under you?
> > > 
> > > I'm not sure what you mean. I thought it was safe to use the str and
> > > len passed into d_compare. Even if it gets changed under RCU
> > > conditions I thought there was some code to ensure that the name/len
> > > pair passed in is consistent, and any other inconsistencies would get
> > > caught by d_seq later. Are there unsafe code paths that can follow?
> > 
> > If you ever fetch the same byte twice, you might see different values.
> > You need a fairly careful use of READ_ONCE() or equivalents to make
> > sure that you don't get screwed over by that.
> > 
> > Sure, ->d_seq mismatch will throw the result out, but you need to make
> > sure you won't oops/step on uninitialized memory/etc. in process.
> > 
> > It's not impossible to get right, but it's not trivial and you need all
> > code working with that much more careful than normal for string handling.
> 
> It looks like this is a real problem, not just a "theoretical" data race.
> For example, see:
> 
> utf8ncursor():
>         /* The first byte of s may not be an utf8 continuation. */
>         if (len > 0 && (*s & 0xC0) == 0x80)
>                 return -1;
> 
> and then utf8byte():
>                 } else if ((*u8c->s & 0xC0) == 0x80) {
>                         /* This is a continuation of the current character. */
>                         if (!u8c->p)
>                                 u8c->len--;
>                         return (unsigned char)*u8c->s++;
> 
> The first byte of the string is checked in two different functions, so it's very
> likely to be loaded twice.  In between, it could change from a non-continuation
> byte to a continuation byte.  That would cause the string length to be
> decremented from 0 to UINT_MAX.  Then utf8_strncasecmp() would run beyond the
> bounds of the string until something happened to mismatch.
> 
> That's just an example that I found right away; there are probably more.
> 
> IMO, this needs to be fixed before anyone can actually use the ext4 and f2fs
> casefolding stuff.
> 
> I don't know the best solution.  One option is to fix fs/unicode/ to handle
> concurrently modified strings.  Another could be to see what it would take to
> serialize lookups and renames for casefolded directories...
> 

Or (just throwing another idea out there) the dentry's name could be copied to a
temporary buffer in ->d_compare().  The simplest version would be:

	u8 _name[NAME_MAX];

	memcpy(_name, name, len);
	name = _name;

Though, 255 bytes is a bit large for a stack buffer (so for long names it may
need kmalloc with GFP_ATOMIC), and technically it would need a special version
of memcpy() to be guaranteed safe from compiler optimizations (though I expect
this would work in practice).

Alternatively, take_dentry_name_snapshot() kind of does this already, except
that it takes a dentry and not a (name, len) pair.

- Eric

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

  reply	other threads:[~2020-02-12  6:57 UTC|newest]

Thread overview: 84+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-08  1:35 [PATCH v7 0/8] Support fof Casefolding and Encryption Daniel Rosenberg
2020-02-08  1:35 ` Daniel Rosenberg
2020-02-08  1:35 ` [f2fs-dev] " Daniel Rosenberg via Linux-f2fs-devel
2020-02-08  1:35 ` [PATCH v7 1/8] unicode: Add utf8_casefold_iter Daniel Rosenberg
2020-02-08  1:35   ` Daniel Rosenberg
2020-02-08  1:35   ` [f2fs-dev] " Daniel Rosenberg via Linux-f2fs-devel
2020-02-12  3:38   ` Eric Biggers
2020-02-12  3:38     ` Eric Biggers
2020-02-12  3:38     ` [f2fs-dev] " Eric Biggers
2020-02-14 21:47     ` Daniel Rosenberg
2020-02-14 21:47       ` Daniel Rosenberg
2020-02-14 21:47       ` [f2fs-dev] " Daniel Rosenberg via Linux-f2fs-devel
2020-02-17 19:02       ` Gabriel Krisman Bertazi
2020-02-17 19:02         ` Gabriel Krisman Bertazi
2020-02-17 19:02         ` [f2fs-dev] " Gabriel Krisman Bertazi
2020-02-08  1:35 ` [PATCH v7 2/8] fs: Add standard casefolding support Daniel Rosenberg
2020-02-08  1:35   ` Daniel Rosenberg
2020-02-08  1:35   ` [f2fs-dev] " Daniel Rosenberg via Linux-f2fs-devel
2020-02-08  2:12   ` Al Viro
2020-02-08  2:12     ` Al Viro
2020-02-08  2:12     ` [f2fs-dev] " Al Viro
2020-02-10 23:11     ` Daniel Rosenberg
2020-02-10 23:11       ` Daniel Rosenberg
2020-02-10 23:11       ` [f2fs-dev] " Daniel Rosenberg via Linux-f2fs-devel
2020-02-10 23:42       ` Al Viro
2020-02-10 23:42         ` Al Viro
2020-02-10 23:42         ` [f2fs-dev] " Al Viro
2020-02-12  6:34         ` Eric Biggers
2020-02-12  6:34           ` Eric Biggers
2020-02-12  6:34           ` [f2fs-dev] " Eric Biggers
2020-02-12  6:57           ` Eric Biggers [this message]
2020-02-12  6:57             ` Eric Biggers
2020-02-12  6:57             ` [f2fs-dev] " Eric Biggers
2020-02-20  2:27             ` Daniel Rosenberg
2020-02-20  2:27               ` Daniel Rosenberg
2020-02-20  2:27               ` [f2fs-dev] " Daniel Rosenberg via Linux-f2fs-devel
2020-02-12  3:55   ` Eric Biggers
2020-02-12  3:55     ` Eric Biggers
2020-02-12  3:55     ` [f2fs-dev] " Eric Biggers
2020-02-08  1:35 ` [PATCH v7 3/8] f2fs: Use generic " Daniel Rosenberg
2020-02-08  1:35   ` Daniel Rosenberg
2020-02-08  1:35   ` [f2fs-dev] " Daniel Rosenberg via Linux-f2fs-devel
2020-02-12  4:05   ` Eric Biggers
2020-02-12  4:05     ` Eric Biggers
2020-02-12  4:05     ` [f2fs-dev] " Eric Biggers
2020-02-08  1:35 ` [PATCH v7 4/8] ext4: " Daniel Rosenberg
2020-02-08  1:35   ` Daniel Rosenberg
2020-02-08  1:35   ` [f2fs-dev] " Daniel Rosenberg via Linux-f2fs-devel
2020-02-08  1:35 ` [PATCH v7 5/8] fscrypt: Have filesystems handle their d_ops Daniel Rosenberg
2020-02-08  1:35   ` Daniel Rosenberg
2020-02-08  1:35   ` [f2fs-dev] " Daniel Rosenberg via Linux-f2fs-devel
2020-02-12  4:33   ` Eric Biggers
2020-02-12  4:33     ` Eric Biggers
2020-02-12  4:33     ` [f2fs-dev] " Eric Biggers
2020-02-08  1:35 ` [PATCH v7 6/8] f2fs: Handle casefolding with Encryption Daniel Rosenberg
2020-02-08  1:35   ` Daniel Rosenberg
2020-02-08  1:35   ` [f2fs-dev] " Daniel Rosenberg via Linux-f2fs-devel
2020-02-12  5:10   ` Eric Biggers
2020-02-12  5:10     ` Eric Biggers
2020-02-12  5:10     ` [f2fs-dev] " Eric Biggers
2020-02-12  5:55     ` Al Viro
2020-02-12  5:55       ` Al Viro
2020-02-12  5:55       ` [f2fs-dev] " Al Viro
2020-02-12  6:06       ` Eric Biggers
2020-02-12  6:06         ` Eric Biggers
2020-02-12  6:06         ` [f2fs-dev] " Eric Biggers
2020-02-12  5:47   ` Eric Biggers
2020-02-12  5:47     ` Eric Biggers
2020-02-12  5:47     ` [f2fs-dev] " Eric Biggers
2020-02-08  1:35 ` [PATCH v7 7/8] ext4: Hande casefolding with encryption Daniel Rosenberg
2020-02-08  1:35   ` Daniel Rosenberg
2020-02-08  1:35   ` [f2fs-dev] " Daniel Rosenberg via Linux-f2fs-devel
2020-02-12  5:59   ` Eric Biggers
2020-02-12  5:59     ` Eric Biggers
2020-02-12  5:59     ` [f2fs-dev] " Eric Biggers
2020-02-08  1:35 ` [PATCH v7 8/8] ext4: Optimize match for casefolded encrypted dirs Daniel Rosenberg
2020-02-08  1:35   ` Daniel Rosenberg
2020-02-08  1:35   ` [f2fs-dev] " Daniel Rosenberg via Linux-f2fs-devel
2020-02-12  6:12 ` [PATCH v7 0/8] Support fof Casefolding and Encryption Eric Biggers
2020-02-12  6:12   ` Eric Biggers
2020-02-12  6:12   ` [f2fs-dev] " Eric Biggers
2020-02-13  0:01   ` Daniel Rosenberg
2020-02-13  0:01     ` Daniel Rosenberg
2020-02-13  0:01     ` [f2fs-dev] " Daniel Rosenberg via Linux-f2fs-devel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200212065734.GA157327@sol.localdomain \
    --to=ebiggers@kernel.org \
    --cc=adilger.kernel@dilger.ca \
    --cc=chao@kernel.org \
    --cc=corbet@lwn.net \
    --cc=drosen@google.com \
    --cc=jaegeuk@kernel.org \
    --cc=kernel-team@android.com \
    --cc=krisman@collabora.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-f2fs-devel@lists.sourceforge.net \
    --cc=linux-fscrypt@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mtd@lists.infradead.org \
    --cc=richard@nod.at \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.