* [PATCH v6 0/4]
@ 2018-01-22 2:04 Deepa Dinamani
2018-01-22 2:04 ` [PATCH v6 1/4] vfs: Add file timestamp range support Deepa Dinamani
` (3 more replies)
0 siblings, 4 replies; 12+ messages in thread
From: Deepa Dinamani @ 2018-01-22 2:04 UTC (permalink / raw)
To: viro, tytso, adilger.kernel, linux-ext4
Cc: torvalds, linux-fsdevel, linux-kernel, arnd, y2038
The series is aimed at adding support to maintain individual
timestamp ranges for filesystems. This helps futimens, utimensat
and utimes syscalls to conform to POSIX defined behavior when
the time being set is outside of the corresponding filesystem's
supported limits.
The series was developed with discussions and guidance from
Arnd Bergmann.
The original thread is at https://lkml.org/lkml/2016/11/2/294
I will be submitting follow up kernel patches to update all
filesystems.
Currently ext4 is the only filesystem that reflects correct limits.
The branch is available at
https://github.com/deepa-hub/vfs.git refs/heads/range
Changes since v5:
* Dropped y2038-specific changes
Changes since v4:
* Added documentation for boot param
Changes since v3:
* Remove redundant initializations in libfs.c
* Change early_param to __setup similar to other root mount options.
* Fix documentation warning
Changes since v2:
* Introduce early boot param override for checks.
* Drop afs patch for timestamp limits.
Changes since v1:
* return EROFS on mount errors
* fix mtime copy/paste error in utimes
Deepa Dinamani (4):
vfs: Add file timestamp range support
ext4: Initialize timestamps limits
vfs: Add timestamp_truncate() api
utimes: Clamp the timestamps before update
fs/ext4/ext4.h | 4 ++++
fs/ext4/super.c | 7 ++++++-
fs/inode.c | 32 +++++++++++++++++++++++++++++++-
fs/super.c | 2 ++
fs/utimes.c | 17 +++++++++++++----
include/linux/fs.h | 3 +++
include/linux/time64.h | 2 ++
7 files changed, 61 insertions(+), 6 deletions(-)
--
2.14.1
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: linux-ext4@vger.kernel.org
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v6 1/4] vfs: Add file timestamp range support
2018-01-22 2:04 [PATCH v6 0/4] Deepa Dinamani
@ 2018-01-22 2:04 ` Deepa Dinamani
2018-01-22 2:04 ` [PATCH v6 2/4] ext4: Initialize timestamps limits Deepa Dinamani
` (2 subsequent siblings)
3 siblings, 0 replies; 12+ messages in thread
From: Deepa Dinamani @ 2018-01-22 2:04 UTC (permalink / raw)
To: viro, tytso, adilger.kernel, linux-ext4
Cc: torvalds, linux-fsdevel, linux-kernel, arnd, y2038
Add fields to the superblock to track the min and max
timestamps supported by filesystems.
Initially, when a superblock is allocated, initialize
it to the max and min values the fields can hold.
Individual filesystems override these to match their
actual limits.
Pseudo filesystems are assumed to always support the
min and max allowable values for the fields.
Note that the time ranges are saved in type time64_t
rather than time_t.
This is required because if we save ranges in time_t
then we would not be able to save timestamp ranges
for files that support timestamps beyond y2038.
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
---
fs/super.c | 2 ++
include/linux/fs.h | 3 +++
include/linux/time64.h | 2 ++
3 files changed, 7 insertions(+)
diff --git a/fs/super.c b/fs/super.c
index 672538ca9831..9e0c97e54e46 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -244,6 +244,8 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags,
s->s_maxbytes = MAX_NON_LFS;
s->s_op = &default_op;
s->s_time_gran = 1000000000;
+ s->s_time_min = TIME64_MIN;
+ s->s_time_max = TIME64_MAX;
s->cleancache_poolid = CLEANCACHE_NO_POOL;
s->s_shrink.seeks = DEFAULT_SEEKS;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index f9d01c0951a8..406f3de71c22 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1379,6 +1379,9 @@ struct super_block {
/* Granularity of c/m/atime in ns.
Cannot be worse than a second */
u32 s_time_gran;
+ /* Time limits for c/m/atime in seconds. */
+ time64_t s_time_min;
+ time64_t s_time_max;
/*
* The next field is for VFS *only*. No filesystems have any business
diff --git a/include/linux/time64.h b/include/linux/time64.h
index 93d39499838e..76ed46db7a7f 100644
--- a/include/linux/time64.h
+++ b/include/linux/time64.h
@@ -36,6 +36,8 @@ struct itimerspec64 {
/* Located here for timespec[64]_valid_strict */
#define TIME64_MAX ((s64)~((u64)1 << 63))
+#define TIME64_MIN (-TIME64_MAX - 1)
+
#define KTIME_MAX ((s64)~((u64)1 << 63))
#define KTIME_SEC_MAX (KTIME_MAX / NSEC_PER_SEC)
--
2.14.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v6 2/4] ext4: Initialize timestamps limits
2018-01-22 2:04 [PATCH v6 0/4] Deepa Dinamani
2018-01-22 2:04 ` [PATCH v6 1/4] vfs: Add file timestamp range support Deepa Dinamani
@ 2018-01-22 2:04 ` Deepa Dinamani
2018-01-22 2:04 ` [PATCH v6 3/4] vfs: Add timestamp_truncate() api Deepa Dinamani
2018-01-22 2:04 ` [PATCH v6 4/4] utimes: Clamp the timestamps before update Deepa Dinamani
3 siblings, 0 replies; 12+ messages in thread
From: Deepa Dinamani @ 2018-01-22 2:04 UTC (permalink / raw)
To: viro, tytso, adilger.kernel, linux-ext4
Cc: torvalds, linux-fsdevel, linux-kernel, arnd, y2038
ext4 has different overflow limits for max filesystem
timestamps based on the extra bytes available.
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: linux-ext4@vger.kernel.org
---
fs/ext4/ext4.h | 4 ++++
fs/ext4/super.c | 7 ++++++-
2 files changed, 10 insertions(+), 1 deletion(-)
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 3241475a1733..fe4d7a168664 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1602,6 +1602,10 @@ static inline void ext4_clear_state_flags(struct ext4_inode_info *ei)
#define EXT4_GOOD_OLD_INODE_SIZE 128
+#define EXT4_EXTRA_TIMESTAMP_MAX (((s64)1 << 34) - 1 + S32_MIN)
+#define EXT4_NON_EXTRA_TIMESTAMP_MAX S32_MAX
+#define EXT4_TIMESTAMP_MIN S32_MIN
+
/*
* Feature set definitions
*/
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 0843ebfeace1..7c2b227aa319 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -3768,8 +3768,13 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
sbi->s_inode_size);
goto failed_mount;
}
- if (sbi->s_inode_size > EXT4_GOOD_OLD_INODE_SIZE)
+ if (sbi->s_inode_size > EXT4_GOOD_OLD_INODE_SIZE) {
sb->s_time_gran = 1 << (EXT4_EPOCH_BITS - 2);
+ sb->s_time_max = EXT4_EXTRA_TIMESTAMP_MAX;
+ } else
+ sb->s_time_max = EXT4_NON_EXTRA_TIMESTAMP_MAX;
+
+ sb->s_time_min = EXT4_TIMESTAMP_MIN;
}
sbi->s_desc_size = le16_to_cpu(es->s_desc_size);
--
2.14.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v6 3/4] vfs: Add timestamp_truncate() api
2018-01-22 2:04 [PATCH v6 0/4] Deepa Dinamani
2018-01-22 2:04 ` [PATCH v6 1/4] vfs: Add file timestamp range support Deepa Dinamani
2018-01-22 2:04 ` [PATCH v6 2/4] ext4: Initialize timestamps limits Deepa Dinamani
@ 2018-01-22 2:04 ` Deepa Dinamani
2018-01-22 20:25 ` Linus Torvalds
2018-01-22 2:04 ` [PATCH v6 4/4] utimes: Clamp the timestamps before update Deepa Dinamani
3 siblings, 1 reply; 12+ messages in thread
From: Deepa Dinamani @ 2018-01-22 2:04 UTC (permalink / raw)
To: viro, tytso, adilger.kernel, linux-ext4
Cc: torvalds, linux-fsdevel, linux-kernel, arnd, y2038
timespec_trunc() function is used to truncate a
filesystem timestamp to the right granularity.
But, the function does not clamp tv_sec part of the
timestamps according to the filesystem timestamp limits.
Also, timespec_trunc() is exclusively used for filesystem
timestamps. Move the api to be part of vfs.
The replacement api: timestamp_truncate() also alters the
signature of the function to accommodate filesystem
timestamp clamping according to flesystem limits.
Note that the clamp_t macro is used for clamping here as vfs
is not yet using struct timespec64 internally. This is
required for compilation purposes.
Also note that clamp won't do the right thing for timestamps
beyond 2038 on 32-bit machines until the vfs uses timespec64.
After the vfs is transitioned to use timespec64 for timestamps,
clamp_t() can be replaced by clamp().
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
---
fs/inode.c | 32 +++++++++++++++++++++++++++++++-
1 file changed, 31 insertions(+), 1 deletion(-)
diff --git a/fs/inode.c b/fs/inode.c
index ef362364d396..0e2820ade554 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -2107,6 +2107,36 @@ void inode_nohighmem(struct inode *inode)
}
EXPORT_SYMBOL(inode_nohighmem);
+/**
+ * timestamp_truncate - Truncate timespec to a granularity
+ * @t: Timespec
+ * @inode: inode being updated
+ *
+ * Truncate a timespec to the granularity supported by the fs
+ * containing the inode. Always rounds down. gran must
+ * not be 0 nor greater than a second (NSEC_PER_SEC, or 10^9 ns).
+ */
+struct timespec timestamp_truncate(struct timespec t, struct inode *inode)
+{
+ struct super_block *sb = inode->i_sb;
+ unsigned int gran = sb->s_time_gran;
+
+ t.tv_sec = clamp_t(time64_t, t.tv_sec, sb->s_time_min, sb->s_time_max);
+
+ /* Avoid division in the common cases 1 ns and 1 s. */
+ if (gran == 1) {
+ /* nothing */
+ } else if (gran == NSEC_PER_SEC) {
+ t.tv_nsec = 0;
+ } else if (gran > 1 && gran < NSEC_PER_SEC) {
+ t.tv_nsec -= t.tv_nsec % gran;
+ } else {
+ WARN(1, "illegal file time granularity: %u", gran);
+ }
+ return t;
+}
+EXPORT_SYMBOL(timestamp_truncate);
+
/**
* current_time - Return FS time
* @inode: inode.
@@ -2126,6 +2156,6 @@ struct timespec current_time(struct inode *inode)
return now;
}
- return timespec_trunc(now, inode->i_sb->s_time_gran);
+ return timestamp_truncate(now, inode);
}
EXPORT_SYMBOL(current_time);
--
2.14.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v6 4/4] utimes: Clamp the timestamps before update
2018-01-22 2:04 [PATCH v6 0/4] Deepa Dinamani
` (2 preceding siblings ...)
2018-01-22 2:04 ` [PATCH v6 3/4] vfs: Add timestamp_truncate() api Deepa Dinamani
@ 2018-01-22 2:04 ` Deepa Dinamani
3 siblings, 0 replies; 12+ messages in thread
From: Deepa Dinamani @ 2018-01-22 2:04 UTC (permalink / raw)
To: viro, tytso, adilger.kernel, linux-ext4
Cc: torvalds, linux-fsdevel, linux-kernel, arnd, y2038
POSIX.1 section for futimens, utimensat and utimes says:
The file's relevant timestamp shall be set to the
greatest value supported by the file system that is
not greater than the specified time.
Clamp the timestamps accordingly before assignment.
Note that the clamp_t macro is used for clamping here as vfs
is not yet using struct timespec64 internally. This is
required for compilation purposes.
Also note that clamp won't do the right thing for timestamps
beyond 2038 on 32-bit machines until the vfs uses timespec64.
After the vfs is transitioned to use timespec64 for timestamps,
clamp_t() can be replaced by clamp().
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
---
fs/utimes.c | 17 +++++++++++++----
1 file changed, 13 insertions(+), 4 deletions(-)
diff --git a/fs/utimes.c b/fs/utimes.c
index e4b3d7c2c9f5..82fdcc3284b9 100644
--- a/fs/utimes.c
+++ b/fs/utimes.c
@@ -50,6 +50,7 @@ static int utimes_common(const struct path *path, struct timespec64 *times)
int error;
struct iattr newattrs;
struct inode *inode = path->dentry->d_inode;
+ struct super_block *sb = inode->i_sb;
struct inode *delegated_inode = NULL;
error = mnt_want_write(path->mnt);
@@ -65,16 +66,24 @@ static int utimes_common(const struct path *path, struct timespec64 *times)
if (times[0].tv_nsec == UTIME_OMIT)
newattrs.ia_valid &= ~ATTR_ATIME;
else if (times[0].tv_nsec != UTIME_NOW) {
- newattrs.ia_atime.tv_sec = times[0].tv_sec;
- newattrs.ia_atime.tv_nsec = times[0].tv_nsec;
+ newattrs.ia_atime.tv_sec =
+ clamp_t(time64_t, times[0].tv_sec, sb->s_time_min, sb->s_time_max);
+ if (times[0].tv_sec >= sb->s_time_max)
+ newattrs.ia_atime.tv_nsec = 0;
+ else
+ newattrs.ia_atime.tv_nsec = times[0].tv_nsec;
newattrs.ia_valid |= ATTR_ATIME_SET;
}
if (times[1].tv_nsec == UTIME_OMIT)
newattrs.ia_valid &= ~ATTR_MTIME;
else if (times[1].tv_nsec != UTIME_NOW) {
- newattrs.ia_mtime.tv_sec = times[1].tv_sec;
- newattrs.ia_mtime.tv_nsec = times[1].tv_nsec;
+ newattrs.ia_mtime.tv_sec =
+ clamp_t(time64_t, times[1].tv_sec, sb->s_time_min, sb->s_time_max);
+ if (times[1].tv_sec >= sb->s_time_max)
+ newattrs.ia_mtime.tv_nsec = 0;
+ else
+ newattrs.ia_mtime.tv_nsec = times[1].tv_nsec;
newattrs.ia_valid |= ATTR_MTIME_SET;
}
/*
--
2.14.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v6 3/4] vfs: Add timestamp_truncate() api
2018-01-22 2:04 ` [PATCH v6 3/4] vfs: Add timestamp_truncate() api Deepa Dinamani
@ 2018-01-22 20:25 ` Linus Torvalds
2018-01-23 16:25 ` Deepa Dinamani
0 siblings, 1 reply; 12+ messages in thread
From: Linus Torvalds @ 2018-01-22 20:25 UTC (permalink / raw)
To: Deepa Dinamani
Cc: Al Viro, Theodore Ts'o, adilger.kernel, linux-ext4,
linux-fsdevel, Linux Kernel Mailing List, Arnd Bergmann,
y2038 Mailman List
On Sun, Jan 21, 2018 at 6:04 PM, Deepa Dinamani <deepa.kernel@gmail.com> wrote:
> + t.tv_nsec -= t.tv_nsec % gran;
This doesn't actuall ywork if tv_nsec is negative.
Which may not be an issue in most cases, but did somebody check
utimensat() or whatever?
> + WARN(1, "illegal file time granularity: %u", gran);
.. small nit: we generally should use 'invalid' rather than 'illegal'.
No cops will hunt you down for things like this.
Linus
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v6 3/4] vfs: Add timestamp_truncate() api
2018-01-22 20:25 ` Linus Torvalds
@ 2018-01-23 16:25 ` Deepa Dinamani
2018-01-24 11:56 ` Arnd Bergmann
0 siblings, 1 reply; 12+ messages in thread
From: Deepa Dinamani @ 2018-01-23 16:25 UTC (permalink / raw)
To: Linus Torvalds
Cc: Al Viro, Theodore Ts'o, Andreas Dilger, linux-ext4,
linux-fsdevel, Linux Kernel Mailing List, Arnd Bergmann,
y2038 Mailman List
On Mon, Jan 22, 2018 at 12:25 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Sun, Jan 21, 2018 at 6:04 PM, Deepa Dinamani <deepa.kernel@gmail.com> wrote:
>> + t.tv_nsec -= t.tv_nsec % gran;
>
> This doesn't actuall ywork if tv_nsec is negative.
Right.
> Which may not be an issue in most cases, but did somebody check
> utimensat() or whatever?
I checked POSIX again. There is no mention of tv_nsec being positive
always for utimes.
And, the long term plan is to replace all the callers of
timespec_trunc() to use this new api instead for filesystems.
So this will need to be fixed. I will fix this and post an update.
>> + WARN(1, "illegal file time granularity: %u", gran);
>
> .. small nit: we generally should use 'invalid' rather than 'illegal'.
Will update this as well.
Thanks,
Deepa
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v6 3/4] vfs: Add timestamp_truncate() api
2018-01-23 16:25 ` Deepa Dinamani
@ 2018-01-24 11:56 ` Arnd Bergmann
2018-01-24 17:47 ` Deepa Dinamani
2018-01-24 18:00 ` Linus Torvalds
0 siblings, 2 replies; 12+ messages in thread
From: Arnd Bergmann @ 2018-01-24 11:56 UTC (permalink / raw)
To: Deepa Dinamani
Cc: Linus Torvalds, Al Viro, Theodore Ts'o, Andreas Dilger,
linux-ext4, linux-fsdevel, Linux Kernel Mailing List,
y2038 Mailman List
On Tue, Jan 23, 2018 at 5:25 PM, Deepa Dinamani <deepa.kernel@gmail.com> wrote:
> On Mon, Jan 22, 2018 at 12:25 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>> On Sun, Jan 21, 2018 at 6:04 PM, Deepa Dinamani <deepa.kernel@gmail.com> wrote:
>>> + t.tv_nsec -= t.tv_nsec % gran;
>>
>> This doesn't actuall ywork if tv_nsec is negative.
>
> Right.
>
>> Which may not be an issue in most cases, but did somebody check
>> utimensat() or whatever?
>
> I checked POSIX again. There is no mention of tv_nsec being positive
> always for utimes.
> And, the long term plan is to replace all the callers of
> timespec_trunc() to use this new api instead for filesystems.
> So this will need to be fixed. I will fix this and post an update.
I found this on
http://pubs.opengroup.org/onlinepubs/9699919799/functions/utimes.html
ERRORS
These functions shall fail if:
...
[EINVAL]
Either of the times argument structures specified a tv_nsec value that was
neither UTIME_NOW nor UTIME_OMIT, and was a value less than zero or
greater than or equal to 1000 million.
which is the same as the Linux man page and what the kernel actually
does for all the syscalls. The POSIX description seems a bit ambiguous
to whether it also expects or allows EINVAL for utimes() with a tv_usec
over 1000000 microseconds, or if it just applies to the utimensat and
futimens(). Older descriptions that only explain utimes() don't mention
the range check on tv_usec either.
Arnd
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v6 3/4] vfs: Add timestamp_truncate() api
2018-01-24 11:56 ` Arnd Bergmann
@ 2018-01-24 17:47 ` Deepa Dinamani
2018-01-24 18:00 ` Linus Torvalds
1 sibling, 0 replies; 12+ messages in thread
From: Deepa Dinamani @ 2018-01-24 17:47 UTC (permalink / raw)
To: Arnd Bergmann
Cc: Linus Torvalds, Al Viro, Theodore Ts'o, Andreas Dilger,
linux-ext4, linux-fsdevel, Linux Kernel Mailing List,
y2038 Mailman List
On Wed, Jan 24, 2018 at 3:56 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Tue, Jan 23, 2018 at 5:25 PM, Deepa Dinamani <deepa.kernel@gmail.com> wrote:
>> On Mon, Jan 22, 2018 at 12:25 PM, Linus Torvalds
>> <torvalds@linux-foundation.org> wrote:
>>> On Sun, Jan 21, 2018 at 6:04 PM, Deepa Dinamani <deepa.kernel@gmail.com> wrote:
>>>> + t.tv_nsec -= t.tv_nsec % gran;
>>>
>>> This doesn't actuall ywork if tv_nsec is negative.
>>
>> Right.
>>
>>> Which may not be an issue in most cases, but did somebody check
>>> utimensat() or whatever?
>>
>> I checked POSIX again. There is no mention of tv_nsec being positive
>> always for utimes.
>> And, the long term plan is to replace all the callers of
>> timespec_trunc() to use this new api instead for filesystems.
>> So this will need to be fixed. I will fix this and post an update.
>
> I found this on
> http://pubs.opengroup.org/onlinepubs/9699919799/functions/utimes.html
>
> ERRORS
> These functions shall fail if:
> ...
> [EINVAL]
> Either of the times argument structures specified a tv_nsec value that was
> neither UTIME_NOW nor UTIME_OMIT, and was a value less than zero or
> greater than or equal to 1000 million.
>
> which is the same as the Linux man page and what the kernel actually
> does for all the syscalls. The POSIX description seems a bit ambiguous
> to whether it also expects or allows EINVAL for utimes() with a tv_usec
> over 1000000 microseconds, or if it just applies to the utimensat and
> futimens(). Older descriptions that only explain utimes() don't mention
> the range check on tv_usec either.
Right. This is in keeping with the kernel implementation of the
corresponding syscalls.
But, this timespec_truncate() is being called from current_time() and
will be extended to other calls.
C99 says
"When integers are divided, the result of the / operator is the
algebraic quotient with any fractional part discarded (often called
"truncation toward zero"). If the quotient a/b is representable, the
expression (a/b)*b + a%b shall equal a."
Also, we are already checking for gran being non-zero and in the
nanoseconds range.
So I think the right answer here is to add a comment that the function
expects timespec to be normalized, and the functions calling it can
take care of validation.
-Deepa
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v6 3/4] vfs: Add timestamp_truncate() api
2018-01-24 11:56 ` Arnd Bergmann
2018-01-24 17:47 ` Deepa Dinamani
@ 2018-01-24 18:00 ` Linus Torvalds
2018-01-30 16:52 ` Arnd Bergmann
1 sibling, 1 reply; 12+ messages in thread
From: Linus Torvalds @ 2018-01-24 18:00 UTC (permalink / raw)
To: Arnd Bergmann
Cc: Deepa Dinamani, Al Viro, Theodore Ts'o, Andreas Dilger,
linux-ext4, linux-fsdevel, Linux Kernel Mailing List,
y2038 Mailman List
On Wed, Jan 24, 2018 at 3:56 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Tue, Jan 23, 2018 at 5:25 PM, Deepa Dinamani <deepa.kernel@gmail.com> wrote:
>>
>> I checked POSIX again. There is no mention of tv_nsec being positive
>> always for utimes.
>> And, the long term plan is to replace all the callers of
>> timespec_trunc() to use this new api instead for filesystems.
>> So this will need to be fixed. I will fix this and post an update.
>
> I found this on
> http://pubs.opengroup.org/onlinepubs/9699919799/functions/utimes.html
>
> ERRORS
> These functions shall fail if:
> ...
> [EINVAL]
> Either of the times argument structures specified a tv_nsec value that was
> neither UTIME_NOW nor UTIME_OMIT, and was a value less than zero or
> greater than or equal to 1000 million.
The thing is, we shouldn't check the standards, we should check what
we actually _do_.
And what we actually _do_ is to have a tv_nsec that is of type "long",
and while we do have a
timespec64_valid(const struct timespec64 *ts)
and fs/utimes.c has a 'nsec_valid()' that apparently the utimes*
family of system calls all do end up using, I'm more worried about
something where we don't.
Because I'm almost certain that we've had cases where we just
normalize things after-the-fact.
This all likely isn't a big deal, but it does end up worrying me if we
then _depend_ on that granularity thing actually giving the proper
granularity even for odd inputs. If they can happen.
Maybe we don't care?
Linus
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v6 3/4] vfs: Add timestamp_truncate() api
2018-01-24 18:00 ` Linus Torvalds
@ 2018-01-30 16:52 ` Arnd Bergmann
0 siblings, 0 replies; 12+ messages in thread
From: Arnd Bergmann @ 2018-01-30 16:52 UTC (permalink / raw)
To: Linus Torvalds
Cc: Deepa Dinamani, Al Viro, Theodore Ts'o, Andreas Dilger,
linux-ext4, linux-fsdevel, Linux Kernel Mailing List,
y2038 Mailman List
On Wed, Jan 24, 2018 at 7:00 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Wed, Jan 24, 2018 at 3:56 AM, Arnd Bergmann <arnd@arndb.de> wrote:
>> On Tue, Jan 23, 2018 at 5:25 PM, Deepa Dinamani <deepa.kernel@gmail.com> wrote:
>>>
>>> I checked POSIX again. There is no mention of tv_nsec being positive
>>> always for utimes.
>>> And, the long term plan is to replace all the callers of
>>> timespec_trunc() to use this new api instead for filesystems.
>>> So this will need to be fixed. I will fix this and post an update.
>>
>> I found this on
>> http://pubs.opengroup.org/onlinepubs/9699919799/functions/utimes.html
>>
>> ERRORS
>> These functions shall fail if:
>> ...
>> [EINVAL]
>> Either of the times argument structures specified a tv_nsec value that was
>> neither UTIME_NOW nor UTIME_OMIT, and was a value less than zero or
>> greater than or equal to 1000 million.
>
> The thing is, we shouldn't check the standards, we should check what
> we actually _do_.
The issue for tv_sec is that we don't do anything interesting
at the moment, and that's bad.
- The Linux man page says we return -EINVAL for an out-of-range
tv_sec, but the VFS code at the moment ends up running into an
integer overflow, resulting in an arbitrary (i.e. file system specific)
tv_sec when we read back a number that was set out of range.
- POSIX (IEEE Std 1003.1-2008) appears to say we should cap the
tv_sec to the maximum supported value to update the inode /and/
return -EINVAL, which seems better than what we do today, but
would be surprising behavior, as -EINVAL usually indicates that we
did not do anything.
My best guess is that this is not intentional in the spec and should
not be done like that.
- Deepa's patch implements a third option, which is to cap to the
maximum (or minimum for times before the fs specific epoch)
and keep always returning success for utimensat() etc.
This seems the most reasonable behavior.
Between these three, we really need to make a decision.
> And what we actually _do_ is to have a tv_nsec that is of type "long",
> and while we do have a
>
> timespec64_valid(const struct timespec64 *ts)
>
> and fs/utimes.c has a 'nsec_valid()' that apparently the utimes*
> family of system calls all do end up using, I'm more worried about
> something where we don't.
>
> Because I'm almost certain that we've had cases where we just
> normalize things after-the-fact.
>
> This all likely isn't a big deal, but it does end up worrying me if we
> then _depend_ on that granularity thing actually giving the proper
> granularity even for odd inputs. If they can happen.
>
> Maybe we don't care?
This part seems easy, while there are two aspects here, I think they
each have just one answer:
- truncating the nanoseconds in the in-memory inode to the resolution
supported by the file system is currently done by Linux (since 2.6.10).
This behavior matches the Linux and POSIX documentation and
is sensible, so there is no reason to change it. Before 2.6.10, we could
end up with a timestamp moving backwards when an inode got
evicted from the icache and loaded back from disk.
- the range nsec validation is currently correct, I double-checked
the relevant corner cases. We have to be careful when we introduce
the 64-bit time_t based system call to make sure we can deal
with glibc using 32-bit padding in the upper bits. For 64-bit user
space, we must keep returning -EINVAL when those bits are
nonzero, but for 32-bit tasks (compat or native), the current plan
is to ignore the padding and instead take only the lower 32-bit
before performing the range check. Deepa has posted patches
for this in the past, but that's a differnent series.
Arnd
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH V6 0/4]
@ 2011-05-26 19:29 Shirley Ma
0 siblings, 0 replies; 12+ messages in thread
From: Shirley Ma @ 2011-05-26 19:29 UTC (permalink / raw)
To: David Miller, mst, Eric Dumazet, Avi Kivity, Arnd Bergmann
Cc: netdev, kvm, linux-kernel
This patchset add supports for TX zero-copy between guest and host
kernel through vhost. It significantly reduces CPU utilization on the
local host on which the guest is located (It reduced about 50% CPU usage
for single stream test on the host, while 4K message size BW has
increased about 50%). The patchset is based on previous submission and
comments from the community regarding when/how to handle guest kernel
buffers to be released. This is the simplest approach I can think of
after comparing with several other solutions.
This patchset has integrated V3 review comments from community:
1. Add more comments on how to use device ZEROCOPY flag;
2. Change device ZEROCOPY to available bit 31
3. Fix skb header linear allocation when virtio_net GSO is not enabled
It has integrated V4 review comments from MST and Sridhar:
1. In vhost, using socket poll wake up for outstanding DMAs
2. Add detailed comments for vhost_zerocopy_signal_used call
3. Add sleep in vhost shutting down instead of busy-wait for outstanding
DMAs.
4. Copy small packets, don't do zero-copy callback in mavtap, mark it's
DMA done in vhost
5. change zerocopy to bool in macvtap.
It integrates V5 review comments from MST and
Michał Mirosław <mirqus@gmail.com>
1. Prevent userspace apps from holding skb userspace buffers by copying
userspace buffers to kernel in skb_clone, skb_copy, pskb_copy,
pskb_expand_head.
2. It is also used HIGHDMA, SG feature bits to enable ZEROCOPY to remove
the dependency of a new feature bit, we can add it later when new
feature bit is available.
This patchset includes:
1/4: Add a new sock zero-copy flag, SOCK_ZEROCOPY;
2/4: Add a new struct skb_ubuf_info in skb_share_info for userspace
buffers release callback when lower device DMA has done for that skb,
which is the last reference count gone; Or whenever skb_clone, skb_copy,
pskb_copy, pskb_expand_head get call from tcpdump, filtering, these userspace
buffers will be copied into kernel ... we don't want userspace apps to hold
userspace buffers too long.
3/4: Add vhost zero-copy callback in vhost when skb last refcnt is gone;
add vhost_zerocopy_signal_used to notify guest to release TX skb
buffers.
4/4: Add macvtap zero-copy in lower device when sending packet is
greater than 256 bytes.
The patchset is built against most recent net-next linux 2.6.39-rc7. It
has passed netperf/netserver multiple streams stress test, tcpdump
suspended test, dynamically SG change test.
Single TCP_STREAM 120 secs test results over ixgbe 10Gb NIC results:
Message BW(Gb/s)qemu-kvm (NumCPU)vhost-net(NumCPU) PerfTop irq/s
4K 7408.57 92.1% 22.6% 1229
4K(Orig)4913.17 118.1% 84.1% 2086
8K 9129.90 89.3% 23.3% 1141
8K(Orig)7094.55 115.9% 84.7% 2157
16K 9178.81 89.1% 23.3% 1139
16K(Orig)8927.1 118.7% 83.4% 2262
64K 9171.43 88.4% 24.9% 1253
64K(Orig)9085.85 115.9% 82.4% 2229
For message size less or equal than 2K, there is a known KVM guest TX
overrun issue. With this zero-copy patch, the issue becomes more severe,
guest io_exits has tripled than before, so the performance is not good.
Once the TX overrun problem has been addressed, I will retest the small
message size performance.
drivers/net/macvtap.c | 132 ++++++++++++++++++++++++++++++++++++---
drivers/vhost/net.c | 44 +++++++++++++-
drivers/vhost/vhost.c | 49 +++++++++++++++
drivers/vhost/vhost.h | 13 ++++
include/linux/netdevice.h | 10 +++
include/linux/skbuff.h | 26 ++++++++
include/net/sock.h | 1 +
net/core/skbuff.c | 81 ++++++++++++++++++++++++-
8 files changed, 345 insertions(+), 17 deletions(-)
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2018-01-30 16:52 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-22 2:04 [PATCH v6 0/4] Deepa Dinamani
2018-01-22 2:04 ` [PATCH v6 1/4] vfs: Add file timestamp range support Deepa Dinamani
2018-01-22 2:04 ` [PATCH v6 2/4] ext4: Initialize timestamps limits Deepa Dinamani
2018-01-22 2:04 ` [PATCH v6 3/4] vfs: Add timestamp_truncate() api Deepa Dinamani
2018-01-22 20:25 ` Linus Torvalds
2018-01-23 16:25 ` Deepa Dinamani
2018-01-24 11:56 ` Arnd Bergmann
2018-01-24 17:47 ` Deepa Dinamani
2018-01-24 18:00 ` Linus Torvalds
2018-01-30 16:52 ` Arnd Bergmann
2018-01-22 2:04 ` [PATCH v6 4/4] utimes: Clamp the timestamps before update Deepa Dinamani
-- strict thread matches above, loose matches on Subject: below --
2011-05-26 19:29 [PATCH V6 0/4] Shirley Ma
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).