On Mar 15, 2018, at 11:51 AM, Andiry Xu <jix024@eng.ucsd.edu> wrote:
> 
> On Thu, Mar 15, 2018 at 2:05 AM, Arnd Bergmann <arnd@arndb.de> wrote:
>> On Thu, Mar 15, 2018 at 7:11 AM, Andiry Xu <jix024@eng.ucsd.edu> wrote:
>>> On Wed, Mar 14, 2018 at 9:54 PM, Darrick J. Wong
>>> <darrick.wong@oracle.com> wrote:
>>>> On Sat, Mar 10, 2018 at 10:17:44AM -0800, Andiry Xu wrote:
>> 
>>>>> +     /* s_mtime and s_wtime should be together and their order should not be
>>>>> +      * changed. we use an 8 byte write to update both of them atomically
>>>>> +      */
>>>>> +     __le32          s_mtime;                /* mount time */
>>>>> +     __le32          s_wtime;                /* write time */
>>>> 
>>>> Hmmm, 32-bit timestamps?  2038 isn't that far away...
>>>> 
>>> 
>>> I will try fixing this in the next version.
>> 
>> I would also recommend adding nanosecond-resolution timestamps.
>> In theory, a signed 64-bit nanosecond field is sufficient for each timestamp
>> (it's good for several hundred years), but the more common format uses
>> 64-bit seconds and 32-bit nanoseconds in other file systems.
>> 
>> Unfortunately it looks, you will have to come up with a more sophisticated
>> update method above, even if you leave out the nanoseconds, you can't
>> easily rely on a 16-byte atomic update across architectures to deal with
>> the two 64-bit timestamps. For the superblock fields, you might be able
>> to get away with using second resolution, and then encoding the
>> timestamps as a signed 64-bit 'mkfs time' along with two unsigned
>> 32-bit times added on top, which gives you a range of 136 years mount
>> a file system after its creation.
>> 
> 
> I will take a look at other file systems.
> 
> Superblock mtime is not a big problem as it is updated rarely. 64-bit
> seconds and 32-bit nanoseconds make the inode and log entry bigger,
> and updating file->atime cannot be done with a single 64bit update.
> That may be annoying and needs to use journaling.

If the 64-bit atomicity was really a performance issue, you could do
something like:

	__u32	time_high = seconds >> 32;
	__u64	time_low = seconds << 32 | nanoseconds;

and then you only need to update time_high with a journal operation if it
has changed from the current time_high value (about once every 140 years),
and the time_low can be set atomically.  It needs a few extra cycles each
time (hidden with an unlikely()) vs. just setting both, but that is a win
if it avoids other CPU or IO overhead.

Cheers, Andreas