linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Yan, Zheng" <ukernel@gmail.com>
To: Arnd Bergmann <arnd@arndb.de>
Cc: Deepa Dinamani <deepa.kernel@gmail.com>,
	John Stultz <john.stultz@linaro.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Al Viro <viro@zeniv.linux.org.uk>,
	gregkh <gregkh@linuxfoundation.org>,
	"Dilger, Andreas" <andreas.dilger@intel.com>,
	"J. Bruce Fields" <bfields@fieldses.org>,
	Chris Mason <clm@fb.com>, David Miller <davem@davemloft.net>,
	David Sterba <dsterba@suse.com>,
	Evgeniy Dushistov <dushistov@mail.ru>,
	Eric Paris <eparis@redhat.com>, Jaegeuk Kim <jaegeuk@kernel.org>,
	Josef Bacik <jbacik@fb.com>,
	Jeff Layton <jlayton@poochiereds.net>,
	James Simmons <jsimmons@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	"Drokin, Oleg" <oleg.drokin@intel.com>,
	Paul Moore <paul@paul-moore.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	yuchao0@huawei.com, ceph-devel <ceph-devel@vger.kernel.org>,
	devel@driverdev.osuosl.org, linux-audit@redhat.com,
	linux-btrfs <linux-btrfs@vger.kernel.org>,
	linux-cifs@vger.kernel.org, "Linux F2FS DEV,
	Mailing List"  <linux-f2fs-devel@lists.sourceforge.net>,
	Linux FS-devel Mailing List <linux-fsdevel@vger.kernel.org>,
	linux-mtd <linux-mtd@lists.infradead.org>,
	LSM List <linux-security-module@vger.kernel.org>,
	Lustre Development List <lustre-devel@lists.lustre.org>,
	Networking <netdev@vger.kernel.org>,
	samba-technical@lists.samba.org,
	y2038 Mailman List <y2038@lists.linaro.org>
Subject: Re: [PATCH 04/12] fs: ceph: CURRENT_TIME with ktime_get_real_ts()
Date: Fri, 2 Jun 2017 18:10:49 +0800	[thread overview]
Message-ID: <CAAM7YA=BJHFLz_Ud==7L=yCzxj+1sjctc68tvpNogPhiSxxELQ@mail.gmail.com> (raw)
In-Reply-To: <CAK8P3a1TwkVRCwDWqgYE8Z3wZT2+v6xayq=dVeW8KaHmO0W4cg@mail.gmail.com>

On Fri, Jun 2, 2017 at 5:45 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Fri, Jun 2, 2017 at 4:09 AM, Yan, Zheng <ukernel@gmail.com> wrote:
>> On Fri, Jun 2, 2017 at 8:57 AM, Deepa Dinamani <deepa.kernel@gmail.com> wrote:
>>> On Thu, Jun 1, 2017 at 5:36 PM, John Stultz <john.stultz@linaro.org> wrote:
>>>> On Thu, Jun 1, 2017 at 5:26 PM, Yan, Zheng <ukernel@gmail.com> wrote:
>>>>> On Thu, Jun 1, 2017 at 6:22 PM, Arnd Bergmann <arnd@arndb.de> wrote:
>>>>>> On Thu, Jun 1, 2017 at 11:56 AM, Yan, Zheng <ukernel@gmail.com> wrote:
>>>>>>> On Sat, Apr 8, 2017 at 8:57 AM, Deepa Dinamani <deepa.kernel@gmail.com> wrote:
>>>>>>
>>>>>>>> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
>>>>>>>> index 517838b..77204da 100644
>>>>>>>> --- a/drivers/block/rbd.c
>>>>>>>> +++ b/drivers/block/rbd.c
>>>>>>>> @@ -1922,7 +1922,7 @@ static void rbd_osd_req_format_write(struct rbd_obj_request *obj_request)
>>>>>>>>  {
>>>>>>>>         struct ceph_osd_request *osd_req = obj_request->osd_req;
>>>>>>>>
>>>>>>>> -       osd_req->r_mtime = CURRENT_TIME;
>>>>>>>> +       ktime_get_real_ts(&osd_req->r_mtime);
>>>>>>>>         osd_req->r_data_offset = obj_request->offset;
>>>>>>>>  }
>>>>>>>>
>>>>>>>> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
>>>>>>>> index c681762..1d3fa90 100644
>>>>>>>> --- a/fs/ceph/mds_client.c
>>>>>>>> +++ b/fs/ceph/mds_client.c
>>>>>>>> @@ -1666,6 +1666,7 @@ struct ceph_mds_request *
>>>>>>>>  ceph_mdsc_create_request(struct ceph_mds_client *mdsc, int op, int mode)
>>>>>>>>  {
>>>>>>>>         struct ceph_mds_request *req = kzalloc(sizeof(*req), GFP_NOFS);
>>>>>>>> +       struct timespec ts;
>>>>>>>>
>>>>>>>>         if (!req)
>>>>>>>>                 return ERR_PTR(-ENOMEM);
>>>>>>>> @@ -1684,7 +1685,8 @@ ceph_mdsc_create_request(struct ceph_mds_client *mdsc, int op, int mode)
>>>>>>>>         init_completion(&req->r_safe_completion);
>>>>>>>>         INIT_LIST_HEAD(&req->r_unsafe_item);
>>>>>>>>
>>>>>>>> -       req->r_stamp = current_fs_time(mdsc->fsc->sb);
>>>>>>>> +       ktime_get_real_ts(&ts);
>>>>>>>> +       req->r_stamp = timespec_trunc(ts, mdsc->fsc->sb->s_time_gran);
>>>>>>>
>>>>>>> This change causes our kernel_untar_tar test case to fail (inode's
>>>>>>> ctime goes back). The reason is that there is time drift between the
>>>>>>> time stamps got by  ktime_get_real_ts() and current_time(). We need to
>>>>>>> revert this change until current_time() uses ktime_get_real_ts()
>>>>>>> internally.
>>>>>>
>>>>>> Hmm, the change was not supposed to have a user-visible effect, so
>>>>>> something has gone wrong, but I don't immediately see how it
>>>>>> relates to what you observe.
>>>>>>
>>>>>> ktime_get_real_ts() and current_time() use the same time base, there
>>>>>> is no drift, but there is a difference in resolution, as the latter uses
>>>>>> the time stamp of the last jiffies update, which may be up to one jiffy
>>>>>> (10ms) behind the exact time we put in the request stamps here.
>>>>>>
>>>>>> Do you still see problems if you use current_kernel_time() instead of
>>>>>> ktime_get_real_ts()?
>>>>>
>>>>> The problem disappears after using current_kernel_time().
>>>>>
>>>>> https://github.com/ceph/ceph-client/commit/2e0f648da23167034a3cf1500bc90ec60aef2417
>>>>
>>>> From the commit above:
>>>> "It seems there is time drift between ktime_get_real_ts() and
>>>> current_kernel_time()"
>>>>
>>>> Its more of a granularity difference. current_kernel_time() returns
>>>> the cached time at the last tick, where as ktime_get_real_ts() reads
>>>> the clocksource hardware and returns the immediate time.
>>>>
>>>> Filesystems usually use the cached time (similar to
>>>> CLOCK_REALTIME_COARSE), for performance reasons, as touching the
>>>> clocksource takes time.
>>>
>>> Alternatively, it would be best for this code also to use current_time().
>>> I had suggested this in one of the previous versions of the patch.
>>> The implementation of current_time() will change when we switch vfs to
>>> use 64 bit time. This will prevent such errors from happening again.
>>> But, this also means there is more code reordering for these modules
>>> to get a reference to inode.
>>>
>>
>> I took a look. it's quite inconvenience to use current_time(). I
>> prefer to temporarily use current_kernel_time().
>
> I've looked at the code some more and I think there is another angle to it:
> In your test case, 'tar' calls into the utimes syscall (or a member of its
> family), which sets the i_ctime field in the inode to the curren time
> (using current_time()), then calls __ceph_setattr(), which creates
> a mds client request, and ceph_mdsc_create_request() takes another
> time stamp and stores it in r_stamp.
>
> We then store the first timestamp (only) in the in-memory inode, and the
> second time stamp in the request. Depending on the state of the inode,
> we may also set the ctime to a third timestamp we again take using
> current_time().
>
> The mtime and atime from user space get passed correctly through
> union ceph_mds_request_args->setattr and are kept in sync between
> the in-memory inode and the persistent inode data, but the ctime
> in the inode never makes it to the lower protocol levels and instead
> we use the r_stamp field that got set a little earlier or a little later.
>
> I believe the bug you see is the result of the two timestamps
> currently being almost guaranteed to be different in the latest
> kernels.
> Changing r_stamp to use current_kernel_time() will make it the
> same value most of the time (as it was before Deepa's patch),
> but when the timer interrupt happens between the timestamps,
> the two are still different, it's just much harder to hit.
>
> I think the proper solution should be to change __ceph_setattr()
> in a way that has req->r_stamp always synchronized with i_ctime.
> If we copy i_ctime to r_stamp, that will also take care of the
> future issues with the planned changes to current_time().
>
I already have a patch
https://github.com/ceph/ceph-client/commit/24f54cd18e195a002ee3d2ab50dbc952fd9f82af


> The part I don't understand is what else r_stamp (i.e. the time
> stamp in ceph_msg_data with type==
> CEPH_MSG_CLIENT_REQUEST) is used for, other than setting
> ctime in CEPH_MDS_OP_SETATTR.
>
> Will this be used to update the stored i_ctime for other operations
> too? If so, we would need to synchronize it with the in-memory
> i_ctime for all operations that do this.
>

yes,  mds uses it to update ctime of modified inodes. For example,
when handling mkdir, mds set ctime of both parent inode and new inode
to r_stamp.

Regards
Yan, Zheng

> Semi-related side note: I see that the granularity for ceph_timespec
> is 1000 nanoseconds, so the value is always a multiple of 1000.
> If the full 32-bit data gets stored, we could use this it to store the
> epoch number in the future:
>
> static inline void ceph_decode_timespec(struct timespec64 *ts,
>                                         const struct ceph_timespec *tv)
> {
>         u32 ns_epoch = le32_to_cpu(tv->tv_nsec);
>         u32 epoch = ns_epoch % 1000;
>
>         /* tv_sec is traditionally interpreted as unsigned
>          * with time ranges 1970-2106, we extend
>          * it to 1970-138069 */
>         ts->tv_sec = (u64)le32_to_cpu(tv->tv_sec) +
>                             (u64)epoch << 32;
>         ts->tv_nsec = ns_epoch - epoch;
> }
>
>       Arnd

  reply	other threads:[~2017-06-02 10:11 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-08  0:56 [PATCH 00/12] Delete CURRENT_TIME, CURRENT_TIME_SEC and current_fs_time Deepa Dinamani
2017-04-08  0:56 ` [PATCH 01/12] fs: f2fs: Use ktime_get_real_seconds for sit_info times Deepa Dinamani
2017-04-08  0:57 ` [PATCH 02/12] trace: Make trace_hwlat timestamp y2038 safe Deepa Dinamani
2017-04-08  1:30   ` Steven Rostedt
2017-04-08  2:16     ` Deepa Dinamani
2017-04-08  0:57 ` [PATCH 03/12] fs: cifs: Replace CURRENT_TIME by other appropriate apis Deepa Dinamani
2017-04-08  0:57 ` [PATCH 04/12] fs: ceph: CURRENT_TIME with ktime_get_real_ts() Deepa Dinamani
2017-06-01  9:56   ` Yan, Zheng
2017-06-01 10:22     ` Arnd Bergmann
2017-06-02  0:26       ` Yan, Zheng
2017-06-02  0:36         ` John Stultz
2017-06-02  0:57           ` Deepa Dinamani
2017-06-02  2:09             ` Yan, Zheng
2017-06-02  9:45               ` Arnd Bergmann
2017-06-02 10:10                 ` Yan, Zheng [this message]
2017-06-02 10:51                   ` Arnd Bergmann
2017-06-02 11:18                     ` Yan, Zheng
2017-06-02 11:33                       ` Arnd Bergmann
2017-06-02 12:18                         ` Yan, Zheng
2017-06-02 14:18                           ` Arnd Bergmann
2017-06-04  7:54                             ` Yan, Zheng
2017-06-02  0:35       ` Yan, Zheng
2017-04-08  0:57 ` [PATCH 05/12] fs: ufs: Use ktime_get_real_ts64() for birthtime Deepa Dinamani
2017-04-08  0:57 ` [PATCH 06/12] audit: Use timespec64 to represent audit timestamps Deepa Dinamani
2017-04-08 12:40   ` Paul Moore
2017-04-08 17:58     ` Deepa Dinamani
2017-04-11 20:12       ` Paul Moore
2017-04-15 10:29       ` Arnd Bergmann
2017-04-08  0:57 ` [PATCH 07/12] fs: btrfs: Use ktime_get_real_ts for root ctime Deepa Dinamani
2017-04-12 13:42   ` David Sterba
2017-04-08  0:57 ` [PATCH 08/12] fs: ubifs: Replace CURRENT_TIME_SEC with current_time Deepa Dinamani
2017-04-08  0:57 ` [PATCH 09/12] lustre: Replace CURRENT_TIME macro Deepa Dinamani
2017-04-08  0:57 ` [PATCH 10/12] apparmorfs: Replace CURRENT_TIME with current_time() Deepa Dinamani
2017-06-02  7:29   ` John Johansen
2017-04-08  0:57 ` [PATCH 11/12] time: Delete CURRENT_TIME_SEC and CURRENT_TIME Deepa Dinamani
2017-04-08  0:57 ` [PATCH 12/12] time: Delete current_fs_time() function Deepa Dinamani

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAAM7YA=BJHFLz_Ud==7L=yCzxj+1sjctc68tvpNogPhiSxxELQ@mail.gmail.com' \
    --to=ukernel@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=andreas.dilger@intel.com \
    --cc=arnd@arndb.de \
    --cc=bfields@fieldses.org \
    --cc=ceph-devel@vger.kernel.org \
    --cc=clm@fb.com \
    --cc=davem@davemloft.net \
    --cc=deepa.kernel@gmail.com \
    --cc=devel@driverdev.osuosl.org \
    --cc=dsterba@suse.com \
    --cc=dushistov@mail.ru \
    --cc=eparis@redhat.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=jaegeuk@kernel.org \
    --cc=jbacik@fb.com \
    --cc=jlayton@poochiereds.net \
    --cc=john.stultz@linaro.org \
    --cc=jsimmons@infradead.org \
    --cc=linux-audit@redhat.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-cifs@vger.kernel.org \
    --cc=linux-f2fs-devel@lists.sourceforge.net \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mtd@lists.infradead.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=lustre-devel@lists.lustre.org \
    --cc=mingo@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=oleg.drokin@intel.com \
    --cc=paul@paul-moore.com \
    --cc=rostedt@goodmis.org \
    --cc=samba-technical@lists.samba.org \
    --cc=tglx@linutronix.de \
    --cc=viro@zeniv.linux.org.uk \
    --cc=y2038@lists.linaro.org \
    --cc=yuchao0@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).