All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Li, Hao" <lihao2018.fnst@cn.fujitsu.com>
To: Ira Weiny <ira.weiny@intel.com>, Yasunori Goto <y-goto@fujitsu.com>
Cc: Dave Chinner <david@fromorbit.com>,
	"linux-xfs@vger.kernel.org" <linux-xfs@vger.kernel.org>,
	"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>
Subject: Re: Can we change the S_DAX flag immediately on XFS without dropping caches?
Date: Tue, 18 Aug 2020 17:16:25 +0800	[thread overview]
Message-ID: <ba98b77e-a806-048a-a0dc-ca585677daf3@cn.fujitsu.com> (raw)
In-Reply-To: <20200807170858.GU1573827@iweiny-DESK2.sc.intel.com>


On 2020/8/8 1:09, Ira Weiny wrote:
> On Fri, Jul 31, 2020 at 06:59:32PM +0900, Yasunori Goto wrote:
>> On 2020/07/30 8:21, Dave Chinner wrote:
>>> On Wed, Jul 29, 2020 at 11:23:21AM +0900, Yasunori Goto wrote:
>>>> Hi,
>>>>
>>>> On 2020/07/28 11:20, Dave Chinner wrote:
>>>>> On Tue, Jul 28, 2020 at 02:00:08AM +0000, Li, Hao wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I have noticed that we have to drop caches to make the changing of S_DAX
>>>>>> flag take effect after using chattr +x to turn on DAX for a existing
>>>>>> regular file. The related function is xfs_diflags_to_iflags, whose
>>>>>> second parameter determines whether we should set S_DAX immediately.
>>>>> Yup, as documented in Documentation/filesystems/dax.txt. Specifically:
>>>>>
>>>>>    6. When changing the S_DAX policy via toggling the persistent FS_XFLAG_DAX flag,
>>>>>       the change in behaviour for existing regular files may not occur
>>>>>       immediately.  If the change must take effect immediately, the administrator
>>>>>       needs to:
>>>>>
>>>>>       a) stop the application so there are no active references to the data set
>>>>>          the policy change will affect
>>>>>
>>>>>       b) evict the data set from kernel caches so it will be re-instantiated when
>>>>>          the application is restarted. This can be achieved by:
>>>>>
>>>>>          i. drop-caches
>>>>>          ii. a filesystem unmount and mount cycle
>>>>>          iii. a system reboot
>>>>>
>>>>>> I can't figure out why we do this. Is this because the page caches in
>>>>>> address_space->i_pages are hard to deal with?
>>>>> Because of unfixable races in the page fault path that prevent
>>>>> changing the caching behaviour of the inode while concurrent access
>>>>> is possible. The only way to guarantee races can't happen is to
>>>>> cycle the inode out of cache.
>>>> I understand why the drop_cache operation is necessary. Thanks.
>>>>
>>>> BTW, even normal user becomes to able to change DAX flag for an inode,
>>>> drop_cache operation still requires root permission, right?
>>> Step back for a minute and explain why you want to be able to change
>>> the DAX mode of a file -as a user-.
>>
>> For example, there are 2 containers executed in a system, which is named as
>> container A and container B, and these host gives FS-DAX files to each
>> containers.
>> If the user of container A would like to change DAX-off for tuning, then he
>> will stop his application
>> and change DAX flag, but the flag may not be changed.
>>
>> Then he will "need" to ask host operator to execute drop_cache, and the
>> operator did it.
>> As a result, not only container A, but also container B get the impact of
>> drop_cache.
>>
>> Especially, if this is multi tenant container system, then I think this is
>> not acceptable.
>>
>> Probably, there are 2 problems I think.
>> 1) drop_cache requires root permission.
>> 2) drop_cache has too wide effect.
>>
>>>> So, if kernel have a feature for normal user can operate drop cache for "a
>>>> inode" with
>>>> its permission, I think it improve the above limitation, and
>>>> we would like to try to implement it recently.
>>> No, drop_caches is not going to be made available to users. That
>>> makes it s trivial system wide DoS vector.
>> The current drop_cache feature tries to drop ALL of cache (page cache and/or
>> slab cache).
>> Then, I agree that normal user should not drop all of them.
>>
>> But my intention was that drop cache of ONE file which is changed dax flag,
>> (and if possible, drop only the inode cache.)
>> Do you mean it will be still cause of weakness against DoS attack?
>> If so, I should give up to solve problem 1) at least.
> FWIW changing the on disk flag automatically flags the inode to be dropped as
> soon as all references are done.
>
> See:
>
> 2c567af418e3 fs: Introduce DCACHE_DONTCACHE
> dae2f8ed7992 fs: Lift XFS_IDONTCACHE to the VFS layer
Hi,

I find that DCACHE_DONTCACHE doesn't work well.
If DCACHE_REFERENCED is not set, dput() can drop the inode successfully as
soon as all references are gone. By contrast, if DCACHE_REFERENCED is set,
dput() only decreases the reference count of dentry and don't evict inode.

Example 1:

echo abcdefg > test.txt
echo 3 > /proc/sys/vm/drop_caches
xfs_io -c 'chattr +x' test.txt

In this example, we can say the DAX policy takes effects immediately as we
don't need to drop cache after chattr.
In this circumstance, DCACHE_REFERENCED is not set, and DCACHE_DONTCACHE
can drop the inode as expected.

Example 2:

echo abcdefg > test.txt
xfs_io -c 'chattr +x' test.txt

In this example, we must drop caches after chattr to make DAX policy
take effects. This is because DCACHE_REFERENCED is set, and fast_dput() will
return true, and then retain_dentry() have no chance to check DCACHE_DONTCACHE.

If this is the desired behavior, I can't understand the necessity
of DCACHE_DONTCACHE.

Regards,
Hao Li
>
> But from a users perspective you just don't know when that will happen.  The
> system just can't guarantee it.  The best the user can do is stop taking
> references to the file and close all references, and periodically check the
> state.  But this will take a reference so...  Kind of a catch-22 here...  :-(
>
> Ira
>
>>
>> Thanks,
>>
>>> Cheers,
>>>
>>> Dave.
>> -- 
>> Yasunori Goto
>>
>

_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

WARNING: multiple messages have this Message-ID (diff)
From: "Li, Hao" <lihao2018.fnst@cn.fujitsu.com>
To: Ira Weiny <ira.weiny@intel.com>, Yasunori Goto <y-goto@fujitsu.com>
Cc: Dave Chinner <david@fromorbit.com>,
	"linux-xfs@vger.kernel.org" <linux-xfs@vger.kernel.org>,
	"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>
Subject: Re: Can we change the S_DAX flag immediately on XFS without dropping caches?
Date: Tue, 18 Aug 2020 17:16:25 +0800	[thread overview]
Message-ID: <ba98b77e-a806-048a-a0dc-ca585677daf3@cn.fujitsu.com> (raw)
In-Reply-To: <20200807170858.GU1573827@iweiny-DESK2.sc.intel.com>


On 2020/8/8 1:09, Ira Weiny wrote:
> On Fri, Jul 31, 2020 at 06:59:32PM +0900, Yasunori Goto wrote:
>> On 2020/07/30 8:21, Dave Chinner wrote:
>>> On Wed, Jul 29, 2020 at 11:23:21AM +0900, Yasunori Goto wrote:
>>>> Hi,
>>>>
>>>> On 2020/07/28 11:20, Dave Chinner wrote:
>>>>> On Tue, Jul 28, 2020 at 02:00:08AM +0000, Li, Hao wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I have noticed that we have to drop caches to make the changing of S_DAX
>>>>>> flag take effect after using chattr +x to turn on DAX for a existing
>>>>>> regular file. The related function is xfs_diflags_to_iflags, whose
>>>>>> second parameter determines whether we should set S_DAX immediately.
>>>>> Yup, as documented in Documentation/filesystems/dax.txt. Specifically:
>>>>>
>>>>>    6. When changing the S_DAX policy via toggling the persistent FS_XFLAG_DAX flag,
>>>>>       the change in behaviour for existing regular files may not occur
>>>>>       immediately.  If the change must take effect immediately, the administrator
>>>>>       needs to:
>>>>>
>>>>>       a) stop the application so there are no active references to the data set
>>>>>          the policy change will affect
>>>>>
>>>>>       b) evict the data set from kernel caches so it will be re-instantiated when
>>>>>          the application is restarted. This can be achieved by:
>>>>>
>>>>>          i. drop-caches
>>>>>          ii. a filesystem unmount and mount cycle
>>>>>          iii. a system reboot
>>>>>
>>>>>> I can't figure out why we do this. Is this because the page caches in
>>>>>> address_space->i_pages are hard to deal with?
>>>>> Because of unfixable races in the page fault path that prevent
>>>>> changing the caching behaviour of the inode while concurrent access
>>>>> is possible. The only way to guarantee races can't happen is to
>>>>> cycle the inode out of cache.
>>>> I understand why the drop_cache operation is necessary. Thanks.
>>>>
>>>> BTW, even normal user becomes to able to change DAX flag for an inode,
>>>> drop_cache operation still requires root permission, right?
>>> Step back for a minute and explain why you want to be able to change
>>> the DAX mode of a file -as a user-.
>>
>> For example, there are 2 containers executed in a system, which is named as
>> container A and container B, and these host gives FS-DAX files to each
>> containers.
>> If the user of container A would like to change DAX-off for tuning, then he
>> will stop his application
>> and change DAX flag, but the flag may not be changed.
>>
>> Then he will "need" to ask host operator to execute drop_cache, and the
>> operator did it.
>> As a result, not only container A, but also container B get the impact of
>> drop_cache.
>>
>> Especially, if this is multi tenant container system, then I think this is
>> not acceptable.
>>
>> Probably, there are 2 problems I think.
>> 1) drop_cache requires root permission.
>> 2) drop_cache has too wide effect.
>>
>>>> So, if kernel have a feature for normal user can operate drop cache for "a
>>>> inode" with
>>>> its permission, I think it improve the above limitation, and
>>>> we would like to try to implement it recently.
>>> No, drop_caches is not going to be made available to users. That
>>> makes it s trivial system wide DoS vector.
>> The current drop_cache feature tries to drop ALL of cache (page cache and/or
>> slab cache).
>> Then, I agree that normal user should not drop all of them.
>>
>> But my intention was that drop cache of ONE file which is changed dax flag,
>> (and if possible, drop only the inode cache.)
>> Do you mean it will be still cause of weakness against DoS attack?
>> If so, I should give up to solve problem 1) at least.
> FWIW changing the on disk flag automatically flags the inode to be dropped as
> soon as all references are done.
>
> See:
>
> 2c567af418e3 fs: Introduce DCACHE_DONTCACHE
> dae2f8ed7992 fs: Lift XFS_IDONTCACHE to the VFS layer
Hi,

I find that DCACHE_DONTCACHE doesn't work well.
If DCACHE_REFERENCED is not set, dput() can drop the inode successfully as
soon as all references are gone. By contrast, if DCACHE_REFERENCED is set,
dput() only decreases the reference count of dentry and don't evict inode.

Example 1:

echo abcdefg > test.txt
echo 3 > /proc/sys/vm/drop_caches
xfs_io -c 'chattr +x' test.txt

In this example, we can say the DAX policy takes effects immediately as we
don't need to drop cache after chattr.
In this circumstance, DCACHE_REFERENCED is not set, and DCACHE_DONTCACHE
can drop the inode as expected.

Example 2:

echo abcdefg > test.txt
xfs_io -c 'chattr +x' test.txt

In this example, we must drop caches after chattr to make DAX policy
take effects. This is because DCACHE_REFERENCED is set, and fast_dput() will
return true, and then retain_dentry() have no chance to check DCACHE_DONTCACHE.

If this is the desired behavior, I can't understand the necessity
of DCACHE_DONTCACHE.

Regards,
Hao Li
>
> But from a users perspective you just don't know when that will happen.  The
> system just can't guarantee it.  The best the user can do is stop taking
> references to the file and close all references, and periodically check the
> state.  But this will take a reference so...  Kind of a catch-22 here...  :-(
>
> Ira
>
>>
>> Thanks,
>>
>>> Cheers,
>>>
>>> Dave.
>> -- 
>> Yasunori Goto
>>
>



  reply	other threads:[~2020-08-18  9:16 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-28  2:00 Can we change the S_DAX flag immediately on XFS without dropping caches? Li, Hao
2020-07-28  2:00 ` Li, Hao
2020-07-28  2:20 ` Dave Chinner
2020-07-28  2:20   ` Dave Chinner
2020-07-29  2:23   ` Yasunori Goto
2020-07-29  2:23     ` Yasunori Goto
2020-07-29 16:10     ` Ira Weiny
2020-07-29 16:10       ` Ira Weiny
2020-07-31  9:12       ` Li, Hao
2020-07-31  9:12         ` Li, Hao
2020-08-05  8:10         ` Li, Hao
2020-08-05  8:10           ` Li, Hao
2020-08-05 15:44           ` Darrick J. Wong
2020-08-05 15:44             ` Darrick J. Wong
2020-08-07 16:57             ` Ira Weiny
2020-08-07 16:57               ` Ira Weiny
2020-07-31 10:04       ` Yasunori Goto
2020-07-31 10:04         ` Yasunori Goto
2020-07-29 23:21     ` Dave Chinner
2020-07-29 23:21       ` Dave Chinner
2020-07-31  9:15       ` Li, Hao
2020-07-31  9:15         ` Li, Hao
2020-07-31  9:59       ` Yasunori Goto
2020-07-31  9:59         ` Yasunori Goto
2020-08-07 17:09         ` Ira Weiny
2020-08-07 17:09           ` Ira Weiny
2020-08-18  9:16           ` Li, Hao [this message]
2020-08-18  9:16             ` Li, Hao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ba98b77e-a806-048a-a0dc-ca585677daf3@cn.fujitsu.com \
    --to=lihao2018.fnst@cn.fujitsu.com \
    --cc=david@fromorbit.com \
    --cc=ira.weiny@intel.com \
    --cc=linux-nvdimm@lists.01.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=y-goto@fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.