All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v7] mm, pmem, xfs: Introduce MF_MEM_REMOVE for unbind
@ 2022-08-26 13:03 Shiyang Ruan
  2022-08-26 21:35 ` Dan Williams
  2022-09-02 10:35 ` [PATCH v8 0/3] " Shiyang Ruan
  0 siblings, 2 replies; 19+ messages in thread
From: Shiyang Ruan @ 2022-08-26 13:03 UTC (permalink / raw)
  To: linux-kernel, linux-xfs, nvdimm, linux-mm, linux-fsdevel
  Cc: djwong, dan.j.williams, david, hch, jane.chu

This patch is inspired by Dan's "mm, dax, pmem: Introduce
dev_pagemap_failure()"[1].  With the help of dax_holder and
->notify_failure() mechanism, the pmem driver is able to ask filesystem
(or mapped device) on it to unmap all files in use and notify processes
who are using those files.

Call trace:
trigger unbind
  -> unbind_store()
   -> ... (skip)
    -> devres_release_all()
     -> kill_dax()
      -> dax_holder_notify_failure(dax_dev, 0, U64_MAX, MF_MEM_PRE_REMOVE)
       -> xfs_dax_notify_failure()

Introduce MF_MEM_PRE_REMOVE to let filesystem know this is a remove
event.  So do not shutdown filesystem directly if something not
supported, or if failure range includes metadata area.  Make sure all
files and processes are handled correctly.

==
Changes since v6:
   1. Rebase on 6.0-rc2 and Darrick's patch[2].

Changes since v5:
   1. Renamed MF_MEM_REMOVE to MF_MEM_PRE_REMOVE
   2. hold s_umount before sync_filesystem()
   3. do sync_filesystem() after SB_BORN check
   4. Rebased on next-20220714

[1]: 
https://lore.kernel.org/linux-mm/161604050314.1463742.14151665140035795571.stgit@dwillia2-desk3.amr.corp.intel.com/
[2]: https://lore.kernel.org/linux-xfs/Yv5wIa2crHioYeRr@magnolia/

Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
  drivers/dax/super.c         |  3 ++-
  fs/xfs/xfs_notify_failure.c | 15 +++++++++++++++
  include/linux/mm.h          |  1 +
  3 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index 9b5e2a5eb0ae..cf9a64563fbe 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -323,7 +323,8 @@ void kill_dax(struct dax_device *dax_dev)
  		return;
   	if (dax_dev->holder_data != NULL)
-		dax_holder_notify_failure(dax_dev, 0, U64_MAX, 0);
+		dax_holder_notify_failure(dax_dev, 0, U64_MAX,
+				MF_MEM_PRE_REMOVE);
   	clear_bit(DAXDEV_ALIVE, &dax_dev->flags);
  	synchronize_srcu(&dax_srcu);
diff --git a/fs/xfs/xfs_notify_failure.c b/fs/xfs/xfs_notify_failure.c
index 65d5eb20878e..a9769f17e998 100644
--- a/fs/xfs/xfs_notify_failure.c
+++ b/fs/xfs/xfs_notify_failure.c
@@ -77,6 +77,9 @@ xfs_dax_failure_fn(
   	if (XFS_RMAP_NON_INODE_OWNER(rec->rm_owner) ||
  	    (rec->rm_flags & (XFS_RMAP_ATTR_FORK | XFS_RMAP_BMBT_BLOCK))) {
+		/* Do not shutdown so early when device is to be removed */
+		if (notify->mf_flags & MF_MEM_PRE_REMOVE)
+			return 0;
  		notify->want_shutdown = true;
  		return 0;
  	}
@@ -182,12 +185,22 @@ xfs_dax_notify_failure(
  	struct xfs_mount	*mp = dax_holder(dax_dev);
  	u64			ddev_start;
  	u64			ddev_end;
+	int			error;
   	if (!(mp->m_sb.sb_flags & SB_BORN)) {
  		xfs_warn(mp, "filesystem is not ready for notify_failure()!");
  		return -EIO;
  	}
  +	if (mf_flags & MF_MEM_PRE_REMOVE) {
+		xfs_info(mp, "device is about to be removed!");
+		down_write(&mp->m_super->s_umount);
+		error = sync_filesystem(mp->m_super);
+		up_write(&mp->m_super->s_umount);
+		if (error)
+			return error;
+	}
+
  	if (mp->m_rtdev_targp && mp->m_rtdev_targp->bt_daxdev == dax_dev) {
  		xfs_warn(mp,
  			 "notify_failure() not supported on realtime device!");
@@ -196,6 +209,8 @@ xfs_dax_notify_failure(
   	if (mp->m_logdev_targp && mp->m_logdev_targp->bt_daxdev == dax_dev &&
  	    mp->m_logdev_targp != mp->m_ddev_targp) {
+		if (mf_flags & MF_MEM_PRE_REMOVE)
+			return 0;
  		xfs_err(mp, "ondisk log corrupt, shutting down fs!");
  		xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_ONDISK);
  		return -EFSCORRUPTED;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 982f2607180b..2c7c132e6512 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3176,6 +3176,7 @@ enum mf_flags {
  	MF_UNPOISON = 1 << 4,
  	MF_SW_SIMULATED = 1 << 5,
  	MF_NO_RETRY = 1 << 6,
+	MF_MEM_PRE_REMOVE = 1 << 7,
  };
  int mf_dax_kill_procs(struct address_space *mapping, pgoff_t index,
  		      unsigned long count, int mf_flags);
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* RE: [PATCH v7] mm, pmem, xfs: Introduce MF_MEM_REMOVE for unbind
  2022-08-26 13:03 [PATCH v7] mm, pmem, xfs: Introduce MF_MEM_REMOVE for unbind Shiyang Ruan
@ 2022-08-26 21:35 ` Dan Williams
  2022-08-29 10:02   ` Shiyang Ruan
  2022-09-02 10:35 ` [PATCH v8 0/3] " Shiyang Ruan
  1 sibling, 1 reply; 19+ messages in thread
From: Dan Williams @ 2022-08-26 21:35 UTC (permalink / raw)
  To: Shiyang Ruan, linux-kernel, linux-xfs, nvdimm, linux-mm, linux-fsdevel
  Cc: djwong, dan.j.williams, david, hch, jane.chu

Shiyang Ruan wrote:
> This patch is inspired by Dan's "mm, dax, pmem: Introduce
> dev_pagemap_failure()"[1].  With the help of dax_holder and
> ->notify_failure() mechanism, the pmem driver is able to ask filesystem
> (or mapped device) on it to unmap all files in use and notify processes
> who are using those files.
> 
> Call trace:
> trigger unbind
>   -> unbind_store()
>    -> ... (skip)
>     -> devres_release_all()
>      -> kill_dax()
>       -> dax_holder_notify_failure(dax_dev, 0, U64_MAX, MF_MEM_PRE_REMOVE)
>        -> xfs_dax_notify_failure()
> 
> Introduce MF_MEM_PRE_REMOVE to let filesystem know this is a remove
> event.  So do not shutdown filesystem directly if something not
> supported, or if failure range includes metadata area.  Make sure all
> files and processes are handled correctly.
> 
> ==
> Changes since v6:
>    1. Rebase on 6.0-rc2 and Darrick's patch[2].
> 
> Changes since v5:
>    1. Renamed MF_MEM_REMOVE to MF_MEM_PRE_REMOVE
>    2. hold s_umount before sync_filesystem()
>    3. do sync_filesystem() after SB_BORN check
>    4. Rebased on next-20220714
> 
> [1]: 
> https://lore.kernel.org/linux-mm/161604050314.1463742.14151665140035795571.stgit@dwillia2-desk3.amr.corp.intel.com/
> [2]: https://lore.kernel.org/linux-xfs/Yv5wIa2crHioYeRr@magnolia/
> 
> Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> ---
>   drivers/dax/super.c         |  3 ++-
>   fs/xfs/xfs_notify_failure.c | 15 +++++++++++++++
>   include/linux/mm.h          |  1 +
>   3 files changed, 18 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/dax/super.c b/drivers/dax/super.c
> index 9b5e2a5eb0ae..cf9a64563fbe 100644
> --- a/drivers/dax/super.c
> +++ b/drivers/dax/super.c
> @@ -323,7 +323,8 @@ void kill_dax(struct dax_device *dax_dev)
>   		return;
>    	if (dax_dev->holder_data != NULL)
> -		dax_holder_notify_failure(dax_dev, 0, U64_MAX, 0);
> +		dax_holder_notify_failure(dax_dev, 0, U64_MAX,
> +				MF_MEM_PRE_REMOVE);
>    	clear_bit(DAXDEV_ALIVE, &dax_dev->flags);
>   	synchronize_srcu(&dax_srcu);
> diff --git a/fs/xfs/xfs_notify_failure.c b/fs/xfs/xfs_notify_failure.c
> index 65d5eb20878e..a9769f17e998 100644
> --- a/fs/xfs/xfs_notify_failure.c
> +++ b/fs/xfs/xfs_notify_failure.c
> @@ -77,6 +77,9 @@ xfs_dax_failure_fn(
>    	if (XFS_RMAP_NON_INODE_OWNER(rec->rm_owner) ||
>   	    (rec->rm_flags & (XFS_RMAP_ATTR_FORK | XFS_RMAP_BMBT_BLOCK))) {
> +		/* Do not shutdown so early when device is to be removed */
> +		if (notify->mf_flags & MF_MEM_PRE_REMOVE)
> +			return 0;
>   		notify->want_shutdown = true;
>   		return 0;
>   	}
> @@ -182,12 +185,22 @@ xfs_dax_notify_failure(
>   	struct xfs_mount	*mp = dax_holder(dax_dev);
>   	u64			ddev_start;
>   	u64			ddev_end;
> +	int			error;
>    	if (!(mp->m_sb.sb_flags & SB_BORN)) {

How are you testing the SB_BORN interactions? I have a fix for this
pending here:

https://lore.kernel.org/nvdimm/166153428094.2758201.7936572520826540019.stgit@dwillia2-xfh.jf.intel.com/

>   		xfs_warn(mp, "filesystem is not ready for notify_failure()!");
>   		return -EIO;
>   	}
>   +	if (mf_flags & MF_MEM_PRE_REMOVE) {

It appears this patch is corrupted here. I confirmed that b4 sees the
same when trying to apply it.

> +		xfs_info(mp, "device is about to be removed!");
> +		down_write(&mp->m_super->s_umount);
> +		error = sync_filesystem(mp->m_super);

This syncs to make data persistent, but for DAX this also needs to get
invalidate all current DAX mappings. I do not see that in these changes.

> +		up_write(&mp->m_super->s_umount);
> +		if (error)
> +			return error;
> +	}
> +
>   	if (mp->m_rtdev_targp && mp->m_rtdev_targp->bt_daxdev == dax_dev) {
>   		xfs_warn(mp,
>   			 "notify_failure() not supported on realtime device!");
> @@ -196,6 +209,8 @@ xfs_dax_notify_failure(
>    	if (mp->m_logdev_targp && mp->m_logdev_targp->bt_daxdev == dax_dev &&
>   	    mp->m_logdev_targp != mp->m_ddev_targp) {
> +		if (mf_flags & MF_MEM_PRE_REMOVE)
> +			return 0;
>   		xfs_err(mp, "ondisk log corrupt, shutting down fs!");
>   		xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_ONDISK);
>   		return -EFSCORRUPTED;
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 982f2607180b..2c7c132e6512 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -3176,6 +3176,7 @@ enum mf_flags {
>   	MF_UNPOISON = 1 << 4,
>   	MF_SW_SIMULATED = 1 << 5,
>   	MF_NO_RETRY = 1 << 6,
> +	MF_MEM_PRE_REMOVE = 1 << 7,
>   };
>   int mf_dax_kill_procs(struct address_space *mapping, pgoff_t index,
>   		      unsigned long count, int mf_flags);
> -- 
> 2.37.2
> 
> 



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v7] mm, pmem, xfs: Introduce MF_MEM_REMOVE for unbind
  2022-08-26 21:35 ` Dan Williams
@ 2022-08-29 10:02   ` Shiyang Ruan
  2022-08-29 14:49     ` Darrick J. Wong
  0 siblings, 1 reply; 19+ messages in thread
From: Shiyang Ruan @ 2022-08-29 10:02 UTC (permalink / raw)
  To: Dan Williams, linux-kernel, linux-xfs, nvdimm, linux-mm, linux-fsdevel
  Cc: djwong, david, hch, jane.chu



在 2022/8/27 5:35, Dan Williams 写道:
> Shiyang Ruan wrote:
>> This patch is inspired by Dan's "mm, dax, pmem: Introduce
>> dev_pagemap_failure()"[1].  With the help of dax_holder and
>> ->notify_failure() mechanism, the pmem driver is able to ask filesystem
>> (or mapped device) on it to unmap all files in use and notify processes
>> who are using those files.
>>
>> Call trace:
>> trigger unbind
>>    -> unbind_store()
>>     -> ... (skip)
>>      -> devres_release_all()
>>       -> kill_dax()
>>        -> dax_holder_notify_failure(dax_dev, 0, U64_MAX, MF_MEM_PRE_REMOVE)
>>         -> xfs_dax_notify_failure()
>>
>> Introduce MF_MEM_PRE_REMOVE to let filesystem know this is a remove
>> event.  So do not shutdown filesystem directly if something not
>> supported, or if failure range includes metadata area.  Make sure all
>> files and processes are handled correctly.
>>
>> ==
>> Changes since v6:
>>     1. Rebase on 6.0-rc2 and Darrick's patch[2].
>>
>> Changes since v5:
>>     1. Renamed MF_MEM_REMOVE to MF_MEM_PRE_REMOVE
>>     2. hold s_umount before sync_filesystem()
>>     3. do sync_filesystem() after SB_BORN check
>>     4. Rebased on next-20220714
>>
>> [1]:
>> https://lore.kernel.org/linux-mm/161604050314.1463742.14151665140035795571.stgit@dwillia2-desk3.amr.corp.intel.com/
>> [2]: https://lore.kernel.org/linux-xfs/Yv5wIa2crHioYeRr@magnolia/
>>
>> Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
>> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
>> ---
>>    drivers/dax/super.c         |  3 ++-
>>    fs/xfs/xfs_notify_failure.c | 15 +++++++++++++++
>>    include/linux/mm.h          |  1 +
>>    3 files changed, 18 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/dax/super.c b/drivers/dax/super.c
>> index 9b5e2a5eb0ae..cf9a64563fbe 100644
>> --- a/drivers/dax/super.c
>> +++ b/drivers/dax/super.c
>> @@ -323,7 +323,8 @@ void kill_dax(struct dax_device *dax_dev)
>>    		return;
>>     	if (dax_dev->holder_data != NULL)
>> -		dax_holder_notify_failure(dax_dev, 0, U64_MAX, 0);
>> +		dax_holder_notify_failure(dax_dev, 0, U64_MAX,
>> +				MF_MEM_PRE_REMOVE);
>>     	clear_bit(DAXDEV_ALIVE, &dax_dev->flags);
>>    	synchronize_srcu(&dax_srcu);
>> diff --git a/fs/xfs/xfs_notify_failure.c b/fs/xfs/xfs_notify_failure.c
>> index 65d5eb20878e..a9769f17e998 100644
>> --- a/fs/xfs/xfs_notify_failure.c
>> +++ b/fs/xfs/xfs_notify_failure.c
>> @@ -77,6 +77,9 @@ xfs_dax_failure_fn(
>>     	if (XFS_RMAP_NON_INODE_OWNER(rec->rm_owner) ||
>>    	    (rec->rm_flags & (XFS_RMAP_ATTR_FORK | XFS_RMAP_BMBT_BLOCK))) {
>> +		/* Do not shutdown so early when device is to be removed */
>> +		if (notify->mf_flags & MF_MEM_PRE_REMOVE)
>> +			return 0;
>>    		notify->want_shutdown = true;
>>    		return 0;
>>    	}
>> @@ -182,12 +185,22 @@ xfs_dax_notify_failure(
>>    	struct xfs_mount	*mp = dax_holder(dax_dev);
>>    	u64			ddev_start;
>>    	u64			ddev_end;
>> +	int			error;
>>     	if (!(mp->m_sb.sb_flags & SB_BORN)) {
> 
> How are you testing the SB_BORN interactions? I have a fix for this
> pending here:
> 
> https://lore.kernel.org/nvdimm/166153428094.2758201.7936572520826540019.stgit@dwillia2-xfh.jf.intel.com/

That was my mistake.  Yes, it should be mp->m_super->s_flags.

(I remember my testcase did pass in my dev version, but now that seems 
impossible.  I think something was wrong when I did the test.)

> 
>>    		xfs_warn(mp, "filesystem is not ready for notify_failure()!");
>>    		return -EIO;
>>    	}
>>    +	if (mf_flags & MF_MEM_PRE_REMOVE) {
> 
> It appears this patch is corrupted here. I confirmed that b4 sees the
> same when trying to apply it.

Can't this patch be applied?  It is based on 6.0-rc2 + Darrick's patch. 
It's also ok to rebase on 6.0-rc3 + Darrick's patch.

> 
>> +		xfs_info(mp, "device is about to be removed!");
>> +		down_write(&mp->m_super->s_umount);
>> +		error = sync_filesystem(mp->m_super);
> 
> This syncs to make data persistent, but for DAX this also needs to get
> invalidate all current DAX mappings. I do not see that in these changes.

I'll add it.


--
Thanks,
Ruan.

> 
>> +		up_write(&mp->m_super->s_umount);
>> +		if (error)
>> +			return error;
>> +	}
>> +
>>    	if (mp->m_rtdev_targp && mp->m_rtdev_targp->bt_daxdev == dax_dev) {
>>    		xfs_warn(mp,
>>    			 "notify_failure() not supported on realtime device!");
>> @@ -196,6 +209,8 @@ xfs_dax_notify_failure(
>>     	if (mp->m_logdev_targp && mp->m_logdev_targp->bt_daxdev == dax_dev &&
>>    	    mp->m_logdev_targp != mp->m_ddev_targp) {
>> +		if (mf_flags & MF_MEM_PRE_REMOVE)
>> +			return 0;
>>    		xfs_err(mp, "ondisk log corrupt, shutting down fs!");
>>    		xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_ONDISK);
>>    		return -EFSCORRUPTED;
>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>> index 982f2607180b..2c7c132e6512 100644
>> --- a/include/linux/mm.h
>> +++ b/include/linux/mm.h
>> @@ -3176,6 +3176,7 @@ enum mf_flags {
>>    	MF_UNPOISON = 1 << 4,
>>    	MF_SW_SIMULATED = 1 << 5,
>>    	MF_NO_RETRY = 1 << 6,
>> +	MF_MEM_PRE_REMOVE = 1 << 7,
>>    };
>>    int mf_dax_kill_procs(struct address_space *mapping, pgoff_t index,
>>    		      unsigned long count, int mf_flags);
>> -- 
>> 2.37.2
>>
>>
> 
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v7] mm, pmem, xfs: Introduce MF_MEM_REMOVE for unbind
  2022-08-29 10:02   ` Shiyang Ruan
@ 2022-08-29 14:49     ` Darrick J. Wong
  0 siblings, 0 replies; 19+ messages in thread
From: Darrick J. Wong @ 2022-08-29 14:49 UTC (permalink / raw)
  To: Shiyang Ruan
  Cc: Dan Williams, linux-kernel, linux-xfs, nvdimm, linux-mm,
	linux-fsdevel, david, hch, jane.chu

On Mon, Aug 29, 2022 at 06:02:11PM +0800, Shiyang Ruan wrote:
> 
> 
> 在 2022/8/27 5:35, Dan Williams 写道:
> > Shiyang Ruan wrote:
> > > This patch is inspired by Dan's "mm, dax, pmem: Introduce
> > > dev_pagemap_failure()"[1].  With the help of dax_holder and
> > > ->notify_failure() mechanism, the pmem driver is able to ask filesystem
> > > (or mapped device) on it to unmap all files in use and notify processes
> > > who are using those files.
> > > 
> > > Call trace:
> > > trigger unbind
> > >    -> unbind_store()
> > >     -> ... (skip)
> > >      -> devres_release_all()
> > >       -> kill_dax()
> > >        -> dax_holder_notify_failure(dax_dev, 0, U64_MAX, MF_MEM_PRE_REMOVE)
> > >         -> xfs_dax_notify_failure()
> > > 
> > > Introduce MF_MEM_PRE_REMOVE to let filesystem know this is a remove
> > > event.  So do not shutdown filesystem directly if something not
> > > supported, or if failure range includes metadata area.  Make sure all
> > > files and processes are handled correctly.
> > > 
> > > ==
> > > Changes since v6:
> > >     1. Rebase on 6.0-rc2 and Darrick's patch[2].
> > > 
> > > Changes since v5:
> > >     1. Renamed MF_MEM_REMOVE to MF_MEM_PRE_REMOVE
> > >     2. hold s_umount before sync_filesystem()
> > >     3. do sync_filesystem() after SB_BORN check
> > >     4. Rebased on next-20220714
> > > 
> > > [1]:
> > > https://lore.kernel.org/linux-mm/161604050314.1463742.14151665140035795571.stgit@dwillia2-desk3.amr.corp.intel.com/
> > > [2]: https://lore.kernel.org/linux-xfs/Yv5wIa2crHioYeRr@magnolia/
> > > 
> > > Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
> > > Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> > > ---
> > >    drivers/dax/super.c         |  3 ++-
> > >    fs/xfs/xfs_notify_failure.c | 15 +++++++++++++++
> > >    include/linux/mm.h          |  1 +
> > >    3 files changed, 18 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/dax/super.c b/drivers/dax/super.c
> > > index 9b5e2a5eb0ae..cf9a64563fbe 100644
> > > --- a/drivers/dax/super.c
> > > +++ b/drivers/dax/super.c
> > > @@ -323,7 +323,8 @@ void kill_dax(struct dax_device *dax_dev)
> > >    		return;
> > >     	if (dax_dev->holder_data != NULL)
> > > -		dax_holder_notify_failure(dax_dev, 0, U64_MAX, 0);
> > > +		dax_holder_notify_failure(dax_dev, 0, U64_MAX,
> > > +				MF_MEM_PRE_REMOVE);
> > >     	clear_bit(DAXDEV_ALIVE, &dax_dev->flags);
> > >    	synchronize_srcu(&dax_srcu);
> > > diff --git a/fs/xfs/xfs_notify_failure.c b/fs/xfs/xfs_notify_failure.c
> > > index 65d5eb20878e..a9769f17e998 100644
> > > --- a/fs/xfs/xfs_notify_failure.c
> > > +++ b/fs/xfs/xfs_notify_failure.c
> > > @@ -77,6 +77,9 @@ xfs_dax_failure_fn(
> > >     	if (XFS_RMAP_NON_INODE_OWNER(rec->rm_owner) ||
> > >    	    (rec->rm_flags & (XFS_RMAP_ATTR_FORK | XFS_RMAP_BMBT_BLOCK))) {
> > > +		/* Do not shutdown so early when device is to be removed */
> > > +		if (notify->mf_flags & MF_MEM_PRE_REMOVE)
> > > +			return 0;
> > >    		notify->want_shutdown = true;
> > >    		return 0;
> > >    	}
> > > @@ -182,12 +185,22 @@ xfs_dax_notify_failure(
> > >    	struct xfs_mount	*mp = dax_holder(dax_dev);
> > >    	u64			ddev_start;
> > >    	u64			ddev_end;
> > > +	int			error;
> > >     	if (!(mp->m_sb.sb_flags & SB_BORN)) {
> > 
> > How are you testing the SB_BORN interactions? I have a fix for this
> > pending here:
> > 
> > https://lore.kernel.org/nvdimm/166153428094.2758201.7936572520826540019.stgit@dwillia2-xfh.jf.intel.com/
> 
> That was my mistake.  Yes, it should be mp->m_super->s_flags.
> 
> (I remember my testcase did pass in my dev version, but now that seems
> impossible.  I think something was wrong when I did the test.)
> 
> > 
> > >    		xfs_warn(mp, "filesystem is not ready for notify_failure()!");
> > >    		return -EIO;
> > >    	}
> > >    +	if (mf_flags & MF_MEM_PRE_REMOVE) {
> > 
> > It appears this patch is corrupted here. I confirmed that b4 sees the
> > same when trying to apply it.
> 
> Can't this patch be applied?  It is based on 6.0-rc2 + Darrick's patch. It's
> also ok to rebase on 6.0-rc3 + Darrick's patch.
> 
> > 
> > > +		xfs_info(mp, "device is about to be removed!");
> > > +		down_write(&mp->m_super->s_umount);
> > > +		error = sync_filesystem(mp->m_super);
> > 
> > This syncs to make data persistent, but for DAX this also needs to get
> > invalidate all current DAX mappings. I do not see that in these changes.
> 
> I'll add it.

Are you guys going to pick up [1]?

--D

[1] https://lore.kernel.org/linux-xfs/Yv5wIa2crHioYeRr@magnolia/

> 
> --
> Thanks,
> Ruan.
> 
> > 
> > > +		up_write(&mp->m_super->s_umount);
> > > +		if (error)
> > > +			return error;
> > > +	}
> > > +
> > >    	if (mp->m_rtdev_targp && mp->m_rtdev_targp->bt_daxdev == dax_dev) {
> > >    		xfs_warn(mp,
> > >    			 "notify_failure() not supported on realtime device!");
> > > @@ -196,6 +209,8 @@ xfs_dax_notify_failure(
> > >     	if (mp->m_logdev_targp && mp->m_logdev_targp->bt_daxdev == dax_dev &&
> > >    	    mp->m_logdev_targp != mp->m_ddev_targp) {
> > > +		if (mf_flags & MF_MEM_PRE_REMOVE)
> > > +			return 0;
> > >    		xfs_err(mp, "ondisk log corrupt, shutting down fs!");
> > >    		xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_ONDISK);
> > >    		return -EFSCORRUPTED;
> > > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > > index 982f2607180b..2c7c132e6512 100644
> > > --- a/include/linux/mm.h
> > > +++ b/include/linux/mm.h
> > > @@ -3176,6 +3176,7 @@ enum mf_flags {
> > >    	MF_UNPOISON = 1 << 4,
> > >    	MF_SW_SIMULATED = 1 << 5,
> > >    	MF_NO_RETRY = 1 << 6,
> > > +	MF_MEM_PRE_REMOVE = 1 << 7,
> > >    };
> > >    int mf_dax_kill_procs(struct address_space *mapping, pgoff_t index,
> > >    		      unsigned long count, int mf_flags);
> > > -- 
> > > 2.37.2
> > > 
> > > 
> > 
> > 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v8 0/3] mm, pmem, xfs: Introduce MF_MEM_REMOVE for unbind
  2022-08-26 13:03 [PATCH v7] mm, pmem, xfs: Introduce MF_MEM_REMOVE for unbind Shiyang Ruan
  2022-08-26 21:35 ` Dan Williams
@ 2022-09-02 10:35 ` Shiyang Ruan
  2022-09-02 10:35   ` [PATCH 1/3] xfs: fix the calculation of length and end Shiyang Ruan
                     ` (3 more replies)
  1 sibling, 4 replies; 19+ messages in thread
From: Shiyang Ruan @ 2022-09-02 10:35 UTC (permalink / raw)
  To: linux-kernel, linux-xfs, nvdimm, linux-mm, linux-fsdevel
  Cc: djwong, dan.j.williams, david, hch, jane.chu

Changes since v7:
  1. Add P1 to fix calculation mistake
  2. Add P2 to move drop_pagecache_sb() to super.c for xfs to use
  3. P3: Add invalidate all mappings after sync.
  4. P3: Set offset&len to be start&length of device when it is to be removed.
  5. Rebase on 6.0-rc3 + Darrick's patch[1] + Dan's patch[2].

Changes since v6:
  1. Rebase on 6.0-rc2 and Darrick's patch[1].

[1]: https://lore.kernel.org/linux-xfs/Yv5wIa2crHioYeRr@magnolia/
[2]: https://lore.kernel.org/linux-xfs/166153426798.2758201.15108211981034512993.stgit@dwillia2-xfh.jf.intel.com/

Shiyang Ruan (3):
  xfs: fix the calculation of length and end
  fs: move drop_pagecache_sb() for others to use
  mm, pmem, xfs: Introduce MF_MEM_REMOVE for unbind

 drivers/dax/super.c         |  3 ++-
 fs/drop_caches.c            | 33 ---------------------------------
 fs/super.c                  | 34 ++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_notify_failure.c | 31 +++++++++++++++++++++++++++----
 include/linux/fs.h          |  1 +
 include/linux/mm.h          |  1 +
 6 files changed, 65 insertions(+), 38 deletions(-)

-- 
2.37.2


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/3] xfs: fix the calculation of length and end
  2022-09-02 10:35 ` [PATCH v8 0/3] " Shiyang Ruan
@ 2022-09-02 10:35   ` Shiyang Ruan
  2022-09-14 18:13     ` Darrick J. Wong
  2022-09-02 10:36   ` [PATCH 2/3] fs: move drop_pagecache_sb() for others to use Shiyang Ruan
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 19+ messages in thread
From: Shiyang Ruan @ 2022-09-02 10:35 UTC (permalink / raw)
  To: linux-kernel, linux-xfs, nvdimm, linux-mm, linux-fsdevel
  Cc: djwong, dan.j.williams, david, hch, jane.chu

The end should be start + length - 1.  Also fix the calculation of the
length when seeking for intersection of notify range and device.

Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
---
 fs/xfs/xfs_notify_failure.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/xfs/xfs_notify_failure.c b/fs/xfs/xfs_notify_failure.c
index c4078d0ec108..3830f908e215 100644
--- a/fs/xfs/xfs_notify_failure.c
+++ b/fs/xfs/xfs_notify_failure.c
@@ -114,7 +114,7 @@ xfs_dax_notify_ddev_failure(
 	int			error = 0;
 	xfs_fsblock_t		fsbno = XFS_DADDR_TO_FSB(mp, daddr);
 	xfs_agnumber_t		agno = XFS_FSB_TO_AGNO(mp, fsbno);
-	xfs_fsblock_t		end_fsbno = XFS_DADDR_TO_FSB(mp, daddr + bblen);
+	xfs_fsblock_t		end_fsbno = XFS_DADDR_TO_FSB(mp, daddr + bblen - 1);
 	xfs_agnumber_t		end_agno = XFS_FSB_TO_AGNO(mp, end_fsbno);
 
 	error = xfs_trans_alloc_empty(mp, &tp);
@@ -210,7 +210,7 @@ xfs_dax_notify_failure(
 	ddev_end = ddev_start + bdev_nr_bytes(mp->m_ddev_targp->bt_bdev) - 1;
 
 	/* Ignore the range out of filesystem area */
-	if (offset + len < ddev_start)
+	if (offset + len - 1 < ddev_start)
 		return -ENXIO;
 	if (offset > ddev_end)
 		return -ENXIO;
@@ -222,8 +222,8 @@ xfs_dax_notify_failure(
 		len -= ddev_start - offset;
 		offset = 0;
 	}
-	if (offset + len > ddev_end)
-		len -= ddev_end - offset;
+	if (offset + len - 1 > ddev_end)
+		len -= offset + len - 1 - ddev_end;
 
 	return xfs_dax_notify_ddev_failure(mp, BTOBB(offset), BTOBB(len),
 			mf_flags);
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/3] fs: move drop_pagecache_sb() for others to use
  2022-09-02 10:35 ` [PATCH v8 0/3] " Shiyang Ruan
  2022-09-02 10:35   ` [PATCH 1/3] xfs: fix the calculation of length and end Shiyang Ruan
@ 2022-09-02 10:36   ` Shiyang Ruan
  2022-09-14 18:16     ` Darrick J. Wong
  2022-09-02 10:36   ` [PATCH 3/3] mm, pmem, xfs: Introduce MF_MEM_REMOVE for unbind Shiyang Ruan
  2022-09-07  9:46   ` [PATCH v8 0/3] " Shiyang Ruan
  3 siblings, 1 reply; 19+ messages in thread
From: Shiyang Ruan @ 2022-09-02 10:36 UTC (permalink / raw)
  To: linux-kernel, linux-xfs, nvdimm, linux-mm, linux-fsdevel
  Cc: djwong, dan.j.williams, david, hch, jane.chu

xfs_notify_failure requires a method to invalidate all mappings.
drop_pagecache_sb() can do this but it is a static function and only
build with CONFIG_SYSCTL.  Now, move it to super.c and make it available
for others.

Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
---
 fs/drop_caches.c   | 33 ---------------------------------
 fs/super.c         | 34 ++++++++++++++++++++++++++++++++++
 include/linux/fs.h |  1 +
 3 files changed, 35 insertions(+), 33 deletions(-)

diff --git a/fs/drop_caches.c b/fs/drop_caches.c
index e619c31b6bd9..5c8406076f9b 100644
--- a/fs/drop_caches.c
+++ b/fs/drop_caches.c
@@ -3,7 +3,6 @@
  * Implement the manual drop-all-pagecache function
  */
 
-#include <linux/pagemap.h>
 #include <linux/kernel.h>
 #include <linux/mm.h>
 #include <linux/fs.h>
@@ -15,38 +14,6 @@
 /* A global variable is a bit ugly, but it keeps the code simple */
 int sysctl_drop_caches;
 
-static void drop_pagecache_sb(struct super_block *sb, void *unused)
-{
-	struct inode *inode, *toput_inode = NULL;
-
-	spin_lock(&sb->s_inode_list_lock);
-	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
-		spin_lock(&inode->i_lock);
-		/*
-		 * We must skip inodes in unusual state. We may also skip
-		 * inodes without pages but we deliberately won't in case
-		 * we need to reschedule to avoid softlockups.
-		 */
-		if ((inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW)) ||
-		    (mapping_empty(inode->i_mapping) && !need_resched())) {
-			spin_unlock(&inode->i_lock);
-			continue;
-		}
-		__iget(inode);
-		spin_unlock(&inode->i_lock);
-		spin_unlock(&sb->s_inode_list_lock);
-
-		invalidate_mapping_pages(inode->i_mapping, 0, -1);
-		iput(toput_inode);
-		toput_inode = inode;
-
-		cond_resched();
-		spin_lock(&sb->s_inode_list_lock);
-	}
-	spin_unlock(&sb->s_inode_list_lock);
-	iput(toput_inode);
-}
-
 int drop_caches_sysctl_handler(struct ctl_table *table, int write,
 		void *buffer, size_t *length, loff_t *ppos)
 {
diff --git a/fs/super.c b/fs/super.c
index 734ed584a946..bdf53dbe834c 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -36,6 +36,7 @@
 #include <linux/lockdep.h>
 #include <linux/user_namespace.h>
 #include <linux/fs_context.h>
+#include <linux/pagemap.h>
 #include <uapi/linux/mount.h>
 #include "internal.h"
 
@@ -677,6 +678,39 @@ void drop_super_exclusive(struct super_block *sb)
 }
 EXPORT_SYMBOL(drop_super_exclusive);
 
+void drop_pagecache_sb(struct super_block *sb, void *unused)
+{
+	struct inode *inode, *toput_inode = NULL;
+
+	spin_lock(&sb->s_inode_list_lock);
+	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
+		spin_lock(&inode->i_lock);
+		/*
+		 * We must skip inodes in unusual state. We may also skip
+		 * inodes without pages but we deliberately won't in case
+		 * we need to reschedule to avoid softlockups.
+		 */
+		if ((inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW)) ||
+		    (mapping_empty(inode->i_mapping) && !need_resched())) {
+			spin_unlock(&inode->i_lock);
+			continue;
+		}
+		__iget(inode);
+		spin_unlock(&inode->i_lock);
+		spin_unlock(&sb->s_inode_list_lock);
+
+		invalidate_mapping_pages(inode->i_mapping, 0, -1);
+		iput(toput_inode);
+		toput_inode = inode;
+
+		cond_resched();
+		spin_lock(&sb->s_inode_list_lock);
+	}
+	spin_unlock(&sb->s_inode_list_lock);
+	iput(toput_inode);
+}
+EXPORT_SYMBOL(drop_pagecache_sb);
+
 static void __iterate_supers(void (*f)(struct super_block *))
 {
 	struct super_block *sb, *p = NULL;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 9eced4cc286e..5ded28c0d2c9 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3292,6 +3292,7 @@ extern struct super_block *get_super(struct block_device *);
 extern struct super_block *get_active_super(struct block_device *bdev);
 extern void drop_super(struct super_block *sb);
 extern void drop_super_exclusive(struct super_block *sb);
+void drop_pagecache_sb(struct super_block *sb, void *unused);
 extern void iterate_supers(void (*)(struct super_block *, void *), void *);
 extern void iterate_supers_type(struct file_system_type *,
 			        void (*)(struct super_block *, void *), void *);
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/3] mm, pmem, xfs: Introduce MF_MEM_REMOVE for unbind
  2022-09-02 10:35 ` [PATCH v8 0/3] " Shiyang Ruan
  2022-09-02 10:35   ` [PATCH 1/3] xfs: fix the calculation of length and end Shiyang Ruan
  2022-09-02 10:36   ` [PATCH 2/3] fs: move drop_pagecache_sb() for others to use Shiyang Ruan
@ 2022-09-02 10:36   ` Shiyang Ruan
  2022-09-14 18:15     ` Darrick J. Wong
  2022-09-20  2:45     ` Dave Chinner
  2022-09-07  9:46   ` [PATCH v8 0/3] " Shiyang Ruan
  3 siblings, 2 replies; 19+ messages in thread
From: Shiyang Ruan @ 2022-09-02 10:36 UTC (permalink / raw)
  To: linux-kernel, linux-xfs, nvdimm, linux-mm, linux-fsdevel
  Cc: djwong, dan.j.williams, david, hch, jane.chu

This patch is inspired by Dan's "mm, dax, pmem: Introduce
dev_pagemap_failure()"[1].  With the help of dax_holder and
->notify_failure() mechanism, the pmem driver is able to ask filesystem
(or mapped device) on it to unmap all files in use and notify processes
who are using those files.

Call trace:
trigger unbind
 -> unbind_store()
  -> ... (skip)
   -> devres_release_all()   # was pmem driver ->remove() in v1
    -> kill_dax()
     -> dax_holder_notify_failure(dax_dev, 0, U64_MAX, MF_MEM_PRE_REMOVE)
      -> xfs_dax_notify_failure()

Introduce MF_MEM_PRE_REMOVE to let filesystem know this is a remove
event.  So do not shutdown filesystem directly if something not
supported, or if failure range includes metadata area.  Make sure all
files and processes are handled correctly.

[1]: https://lore.kernel.org/linux-mm/161604050314.1463742.14151665140035795571.stgit@dwillia2-desk3.amr.corp.intel.com/

Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
---
 drivers/dax/super.c         |  3 ++-
 fs/xfs/xfs_notify_failure.c | 23 +++++++++++++++++++++++
 include/linux/mm.h          |  1 +
 3 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index 9b5e2a5eb0ae..cf9a64563fbe 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -323,7 +323,8 @@ void kill_dax(struct dax_device *dax_dev)
 		return;
 
 	if (dax_dev->holder_data != NULL)
-		dax_holder_notify_failure(dax_dev, 0, U64_MAX, 0);
+		dax_holder_notify_failure(dax_dev, 0, U64_MAX,
+				MF_MEM_PRE_REMOVE);
 
 	clear_bit(DAXDEV_ALIVE, &dax_dev->flags);
 	synchronize_srcu(&dax_srcu);
diff --git a/fs/xfs/xfs_notify_failure.c b/fs/xfs/xfs_notify_failure.c
index 3830f908e215..5e04ba7fa403 100644
--- a/fs/xfs/xfs_notify_failure.c
+++ b/fs/xfs/xfs_notify_failure.c
@@ -22,6 +22,7 @@
 
 #include <linux/mm.h>
 #include <linux/dax.h>
+#include <linux/fs.h>
 
 struct xfs_failure_info {
 	xfs_agblock_t		startblock;
@@ -77,6 +78,9 @@ xfs_dax_failure_fn(
 
 	if (XFS_RMAP_NON_INODE_OWNER(rec->rm_owner) ||
 	    (rec->rm_flags & (XFS_RMAP_ATTR_FORK | XFS_RMAP_BMBT_BLOCK))) {
+		/* The device is about to be removed.  Not a really failure. */
+		if (notify->mf_flags & MF_MEM_PRE_REMOVE)
+			return 0;
 		notify->want_shutdown = true;
 		return 0;
 	}
@@ -182,12 +186,23 @@ xfs_dax_notify_failure(
 	struct xfs_mount	*mp = dax_holder(dax_dev);
 	u64			ddev_start;
 	u64			ddev_end;
+	int			error;
 
 	if (!(mp->m_super->s_flags & SB_BORN)) {
 		xfs_warn(mp, "filesystem is not ready for notify_failure()!");
 		return -EIO;
 	}
 
+	if (mf_flags & MF_MEM_PRE_REMOVE) {
+		xfs_info(mp, "device is about to be removed!");
+		down_write(&mp->m_super->s_umount);
+		error = sync_filesystem(mp->m_super);
+		drop_pagecache_sb(mp->m_super, NULL);
+		up_write(&mp->m_super->s_umount);
+		if (error)
+			return error;
+	}
+
 	if (mp->m_rtdev_targp && mp->m_rtdev_targp->bt_daxdev == dax_dev) {
 		xfs_debug(mp,
 			 "notify_failure() not supported on realtime device!");
@@ -196,6 +211,8 @@ xfs_dax_notify_failure(
 
 	if (mp->m_logdev_targp && mp->m_logdev_targp->bt_daxdev == dax_dev &&
 	    mp->m_logdev_targp != mp->m_ddev_targp) {
+		if (mf_flags & MF_MEM_PRE_REMOVE)
+			return 0;
 		xfs_err(mp, "ondisk log corrupt, shutting down fs!");
 		xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_ONDISK);
 		return -EFSCORRUPTED;
@@ -209,6 +226,12 @@ xfs_dax_notify_failure(
 	ddev_start = mp->m_ddev_targp->bt_dax_part_off;
 	ddev_end = ddev_start + bdev_nr_bytes(mp->m_ddev_targp->bt_bdev) - 1;
 
+	/* Notify failure on the whole device */
+	if (offset == 0 && len == U64_MAX) {
+		offset = ddev_start;
+		len = bdev_nr_bytes(mp->m_ddev_targp->bt_bdev);
+	}
+
 	/* Ignore the range out of filesystem area */
 	if (offset + len - 1 < ddev_start)
 		return -ENXIO;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 21f8b27bd9fd..9122a1c57dd2 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3183,6 +3183,7 @@ enum mf_flags {
 	MF_UNPOISON = 1 << 4,
 	MF_SW_SIMULATED = 1 << 5,
 	MF_NO_RETRY = 1 << 6,
+	MF_MEM_PRE_REMOVE = 1 << 7,
 };
 int mf_dax_kill_procs(struct address_space *mapping, pgoff_t index,
 		      unsigned long count, int mf_flags);
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v8 0/3] mm, pmem, xfs: Introduce MF_MEM_REMOVE for unbind
  2022-09-02 10:35 ` [PATCH v8 0/3] " Shiyang Ruan
                     ` (2 preceding siblings ...)
  2022-09-02 10:36   ` [PATCH 3/3] mm, pmem, xfs: Introduce MF_MEM_REMOVE for unbind Shiyang Ruan
@ 2022-09-07  9:46   ` Shiyang Ruan
  2022-09-14 18:09     ` Darrick J. Wong
  3 siblings, 1 reply; 19+ messages in thread
From: Shiyang Ruan @ 2022-09-07  9:46 UTC (permalink / raw)
  To: dan.j.williams, djwong
  Cc: linux-fsdevel, linux-mm, nvdimm, linux-xfs, linux-kernel, hch,
	david, jane.chu

ping

在 2022/9/2 18:35, Shiyang Ruan 写道:
> Changes since v7:
>    1. Add P1 to fix calculation mistake
>    2. Add P2 to move drop_pagecache_sb() to super.c for xfs to use
>    3. P3: Add invalidate all mappings after sync.
>    4. P3: Set offset&len to be start&length of device when it is to be removed.
>    5. Rebase on 6.0-rc3 + Darrick's patch[1] + Dan's patch[2].
> 
> Changes since v6:
>    1. Rebase on 6.0-rc2 and Darrick's patch[1].
> 
> [1]: https://lore.kernel.org/linux-xfs/Yv5wIa2crHioYeRr@magnolia/
> [2]: https://lore.kernel.org/linux-xfs/166153426798.2758201.15108211981034512993.stgit@dwillia2-xfh.jf.intel.com/
> 
> Shiyang Ruan (3):
>    xfs: fix the calculation of length and end
>    fs: move drop_pagecache_sb() for others to use
>    mm, pmem, xfs: Introduce MF_MEM_REMOVE for unbind
> 
>   drivers/dax/super.c         |  3 ++-
>   fs/drop_caches.c            | 33 ---------------------------------
>   fs/super.c                  | 34 ++++++++++++++++++++++++++++++++++
>   fs/xfs/xfs_notify_failure.c | 31 +++++++++++++++++++++++++++----
>   include/linux/fs.h          |  1 +
>   include/linux/mm.h          |  1 +
>   6 files changed, 65 insertions(+), 38 deletions(-)
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v8 0/3] mm, pmem, xfs: Introduce MF_MEM_REMOVE for unbind
  2022-09-07  9:46   ` [PATCH v8 0/3] " Shiyang Ruan
@ 2022-09-14 18:09     ` Darrick J. Wong
  2022-09-14 18:15       ` Darrick J. Wong
  0 siblings, 1 reply; 19+ messages in thread
From: Darrick J. Wong @ 2022-09-14 18:09 UTC (permalink / raw)
  To: Shiyang Ruan
  Cc: dan.j.williams, linux-fsdevel, linux-mm, nvdimm, linux-xfs,
	linux-kernel, hch, david, jane.chu

On Wed, Sep 07, 2022 at 05:46:00PM +0800, Shiyang Ruan wrote:
> ping
> 
> 在 2022/9/2 18:35, Shiyang Ruan 写道:
> > Changes since v7:
> >    1. Add P1 to fix calculation mistake
> >    2. Add P2 to move drop_pagecache_sb() to super.c for xfs to use
> >    3. P3: Add invalidate all mappings after sync.
> >    4. P3: Set offset&len to be start&length of device when it is to be removed.
> >    5. Rebase on 6.0-rc3 + Darrick's patch[1] + Dan's patch[2].
> > 
> > Changes since v6:
> >    1. Rebase on 6.0-rc2 and Darrick's patch[1].
> > 
> > [1]: https://lore.kernel.org/linux-xfs/Yv5wIa2crHioYeRr@magnolia/
> > [2]: https://lore.kernel.org/linux-xfs/166153426798.2758201.15108211981034512993.stgit@dwillia2-xfh.jf.intel.com/

Just out of curiosity, is it your (or djbw's) intent to send all these
as bugfixes for 6.0 via akpm like all the other dax fixen?

--D

> > 
> > Shiyang Ruan (3):
> >    xfs: fix the calculation of length and end
> >    fs: move drop_pagecache_sb() for others to use
> >    mm, pmem, xfs: Introduce MF_MEM_REMOVE for unbind
> > 
> >   drivers/dax/super.c         |  3 ++-
> >   fs/drop_caches.c            | 33 ---------------------------------
> >   fs/super.c                  | 34 ++++++++++++++++++++++++++++++++++
> >   fs/xfs/xfs_notify_failure.c | 31 +++++++++++++++++++++++++++----
> >   include/linux/fs.h          |  1 +
> >   include/linux/mm.h          |  1 +
> >   6 files changed, 65 insertions(+), 38 deletions(-)
> > 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3] xfs: fix the calculation of length and end
  2022-09-02 10:35   ` [PATCH 1/3] xfs: fix the calculation of length and end Shiyang Ruan
@ 2022-09-14 18:13     ` Darrick J. Wong
  0 siblings, 0 replies; 19+ messages in thread
From: Darrick J. Wong @ 2022-09-14 18:13 UTC (permalink / raw)
  To: Shiyang Ruan
  Cc: linux-kernel, linux-xfs, nvdimm, linux-mm, linux-fsdevel,
	dan.j.williams, david, hch, jane.chu

On Fri, Sep 02, 2022 at 10:35:59AM +0000, Shiyang Ruan wrote:
> The end should be start + length - 1.  Also fix the calculation of the
> length when seeking for intersection of notify range and device.
> 
> Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>

Looks correct to me,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/xfs_notify_failure.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/xfs/xfs_notify_failure.c b/fs/xfs/xfs_notify_failure.c
> index c4078d0ec108..3830f908e215 100644
> --- a/fs/xfs/xfs_notify_failure.c
> +++ b/fs/xfs/xfs_notify_failure.c
> @@ -114,7 +114,7 @@ xfs_dax_notify_ddev_failure(
>  	int			error = 0;
>  	xfs_fsblock_t		fsbno = XFS_DADDR_TO_FSB(mp, daddr);
>  	xfs_agnumber_t		agno = XFS_FSB_TO_AGNO(mp, fsbno);
> -	xfs_fsblock_t		end_fsbno = XFS_DADDR_TO_FSB(mp, daddr + bblen);
> +	xfs_fsblock_t		end_fsbno = XFS_DADDR_TO_FSB(mp, daddr + bblen - 1);
>  	xfs_agnumber_t		end_agno = XFS_FSB_TO_AGNO(mp, end_fsbno);
>  
>  	error = xfs_trans_alloc_empty(mp, &tp);
> @@ -210,7 +210,7 @@ xfs_dax_notify_failure(
>  	ddev_end = ddev_start + bdev_nr_bytes(mp->m_ddev_targp->bt_bdev) - 1;
>  
>  	/* Ignore the range out of filesystem area */
> -	if (offset + len < ddev_start)
> +	if (offset + len - 1 < ddev_start)
>  		return -ENXIO;
>  	if (offset > ddev_end)
>  		return -ENXIO;
> @@ -222,8 +222,8 @@ xfs_dax_notify_failure(
>  		len -= ddev_start - offset;
>  		offset = 0;
>  	}
> -	if (offset + len > ddev_end)
> -		len -= ddev_end - offset;
> +	if (offset + len - 1 > ddev_end)
> +		len -= offset + len - 1 - ddev_end;
>  
>  	return xfs_dax_notify_ddev_failure(mp, BTOBB(offset), BTOBB(len),
>  			mf_flags);
> -- 
> 2.37.2
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3] mm, pmem, xfs: Introduce MF_MEM_REMOVE for unbind
  2022-09-02 10:36   ` [PATCH 3/3] mm, pmem, xfs: Introduce MF_MEM_REMOVE for unbind Shiyang Ruan
@ 2022-09-14 18:15     ` Darrick J. Wong
  2022-09-15  1:49       ` Shiyang Ruan
  2022-09-20  2:45     ` Dave Chinner
  1 sibling, 1 reply; 19+ messages in thread
From: Darrick J. Wong @ 2022-09-14 18:15 UTC (permalink / raw)
  To: Shiyang Ruan
  Cc: linux-kernel, linux-xfs, nvdimm, linux-mm, linux-fsdevel,
	dan.j.williams, david, hch, jane.chu

On Fri, Sep 02, 2022 at 10:36:01AM +0000, Shiyang Ruan wrote:
> This patch is inspired by Dan's "mm, dax, pmem: Introduce
> dev_pagemap_failure()"[1].  With the help of dax_holder and
> ->notify_failure() mechanism, the pmem driver is able to ask filesystem
> (or mapped device) on it to unmap all files in use and notify processes
> who are using those files.
> 
> Call trace:
> trigger unbind
>  -> unbind_store()
>   -> ... (skip)
>    -> devres_release_all()   # was pmem driver ->remove() in v1
>     -> kill_dax()
>      -> dax_holder_notify_failure(dax_dev, 0, U64_MAX, MF_MEM_PRE_REMOVE)
>       -> xfs_dax_notify_failure()
> 
> Introduce MF_MEM_PRE_REMOVE to let filesystem know this is a remove
> event.  So do not shutdown filesystem directly if something not
> supported, or if failure range includes metadata area.  Make sure all
> files and processes are handled correctly.
> 
> [1]: https://lore.kernel.org/linux-mm/161604050314.1463742.14151665140035795571.stgit@dwillia2-desk3.amr.corp.intel.com/
> 
> Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
> ---
>  drivers/dax/super.c         |  3 ++-
>  fs/xfs/xfs_notify_failure.c | 23 +++++++++++++++++++++++
>  include/linux/mm.h          |  1 +
>  3 files changed, 26 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/dax/super.c b/drivers/dax/super.c
> index 9b5e2a5eb0ae..cf9a64563fbe 100644
> --- a/drivers/dax/super.c
> +++ b/drivers/dax/super.c
> @@ -323,7 +323,8 @@ void kill_dax(struct dax_device *dax_dev)
>  		return;
>  
>  	if (dax_dev->holder_data != NULL)
> -		dax_holder_notify_failure(dax_dev, 0, U64_MAX, 0);
> +		dax_holder_notify_failure(dax_dev, 0, U64_MAX,
> +				MF_MEM_PRE_REMOVE);
>  
>  	clear_bit(DAXDEV_ALIVE, &dax_dev->flags);
>  	synchronize_srcu(&dax_srcu);
> diff --git a/fs/xfs/xfs_notify_failure.c b/fs/xfs/xfs_notify_failure.c
> index 3830f908e215..5e04ba7fa403 100644
> --- a/fs/xfs/xfs_notify_failure.c
> +++ b/fs/xfs/xfs_notify_failure.c
> @@ -22,6 +22,7 @@
>  
>  #include <linux/mm.h>
>  #include <linux/dax.h>
> +#include <linux/fs.h>
>  
>  struct xfs_failure_info {
>  	xfs_agblock_t		startblock;
> @@ -77,6 +78,9 @@ xfs_dax_failure_fn(
>  
>  	if (XFS_RMAP_NON_INODE_OWNER(rec->rm_owner) ||
>  	    (rec->rm_flags & (XFS_RMAP_ATTR_FORK | XFS_RMAP_BMBT_BLOCK))) {
> +		/* The device is about to be removed.  Not a really failure. */
> +		if (notify->mf_flags & MF_MEM_PRE_REMOVE)
> +			return 0;
>  		notify->want_shutdown = true;
>  		return 0;
>  	}
> @@ -182,12 +186,23 @@ xfs_dax_notify_failure(
>  	struct xfs_mount	*mp = dax_holder(dax_dev);
>  	u64			ddev_start;
>  	u64			ddev_end;
> +	int			error;
>  
>  	if (!(mp->m_super->s_flags & SB_BORN)) {
>  		xfs_warn(mp, "filesystem is not ready for notify_failure()!");
>  		return -EIO;
>  	}
>  
> +	if (mf_flags & MF_MEM_PRE_REMOVE) {
> +		xfs_info(mp, "device is about to be removed!");
> +		down_write(&mp->m_super->s_umount);
> +		error = sync_filesystem(mp->m_super);
> +		drop_pagecache_sb(mp->m_super, NULL);
> +		up_write(&mp->m_super->s_umount);
> +		if (error)
> +			return error;
> +	}
> +
>  	if (mp->m_rtdev_targp && mp->m_rtdev_targp->bt_daxdev == dax_dev) {
>  		xfs_debug(mp,
>  			 "notify_failure() not supported on realtime device!");
> @@ -196,6 +211,8 @@ xfs_dax_notify_failure(
>  
>  	if (mp->m_logdev_targp && mp->m_logdev_targp->bt_daxdev == dax_dev &&
>  	    mp->m_logdev_targp != mp->m_ddev_targp) {
> +		if (mf_flags & MF_MEM_PRE_REMOVE)
> +			return 0;
>  		xfs_err(mp, "ondisk log corrupt, shutting down fs!");
>  		xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_ONDISK);
>  		return -EFSCORRUPTED;
> @@ -209,6 +226,12 @@ xfs_dax_notify_failure(
>  	ddev_start = mp->m_ddev_targp->bt_dax_part_off;
>  	ddev_end = ddev_start + bdev_nr_bytes(mp->m_ddev_targp->bt_bdev) - 1;
>  
> +	/* Notify failure on the whole device */
> +	if (offset == 0 && len == U64_MAX) {
> +		offset = ddev_start;
> +		len = bdev_nr_bytes(mp->m_ddev_targp->bt_bdev);
> +	}

I wonder, won't the trimming code below take care of this?

The rest of the patch looks ok to me.

--D

> +
>  	/* Ignore the range out of filesystem area */
>  	if (offset + len - 1 < ddev_start)
>  		return -ENXIO;
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 21f8b27bd9fd..9122a1c57dd2 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -3183,6 +3183,7 @@ enum mf_flags {
>  	MF_UNPOISON = 1 << 4,
>  	MF_SW_SIMULATED = 1 << 5,
>  	MF_NO_RETRY = 1 << 6,
> +	MF_MEM_PRE_REMOVE = 1 << 7,
>  };
>  int mf_dax_kill_procs(struct address_space *mapping, pgoff_t index,
>  		      unsigned long count, int mf_flags);
> -- 
> 2.37.2
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v8 0/3] mm, pmem, xfs: Introduce MF_MEM_REMOVE for unbind
  2022-09-14 18:09     ` Darrick J. Wong
@ 2022-09-14 18:15       ` Darrick J. Wong
  2022-09-15  2:56         ` Shiyang Ruan
  0 siblings, 1 reply; 19+ messages in thread
From: Darrick J. Wong @ 2022-09-14 18:15 UTC (permalink / raw)
  To: Shiyang Ruan
  Cc: dan.j.williams, linux-fsdevel, linux-mm, nvdimm, linux-xfs,
	linux-kernel, hch, david, jane.chu

On Wed, Sep 14, 2022 at 11:09:23AM -0700, Darrick J. Wong wrote:
> On Wed, Sep 07, 2022 at 05:46:00PM +0800, Shiyang Ruan wrote:
> > ping
> > 
> > 在 2022/9/2 18:35, Shiyang Ruan 写道:
> > > Changes since v7:
> > >    1. Add P1 to fix calculation mistake
> > >    2. Add P2 to move drop_pagecache_sb() to super.c for xfs to use
> > >    3. P3: Add invalidate all mappings after sync.
> > >    4. P3: Set offset&len to be start&length of device when it is to be removed.
> > >    5. Rebase on 6.0-rc3 + Darrick's patch[1] + Dan's patch[2].
> > > 
> > > Changes since v6:
> > >    1. Rebase on 6.0-rc2 and Darrick's patch[1].
> > > 
> > > [1]: https://lore.kernel.org/linux-xfs/Yv5wIa2crHioYeRr@magnolia/
> > > [2]: https://lore.kernel.org/linux-xfs/166153426798.2758201.15108211981034512993.stgit@dwillia2-xfh.jf.intel.com/
> 
> Just out of curiosity, is it your (or djbw's) intent to send all these
> as bugfixes for 6.0 via akpm like all the other dax fixen?

Aha, this is 6.1 stuff, please ignore this question.

--D

> --D
> 
> > > 
> > > Shiyang Ruan (3):
> > >    xfs: fix the calculation of length and end
> > >    fs: move drop_pagecache_sb() for others to use
> > >    mm, pmem, xfs: Introduce MF_MEM_REMOVE for unbind
> > > 
> > >   drivers/dax/super.c         |  3 ++-
> > >   fs/drop_caches.c            | 33 ---------------------------------
> > >   fs/super.c                  | 34 ++++++++++++++++++++++++++++++++++
> > >   fs/xfs/xfs_notify_failure.c | 31 +++++++++++++++++++++++++++----
> > >   include/linux/fs.h          |  1 +
> > >   include/linux/mm.h          |  1 +
> > >   6 files changed, 65 insertions(+), 38 deletions(-)
> > > 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/3] fs: move drop_pagecache_sb() for others to use
  2022-09-02 10:36   ` [PATCH 2/3] fs: move drop_pagecache_sb() for others to use Shiyang Ruan
@ 2022-09-14 18:16     ` Darrick J. Wong
  0 siblings, 0 replies; 19+ messages in thread
From: Darrick J. Wong @ 2022-09-14 18:16 UTC (permalink / raw)
  To: Shiyang Ruan
  Cc: linux-kernel, linux-xfs, nvdimm, linux-mm, linux-fsdevel,
	dan.j.williams, david, hch, jane.chu

On Fri, Sep 02, 2022 at 10:36:00AM +0000, Shiyang Ruan wrote:
> xfs_notify_failure requires a method to invalidate all mappings.
> drop_pagecache_sb() can do this but it is a static function and only
> build with CONFIG_SYSCTL.  Now, move it to super.c and make it available
> for others.
> 
> Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
> ---
>  fs/drop_caches.c   | 33 ---------------------------------
>  fs/super.c         | 34 ++++++++++++++++++++++++++++++++++
>  include/linux/fs.h |  1 +
>  3 files changed, 35 insertions(+), 33 deletions(-)
> 
> diff --git a/fs/drop_caches.c b/fs/drop_caches.c
> index e619c31b6bd9..5c8406076f9b 100644
> --- a/fs/drop_caches.c
> +++ b/fs/drop_caches.c
> @@ -3,7 +3,6 @@
>   * Implement the manual drop-all-pagecache function
>   */
>  
> -#include <linux/pagemap.h>
>  #include <linux/kernel.h>
>  #include <linux/mm.h>
>  #include <linux/fs.h>
> @@ -15,38 +14,6 @@
>  /* A global variable is a bit ugly, but it keeps the code simple */
>  int sysctl_drop_caches;
>  
> -static void drop_pagecache_sb(struct super_block *sb, void *unused)
> -{
> -	struct inode *inode, *toput_inode = NULL;
> -
> -	spin_lock(&sb->s_inode_list_lock);
> -	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
> -		spin_lock(&inode->i_lock);
> -		/*
> -		 * We must skip inodes in unusual state. We may also skip
> -		 * inodes without pages but we deliberately won't in case
> -		 * we need to reschedule to avoid softlockups.
> -		 */
> -		if ((inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW)) ||
> -		    (mapping_empty(inode->i_mapping) && !need_resched())) {
> -			spin_unlock(&inode->i_lock);
> -			continue;
> -		}
> -		__iget(inode);
> -		spin_unlock(&inode->i_lock);
> -		spin_unlock(&sb->s_inode_list_lock);
> -
> -		invalidate_mapping_pages(inode->i_mapping, 0, -1);
> -		iput(toput_inode);
> -		toput_inode = inode;
> -
> -		cond_resched();
> -		spin_lock(&sb->s_inode_list_lock);
> -	}
> -	spin_unlock(&sb->s_inode_list_lock);
> -	iput(toput_inode);
> -}
> -
>  int drop_caches_sysctl_handler(struct ctl_table *table, int write,
>  		void *buffer, size_t *length, loff_t *ppos)
>  {
> diff --git a/fs/super.c b/fs/super.c
> index 734ed584a946..bdf53dbe834c 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -36,6 +36,7 @@
>  #include <linux/lockdep.h>
>  #include <linux/user_namespace.h>
>  #include <linux/fs_context.h>
> +#include <linux/pagemap.h>
>  #include <uapi/linux/mount.h>
>  #include "internal.h"
>  
> @@ -677,6 +678,39 @@ void drop_super_exclusive(struct super_block *sb)
>  }
>  EXPORT_SYMBOL(drop_super_exclusive);
>  
> +void drop_pagecache_sb(struct super_block *sb, void *unused)
> +{
> +	struct inode *inode, *toput_inode = NULL;
> +
> +	spin_lock(&sb->s_inode_list_lock);
> +	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
> +		spin_lock(&inode->i_lock);
> +		/*
> +		 * We must skip inodes in unusual state. We may also skip
> +		 * inodes without pages but we deliberately won't in case
> +		 * we need to reschedule to avoid softlockups.
> +		 */
> +		if ((inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW)) ||
> +		    (mapping_empty(inode->i_mapping) && !need_resched())) {
> +			spin_unlock(&inode->i_lock);
> +			continue;
> +		}
> +		__iget(inode);
> +		spin_unlock(&inode->i_lock);
> +		spin_unlock(&sb->s_inode_list_lock);
> +
> +		invalidate_mapping_pages(inode->i_mapping, 0, -1);
> +		iput(toput_inode);
> +		toput_inode = inode;
> +
> +		cond_resched();
> +		spin_lock(&sb->s_inode_list_lock);
> +	}
> +	spin_unlock(&sb->s_inode_list_lock);
> +	iput(toput_inode);
> +}
> +EXPORT_SYMBOL(drop_pagecache_sb);

You might want to rename this "super_drop_pagecache" to fit with the
other functions that all have "super" in the name somewhere.

--D

> +
>  static void __iterate_supers(void (*f)(struct super_block *))
>  {
>  	struct super_block *sb, *p = NULL;
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 9eced4cc286e..5ded28c0d2c9 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -3292,6 +3292,7 @@ extern struct super_block *get_super(struct block_device *);
>  extern struct super_block *get_active_super(struct block_device *bdev);
>  extern void drop_super(struct super_block *sb);
>  extern void drop_super_exclusive(struct super_block *sb);
> +void drop_pagecache_sb(struct super_block *sb, void *unused);
>  extern void iterate_supers(void (*)(struct super_block *, void *), void *);
>  extern void iterate_supers_type(struct file_system_type *,
>  			        void (*)(struct super_block *, void *), void *);
> -- 
> 2.37.2
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3] mm, pmem, xfs: Introduce MF_MEM_REMOVE for unbind
  2022-09-14 18:15     ` Darrick J. Wong
@ 2022-09-15  1:49       ` Shiyang Ruan
  0 siblings, 0 replies; 19+ messages in thread
From: Shiyang Ruan @ 2022-09-15  1:49 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: linux-kernel, linux-xfs, nvdimm, linux-mm, linux-fsdevel,
	dan.j.williams, david, hch, jane.chu



在 2022/9/15 2:15, Darrick J. Wong 写道:
> On Fri, Sep 02, 2022 at 10:36:01AM +0000, Shiyang Ruan wrote:
>> This patch is inspired by Dan's "mm, dax, pmem: Introduce
>> dev_pagemap_failure()"[1].  With the help of dax_holder and
>> ->notify_failure() mechanism, the pmem driver is able to ask filesystem
>> (or mapped device) on it to unmap all files in use and notify processes
>> who are using those files.
>>
>> Call trace:
>> trigger unbind
>>   -> unbind_store()
>>    -> ... (skip)
>>     -> devres_release_all()   # was pmem driver ->remove() in v1
>>      -> kill_dax()
>>       -> dax_holder_notify_failure(dax_dev, 0, U64_MAX, MF_MEM_PRE_REMOVE)
>>        -> xfs_dax_notify_failure()
>>
>> Introduce MF_MEM_PRE_REMOVE to let filesystem know this is a remove
>> event.  So do not shutdown filesystem directly if something not
>> supported, or if failure range includes metadata area.  Make sure all
>> files and processes are handled correctly.
>>
>> [1]: https://lore.kernel.org/linux-mm/161604050314.1463742.14151665140035795571.stgit@dwillia2-desk3.amr.corp.intel.com/
>>
>> Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
>> ---
>>   drivers/dax/super.c         |  3 ++-
>>   fs/xfs/xfs_notify_failure.c | 23 +++++++++++++++++++++++
>>   include/linux/mm.h          |  1 +
>>   3 files changed, 26 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/dax/super.c b/drivers/dax/super.c
>> index 9b5e2a5eb0ae..cf9a64563fbe 100644
>> --- a/drivers/dax/super.c
>> +++ b/drivers/dax/super.c
>> @@ -323,7 +323,8 @@ void kill_dax(struct dax_device *dax_dev)
>>   		return;
>>   
>>   	if (dax_dev->holder_data != NULL)
>> -		dax_holder_notify_failure(dax_dev, 0, U64_MAX, 0);
>> +		dax_holder_notify_failure(dax_dev, 0, U64_MAX,
>> +				MF_MEM_PRE_REMOVE);
>>   
>>   	clear_bit(DAXDEV_ALIVE, &dax_dev->flags);
>>   	synchronize_srcu(&dax_srcu);
>> diff --git a/fs/xfs/xfs_notify_failure.c b/fs/xfs/xfs_notify_failure.c
>> index 3830f908e215..5e04ba7fa403 100644
>> --- a/fs/xfs/xfs_notify_failure.c
>> +++ b/fs/xfs/xfs_notify_failure.c
>> @@ -22,6 +22,7 @@
>>   
>>   #include <linux/mm.h>
>>   #include <linux/dax.h>
>> +#include <linux/fs.h>
>>   
>>   struct xfs_failure_info {
>>   	xfs_agblock_t		startblock;
>> @@ -77,6 +78,9 @@ xfs_dax_failure_fn(
>>   
>>   	if (XFS_RMAP_NON_INODE_OWNER(rec->rm_owner) ||
>>   	    (rec->rm_flags & (XFS_RMAP_ATTR_FORK | XFS_RMAP_BMBT_BLOCK))) {
>> +		/* The device is about to be removed.  Not a really failure. */
>> +		if (notify->mf_flags & MF_MEM_PRE_REMOVE)
>> +			return 0;
>>   		notify->want_shutdown = true;
>>   		return 0;
>>   	}
>> @@ -182,12 +186,23 @@ xfs_dax_notify_failure(
>>   	struct xfs_mount	*mp = dax_holder(dax_dev);
>>   	u64			ddev_start;
>>   	u64			ddev_end;
>> +	int			error;
>>   
>>   	if (!(mp->m_super->s_flags & SB_BORN)) {
>>   		xfs_warn(mp, "filesystem is not ready for notify_failure()!");
>>   		return -EIO;
>>   	}
>>   
>> +	if (mf_flags & MF_MEM_PRE_REMOVE) {
>> +		xfs_info(mp, "device is about to be removed!");
>> +		down_write(&mp->m_super->s_umount);
>> +		error = sync_filesystem(mp->m_super);
>> +		drop_pagecache_sb(mp->m_super, NULL);
>> +		up_write(&mp->m_super->s_umount);
>> +		if (error)
>> +			return error;
>> +	}
>> +
>>   	if (mp->m_rtdev_targp && mp->m_rtdev_targp->bt_daxdev == dax_dev) {
>>   		xfs_debug(mp,
>>   			 "notify_failure() not supported on realtime device!");
>> @@ -196,6 +211,8 @@ xfs_dax_notify_failure(
>>   
>>   	if (mp->m_logdev_targp && mp->m_logdev_targp->bt_daxdev == dax_dev &&
>>   	    mp->m_logdev_targp != mp->m_ddev_targp) {
>> +		if (mf_flags & MF_MEM_PRE_REMOVE)
>> +			return 0;
>>   		xfs_err(mp, "ondisk log corrupt, shutting down fs!");
>>   		xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_ONDISK);
>>   		return -EFSCORRUPTED;
>> @@ -209,6 +226,12 @@ xfs_dax_notify_failure(
>>   	ddev_start = mp->m_ddev_targp->bt_dax_part_off;
>>   	ddev_end = ddev_start + bdev_nr_bytes(mp->m_ddev_targp->bt_bdev) - 1;
>>   
>> +	/* Notify failure on the whole device */
>> +	if (offset == 0 && len == U64_MAX) {
>> +		offset = ddev_start;
>> +		len = bdev_nr_bytes(mp->m_ddev_targp->bt_bdev);
>> +	}
> 
> I wonder, won't the trimming code below take care of this?

The len is U64_MAX, so 'offset + len - 1' will overflow.  That can't be 
handled correctly by the trimming code below.


--
Thanks,
Ruan.

> 
> The rest of the patch looks ok to me.
> 
> --D
> 
>> +
>>   	/* Ignore the range out of filesystem area */
>>   	if (offset + len - 1 < ddev_start)
>>   		return -ENXIO;
>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>> index 21f8b27bd9fd..9122a1c57dd2 100644
>> --- a/include/linux/mm.h
>> +++ b/include/linux/mm.h
>> @@ -3183,6 +3183,7 @@ enum mf_flags {
>>   	MF_UNPOISON = 1 << 4,
>>   	MF_SW_SIMULATED = 1 << 5,
>>   	MF_NO_RETRY = 1 << 6,
>> +	MF_MEM_PRE_REMOVE = 1 << 7,
>>   };
>>   int mf_dax_kill_procs(struct address_space *mapping, pgoff_t index,
>>   		      unsigned long count, int mf_flags);
>> -- 
>> 2.37.2
>>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v8 0/3] mm, pmem, xfs: Introduce MF_MEM_REMOVE for unbind
  2022-09-14 18:15       ` Darrick J. Wong
@ 2022-09-15  2:56         ` Shiyang Ruan
  2022-09-16 15:43           ` Darrick J. Wong
  0 siblings, 1 reply; 19+ messages in thread
From: Shiyang Ruan @ 2022-09-15  2:56 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: dan.j.williams, linux-fsdevel, linux-mm, nvdimm, linux-xfs,
	linux-kernel, hch, david, jane.chu



在 2022/9/15 2:15, Darrick J. Wong 写道:
> On Wed, Sep 14, 2022 at 11:09:23AM -0700, Darrick J. Wong wrote:
>> On Wed, Sep 07, 2022 at 05:46:00PM +0800, Shiyang Ruan wrote:
>>> ping
>>>
>>> 在 2022/9/2 18:35, Shiyang Ruan 写道:
>>>> Changes since v7:
>>>>     1. Add P1 to fix calculation mistake
>>>>     2. Add P2 to move drop_pagecache_sb() to super.c for xfs to use
>>>>     3. P3: Add invalidate all mappings after sync.
>>>>     4. P3: Set offset&len to be start&length of device when it is to be removed.
>>>>     5. Rebase on 6.0-rc3 + Darrick's patch[1] + Dan's patch[2].
>>>>
>>>> Changes since v6:
>>>>     1. Rebase on 6.0-rc2 and Darrick's patch[1].
>>>>
>>>> [1]: https://lore.kernel.org/linux-xfs/Yv5wIa2crHioYeRr@magnolia/
>>>> [2]: https://lore.kernel.org/linux-xfs/166153426798.2758201.15108211981034512993.stgit@dwillia2-xfh.jf.intel.com/
>>
>> Just out of curiosity, is it your (or djbw's) intent to send all these
>> as bugfixes for 6.0 via akpm like all the other dax fixen?
> 
> Aha, this is 6.1 stuff, please ignore this question.

Actually I hope these patches can be merged ASAP. (But it seems a bit 
late for 6.0 now.)

And do you know which/whose branch has picked up your patch[1]?  I 
cannot find it.


--
Thanks,
Ruan.

> 
> --D
> 
>> --D
>>
>>>>
>>>> Shiyang Ruan (3):
>>>>     xfs: fix the calculation of length and end
>>>>     fs: move drop_pagecache_sb() for others to use
>>>>     mm, pmem, xfs: Introduce MF_MEM_REMOVE for unbind
>>>>
>>>>    drivers/dax/super.c         |  3 ++-
>>>>    fs/drop_caches.c            | 33 ---------------------------------
>>>>    fs/super.c                  | 34 ++++++++++++++++++++++++++++++++++
>>>>    fs/xfs/xfs_notify_failure.c | 31 +++++++++++++++++++++++++++----
>>>>    include/linux/fs.h          |  1 +
>>>>    include/linux/mm.h          |  1 +
>>>>    6 files changed, 65 insertions(+), 38 deletions(-)
>>>>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v8 0/3] mm, pmem, xfs: Introduce MF_MEM_REMOVE for unbind
  2022-09-15  2:56         ` Shiyang Ruan
@ 2022-09-16 15:43           ` Darrick J. Wong
  0 siblings, 0 replies; 19+ messages in thread
From: Darrick J. Wong @ 2022-09-16 15:43 UTC (permalink / raw)
  To: Shiyang Ruan
  Cc: dan.j.williams, linux-fsdevel, linux-mm, nvdimm, linux-xfs,
	linux-kernel, hch, david, jane.chu

On Thu, Sep 15, 2022 at 10:56:09AM +0800, Shiyang Ruan wrote:
> 
> 
> 在 2022/9/15 2:15, Darrick J. Wong 写道:
> > On Wed, Sep 14, 2022 at 11:09:23AM -0700, Darrick J. Wong wrote:
> > > On Wed, Sep 07, 2022 at 05:46:00PM +0800, Shiyang Ruan wrote:
> > > > ping
> > > > 
> > > > 在 2022/9/2 18:35, Shiyang Ruan 写道:
> > > > > Changes since v7:
> > > > >     1. Add P1 to fix calculation mistake
> > > > >     2. Add P2 to move drop_pagecache_sb() to super.c for xfs to use
> > > > >     3. P3: Add invalidate all mappings after sync.
> > > > >     4. P3: Set offset&len to be start&length of device when it is to be removed.
> > > > >     5. Rebase on 6.0-rc3 + Darrick's patch[1] + Dan's patch[2].
> > > > > 
> > > > > Changes since v6:
> > > > >     1. Rebase on 6.0-rc2 and Darrick's patch[1].
> > > > > 
> > > > > [1]: https://lore.kernel.org/linux-xfs/Yv5wIa2crHioYeRr@magnolia/
> > > > > [2]: https://lore.kernel.org/linux-xfs/166153426798.2758201.15108211981034512993.stgit@dwillia2-xfh.jf.intel.com/
> > > 
> > > Just out of curiosity, is it your (or djbw's) intent to send all these
> > > as bugfixes for 6.0 via akpm like all the other dax fixen?
> > 
> > Aha, this is 6.1 stuff, please ignore this question.
> 
> Actually I hope these patches can be merged ASAP. (But it seems a bit late
> for 6.0 now.)
> 
> And do you know which/whose branch has picked up your patch[1]?  I cannot
> find it.

It's not upstream, though the maintainer (Dave currently) reviewed it.
I don't know if he hasn't had time to put together a fixes branch or if
he's simply punting all the queued up stuff to 6.1.

(Dave?)

--D

> 
> --
> Thanks,
> Ruan.
> 
> > 
> > --D
> > 
> > > --D
> > > 
> > > > > 
> > > > > Shiyang Ruan (3):
> > > > >     xfs: fix the calculation of length and end
> > > > >     fs: move drop_pagecache_sb() for others to use
> > > > >     mm, pmem, xfs: Introduce MF_MEM_REMOVE for unbind
> > > > > 
> > > > >    drivers/dax/super.c         |  3 ++-
> > > > >    fs/drop_caches.c            | 33 ---------------------------------
> > > > >    fs/super.c                  | 34 ++++++++++++++++++++++++++++++++++
> > > > >    fs/xfs/xfs_notify_failure.c | 31 +++++++++++++++++++++++++++----
> > > > >    include/linux/fs.h          |  1 +
> > > > >    include/linux/mm.h          |  1 +
> > > > >    6 files changed, 65 insertions(+), 38 deletions(-)
> > > > > 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3] mm, pmem, xfs: Introduce MF_MEM_REMOVE for unbind
  2022-09-02 10:36   ` [PATCH 3/3] mm, pmem, xfs: Introduce MF_MEM_REMOVE for unbind Shiyang Ruan
  2022-09-14 18:15     ` Darrick J. Wong
@ 2022-09-20  2:45     ` Dave Chinner
  2022-09-21  0:58       ` Darrick J. Wong
  1 sibling, 1 reply; 19+ messages in thread
From: Dave Chinner @ 2022-09-20  2:45 UTC (permalink / raw)
  To: Shiyang Ruan
  Cc: linux-kernel, linux-xfs, nvdimm, linux-mm, linux-fsdevel, djwong,
	dan.j.williams, hch, jane.chu

On Fri, Sep 02, 2022 at 10:36:01AM +0000, Shiyang Ruan wrote:
> This patch is inspired by Dan's "mm, dax, pmem: Introduce
> dev_pagemap_failure()"[1].  With the help of dax_holder and
> ->notify_failure() mechanism, the pmem driver is able to ask filesystem
> (or mapped device) on it to unmap all files in use and notify processes
> who are using those files.
> 
> Call trace:
> trigger unbind
>  -> unbind_store()
>   -> ... (skip)
>    -> devres_release_all()   # was pmem driver ->remove() in v1
>     -> kill_dax()
>      -> dax_holder_notify_failure(dax_dev, 0, U64_MAX, MF_MEM_PRE_REMOVE)
>       -> xfs_dax_notify_failure()
> 
> Introduce MF_MEM_PRE_REMOVE to let filesystem know this is a remove
> event.  So do not shutdown filesystem directly if something not
> supported, or if failure range includes metadata area.  Make sure all
> files and processes are handled correctly.
> 
> [1]: https://lore.kernel.org/linux-mm/161604050314.1463742.14151665140035795571.stgit@dwillia2-desk3.amr.corp.intel.com/
> 
> Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
> ---
>  drivers/dax/super.c         |  3 ++-
>  fs/xfs/xfs_notify_failure.c | 23 +++++++++++++++++++++++
>  include/linux/mm.h          |  1 +
>  3 files changed, 26 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/dax/super.c b/drivers/dax/super.c
> index 9b5e2a5eb0ae..cf9a64563fbe 100644
> --- a/drivers/dax/super.c
> +++ b/drivers/dax/super.c
> @@ -323,7 +323,8 @@ void kill_dax(struct dax_device *dax_dev)
>  		return;
>  
>  	if (dax_dev->holder_data != NULL)
> -		dax_holder_notify_failure(dax_dev, 0, U64_MAX, 0);
> +		dax_holder_notify_failure(dax_dev, 0, U64_MAX,
> +				MF_MEM_PRE_REMOVE);
>  
>  	clear_bit(DAXDEV_ALIVE, &dax_dev->flags);
>  	synchronize_srcu(&dax_srcu);
> diff --git a/fs/xfs/xfs_notify_failure.c b/fs/xfs/xfs_notify_failure.c
> index 3830f908e215..5e04ba7fa403 100644
> --- a/fs/xfs/xfs_notify_failure.c
> +++ b/fs/xfs/xfs_notify_failure.c
> @@ -22,6 +22,7 @@
>  
>  #include <linux/mm.h>
>  #include <linux/dax.h>
> +#include <linux/fs.h>
>  
>  struct xfs_failure_info {
>  	xfs_agblock_t		startblock;
> @@ -77,6 +78,9 @@ xfs_dax_failure_fn(
>  
>  	if (XFS_RMAP_NON_INODE_OWNER(rec->rm_owner) ||
>  	    (rec->rm_flags & (XFS_RMAP_ATTR_FORK | XFS_RMAP_BMBT_BLOCK))) {
> +		/* The device is about to be removed.  Not a really failure. */
> +		if (notify->mf_flags & MF_MEM_PRE_REMOVE)
> +			return 0;
>  		notify->want_shutdown = true;
>  		return 0;
>  	}
> @@ -182,12 +186,23 @@ xfs_dax_notify_failure(
>  	struct xfs_mount	*mp = dax_holder(dax_dev);
>  	u64			ddev_start;
>  	u64			ddev_end;
> +	int			error;
>  
>  	if (!(mp->m_super->s_flags & SB_BORN)) {
>  		xfs_warn(mp, "filesystem is not ready for notify_failure()!");
>  		return -EIO;
>  	}
>  
> +	if (mf_flags & MF_MEM_PRE_REMOVE) {
> +		xfs_info(mp, "device is about to be removed!");
> +		down_write(&mp->m_super->s_umount);
> +		error = sync_filesystem(mp->m_super);
> +		drop_pagecache_sb(mp->m_super, NULL);
> +		up_write(&mp->m_super->s_umount);
> +		if (error)
> +			return error;

If the device is about to go away unexpectedly, shouldn't this shut
down the filesystem after syncing it here?  If the filesystem has
been shut down, then everything will fail before removal finally
triggers, and the act of unmounting the filesystem post device
removal will clean up the page cache and all the other caches.

IOWs, I don't understand why the page cache is considered special
here (as opposed to, say, the inode or dentry caches), nor why we
aren't shutting down the filesystem directly after syncing it to
disk to ensure that we don't end up with applications losing data as
a result of racing with the removal....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3] mm, pmem, xfs: Introduce MF_MEM_REMOVE for unbind
  2022-09-20  2:45     ` Dave Chinner
@ 2022-09-21  0:58       ` Darrick J. Wong
  0 siblings, 0 replies; 19+ messages in thread
From: Darrick J. Wong @ 2022-09-21  0:58 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Shiyang Ruan, linux-kernel, linux-xfs, nvdimm, linux-mm,
	linux-fsdevel, dan.j.williams, hch, jane.chu

On Tue, Sep 20, 2022 at 12:45:19PM +1000, Dave Chinner wrote:
> On Fri, Sep 02, 2022 at 10:36:01AM +0000, Shiyang Ruan wrote:
> > This patch is inspired by Dan's "mm, dax, pmem: Introduce
> > dev_pagemap_failure()"[1].  With the help of dax_holder and
> > ->notify_failure() mechanism, the pmem driver is able to ask filesystem
> > (or mapped device) on it to unmap all files in use and notify processes
> > who are using those files.
> > 
> > Call trace:
> > trigger unbind
> >  -> unbind_store()
> >   -> ... (skip)
> >    -> devres_release_all()   # was pmem driver ->remove() in v1
> >     -> kill_dax()
> >      -> dax_holder_notify_failure(dax_dev, 0, U64_MAX, MF_MEM_PRE_REMOVE)
> >       -> xfs_dax_notify_failure()
> > 
> > Introduce MF_MEM_PRE_REMOVE to let filesystem know this is a remove
> > event.  So do not shutdown filesystem directly if something not
> > supported, or if failure range includes metadata area.  Make sure all
> > files and processes are handled correctly.
> > 
> > [1]: https://lore.kernel.org/linux-mm/161604050314.1463742.14151665140035795571.stgit@dwillia2-desk3.amr.corp.intel.com/
> > 
> > Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
> > ---
> >  drivers/dax/super.c         |  3 ++-
> >  fs/xfs/xfs_notify_failure.c | 23 +++++++++++++++++++++++
> >  include/linux/mm.h          |  1 +
> >  3 files changed, 26 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/dax/super.c b/drivers/dax/super.c
> > index 9b5e2a5eb0ae..cf9a64563fbe 100644
> > --- a/drivers/dax/super.c
> > +++ b/drivers/dax/super.c
> > @@ -323,7 +323,8 @@ void kill_dax(struct dax_device *dax_dev)
> >  		return;
> >  
> >  	if (dax_dev->holder_data != NULL)
> > -		dax_holder_notify_failure(dax_dev, 0, U64_MAX, 0);
> > +		dax_holder_notify_failure(dax_dev, 0, U64_MAX,
> > +				MF_MEM_PRE_REMOVE);
> >  
> >  	clear_bit(DAXDEV_ALIVE, &dax_dev->flags);
> >  	synchronize_srcu(&dax_srcu);
> > diff --git a/fs/xfs/xfs_notify_failure.c b/fs/xfs/xfs_notify_failure.c
> > index 3830f908e215..5e04ba7fa403 100644
> > --- a/fs/xfs/xfs_notify_failure.c
> > +++ b/fs/xfs/xfs_notify_failure.c
> > @@ -22,6 +22,7 @@
> >  
> >  #include <linux/mm.h>
> >  #include <linux/dax.h>
> > +#include <linux/fs.h>
> >  
> >  struct xfs_failure_info {
> >  	xfs_agblock_t		startblock;
> > @@ -77,6 +78,9 @@ xfs_dax_failure_fn(
> >  
> >  	if (XFS_RMAP_NON_INODE_OWNER(rec->rm_owner) ||
> >  	    (rec->rm_flags & (XFS_RMAP_ATTR_FORK | XFS_RMAP_BMBT_BLOCK))) {
> > +		/* The device is about to be removed.  Not a really failure. */
> > +		if (notify->mf_flags & MF_MEM_PRE_REMOVE)
> > +			return 0;
> >  		notify->want_shutdown = true;
> >  		return 0;
> >  	}
> > @@ -182,12 +186,23 @@ xfs_dax_notify_failure(
> >  	struct xfs_mount	*mp = dax_holder(dax_dev);
> >  	u64			ddev_start;
> >  	u64			ddev_end;
> > +	int			error;
> >  
> >  	if (!(mp->m_super->s_flags & SB_BORN)) {
> >  		xfs_warn(mp, "filesystem is not ready for notify_failure()!");
> >  		return -EIO;
> >  	}
> >  
> > +	if (mf_flags & MF_MEM_PRE_REMOVE) {
> > +		xfs_info(mp, "device is about to be removed!");
> > +		down_write(&mp->m_super->s_umount);
> > +		error = sync_filesystem(mp->m_super);
> > +		drop_pagecache_sb(mp->m_super, NULL);
> > +		up_write(&mp->m_super->s_umount);
> > +		if (error)
> > +			return error;
> 
> If the device is about to go away unexpectedly, shouldn't this shut
> down the filesystem after syncing it here?  If the filesystem has
> been shut down, then everything will fail before removal finally
> triggers, and the act of unmounting the filesystem post device
> removal will clean up the page cache and all the other caches.

IIRC they want to kill all the processes with MAP_SYNC mappings sooner
than whenever the admin gets around to unmounting the filesystem, which
is why PRE_REMOVE will then go walk the rmapbt to find processes to
shoot down.  I'm not sure, though, if drop_pagecache_sb only touches
DRAM page cache or if it'll shoot down fsdax mappings too?

> IOWs, I don't understand why the page cache is considered special
> here (as opposed to, say, the inode or dentry caches), nor why we
> aren't shutting down the filesystem directly after syncing it to
> disk to ensure that we don't end up with applications losing data as
> a result of racing with the removal....

But yeah, we might as well shut down the fs at the end of PRE_REMOVE
handling, if the rmap walk hasn't already done that.

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2022-09-21  0:58 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-26 13:03 [PATCH v7] mm, pmem, xfs: Introduce MF_MEM_REMOVE for unbind Shiyang Ruan
2022-08-26 21:35 ` Dan Williams
2022-08-29 10:02   ` Shiyang Ruan
2022-08-29 14:49     ` Darrick J. Wong
2022-09-02 10:35 ` [PATCH v8 0/3] " Shiyang Ruan
2022-09-02 10:35   ` [PATCH 1/3] xfs: fix the calculation of length and end Shiyang Ruan
2022-09-14 18:13     ` Darrick J. Wong
2022-09-02 10:36   ` [PATCH 2/3] fs: move drop_pagecache_sb() for others to use Shiyang Ruan
2022-09-14 18:16     ` Darrick J. Wong
2022-09-02 10:36   ` [PATCH 3/3] mm, pmem, xfs: Introduce MF_MEM_REMOVE for unbind Shiyang Ruan
2022-09-14 18:15     ` Darrick J. Wong
2022-09-15  1:49       ` Shiyang Ruan
2022-09-20  2:45     ` Dave Chinner
2022-09-21  0:58       ` Darrick J. Wong
2022-09-07  9:46   ` [PATCH v8 0/3] " Shiyang Ruan
2022-09-14 18:09     ` Darrick J. Wong
2022-09-14 18:15       ` Darrick J. Wong
2022-09-15  2:56         ` Shiyang Ruan
2022-09-16 15:43           ` Darrick J. Wong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.