All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Cc: linux-kernel@vger.kernel.org, linux-xfs@vger.kernel.org,
	nvdimm@lists.linux.dev, linux-mm@kvack.org,
	linux-fsdevel@vger.kernel.org, dan.j.williams@intel.com,
	david@fromorbit.com, hch@infradead.org, jane.chu@oracle.com
Subject: Re: [PATCH v7 2/8] dax: Introduce holder for dax_device
Date: Thu, 14 Oct 2021 11:00:25 -0700	[thread overview]
Message-ID: <20211014180025.GC24333@magnolia> (raw)
In-Reply-To: <20210924130959.2695749-3-ruansy.fnst@fujitsu.com>

On Fri, Sep 24, 2021 at 09:09:53PM +0800, Shiyang Ruan wrote:
> To easily track filesystem from a pmem device, we introduce a holder for
> dax_device structure, and also its operation.  This holder is used to
> remember who is using this dax_device:
>  - When it is the backend of a filesystem, the holder will be the
>    superblock of this filesystem.
>  - When this pmem device is one of the targets in a mapped device, the
>    holder will be this mapped device.  In this case, the mapped device
>    has its own dax_device and it will follow the first rule.  So that we
>    can finally track to the filesystem we needed.
> 
> The holder and holder_ops will be set when filesystem is being mounted,
> or an target device is being activated.
> 
> Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
> ---
>  drivers/dax/super.c | 59 +++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/dax.h | 29 ++++++++++++++++++++++
>  2 files changed, 88 insertions(+)
> 
> diff --git a/drivers/dax/super.c b/drivers/dax/super.c
> index 48ce86501d93..7d4a11dcba90 100644
> --- a/drivers/dax/super.c
> +++ b/drivers/dax/super.c
> @@ -23,7 +23,10 @@
>   * @cdev: optional character interface for "device dax"
>   * @host: optional name for lookups where the device path is not available
>   * @private: dax driver private data
> + * @holder_data: holder of a dax_device: could be filesystem or mapped device
>   * @flags: state and boolean properties
> + * @ops: operations for dax_device
> + * @holder_ops: operations for the inner holder
>   */
>  struct dax_device {
>  	struct hlist_node list;
> @@ -31,8 +34,10 @@ struct dax_device {
>  	struct cdev cdev;
>  	const char *host;
>  	void *private;
> +	void *holder_data;
>  	unsigned long flags;
>  	const struct dax_operations *ops;
> +	const struct dax_holder_operations *holder_ops;
>  };
>  
>  static dev_t dax_devt;
> @@ -374,6 +379,29 @@ int dax_zero_page_range(struct dax_device *dax_dev, pgoff_t pgoff,
>  }
>  EXPORT_SYMBOL_GPL(dax_zero_page_range);
>  
> +int dax_holder_notify_failure(struct dax_device *dax_dev, loff_t offset,
> +			      size_t size, int flags)
> +{
> +	int rc;
> +
> +	dax_read_lock();
> +	if (!dax_alive(dax_dev)) {
> +		rc = -ENXIO;
> +		goto out;
> +	}
> +
> +	if (!dax_dev->holder_data) {
> +		rc = -EOPNOTSUPP;
> +		goto out;
> +	}
> +
> +	rc = dax_dev->holder_ops->notify_failure(dax_dev, offset, size, flags);

Shouldn't this check if dax_dev->holder_ops != NULL before dereferencing
it for the function call?  Imagine an implementation that wants to
attach a ->notify_failure function to a dax_device, maintains its own
lookup table, and decides that it doesn't need to set holder_data.

(Or, imagine someone who writes a garbage into holder_data and *boom*)

How does the locking work here?  If there's a media failure, we'll take
dax_rwsem and call ->notify_failure.  If the ->notify_failure function
wants to access the pmem to handle the error by calling back into the
dax code, will that cause nested locking on dax_rwsem?

Jumping ahead a bit, I think the rmap btree accesses that the xfs
implementation performs can cause xfs_buf(fer) cache IO, which would
trigger that if the buffers aren't already in memory, if I'm reading
this correctly?

> +out:
> +	dax_read_unlock();
> +	return rc;
> +}
> +EXPORT_SYMBOL_GPL(dax_holder_notify_failure);
> +
>  #ifdef CONFIG_ARCH_HAS_PMEM_API
>  void arch_wb_cache_pmem(void *addr, size_t size);
>  void dax_flush(struct dax_device *dax_dev, void *addr, size_t size)
> @@ -618,6 +646,37 @@ void put_dax(struct dax_device *dax_dev)
>  }
>  EXPORT_SYMBOL_GPL(put_dax);
>  
> +void dax_set_holder(struct dax_device *dax_dev, void *holder,
> +		const struct dax_holder_operations *ops)
> +{
> +	dax_write_lock();
> +	if (!dax_alive(dax_dev)) {
> +		dax_write_unlock();
> +		return;
> +	}
> +
> +	dax_dev->holder_data = holder;
> +	dax_dev->holder_ops = ops;
> +	dax_write_unlock();

I guess this means that the holder has to detach itself before anyone
calls kill_dax, or else a dead dax device ends up with a dangling
reference to the holder?

> +}
> +EXPORT_SYMBOL_GPL(dax_set_holder);
> +
> +void *dax_get_holder(struct dax_device *dax_dev)
> +{
> +	void *holder;
> +
> +	dax_read_lock();
> +	if (!dax_alive(dax_dev)) {
> +		dax_read_unlock();
> +		return NULL;
> +	}
> +
> +	holder = dax_dev->holder_data;
> +	dax_read_unlock();
> +	return holder;
> +}
> +EXPORT_SYMBOL_GPL(dax_get_holder);
> +
>  /**
>   * inode_dax: convert a public inode into its dax_dev
>   * @inode: An inode with i_cdev pointing to a dax_dev
> diff --git a/include/linux/dax.h b/include/linux/dax.h
> index 097b3304f9b9..d273d59723cd 100644
> --- a/include/linux/dax.h
> +++ b/include/linux/dax.h
> @@ -38,9 +38,24 @@ struct dax_operations {
>  	int (*zero_page_range)(struct dax_device *, pgoff_t, size_t);
>  };
>  
> +struct dax_holder_operations {
> +	/*
> +	 * notify_failure - notify memory failure into inner holder device
> +	 * @dax_dev: the dax device which contains the holder
> +	 * @offset: offset on this dax device where memory failure occurs
> +	 * @size: length of this memory failure event
> +	 * @flags: action flags for memory failure handler
> +	 */
> +	int (*notify_failure)(struct dax_device *dax_dev, loff_t offset,
> +			size_t size, int flags);

Shouldn't size be u64 or something?  Let's say that 8GB of your pmem go
bad, wouldn't you want a single call?  Though I guess the current
implementation only goes a single page at a time, doesn't it?

> +};
> +
>  extern struct attribute_group dax_attribute_group;
>  
>  #if IS_ENABLED(CONFIG_DAX)
> +void dax_set_holder(struct dax_device *dax_dev, void *holder,
> +		const struct dax_holder_operations *ops);
> +void *dax_get_holder(struct dax_device *dax_dev);
>  struct dax_device *alloc_dax(void *private, const char *host,
>  		const struct dax_operations *ops, unsigned long flags);
>  void put_dax(struct dax_device *dax_dev);
> @@ -70,6 +85,18 @@ static inline bool daxdev_mapping_supported(struct vm_area_struct *vma,
>  	return dax_synchronous(dax_dev);
>  }
>  #else
> +static inline struct dax_device *dax_get_by_host(const char *host)

Not sure why this is being added here?  AFAICT none of the patches call
this function...?

--D

> +{
> +	return NULL;
> +}
> +static inline void dax_set_holder(struct dax_device *dax_dev, void *holder,
> +		const struct dax_holder_operations *ops)
> +{
> +}
> +static inline void *dax_get_holder(struct dax_device *dax_dev)
> +{
> +	return NULL;
> +}
>  static inline struct dax_device *alloc_dax(void *private, const char *host,
>  		const struct dax_operations *ops, unsigned long flags)
>  {
> @@ -198,6 +225,8 @@ size_t dax_copy_to_iter(struct dax_device *dax_dev, pgoff_t pgoff, void *addr,
>  		size_t bytes, struct iov_iter *i);
>  int dax_zero_page_range(struct dax_device *dax_dev, pgoff_t pgoff,
>  			size_t nr_pages);
> +int dax_holder_notify_failure(struct dax_device *dax_dev, loff_t offset,
> +		size_t size, int flags);
>  void dax_flush(struct dax_device *dax_dev, void *addr, size_t size);
>  
>  ssize_t dax_iomap_rw(struct kiocb *iocb, struct iov_iter *iter,
> -- 
> 2.33.0
> 
> 
> 

  reply	other threads:[~2021-10-14 18:00 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-24 13:09 [PATCH v7 0/8] [PATCH v7 0/8] fsdax: introduce fs query to support reflink Shiyang Ruan
2021-09-24 13:09 ` [PATCH v7 1/8] dax: Use rwsem for dax_{read,write}_lock() Shiyang Ruan
2021-10-14 17:48   ` Darrick J. Wong
2021-10-20  5:19     ` Shiyang Ruan
2021-10-15  6:30   ` Christoph Hellwig
2021-09-24 13:09 ` [PATCH v7 2/8] dax: Introduce holder for dax_device Shiyang Ruan
2021-10-14 18:00   ` Darrick J. Wong [this message]
2021-10-20  6:58     ` Shiyang Ruan
2021-09-24 13:09 ` [PATCH v7 3/8] mm: factor helpers for memory_failure_dev_pagemap Shiyang Ruan
2021-10-14 18:02   ` Darrick J. Wong
2021-10-15  6:33   ` Christoph Hellwig
2021-09-24 13:09 ` [PATCH v7 4/8] pagemap,pmem: Introduce ->memory_failure() Shiyang Ruan
2021-10-14 18:05   ` Darrick J. Wong
2021-10-20  5:25     ` Shiyang Ruan
2021-10-15  6:36   ` Christoph Hellwig
2021-09-24 13:09 ` [PATCH v7 5/8] fsdax: Introduce dax_lock_mapping_entry() Shiyang Ruan
2021-10-14 18:17   ` Darrick J. Wong
2021-09-24 13:09 ` [PATCH v7 6/8] mm: Introduce mf_dax_kill_procs() for fsdax case Shiyang Ruan
2021-10-14 19:32   ` Darrick J. Wong
2021-10-20  5:47     ` Shiyang Ruan
2021-09-24 13:09 ` [PATCH v7 7/8] xfs: Implement ->notify_failure() for XFS Shiyang Ruan
2021-10-14 19:21   ` Darrick J. Wong
2021-10-15  6:41   ` Christoph Hellwig
2021-09-24 13:09 ` [PATCH v7 8/8] fsdax: add exception for reflinked files Shiyang Ruan
2021-10-14 19:24   ` Darrick J. Wong
2021-10-15  6:38     ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211014180025.GC24333@magnolia \
    --to=djwong@kernel.org \
    --cc=dan.j.williams@intel.com \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=jane.chu@oracle.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=nvdimm@lists.linux.dev \
    --cc=ruansy.fnst@fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.