All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Arnd Bergmann <arnd@arndb.de>, Dave Chinner <david@fromorbit.com>
Cc: <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	<linux-mm@kvack.org>, <linux-arch@vger.kernel.org>,
	Vishal Verma <vishal.l.verma@intel.com>,
	"Dave Jiang" <dave.jiang@intel.com>,
	Matthew Wilcox <willy@infradead.org>,
	"Russell King" <linux@armlinux.org.uk>, <nvdimm@lists.linux.dev>,
	<linux-cxl@vger.kernel.org>, <linux-fsdevel@vger.kernel.org>,
	<dm-devel@lists.linux.dev>
Subject: Re: [RFC PATCH v3 2/4] dax: Check for data cache aliasing at runtime
Date: Fri, 2 Feb 2024 11:41:40 -0800	[thread overview]
Message-ID: <65bd457460fb1_719322942@dwillia2-mobl3.amr.corp.intel.com.notmuch> (raw)
In-Reply-To: <6bdf6085-101d-47ef-86f4-87936622345a@efficios.com>

Mathieu Desnoyers wrote:
> On 2024-02-02 12:37, Dan Williams wrote:
> > Mathieu Desnoyers wrote:
> [...]
> >>
> > 
> >> The alternative route I intend to take is to audit all callers
> >> of alloc_dax() and make sure they all save the alloc_dax() return
> >> value in a struct dax_device * local variable first for the sake
> >> of checking for IS_ERR(). This will leave the xyz->dax_dev pointer
> >> initialized to NULL in the error case and simplify the rest of
> >> error checking.
> > 
> > I could maybe get on board with that, but it needs a comment somewhere
> > about the asymmetric subtlety.
> 
> Is this "somewhere" at every alloc_dax() call site, or do you have
> something else in mind ?

At least kill_dax() should mention the asymmetry I think.

> 
> > 
> >>
> >>
> >>>    		return;
> >>>    
> >>>    	if (dax_dev->holder_data != NULL)
> >>> diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
> >>> index 4e8fdcb3f1c8..b69c9e442cf4 100644
> >>> --- a/drivers/nvdimm/pmem.c
> >>> +++ b/drivers/nvdimm/pmem.c
> >>> @@ -560,17 +560,19 @@ static int pmem_attach_disk(struct device *dev,
> >>>    	dax_dev = alloc_dax(pmem, &pmem_dax_ops);
> >>>    	if (IS_ERR(dax_dev)) {
> >>>    		rc = PTR_ERR(dax_dev);
> >>> -		goto out;
> >>> +		if (rc != -EOPNOTSUPP)
> >>> +			goto out;
> >>
> >> If I compare the before / after this change, if previously
> >> pmem_attach_disk() was called in a configuration with FS_DAX=n, it would
> >> result in a NULL pointer dereference.
> > 
> > No, alloc_dax() only returns NULL CONFIG_DAX=n case, not the
> > CONFIG_FS_DAX=n case.
> 
> Indeed, I was wrong there.
> 
> > So that means that pmem devices on ARM have been
> > possible without FS_DAX. So, in order for alloc_dax() returning
> > ERR_PTR(-EOPNOTSUPP) to not regress pmem device availability this error
> > path needs to be changed.
> Good point. We're moving the depends on !(ARM || MIPS |PARC) from FS_DAX
> Kconfig to a runtime check in alloc_dax(), which is used whenever DAX=y,
> which includes configurations that had FS_DAX=n previously.
> 
> I'll change the error path in pmem_attack_disk to treat -EOPNOTSUPP
> alloc_dax() return value as non-fatal.
> 
> > 
> >> This would be an error handling fix all by itself. Do we really want
> >> to return successfully if dax is unsupported, or should we return
> >> an error here ?
> > 
> > Per above, there is no error handling fix, and pmem block device
> > available should not depend on alloc_dax() succeeding.
> 
> I agree on treating alloc_dax() failure as non-fatal. There is
> however one error handling fix to nvdimm/pmem which I plan to
> introduce as an initial patch before this change:
> 
>      nvdimm/pmem: Fix leak on dax_add_host() failure
>      
>      Fix a leak on dax_add_host() error, where "goto out_cleanup_dax" is done
>      before setting pmem->dax_dev, which therefore issues the two following
>      calls on NULL pointers:
>      
>      out_cleanup_dax:
>              kill_dax(pmem->dax_dev);
>              put_dax(pmem->dax_dev);
>      
>      Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> 
> diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
> index 4e8fdcb3f1c8..9fe358090720 100644
> --- a/drivers/nvdimm/pmem.c
> +++ b/drivers/nvdimm/pmem.c
> @@ -566,12 +566,11 @@ static int pmem_attach_disk(struct device *dev,
>   	set_dax_nomc(dax_dev);
>   	if (is_nvdimm_sync(nd_region))
>   		set_dax_synchronous(dax_dev);
> +	pmem->dax_dev = dax_dev;
>   	rc = dax_add_host(dax_dev, disk);
>   	if (rc)
>   		goto out_cleanup_dax;
>   	dax_write_cache(dax_dev, nvdimm_has_cache(nd_region));
> -	pmem->dax_dev = dax_dev;
> -
>   	rc = device_add_disk(dev, disk, pmem_attribute_groups);
>   	if (rc)
>   		goto out_remove_host;

Yup, looks good.

> > The real question is what to do about device-dax. I *think* it is not
> > affected by cpu_dcache aliasing because it never accesses user mappings
> > through a kernel alias. I doubt device-dax is in use on these platforms,
> > but we might need another fixup for that if someone screams about the
> > alloc_dax() behavior change making them lose device-dax access.
> 
> By "device-dax", I understand you mean drivers/dax/Kconfig:DEV_DAX.
> 
> Based on your analysis, is alloc_dax() still the right spot where
> to place this runtime check ? Which call sites are responsible
> for invoking alloc_dax() for device-dax ?

That is in devm_create_dev_dax().

> If we know which call sites do not intend to use the kernel linear
> mapping, we could introduce a flag (or a new variant of the alloc_dax()
> API) that would either enforce or skip the check.

Hmmm, it looks like there is already a natural flag for that. If
alloc_dax() is passed a NULL operations pointer it means there are no
kernel usages of the aliased mapping. That actually fits rather nicely.

[..]
> >>> @@ -804,6 +808,15 @@ static int virtio_fs_setup_dax(struct virtio_device *vdev, struct virtio_fs *fs)
> >>>    	if (!IS_ENABLED(CONFIG_FUSE_DAX))
> >>>    		return 0;
> >>>    
> >>> +	dax_dev = alloc_dax(fs, &virtio_fs_dax_ops);
> >>> +	if (IS_ERR(dax_dev)) {
> >>> +		int rc = PTR_ERR(dax_dev);
> >>> +
> >>> +		if (rc == -EOPNOTSUPP)
> >>> +			return 0;
> >>> +		return rc;
> >>> +	}
> >>
> >> What is gained by moving this allocation here ?
> > 
> > The gain is to fail early in virtio_fs_setup_dax() since the fundamental
> > dependency of alloc_dax() success is not met. For example why let the
> > setup progress to devm_memremap_pages() when alloc_dax() is going to
> > return ERR_PTR(-EOPNOTSUPP).
> 
> What I don't know is whether there is a dependency requiring to do
> devm_request_mem_region(), devm_kzalloc(), devm_memremap_pages()
> before calling alloc_dax() ?
> 
> Those 3 calls are used to populate:
> 
>          fs->window_phys_addr = (phys_addr_t) cache_reg.addr;
>          fs->window_len = (phys_addr_t) cache_reg.len;
> 
> and then alloc_dax() takes "fs" as private data parameter. So it's
> unclear to me whether we can swap the invocation order. I suspect
> that it is not an issue because it is only used to populate
> dax_dev->private, but I prefer to confirm this with you just to be
> on the safe side.

Thanks for that. All of those need to be done before the fs goes live
later in virtio_device_ready(), but before that point nothing should be
calling into virtio_fs_dax_ops, so as far as I can see it is safe to
change the order.

  reply	other threads:[~2024-02-02 19:41 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-31 16:25 [RFC PATCH v3 0/4] Introduce cpu_dcache_is_aliasing() to fix DAX regression Mathieu Desnoyers
2024-01-31 16:25 ` [RFC PATCH v3 1/4] dm: Treat alloc_dax failure as non-fatal Mathieu Desnoyers
2024-01-31 16:25 ` [RFC PATCH v3 2/4] dax: Check for data cache aliasing at runtime Mathieu Desnoyers
2024-01-31 21:02   ` Dan Williams
2024-01-31 21:39     ` Mathieu Desnoyers
2024-01-31 22:18       ` Dan Williams
2024-02-01 15:44         ` Mathieu Desnoyers
2024-02-02 14:40           ` Mathieu Desnoyers
2024-02-02 16:32           ` Mathieu Desnoyers
2024-02-02 17:37           ` Dan Williams
2024-02-02 19:29             ` Mathieu Desnoyers
2024-02-02 19:41               ` Dan Williams [this message]
2024-02-02 20:02                 ` Mathieu Desnoyers
2024-02-02 20:14                   ` Dan Williams
2024-02-02 20:18                     ` Mathieu Desnoyers
2024-01-31 16:25 ` [RFC PATCH v3 3/4] Introduce cpu_dcache_is_aliasing() across all architectures Mathieu Desnoyers
2024-01-31 17:17   ` Christoph Hellwig
2024-01-31 17:59     ` Mathieu Desnoyers
2024-01-31 20:42       ` Dan Williams
2024-01-31 16:25 ` [RFC PATCH v3 4/4] dax: Fix incorrect list of data cache aliasing architectures Mathieu Desnoyers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=65bd457460fb1_719322942@dwillia2-mobl3.amr.corp.intel.com.notmuch \
    --to=dan.j.williams@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=dave.jiang@intel.com \
    --cc=david@fromorbit.com \
    --cc=dm-devel@lists.linux.dev \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux@armlinux.org.uk \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=nvdimm@lists.linux.dev \
    --cc=torvalds@linux-foundation.org \
    --cc=vishal.l.verma@intel.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.